Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2023-12-04 11:15:25 -04:00
commit 458b3d8e52
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 271 additions and 0 deletions

View file

@ -0,0 +1,57 @@
### Please describe the problem.
When doing `git annex sync` with new changes from a remote (i.e. synced/main and/or some_remote/main is ahead of our main), git annex seems to try and lock at least two things/times. With pidlock, this of course isn't possible, so somewhere around a `git merge`, we get the following error:
```
waiting for pid lock file .git/annex/pidlock which is held by another process (or may be stale)
```
When I inspect the content of the pidlock, the `git-annex-sync` process has the lock.
Manually running `git merge <ref that is ahead>` and then `git annex sync` doesn't have this issue, so it seems related to merging changes to the main branch (not the git-annnex branch).
### What steps will reproduce the problem?
I've really struggled to find a minimal reproducer, but I've hit this bug with several large real-world repos (@joeyh, I would be more than happy to give private access to one of these if you think it would be useful for debugging)
The latest time this happened, this was the full log:
```
$ git annex sync
pull origin
Updating 130dffc63..f8889be0c
waiting for pid lock file .git/annex/pidlock which is held by another process (or may be stale)
#### hangs indefinitely ######
^C
$ git merge origin/main
Updating 130dffc63..f8889be0c
Updating files: 100% (223/223), done.
Fast-forward
.gitignore
<and then quite a large diff, including many files created/deleted>
$ git annex sync
# merges git-annex branch and pushes to all remotes successfully
```
Sometimes, but not always, it seems that a git merge updates the files on disk, but not the git index, leading to an inconsistent state where I have the working tree of the latest commit, but git believes I'm still on the older HEAD and shows the diff as unstaged changes. In these cases one must `git reset --hard HEAD && git clean -df` to clear the state back to HEAD, and then git merge manually, and only then will git annex sync behave as expected.
### What version of git-annex are you using? On what operating system?
This issue seems to only exist on versions 10.xxxx, and I remember first running into this a bit over a year ago (I first assumed that it was user error, but I've since had it occur quite a few tim es where it can't be, e.g. freshly logging into a server that was just restarted). At least the following versions are affected:
* git-annex version: 10.20220526-gc6b112108
* git-annex version: 10.20230803-gb2887edc9
* git-annex version: 10.20230926-g44a7b4c9734adfda5912dd82c1aa97c615689f57
This is on various linuxes, mostly a few years old as these are institutional supercomputing clusters (ubuntu 20.04, debian 10, SLES 15.4).
### Please provide any additional information below.
This only affects clones with pidlock enabled (on compute clusters with NFS filesystems), the same repo on a laptop or whatever with a standard local filesystem (e.g. ext4, xfs) works perfectly.
Could this be caused by e.g. git annex running git merge which runs git annex filterprocess (directly or via git status), and git-annex-filterprocess tries to take the pidlock that git-annex-sync already has?
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Lots! This problem popped up during our regular use of git-annex in plant genomic research, where we use git annex to manage and move our analyses between the many clusters we must use for computation. Git annex is indispensable for this use case!!

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Atemu"
avatar="http://cdn.libravatar.org/avatar/86b8c2d893dfdf2146e1bbb8ac4165fb"
subject="comment 4"
date="2023-12-03T21:11:19Z"
content="""
I'd flip that around; make `--fast` the default and add a `--full` flag to show full info. I rarely need it.
"""]]

View file

@ -0,0 +1,20 @@
I seem to be having issues with annex.largefiles. I initialize git and the annex, then I set largefiles to put everything in the annex, generate a 1Mb file, `git add` it, and commit it. The file is copied and renamed to its hash value in .git/annex/objects but the file also remains in the main directory instead of being replaced with a symlink. Here are my steps to create the issue:
git init
git annex init
git annex config --set annex.largefiles anything
fallocate -l 1M test.bin
git add test.bin
git commit -a -m "Test"
I've also tried creating a .gitattributes file in the main directory with the following attribute:
* annex.largefiles=anything
Still, nothing is symlinked.
It works just fine when I run `git annex add test.bin`. It puts the file in the annex and creates a symlink to it.
I've tried this on Fedora 39 with git annex version 10.20230626 and on Ubuntu 22.04.2 LTS with git annex version 8.20210223. These are both fresh machines that have never had git or git-annex run on them before.
What am I doing wrong here? Should I be filing a bug report?

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="kdm9"
avatar="http://cdn.libravatar.org/avatar/b7b736335a0e9944a8169a582eb4c43d"
subject="comment 1"
date="2023-12-04T10:09:15Z"
content="""
I think this is intended behavior when adding with `git add`, or at least it's what I've seen for long enough for me to have forgotten if it ever was different. `git annex add` will create symlinks, as will `git add && git annex lock`.
If this was actually a small file, you wouldn't see it hashed & copied under .git/annex/objects. You should also see in git log that the change is an addition of some git annex key, not a git blob diff as would be the case for a small file.
NB: I'm just another user, @joey please correct me if this is wrong
"""]]

View file

@ -0,0 +1,156 @@
I'm trying to setup git-annex for syncing two clients using a transfer repository. All of that without the webapp UI.
Here's the reproducible scenario with a bash script:
```bash
#/usr/bin/env bash
# Just a way to access the script's directory
cd "$(dirname "$0")"
DIR="$(pwd)"
# Create the 1st client repository
mkdir $DIR/client1
cd $DIR/client1
git init && git annex init
# Create the 2nd client repository
mkdir $DIR/client2
cd $DIR/client2
git init && git annex init
# Create the transfer repository
mkdir $DIR/share
cd $DIR/share
git init && git annex init
# Setup the remotes and groups for the transfer repository
cd $DIR/share
git remote add client1 $DIR/client1
git remote add client2 $DIR/client1
git annex group . transfer
git annex group client1 client
git annex group client2 client
git co -b main
# Setup the remotes and groups for the 1st client repository.
cd $DIR/client1
git remote add share $DIR/share
git annex group . client
git annex group share transfer
git co -b main
# Setup the remotes and groups for the 2nd client repository.
cd $DIR/client2
git remote add share $DIR/share
git annex group . client
git annex group share transfer
git co -b main
# Run git-annex assistant for each repository
cd $DIR/client1 && git annex assistant
cd $DIR/client2 && git annex assistant
cd $DIR/share && git annex assistant
# Add a single file to the 1st client.
cd $DIR/client1
echo "My first file" >> file.txt
```
Result:
client1: I see the auto-commit has been added for file.txt
share: I get the following daemon logs:
```
(scanning...) (started...)
From /home/xxx/git-annex-scenarios/share-between-clients/client1
* [new branch] git-annex -> client2/git-annex
(merging client2/git-annex into git-annex...)
From /home/xxx/git-annex-scenarios/share-between-clients/client1
* [new branch] git-annex -> client1/git-annex
merge: refs/remotes/client2/main - not something we can merge
merge: refs/remotes/client2/synced/main - not something we can merge
merge: refs/remotes/client1/main - not something we can merge
merge: refs/remotes/client1/synced/main - not something we can merge
(merging synced/git-annex into git-annex...)
(recording state in git...)
```
client2: I get the following daemon logs:
```
From /home/xxx/git-annex-scenarios/share-between-clients/share
* [new branch] git-annex -> share/git-annex
(merging share/git-annex into git-annex...)
(recording state in git...)
merge: refs/remotes/share/main - not something we can merge
merge: refs/remotes/share/synced/main - not something we can merge
```
Then, I thought that maybe I needed to do an initial `git pull` for each repository. So I tried adding to the bash script the following lines:
```bash
# Need to do this if there are no commits in the 'client2' and 'share' repositories.
# Or else, I'll get the following logs:
#
# merge: refs/remotes/share/main - not something we can merge
# merge: refs/remotes/share/synced/main - not something we can merge
sleep 3;
cd $DIR/share
git pull client1 main
sleep 3;
cd $DIR/client2
git pull share main
```
But I'm still getting the same error:
```
(scanning...) (started...)
From /home/xxx/git-annex-scenarios/share-between-clients/share
* [new branch] git-annex -> share/git-annex
(merging share/git-annex into git-annex...)
(recording state in git...)
merge: refs/remotes/share/main - not something we can merge
merge: refs/remotes/share/synced/main - not something we can merge
(recording state in git...)
To /home/kolam/git-annex-scenarios/share-between-clients/share
+ 28079ec...ca3c481 git-annex -> synced/git-annex (forced update)
Everything up-to-date
To /home/kolam/git-annex-scenarios/share-between-clients/share
+ 28079ec...ca3c481 git-annex -> synced/git-annex (forced update)
```
However, even though I have that error, `file.txt` now appears in `client2`.
But, the content of `file.txt` is:
```
/annex/objects/SHA256E-s14--14b99b7ab1e9777f7e1c2b482fe2cd95653c7cf35f
459ef0b15bd0d75b2245c9.txt
```
and that link doesn't exist in my filesystem.
Running `git annex whereis file.txt` in `client2` gives me:
```
whereis file.txt (0 copies) failed
whereis: 1 failed
```
So my questions are:
* did I miss something in the steps required to setup the repositories?
* is there some documentation outlining the steps to do so without the webapp?
* how can we enhance the UX for that scenario with better messages?

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="branch"
subject="comment 2"
date="2023-12-03T11:57:56Z"
content="""
There is no specific log to highlight when running the command in `--debug`.
```
[2023-12-03 12:43:49.274023] (Utility.Process) process [40369] done ExitSuccess
git-annex: git: createProcess: chdir: invalid argument (Bad file descriptor)
failed
[2023-12-03 12:43:49.276644] (Utility.Process) process [40197] done ExitSuccess
initremote: 1 failed
```
I ended up refactoring my systems to allow the use of SSH, which seems to be the supported method, and to avoid any further issue down the line.
"""]]