Merge branch 'master' into proxy

This commit is contained in:
Joey Hess 2024-06-17 09:29:34 -04:00
commit 3970bbb03b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
9 changed files with 420 additions and 8 deletions

View file

@ -7,6 +7,8 @@ git-annex (10.20240532) UNRELEASED; urgency=medium
* Fix a bug where interrupting git-annex while it is updating the
git-annex branch for an export could later lead to git fsck
complaining about missing tree objects.
* Tab completion of options like --from now includes special remotes,
as well as proxied remotes and clusters.
* Fix Windows build with Win32 2.13.4+
Thanks, Oleg Tolmatcev

View file

@ -40,6 +40,7 @@ import qualified Types.Backend as Backend
import Utility.HumanTime
import Utility.DataUnits
import Annex.Concurrent
import Remote.List
-- Options that are accepted by all git-annex sub-commands,
-- although not always used.
@ -569,14 +570,30 @@ parseDaemonOptions canstop
)
completeRemotes :: HasCompleter f => Mod f a
completeRemotes = completer $ mkCompleter $ \input -> do
r <- maybe (pure Nothing) (Just <$$> Git.Config.read)
=<< Git.Construct.fromCwd
return $ filter (input `isPrefixOf`) $
mapMaybe remoteKeyToRemoteName $
filter isRemoteUrlKey $
maybe [] (M.keys . config) r
completeRemotes = completer $ mkCompleter $ \input ->
Git.Construct.fromCwd >>= \case
Nothing -> return []
Just g -> completeRemotes' g input
completeRemotes' :: Repo -> [Char] -> IO [[Char]]
completeRemotes' g input = do
g' <- Git.Config.read g
state <- Annex.new g'
Annex.eval state $ do
Annex.setOutput QuietOutput
gc <- Annex.getGitConfig
if isinitialized gc
then do
rs <- remoteList
matches $ map Remote.name rs
else matches $
mapMaybe remoteKeyToRemoteName $
filter isRemoteUrlKey $
M.keys $ config g
where
isinitialized gc = annexUUID gc /= NoUUID && isJust (annexVersion gc)
matches = return . filter (input `isPrefixOf`)
completeBackends :: HasCompleter f => Mod f a
completeBackends = completeWith $
map (decodeBS . formatKeyVariety . Backend.backendVariety) Backend.builtinList

View file

@ -0,0 +1,93 @@
### Please describe the problem.
With an external special remote that handles a custom URL scheme, I receive a "Verification of content failed" on the first `git annex get` of a file (i.e. when git-annex cannot know a checksum for the file, yet).
Sorry that this is hidden in a bit of indirection in a datalad extension, what it does is effectively just implement an external special remote that handles `cds:` URLs and then `git annex addurl --fast --verifiable` those URLs. I get the same verification error even with `--relaxed` instead of `--fast` (though I would like to have the semantics of `--fast`, i.e. record checksum on first download and then always check against that).
### What steps will reproduce the problem?
Install datalad, and datalad-cds from this PR: <https://github.com/matrss/datalad-cds/pull/16>. Then:
[[!format sh """
datalad create test-ds
cd test-ds/
datalad download-cds --lazy --path download.grib '{
"dataset": "reanalysis-era5-pressure-levels",
"sub-selection": {
"variable": "temperature",
"pressure_level": "1000",
"product_type": "reanalysis",
"date": "2017-12-01/2017-12-31",
"time": "12:00",
"format": "grib"
}
}'
git annex get download.grib
"""]]
### What version of git-annex are you using? On what operating system?
```
git-annex version: 10.20240430
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.24.1 bloomfilter-2.0.1.2 crypton-0.34 DAV-1.3.4 feed-1.3.2.1 ghc-9.6.5 http-client-0.7.17 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL VURL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
```
on Ubuntu, installed from a recent version of nixpkgs. Also happens in CI (see PR in datalad-cds) where git-annex is installed from NeuroDebian.
### Please provide any additional information below.
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
$ datalad create test-ds
create(ok): <...> (dataset)
$ cd test-ds/
$ datalad download-cds --lazy --path download.grib '{
"dataset": "reanalysis-era5-pressure-levels",
"sub-selection": {
"variable": "temperature",
"pressure_level": "1000",
"product_type": "reanalysis",
"date": "2017-12-01/2017-12-31",
"time": "12:00",
"format": "grib"
}
}'
save(ok): . (dataset)
cds(ok): <...> (dataset)
$ git annex info download.grib
file: download.grib
size: 0 bytes (+ 1 unknown size)
key: VURL--cds:v1-eyJkYXRhc2V0IjoicmVhbmFs-77566133ebfe9220aefbeed5a58b6972
present: false
$ git annex get download.grib
get download.grib (from cds...)
CDS request is submitted
CDS request is completed
Starting download from CDS
(checksum...)
Verification of content failed
Unable to access these remotes: cds
No other repository is known to contain the file.
failed
get: 1 failed
# End of transcript or log.
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)

View file

@ -0,0 +1,88 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2024-06-11T17:36:51Z"
content="""
interestingly on the client `git restore --staged PATH` managed to recover the link to become \"proper\". And `git-annex restage` did nothing to fix situation with `Modified` file:
```
[bids@rolando VIDS] > git merge --ff-only synced/master
Updating b4f3af57..263dad67
Updating files: 100% (871/871), done.
Fast-forward
.gitattributes | 1 +
.gitignore
...
create mode 100644 logs/2024-05-24T07:35-04:00.log
create mode 100644 logs/2024-05-24T07:35-04:00.logpwd
git-annex: git status will show Videos/2024/03/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log to be modified, since content availability has changed and git-annex was unable to update the index. This is only a cosmetic problem affecting git status; git add, git commit, etc won't be affected. To fix the git status display, you can run: git-annex restage
[bids@rolando VIDS] >
[bids@rolando VIDS] >
[bids@rolando VIDS] >
[bids@rolando VIDS] > git-annex restage
restage ok
[bids@rolando VIDS] > git status
On branch master
Changes to be committed:
(use \"git restore --staged <file>...\" to unstage)
modified: Videos/2024/03/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
[bids@rolando VIDS] > git-annex restage
restage ok
[bids@rolando VIDS] > git status
On branch master
Changes to be committed:
(use \"git restore --staged <file>...\" to unstage)
modified: Videos/2024/03/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
[bids@rolando VIDS] > git-annex restage Videos/2024/03/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
git-annex: This command takes no parameters.
[bids@rolando VIDS] > git status
On branch master
Changes to be committed:
(use \"git restore --staged <file>...\" to unstage)
modified: Videos/2024/03/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
[bids@rolando VIDS] > git restore --staged Videos/2024/03/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
[bids@rolando VIDS] > git status
On branch master
Changes not staged for commit:
(use \"git add <file>...\" to update what will be committed)
(use \"git restore <file>...\" to discard changes in working directory)
modified: Videos/2024/03/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
no changes added to commit (use \"git add\" and/or \"git commit -a\")
[bids@rolando VIDS] > git diff
diff --git a/Videos/2024/03/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log b/Videos/2024/03/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
index 92b79020..fc930f54 100644
--- a/Videos/2024/03/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
+++ b/Videos/2024/03/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
@@ -1 +1 @@
-/annex/objects/MD5E-s69--08983cc11522233e5d4815e4ef62275a.mkv.log
+/annex/objects/MD5E-s68799--29541299bea3691f430d855d2fb432fb.mkv.log
diff --git a/Videos/2024/04/2024.04.04.06.01.22.647_.mkv.log b/Videos/2024/04/2024.04.04.06.01.22.647_.mkv.log
--- a/Videos/2024/04/2024.04.04.06.01.22.647_.mkv.log
+++ b/Videos/2024/04/2024.04.04.06.01.22.647_.mkv.log
@@ -1 +0,0 @@
-/annex/objects/MD5E-s0--d41d8cd98f00b204e9800998ecf8427e.mkv.log
[bids@rolando VIDS] > git log Videos/2024/03/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
commit ef5549f74dfea19c11bf963a7ec9789bce0d925d
Author: ReproStim User <changeme@example.com>
Date: Wed Apr 17 09:38:23 2024 -0400
Move files under subfolders
```
```
[bids@rolando VIDS] > git --version
git version 2.39.2
[bids@rolando VIDS] > git annex version --raw
10.20231129+git83-g86dbe9a825-1~ndall+1
```
"""]]

View file

@ -0,0 +1,66 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2024-06-13T16:31:57Z"
content="""
First I wanted to see if I could get this to happen without the assistant.
joey@darkstar:~/tmp/y>echo '/annex/objects/SHA256E-s30--93c16dbf65b7b66e479bd484398c09c920338e4a1df1fe352b245078d04645f4' > new
joey@darkstar:~/tmp/y>git annex add new
add new ok
joey@darkstar:~/tmp/y>git annex find --format='${key}\n' new
SHA256E-s30--93c16dbf65b7b66e479bd484398c09c920338e4a1df1fe352b245078d04645f4
joey@darkstar:~/tmp/y>git config annex.largefiles anything
joey@darkstar:~/tmp/y>echo '/annex/objects/SHA256E-s30--93c16dbf65b7b66e479bd484398c09c920338e4a1df1fe352b245078d04645f4' > new2
joey@darkstar:~/tmp/y>git add new2
joey@darkstar:~/tmp/y>git annex find --format='${key}\n' new2
SHA256E-s30--93c16dbf65b7b66e479bd484398c09c920338e4a1df1fe352b245078d04645f4
So no, it must be only the assistant that can mess up and add an annexed
link to the annex.
Secondly, here's a way to manually create a repository with this behavior
w/o using the assistant.
joey@darkstar:~/tmp/y>git remote add z ../z
joey@darkstar:~/tmp/y>git-annex move --key SHA256E-s30--93c16dbf65b7b66e479bd484398c09c920338e4a1df1fe352b245078d04645f4 --to z
joey@darkstar:~/tmp/y>echo '/annex/objects/SHA256E-s30--93c16dbf65b7b66e479bd484398c09c920338e4a1df1fe352b245078d04645f4' > funkyobj
joey@darkstar:~/tmp/y>git-annex setkey WORM--foo funkyobj
setkey funkyobj ok
joey@darkstar:~/tmp/y>echo '/annex/objects/WORM--foo' > funky
joey@darkstar:~/tmp/y>git add funky
git-annex: git status will show funky to be modified, since content availability has changed and git-annex was unable to update the index. This is only a cosmetic problem affecting git status; git add, git commit, etc won't be affected. To fix the git status display, you can run: git-annex restage
joey@darkstar:~/tmp/y>git commit -m add funky
joey@darkstar:~/tmp/y>git annex find --format='${key}\n' funky
WORM--foo
joey@darkstar:~/tmp/y>cat funky
/annex/objects/SHA256E-s30--93c16dbf65b7b66e479bd484398c09c920338e4a1df1fe352b245078d04645f4
joey@darkstar:~/tmp/y>git-annex get funky
joey@darkstar:~/tmp/y>
Nothing has gone wrong yet, funky is an unlocked file and it happens to have
the content of an annex pointer file, but git-annex is not treating that
content *as* an annex pointer file. If it were, the `git-annex get funky` above
would get the SHA256 key from remote x.
But in a fresh clone, it's another story:
joey@darkstar:~/tmp>git clone y x
joey@darkstar:~/tmp>cd x
joey@darkstar:~/tmp/x>git remote add z ../z
joey@darkstar:~/tmp/x>cat funky
/annex/objects/WORM--foo
joey@darkstar:~/tmp/x>git-annex get funky
get funky (from origin...)
ok
(recording state in git...)
joey@darkstar:~/tmp/x>git-annex get funky
get funky (from z...)
ok
(recording state in git...)
joey@darkstar:~/tmp/x>cat funky
Thu Jun 13 12:30:17 JEST 2024
Which reproduces what you showed. I think this on its own is a bug, leaving aside whatever caused the assistant to generate this.
"""]]

View file

@ -0,0 +1,38 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2024-06-13T17:07:02Z"
content="""
`git-annex add` (and smudge) use `isPointerFile` to check if a file that is
being added is an annex pointer file. And in that case they stage the
pointer file, rather than injecting it into the annex.
The assistant also checks `isPointerFile` though. And in the simple case,
it also commits a newly added pointer file correctly:
joey@darkstar:~/tmp/b2/a>git-annex assistant
joey@darkstar:~/tmp/b2/a>echo '/annex/objects/SHA256E-s30--93c16dbf65b7b66e479bd484398c09c920338e4a1df1fe352b245078d04645f4' > new
joey@darkstar:~/tmp/b2/a>git show|tail -n 1
+/annex/objects/SHA256E-s30--93c16dbf65b7b66e479bd484398c09c920338e4a1df1fe352b245078d04645f4
So this makes me think of a race condition. What if the file is not a pointer
file when the assistant checks `isPointerFile`. But then it gets turned into
one before it ingests it.
In `git-annex add`, it first stats the file before checking if it's a pointer
file, and later it checks if the file has changed while it was being added,
which should avoid such races.
Looking at the assistant, I'm not at all confident it handles such a race.
It might even be another thread of the assistant that triggered the race.
Could be that something caused the assistant to drop the file,
then get it again, then drop it again. (Eg something wrong with
configuration causing a non-stable state... like "not present" in preferred
content).
I've tried running a get/drop/get/drop loop while the assistant is running,
and have not seen this happen to a file yet. But the race window is probably small.
An interesting thing I did notice is that sometimes when such a loop runs for a while,
the file will be left as a pointer file after `git-annex get`.
"""]]

View file

@ -0,0 +1,91 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2024-06-13T18:01:01Z"
content="""
Looking at the behavior of `git-annex get`, the first one leaves the index
in a diff state:
joey@darkstar:~/tmp/b2/x>git-annex get funky
get funky (from origin...)
ok
(recording state in git...)
joey@darkstar:~/tmp/b2/x>git diff --cached
diff --git a/funky b/funky
index a8813f1..9488a18 100644
--- a/funky
+++ b/funky
@@ -1 +1 @@
-/annex/objects/WORM--foo
+/annex/objects/SHA256E-s30--93c16dbf65b7b66e479bd484398c09c920338e4a1df1fe352b245078d04645f4
To the second `git-annex get`, this is indistinguishable from a different
unlocked file having been moved over top of funky. So the behavior of the
second one is fine.
The problem is with the first `git-annex get` leaving the index in that state.
What's happening is, it doesn't restage the index, because the restage
itself can't tell the difference between this state and an unlocked file having
been moved over top of funky. In particular, `git update-index --refresh --stdin`
when run after the first `git-annex get`, and fed "funky", leaves the index in diff state.
joey@darkstar:~/tmp/b2/x>touch funky
joey@darkstar:~/tmp/b2/x>echo funky | GIT_TRACE=1 git update-index --refresh --stdin
14:14:33.911458 git.c:465 trace: built-in: git update-index --refresh --stdin
14:14:33.911759 run-command.c:657 trace: run_command: 'git-annex filter-process'
14:14:33.917118 git.c:465 trace: built-in: git config --null --list
14:14:33.919641 git.c:465 trace: built-in: git show-ref git-annex
14:14:33.921390 git.c:465 trace: built-in: git show-ref --hash refs/heads/git-annex
14:14:33.925579 git.c:465 trace: built-in: git cat-file --batch
14:14:33.927011 run-command.c:50 trace: run_command: running exit handler for pid 1164525
joey@darkstar:~/tmp/b2/x>git status --short
M funky
So git update-index is running `git-annex filter-process`, which is doing
the same as `git-annex smudge --clean funky` in this case.
And in Command.Smudge.clean, there is a `parseLinkTargetOrPointerLazy'` call
which is intended to avoid storing a pointer file in the annex... The very
thing that the assistant is somehow incorrectly doing. In this case
though, that notices that funky's content looks like an annex pointer file,
so it outputs that pointer. So git stages that pointer.
To avoid this, the first `git-annex get` would need to notice that the
content it got looks like a pointer file. And it would need to communicate
that through the `git update-index` somehow to `git-annex filter-process`. Then
when that saw the same pointer file, it could output the original key, and
this situation would be avoided. Also bear in mind that the
`git update-index` can be interrupted and get restarted later and
it would still need to remember that it was dealing with this case then.
This seems... doable, but it will not be easy.
PS, Full script to synthesize a repository with this situation follows:
git init z
cd z
git-annex init
git commit --allow-empty -m created
cd ..
git clone z y
cd y
git-annex init
echo 'Thu Jun 13 12:30:17 JEST 2024' > foo
git-annex add foo
git commit -m added
git-annex move --foo --to origin
git rm foo
git commit -m removed
echo '/annex/objects/SHA256E-s30--93c16dbf65b7b66e479bd484398c09c920338e4a1df1fe352b245078d04645f4' > funkyobj
git-annex setkey WORM--foo funkyobj
echo '/annex/objects/WORM--foo' > funky
git add funky
git commit -m add\ funky
git annex find --format='${key}\n' funky
git-annex get funky
cd ..
git clone y x
cd x
git remote add z ../z
git-annex get funky
git-annex get funky
"""]]

View file

@ -118,3 +118,5 @@ Stephen Seo,
Antoine Balaine,
mycroft,
Lerrr,
Eve,
Marco,

View file

@ -0,0 +1,15 @@
```
NAME
git-annex-log - shows location log information
SYNOPSIS
git annex log [path ...]
```
although quite often desired to check by the key which might not even be in the tree. `whereis` ( a sister command for similar investigations ) has `--key`, so I thought it would be great to get it here too.
In my case -- doing archaeology on AFNI's test data in [https://github.com/afni/afni/pull/656](https://github.com/afni/afni/pull/656).
[[!meta author=yoh]]
[[!tag projects/repronim]]