move old fixed datalad/dandi/repronim bugs to the project pages

As done previously in 2023 in commit bcc69f07e8

Commands used:

    for f in $(git grep -l '\[\[!tag projects/dandi\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/dandi/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/dandi/bugs-done; fi; fi; done
    for f in $(git grep -l '\[\[!tag projects/repronim\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/repronim/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/repronim/bugs-done; fi; fi; done
    for f in $(git grep -l '\[\[!tag projects/datalad\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/datalad/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/datalad/bugs-done; fi; fi; done
This commit is contained in:
Joey Hess 2025-01-01 13:12:56 -04:00
parent 2fe36b35a2
commit 292acd3c28
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
108 changed files with 0 additions and 0 deletions

View file

@ -1,42 +0,0 @@
### Please describe the problem.
For the past few days consistently
```
496 T Mar 06 GitHub Actions *-4.4* (4.8K/0) datalad/git-annex daily summary: 14 PASSED, 3 FAILED, 5 INCOMPLETE
935 T Mar 05 GitHub Actions *-4.3* (4.8K/0) datalad/git-annex daily summary: 14 PASSED, 3 FAILED, 5 INCOMPLETE
1471 N T Mar 04 GitHub Actions *-1.9* (4.8K/0) datalad/git-annex daily summary: 15 PASSED, 2 FAILED, 5 INCOMPLETE
1704 T Mar 03 GitHub Actions *-0.9* (4.8K/0) datalad/git-annex daily summary: 15 PASSED, 2 FAILED, 5 INCOMPLETE
2619 T Mar 01 GitHub Actions *-3.1* (6.5K/0) datalad/git-annex daily summary: 30 PASSED
2935 O T Feb 28 GitHub Actions *-3.8* (6.5K/0) datalad/git-annex daily summary: 30 PASSED
```
[sample build on OSX](https://github.com/datalad/git-annex/actions/runs/4320138939/jobs/7540059666) says
```
Utility/RawFilePath.hs:40:1: error:
Could not load module System.Posix.Files.ByteString
It is a member of the hidden package unix-2.7.2.2.
You can run :set -package unix to expose it.
(Note: this unloads all the modules in the current scope.)
Use -v (or `:set -v` in ghci) to see a list of the files searched for.
|
40 | import System.Posix.Files.ByteString
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Utility/RawFilePath.hs:41:1: error:
Could not load module System.Posix.Directory.ByteString
It is a member of the hidden package unix-2.7.2.2.
You can run :set -package unix to expose it.
(Note: this unloads all the modules in the current scope.)
Use -v (or `:set -v` in ghci) to see a list of the files searched for.
|
41 | import qualified System.Posix.Directory.ByteString as D
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
[[!meta author=yoh]]
[[!tag projects/datalad]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-03-08T16:17:53Z"
content="""
Windows was already fixed.
OSX got further than that after some fixes on Monday. I've fixed
one more build problem on it which may get it to build again.
"""]]

View file

@ -1,68 +0,0 @@
### Please describe the problem.
Unable to addurl to a `file:///` on Windows
1. doesn't understand `file:///C:/`
2. with `file://C:/` blows with permission denied:
[[!format sh """
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>git annex addurl --file buga file:///C:/123
addurl file:///C:/123
download failed: /C:/123: openBinaryFile: invalid argument (Invalid argument)
failed
git-annex: addurl: 1 failed
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>git annex addurl --file buga file://C:/123
addurl file://C:/123
(to buga)
git-annex: .git\annex\tmp\URL-s6--file&c%%C&c%123: renameFile:renamePath:MoveFileEx "\\\\?\\C:\\Users\\appveyor\\
AppData\\Local\\Temp\\1\\datalad_temp_testrepo_tmphjl88\\.git\\annex\\tmp\\URL-s6--file&c%%C&c%123" Just "\\\\?\\
C:\\Users\\appveyor\\AppData\\Local\\Temp\\1\\datalad_temp_testrepo_tmphjl88\\buga": permission denied (The proce
ss cannot access the file because it is being used by another process.)
failed
git-annex: addurl: 1 failed
"""]]
here is some relevant details (and showing curl handling both file:// and file:///):
[[!format sh """
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>git status
On branch adjusted/master(unlocked)
nothing to commit, working tree clean
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>git annex version
git-annex version: 7.20181205-g51d6f38b1
build flags: Assistant Webapp Pairing S3(multipartupload)(storageclasses) WebDAV TorrentParser Feeds Testsuite
dependency versions: aws-0.17.1 bloomfilter-2.0.1.0 cryptonite-0.23 DAV-1.3.1 feed-0.3.12.0 ghc-8.0.2 http-client
-0.5.7.1 persistent-sqlite-2.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.4.5
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3
_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B51
2E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2S256E BLAKE2S256 BLAKE2S
160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM
URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar hook external
operating system: mingw32 i386
supported repository versions: 5 7
upgrade supported from repository versions: 2 3 4 5 6
local repository version: 7
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>git status
On branch adjusted/master(unlocked)
nothing to commit, working tree clean
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>curl file://C:/123
124
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>curl file:///C:/123
124
"""]]
More information about this appveyor server could be obtained from [datalad wtf](http://paste.debian.net/1055359/) output
Awhile back we [had related discussion](https://git-annex.branchable.com/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/) but at least `addurl` seemed to work then.
[[!meta author=yoh]]
[[!tag projects/repronim]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-03-27T16:27:00Z"
content="""
I tried this on windows, and the second command succeeds now.
The first command still fails as shown.
At this point, what's left of this bug seems to be the same as
<https://git-annex.branchable.com/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/>
"""]]

View file

@ -1,7 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2023-03-27T17:57:12Z"
content="""
Ok, put in an ugly hack to fix this.
"""]]

View file

@ -1,62 +0,0 @@
### Please describe the problem.
Somewhat too late for our current usecase since older git-annex would not know about it, but I think could be generalized into adding a configuration variable right away for **any** automated migration. E.g. there is no variable to prevent autoupgrades of the repos (e.g. from v5 to the next one etc), but AFAIK there is none for automated conversion" into `adjusted/master(unlocked)` mode.
Rationale: With thaw/freeze commands we now can use git-annex in indirect (default) mode on our HPC. But that requires a recent version of git-annex. User might have some other (older) version of git-annex available system-wide by default, and if the user forgets to switch to new version of git-annex before using it, it might trigger git-annex to realize that it operates on crippled FS, and since not knowing about thaw/freeze -- it just migrates repository to adjusted, which is very undesired.
### What steps will reproduce the problem?
here is a demo of older git-annex going back to adjusted branch mode... yet to discover how else we could have migrated without directly invoking `git annex init`:
```
[d31548v@discovery7 d31548v]$ mkdir repo
[d31548v@discovery7 d31548v]$ cd repo
[d31548v@discovery7 repo]$ git init
Initialized empty Git repository in /dartfs/rc/lab/D/DBIC/DBIC/d31548v/repo/.git/
[d31548v@discovery7 repo]$ git config --add annex.thawcontent-command "$HOME/bin-annex/thaw-content %path"
[d31548v@discovery7 repo]$ git config --add annex.freezecontent-command "$HOME/bin-annex/freeze-content %path"
[d31548v@discovery7 repo]$ git annex init
init ok
(recording state in git...)
[d31548v@discovery7 repo]$ echo 123 > 123
[d31548v@discovery7 repo]$ git annex add 123
add 123
ok
(recording state in git...)
git comm[d31548v@discovery7 repo]$ git commit -m 'added 123 in indirect mode' 123
[master (root-commit) 3ceb200] added 123 in indirect mode
1 file changed, 1 insertion(+)
create mode 120000 123
[d31548v@discovery7 repo]$ ls -ld 123
lrwxr-x--- 1 d31548v rc-DBIC 178 May 6 11:19 123 -> .git/annex/objects/G6/qW/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b
[d31548v@discovery7 repo]$ ls -l .git/annex/objects
total 3
drwxr-x--- 3 d31548v rc-DBIC 20 May 6 11:19 G6
[d31548v@discovery7 repo]$ export PATH=/opt/bin:$PATH
[d31548v@discovery7 repo]$ git annex version | head -n 1
git-annex version: 8.20200502-g55acb2e52
[d31548v@discovery7 repo]$ git annex drop 123
drop 123
git-annex: failed to lock content: .git/annex/objects/G6/qW/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b: openFd: permission denied (Permission denied)
failed
git-annex: drop: 1 failed
[d31548v@discovery7 repo]$ git annex init
init
Filesystem allows writing to files whose write bit is not set.
Detected a crippled filesystem.
Disabling core.symlinks.
(scanning for unlocked files...)
Entering an adjusted branch where files are unlocked as this filesystem does not support locked files.
Switched to branch 'adjusted/master(unlocked)'
ok
```
[[!meta author=yoh]]
[[!tag projects/datalad]]
> [[notabug|done]] per comments --[[Joey]]

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2022-05-09T15:00:47Z"
content="""
git-annex does not enter adjusted branch mode except on `git-annex
init` or when you explcitly tell it to. The only exception to this that I
can find is that upgrading from a v5 repository that was in direct mode
will enter an adjusted branch.
Switching back from an adjusted branch to master is a simple `git
checkout`.
These two facts do not argue for a separate config setting IMHO.
"""]]

View file

@ -1,13 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2022-05-09T15:13:39Z"
content="""
It might be worth preventing `git-annex init` when in an existing, already
initalized repo from entering an adjusted branch. But re-running `git-annex
init` generally re-does initialization, except for generating a new UUID
and description. If a repo has been moved to a crippled filesystem,
I think it would be reasonable for a user to expect re-running git-annex
init will react to that. (Which can also involve setting annex.pidlock or
disabling annex.sshcaching.)
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 3"
date="2023-02-17T21:50:19Z"
content="""
here my \"concern\" was freeze/thawing procedure. I am yet to get to the bottom of \"variance\" of how differently different ACL paths behave (some exploration from this friday is [here](https://github.com/dbic/handbook/issues/20)). And what I am afraid is that at some point, something would \"trigger\" git-annex to decide that path here is crippled now -- go to adjusted branches mode. If you say it cannot happen, it is ok - I will become more brave ;-)
"""]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2023-02-20T18:40:00Z"
content="""
Well, if you unset annex.version, it will automatically reinitialize, and
would enter an adjusted branch if a crippled filesystem was detected.
That's the only way I can see that does not involve you running
`git-annex init` (or upgrade from v5 direct mode as mentioned earlier).
"""]]

View file

@ -1,58 +0,0 @@
### Please describe the problem.
Here is a reproducer
```
#!/bin/bash
export PS4='> '
set -x
set -eu
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
mkdir d-in d-repo
echo content >| d-in/file
function dance() {
git annex import master --from d-in
# but we need to merge it
git merge d-in/master
ls -l
grep -e . *
}
cd d-repo
git init
git annex init
git annex initremote d-in type=directory directory=../d-in exporttree=yes importtree=yes encryption=none
git config annex.addunlocked true
ls -l ../d-in
dance
echo "sample" > samplefile
git annex add samplefile
git commit -m 'Committing explicitly samplefile'
ls -l samplefile
git show
dance
```
which even if using super fresh annex 10.20240831+git21-gd717e9aca0-1~ndall+1 shows that files which were obtained via `annex import` and not added unlocked, whenever those which are `git annex add`ed directly, are:
```
> ls -l
total 8
lrwxrwxrwx 1 yoh yoh 178 Sep 11 16:45 file -> .git/annex/objects/zm/2W/SHA256E-s8--434728a410a78f56fc1b5899c3593436e61ab0c731e9072d95e96db290205e53/SHA256E-s8--434728a410a78f56fc1b5899c3593436e61ab0c731e9072d95e96db290205e53
-rw-rw-r-- 1 yoh yoh 7 Sep 11 16:45 samplefile
```
IMHO behavior of `import` should respect setting of `annex.addunlocked`.
This was to consider using `import` for a folder with DANDI stats. For now I will just add them directly.
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2024-12-18T19:24:38Z"
content="""
Turns out that while `git-annex import` from a directory does support
addunlocked, this was forgotten about when implementing the newer special
remote tree import.
I agree that this should be supported.
"""]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2024-12-19T15:31:59Z"
content="""
Note that for --no-content imports, it will not be possible for mimetype=
and mimeencoding= expressions to match.
So if addunlocked is set to such an expression, it will not match and will
add the file locked. Does not seem like a blocker.
"""]]

View file

@ -1,7 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2024-12-19T15:34:48Z"
content="""
Implemented this.
"""]]

View file

@ -1,26 +0,0 @@
### Please describe the problem.
Familiarizing myself more with adjusted branches mode and might be doing smth wrong. But in this http://www.oneukrainian.com/tmp/case-20230630.tgz case I observe that `annex sync` simply updates `master` to some prior state, thus possibly silently causing a data loss for me if I don't spot it:
```
tar -xzf case-20230630.tgz
cd case
content.html@ datasets.datalad.org/ subfolder/
( source ~/git-annexes/10.20230626+git13-g029d12815c.env; git annex version | head -n 1; git describe master; git checkout 'adjusted/master(unlocked)'; git annex sync ; git describe master; )
git-annex version: 10.20230626+git13-g029d12815c-1~ndall+1
0.0.0-2-gf34191a
Switched to branch 'adjusted/master(unlocked)'
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
On branch adjusted/master(unlocked)
nothing to commit, working tree clean
ok
0.0.0-1-gde710c5
```
PS investigation of adjusted/unlocked came up in ReproNim context where people wanted a "hard copy" of the fmriprep results without symlinks to simplify navigation of the results in the browser, which otherwise due to browser resolving symlinks makes it hard and require a workaround like starting a webserver [as we documented in dbic handbook](https://dbic-handbook.readthedocs.io/en/latest/datalad.html#how-to-view-mriqcfmriprepetc-dataladified-results-in-a-browser)
[[!meta author=yoh]]
[[!tag projects/repronim]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,33 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-07-05T19:49:19Z"
content="""
Simplified test case:
git init tc
cd tc
git-annex init
echo 1 > foo
git-annex add
git commit -m add
git annex adjust --unlock
git checkout master
rm foo
echo 2 > foo
git-annex add
git commit -m "this commit will be lost"
git checkout 'adjusted/master(unlocked)'
git annex adjust --unlock # or git-annex sync
git log master
What an unfortunate oversight! And it's not a reversion, it's been there
since the beginning of adjusted branches.
git-annex adjust should display a warning message in that situation,
since the original branch has diverged from the adjusted branch.
And git-annex sync should be able to resolve the divergence by
auto-merging the changes from the original branch into the adjusted
branch.
"""]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2023-07-05T21:01:53Z"
content="""
I've fixed the data loss part of this bug.
`git-annex sync` is able to resolve the divergence too. But for some
reason, the first time it's run after the divergence, it leaves it
diverged, and the second time it resolves it. That needs to be fixed.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2023-07-06T16:16:36Z"
content="""
Ok, fixed git-annex sync to immediately merge the changes from the original
branch into the adjusted branch.
"""]]

View file

@ -1,36 +0,0 @@
### Please describe the problem.
I have not checked if it is legit to have an "empty" port number in the (http) URL but I see that `git` itself handles it fine, but git-annex is not happy:
```
git clone https://datasets.datalad.org:/dbic/QA/.git/
Cloning into 'QA'...
warning: redirecting to https://datasets.datalad.org/dbic/QA/.git/
remote: Enumerating objects: 61661, done.
remote: Counting objects: 100% (61661/61661), done.
remote: Compressing objects: 100% (23181/23181), done.
remote: Total 61661 (delta 31300), reused 56651 (delta 26299)
Receiving objects: 100% (61661/61661), 33.27 MiB | 25.03 MiB/s, done.
Resolving deltas: 100% (31300/31300), done.
git annex get sub-emmet/ses-20180508/anat/sub-emmet_ses-20180508_acq-MPRAGE_T1w.nii.gz
Remote origin not usable by git-annex; setting annex-ignore
https://datasets.datalad.org:/dbic/QA/.git//config download failed: Unsupported url scheme https://datasets.datalad.org:/dbic/QA/.git//config
get sub-emmet/ses-20180508/anat/sub-emmet_ses-20180508_acq-MPRAGE_T1w.nii.gz (from datasets.datalad.org...)
(scanning for annexed files...)
ok
(recording state in git...)
```
so it got the file only after enabling type=git special remote for the same location but with correct URL.
I think it would be nice if git-annex was as robust as git in such cases to avoid "late surprise".
Backstory: Happened to a user trying to access some NWB files on gin for DANDI project, here I used different/simpler/faster URL
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,26 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-02-10T17:04:51Z"
content="""
Not a legal url really, RFC 1738 says "If the port is omitted, the colon is as well."
But web browsers, curl, wget, etc do mostly seem to support it, so at least
Postel's law seems to apply..
Here's the root cause of it failing:
ghci> parseRequest "https://datasets.datalad.org:/dbic/QA/.git/"
*** Exception: InvalidUrlException "https://datasets.datalad.org:/dbic/QA/.git/" "Invalid port"
So http-conduit refuses to parse it and so can't be used to download it.
Filed an issue, but I don't know if they'll want to change
http-conduit to accept a malformed url.
<https://github.com/snoyberg/http-client/issues/501>
Since network-uri is able to parse it, into an URI
that has `"uriPort = ":"`, git-annex could special
case handling of the empty port there, changing it to ""
and so generating an url that http-conduit can parse.
I've implemented this fix.
"""]]

View file

@ -1,190 +0,0 @@
### Please describe the problem.
Our DataLad test which explicitly tests that we are not breeding commits in git-annex branch while adding files/urls to point to datalad-archive special remote started to fail going from git-annex 10.20240532-gf9ce7a452cc0fd5cdd2d58739741f7264fdbc598 to 10.20240532-g28f5c47b5a0daf96e5ed9aa719ff1e2763d3cc8b
(invocation: `python -m pytest -s -v datalad/local/tests/test_add_archive_content.py::TestAddArchiveOptions::test_add_delete_after_and_drop_subdir`)
If before we had a single commit
<details>
<summary></summary>
```shell
git log -p git-annex^..git-annex
commit b42433cab9f671d206fe937ee7b68b53f11a0c54 (git-annex)
Author: DataLad Tester <test@example.com>
Date: Sun Jun 30 10:48:16 2024 -0400
update
diff --git a/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log
new file mode 100644
index 0000000..cc638db
--- /dev/null
+++ b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log
@@ -0,0 +1,2 @@
+1719758896s 1 c04eb54b-4b4e-5755-8436-866b043170fa
+1719758897s 0 d53ab0e3-21a9-4084-806f-bf9f5812f34e
diff --git a/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log.web b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log.web
new file mode 100644
index 0000000..8ef0f1f
--- /dev/null
+++ b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log.web
@@ -0,0 +1 @@
+1719758896s 1 :dl+archive:MD5E-s3584--2f350c3650d5e3a21785d55f5a94ce70.tar#path=1/file.txt&size=4
diff --git a/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
new file mode 100644
index 0000000..cc638db
--- /dev/null
+++ b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
@@ -0,0 +1,2 @@
+1719758896s 1 c04eb54b-4b4e-5755-8436-866b043170fa
+1719758897s 0 d53ab0e3-21a9-4084-806f-bf9f5812f34e
diff --git a/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log.web b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log.web
new file mode 100644
index 0000000..30bb5e9
--- /dev/null
+++ b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log.web
@@ -0,0 +1 @@
+1719758896s 1 :dl+archive:MD5E-s3584--2f350c3650d5e3a21785d55f5a94ce70.tar#path=1/1.dat&size=5
```
</details>
<details>
<summary>now we got two</summary>
```shell
Author: DataLad Tester <test@example.com>
Date: Sun Jun 30 10:45:12 2024 -0400
update
diff --git a/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log
new file mode 100644
index 0000000..97acf53
--- /dev/null
+++ b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log
@@ -0,0 +1,2 @@
+1719758713s 0 86661c7b-0604-49e7-8d65-1baf4ca9f469
+1719758712s 1 c04eb54b-4b4e-5755-8436-866b043170fa
diff --git a/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log.web b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log.web
new file mode 100644
index 0000000..e5bafba
--- /dev/null
+++ b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log.web
@@ -0,0 +1 @@
+1719758712s 1 :dl+archive:MD5E-s3584--de6498c9ca26fee011f289f5f5972ed0.tar#path=1/file.txt&size=4
diff --git a/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
index 11934b6..97acf53 100644
--- a/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
+++ b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
@@ -1,2 +1,2 @@
-1719758712s 1 86661c7b-0604-49e7-8d65-1baf4ca9f469
+1719758713s 0 86661c7b-0604-49e7-8d65-1baf4ca9f469
1719758712s 1 c04eb54b-4b4e-5755-8436-866b043170fa
commit 8c4fdbadb4b1735cbb47f833ef99235790b8bcbf
Author: DataLad Tester <test@example.com>
Date: Sun Jun 30 10:45:12 2024 -0400
update
diff --git a/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
new file mode 100644
index 0000000..11934b6
--- /dev/null
+++ b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
@@ -0,0 +1,2 @@
+1719758712s 1 86661c7b-0604-49e7-8d65-1baf4ca9f469
+1719758712s 1 c04eb54b-4b4e-5755-8436-866b043170fa
diff --git a/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log.web b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log.web
new file mode 100644
index 0000000..107c66f
--- /dev/null
+++ b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log.web
@@ -0,0 +1 @@
+1719758712s 1 :dl+archive:MD5E-s3584--de6498c9ca26fee011f289f5f5972ed0.tar#path=1/1.dat&size=5
```
</details>
for the same effect. And I believe the command which triggers them is `['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'addurl', '--with-files', '--json', '--json-error-messages', '--batch']` which before (for years?!) resulted in expected single commit.
<details>
<summary>Here is the full set of datalad logs for the steps triggering that </summary>
```shell
[DEBUG ] Determined class of decorated function: <class 'datalad.local.add_archive_content.AddArchiveContent'>
[DEBUG ] Resolved dataset to add-archive-content: /home/yoh/.tmp/datalad_temp_tree_rsua9kmg
[DEBUG ] Determined class of decorated function: <class 'datalad.core.local.status.Status'>
[DEBUG ] Resolved dataset to report status: /home/yoh/.tmp/datalad_temp_tree_rsua9kmg
[DEBUG ] Querying AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).diffstatus() for paths: [PosixPath('/home/yoh/.tmp/datalad_temp_tree_rsua9kmg/subdir/1.tar')]
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
[DEBUG ] Query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory', '--', 'subdir/1.tar'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] Done query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG ] Done AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'ls-files', '-z', '-m', '-d', '--', 'subdir/1.tar'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
[DEBUG ] Query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l', '--', 'subdir/1.tar'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] Done query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG ] Done AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'status', '--porcelain', '--untracked-files=normal', '--ignore-submodules=none'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'find', '--anything', '--json', '--json-error-messages', '-c', 'annex.dotfiles=true', '--', 'subdir/1.tar'] (protocol_class=AnnexJsonProtocol) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] Finished ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'find', '--anything', '--json', '--json-error-messages', '-c', 'annex.dotfiles=true', '--', 'subdir/1.tar'] with status 0
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'contentlocation', 'MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar', '-c', 'annex.dotfiles=true'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[INFO ] Adding content of the archive subdir/1.tar into annex AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] Initiating clean cache for the archives under /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives
[DEBUG ] Cache initialized
[DEBUG ] Not initiating existing cache for the archives under /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives
[DEBUG ] Cached directory for archive /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/annex/objects/gg/zf/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar is fbab09b98e
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'cat-file', 'blob', 'git-annex:remote.log'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[Level 11] CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false cat-file blob git-annex:remote.log' failed with exitcode 128 [err: 'fatal: path 'remote.log' does not exist in 'git-annex'']
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'cat-file', 'blob', 'git-annex:trust.log'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[Level 11] CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false cat-file blob git-annex:trust.log' failed with exitcode 128 [err: 'fatal: path 'trust.log' does not exist in 'git-annex'']
[INFO ] Initializing special remote datalad-archives
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'initremote', 'datalad-archives', 'encryption=none', 'type=external', 'autoenable=true', 'externaltype=datalad-archives', 'uuid=c04eb54b-4b4e-5755-8436-866b043170fa', '-c', 'annex.dotfiles=true'] (protocol_class=StdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] Finished ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'initremote', 'datalad-archives', 'encryption=none', 'type=external', 'autoenable=true', 'externaltype=datalad-archives', 'uuid=c04eb54b-4b4e-5755-8436-866b043170fa', '-c', 'annex.dotfiles=true'] with status 0
[DEBUG ] Run ['git', 'config', '-z', '-l', '--show-origin'] (protocol_class=StdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] Finished ['git', 'config', '-z', '-l', '--show-origin'] with status 0
[DEBUG ] Acquiring a lock /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e.extract-lck
[DEBUG ] Acquired? lock /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e.extract-lck: True
[DEBUG ] Extracting /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/annex/objects/gg/zf/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar under /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e
[DEBUG ] Run ['7z', 'x', '/home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/annex/objects/gg/zf/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar'] (protocol_class=KillOutput) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e)
[DEBUG ] Finished ['7z', 'x', '/home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/annex/objects/gg/zf/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar'] with status 0
[DEBUG ] Releasing lock /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e.extract-lck
[INFO ] Start Extracting archive
[DEBUG ] Adding /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/subdir/.dataladiwgxvqzi/1/1.dat to annex pointing to dl+archive:MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar#path=1/1.dat&size=5 and with options None
[DEBUG ] Starting new runner for BatchedAnnex(command=['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'addurl', '--with-files', '--json', '--json-error-messages', '--batch'], encoding=None, exception_on_timeout=False, last_request=None, output_proc=<function readline_json at 0x7f165f5adf80>, path=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg, return_code=None, runner=None, stderr_output=b'', timeout=None, wait_timed_out=None)
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'addurl', '--with-files', '--json', '--json-error-messages', '--batch'] (protocol_class=BatchedCommandProtocol) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] Starting new runner for BatchedAnnex(command=['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'dropkey', '--force', '--json', '--json-error-messages', '--batch'], encoding=None, exception_on_timeout=False, last_request=None, output_proc=<function readline_json at 0x7f165f5adf80>, path=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg, return_code=None, runner=None, stderr_output=b'', timeout=None, wait_timed_out=None)
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'dropkey', '--force', '--json', '--json-error-messages', '--batch'] (protocol_class=BatchedCommandProtocol) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] Adding /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/subdir/.dataladiwgxvqzi/1/file.txt to annex pointing to dl+archive:MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar#path=1/file.txt&size=4 and with options None
[INFO ] Finished adding subdir/1.tar: Files processed: 2, renamed: 2, removed: 2, +annex: 2
[DEBUG ] Removing extracted and annexed files under /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/subdir/.dataladiwgxvqzi
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'rm', '--force', '-r', '--', 'subdir/.dataladiwgxvqzi'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] Query status of AnnexRepo('/home/yoh/.tmp/datalad_temp_tree_rsua9kmg') for all paths
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
[DEBUG ] Query repo: ['ls-files', '--stage', '-z']
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'ls-files', '--stage', '-z'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] Done query repo: ['ls-files', '--stage', '-z']
[DEBUG ] Done AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'ls-files', '-z', '-m', '-d'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
[DEBUG ] Query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
[DEBUG ] Done query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG ] Done AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
[INFO ] Extracting archive 2 Files done in 0.872975 sec at 2.29102 Files/sec
[DEBUG ] Cleaning up the cache for /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/annex/objects/gg/zf/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar under /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e
[DEBUG ] Cleaning up the stamp file for /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/annex/objects/gg/zf/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar under /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e.stamp
add-archive-content(ok): /home/yoh/.tmp/datalad_temp_tree_rsua9kmg (dataset)
```
</details>
[[!meta author=yoh]]
[[!tag projects/repronim]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,19 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2024-07-31T14:20:51Z"
content="""
Note that this does not affect the number of commits made by `addurl` generally
eg when adding multiple urls with --batch from the web.
Also, I don't think that the commits you picked out and showed necessarily
correspond to one-another. The state being recorded in the commit in the 1st
run is not the same as the state that gets recorded by the two commits in the
2nd run. Unless, there is an actual behavior change that eg, leaves the file
present in a repository that it was not present in before.
In the first run the commit shows key
MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat ends up recorded as present in
datalad-archives but not in the local repository. In the second run, the
commits show that the same key ends up recorded present in both repositories.
"""]]

View file

@ -1,23 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2024-07-31T16:06:38Z"
content="""
Bisected to [[!commit 780367200b14d532f745079dfa09ffaa214d0a84]],
"remove dead nodes when loading the cluster log".
Replacing `loadClusters` with a noop on top of that commit gets the test
suite passing again.
Since nothing in `loadClusters` involves the location log at all, I think
this must come down to a difference in when/if git-annex starts reading
from the git-annex branch. There could be git-annex commands that didn't
used to read from the branch before, that now do. Which might mean merging
in other git-annex branches at different points in time than happened
before, which I suppose can result in an additional commit.
Unfortunately, I can't avoid the early `loadClusters` for reasons explained
in that commit.
Anyway, I doubt this will result in a lot of additional commits.
"""]]

View file

@ -1,9 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2024-07-31T19:50:38Z"
content="""
Aha! I found a way around the dependency loop.
This is fixed.
"""]]

View file

@ -1,88 +0,0 @@
### Please describe the problem.
I need to "quickly" ensure that remote has all the files it should have gotten. For that I use invocation like
```
time git annex copy --fast --from web --to dandi-dandisets-dropbox
```
or
```
time git annex copy --auto --from web --to dandi-dandisets-dropbox
```
but then in the cases where all files are already there according to
```
dandi@drogon:/mnt/backup/dandi/dandisets/000003$ time git annex find --not --in dandi-dandisets-dropbox
real 0m0.562s
user 0m0.051s
sys 0m0.019s
```
the `copy` still goes and checks every chunk of every file
```
dandi@drogon:/mnt/backup/dandi/dandisets/000003$ time git annex copy --fast --from web --to dandi-dandisets-dropbox
copy sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
^C
real 0m3.886s
user 0m0.037s
sys 0m0.032s
```
so to achieve what I need, I thought to explicitly specify the query:
```
dandi@drogon:/mnt/backup/dandi/dandisets/000003$ time git annex copy --fast --not --in dandi-dandisets-dropbox --from web --to dandi-dandisets-dropbox
real 0m0.221s
user 0m0.056s
sys 0m0.018s
```
but it doesn't works out correctly whenever there are some files to actually copy:
```
dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex find --in web --not --in dandi-dandisets-dropbox | nl | tail -n 2
40 sub-440889/sub-440889_ses-837360280_obj-raw_behavior+image+ophys.nwb
41 sub-440889/sub-440889_ses-838633305_obj-raw_behavior+image+ophys.nwb
dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex copy --fast --from web --to dandi-dandisets-dropbox --not --in dandi-dandisets-dropbox
dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex copy --fast --from web --to dandi-dandisets-dropbox --in web --not --in dandi-dandisets-dropbox
dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex copy --from web --to dandi-dandisets-dropbox --in web --not --in dandi-dandisets-dropbox
```
so the only way now would be to pipe `find` output into `copy`?
note on edit: filed a dedicated [https://git-annex.branchable.com/bugs/copy_--from_--to_does_not_copy_if_present_locally/](https://git-annex.branchable.com/bugs/copy_--from_--to_does_not_copy_if_present_locally/)
NB `git annex find` has `-z` for input but not for output...
refs to related reports/issues which were said to be addressed for `--fast` mode:
- [https://git-annex.branchable.com/forum/copy_--auto_copies_already_synced_files/](https://git-annex.branchable.com/forum/copy_--auto_copies_already_synced_files/)
- [https://git-annex.branchable.com/forum/batch_check_on_remote_when_using_copy/](https://git-annex.branchable.com/forum/batch_check_on_remote_when_using_copy/)
### What version of git-annex are you using? On what operating system?
```
10.20230321-1~ndall+1
```
and then in conda with `10.20230626-g801c4b7`
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,13 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-11-17T20:57:19Z"
content="""
> but it doesn't works out correctly whenever there are some files to actually copy
I think that was due to the bug you linked, which is now fixed.
I've confirmed that `--fast` is not actually implemented for `git-annex
copy --from --to`. Explicitly specifying `--not --in destremote` is a
fine workaround. But I've gone ahead and implemented `--fast` for it too.
"""]]

View file

@ -1,7 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2023-11-17T21:16:35Z"
content="""
BTW `git-annex find --print0` is the output eqivilant of -z.
"""]]

View file

@ -1,55 +0,0 @@
### Please describe the problem.
```
(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/27964c5b-6ccd-4e23-a67d-a535282bab34$ git annex whereis 0/0/0/13/2/12 | head
whereis 0/0/0/13/2/12 (1 copy)
00000000-0000-0000-0000-000000000001 -- web
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/0/0
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/1/0
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/10/0
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/11/0
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/12/0
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/13/0
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/14/0
(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/27964c5b-6ccd-4e23-a67d-a535282bab34$ time git annex copy --from web --to dandi-dandizarrs-dropbox 0/0/0/13/2/12
copy 0/0/0/13/2/12 (from web...) (to dandi-dandizarrs-dropbox...) ok
real 0m0.366s
user 0m0.104s
sys 0m0.042s
(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/27964c5b-6ccd-4e23-a67d-a535282bab34$ git annex whereis 0/0/0/13/2/12 | head
whereis 0/0/0/13/2/12 (1 copy)
00000000-0000-0000-0000-000000000001 -- web
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/0/0
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/1/0
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/10/0
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/11/0
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/12/0
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/13/0
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/14/0
(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/27964c5b-6ccd-4e23-a67d-a535282bab34$ git annex list 0/0/0/13/2/12
here
|github
||dandiapi
|||web
||||bittorrent
|||||dandi-dandizarrs-dropbox (untrusted)
||||||
__XX__ 0/0/0/13/2/12
```
I would expect `copy` to make a record locally that now the content is also on destination remote, so 2nd invocation of `copy --from ... --to ... --auto` does nothing.
### What version of git-annex are you using? On what operating system?
```
10.20230227-gb02b9cc Debian GNU/Linux
```
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-03-13T18:34:36Z"
content="""
Aah, I see, this is when the content is present on the --to
remote, but git-annex is not locally aware of that yet.
And `git-annex copy --to remote` does
update location tracking in such a case, so --from --to should also.
"""]]

View file

@ -1,93 +0,0 @@
### Please describe the problem.
originally reported while composing [https://git-annex.branchable.com/bugs/copy_--fast_--from_--to_checks_destination_files/](https://git-annex.branchable.com/bugs/copy_--fast_--from_--to_checks_destination_files/) but it is a separate issue: some files are simply not `annex copy`'ed at all: here it tries 6 out of 8 files and still reports that 2 are not on the target remote:
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex copy --from web --to dandi-dandisets-dropbox --fast
copy sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210818T112556_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 696.194 MBytes (730012683 Bytes)
(from web...) (to dandi-dandisets-dropbox...) ok
copy sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210830T100716_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 224.618 MBytes (235528804 Bytes)
(from web...) (to dandi-dandisets-dropbox...) ok
copy sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210920T120959_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 295.387 MBytes (309735634 Bytes)
(from web...) (to dandi-dandisets-dropbox...) ok
copy sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210920T181347_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 860.168 MBytes (901951882 Bytes)
(from web...) (to dandi-dandisets-dropbox...) ok
copy sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave/sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave_ses-20211124T174401_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 856.342 MBytes (897939760 Bytes)
(from web...) (to dandi-dandisets-dropbox...) ok
copy sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20220525T092829_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 948.656 MBytes (994737479 Bytes)
(from web...) (to dandi-dandisets-dropbox...) ok
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex find --in web --not --in dandi-dandisets-dropbox | nl
1 sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210818T173531_behavior+ophys.nwb
2 sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb
```
and it seems to boil down (at least in one case, don't know yet if generalizes to other cases I have) to having those keys present locally:
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex find --in web --not --in dandi-dandisets-dropbox | xargs ls -lL
-r--r--r-- 1 dandi dandi 3878847966 Mar 16 2023 sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210818T173531_behavior+ophys.nwb
-r--r--r-- 1 dandi dandi 3665589468 Mar 16 2023 sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb
```
but somehow it doesn't know that it has them according to `list`:
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex list
here
|github
||dandiapi
|||web
||||bittorrent
|||||dandi-dandisets-dropbox (untrusted)
||||||
__XX_x sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210818T112556_behavior+ophys.nwb
__XX__ sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210818T173531_behavior+ophys.nwb
__XX_x sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210830T100716_behavior+ophys.nwb
__XX_x sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210920T120959_behavior+ophys.nwb
__XX_x sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210920T181347_behavior+ophys.nwb
__XX__ sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb
__XX_x sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave/sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave_ses-20211124T174401_behavior+ophys.nwb
__XX_x sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20220525T092829_behavior+ophys.nwb
```
running without `--from web` starts the transfer:
```
git annex copy --fast --to dandi-dandisets-dropbox
```
IMHO it should perform copy from the local store into the remote since in effect it would be fulfilling the goal - adding a copy to the destination.
I didn't check `move` command but if it does support similar `--from --to` and has similar defect -- should just compliment with dropping after from the original remote.
### What version of git-annex are you using? On what operating system?
10.20230626-g801c4b7 from conda-forge .
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,57 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-11-17T19:58:39Z"
content="""
> -r--r--r-- 1 dandi dandi 3665589468 Mar 16 2023 sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb
This could be an unlocked file that has gotten modified but the staged
version is not actually present locally. Or if `git-annex fsck` on it says
its fixing the location logs, that would tell us something happened that
got the location tracking out of sync with reality.
So possibly there's an issue that could be tracked down regarding the state
of that file. But in either case, git-annex doesn't know it has a local
copy of the file, so `copy --from --to` could not use it.
----
But: `copy --from --to` does in fact have an interesting bug:
joey@darkstar:~/tmp/bench/r2>git-annex whereis foo
whereis foo (2 copies)
22dfa446-7482-4c0a-92c9-70db793859fb -- joey@darkstar:~/tmp/bench/r [origin]
8a504049-2c22-4baa-9a16-218e9561608b -- joey@darkstar:~/tmp/bench/r2 [here]
ok
joey@darkstar:~/tmp/bench/r2>git-annex copy foo --from origin --to r3
joey@darkstar:~/tmp/bench/r2>
So the file content being present locally prevents it sending it to the remote! This needs to get fixed.
Hmm: In the corresponding case of `git-annex move --from --to`, it does not
behave that way.
----
As far as what the behavior ought to be when a file is present locally but not on the --from remote,
the documentation does say:
--from=remote
Copy the content of files from the specified remote to the local repository.
Any files that are not available on the remote will be silently skipped.
So it is behaving as documented. I can think of two reasons why that
documented behavior makes some sense:
* The user may be intending to only copy files --to that are present in --from.
The local repo may have a lot of files they do not want to populate --to.
(For example, perhaps the goal is to make a replica of the --from
repository.)
With that said, the user could do `git-annex copy --from foo --to bar --in foo`
to explicitly only act on files that are present in it.
* Performance. Needing to check if there is a local copy when there is no
remote copy would be a little extra work. Likely not enough to be
significant though.
"""]]

View file

@ -1,9 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2023-11-17T20:27:37Z"
content="""
> So the file content being present locally prevents it sending it to the remote!
Fixed that.
"""]]

View file

@ -1,27 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2023-11-17T20:33:01Z"
content="""
That bug I fixed would also explain the behavior that you saw if the
content *was* present locally, and the location log *was* out of date about
that.
In that situation, git-annex sees that the object file is present, and so
treats the content as present, despite the location log not knowing it's
present. Which triggers the situation of the bug I fixed, causing it to
skip copying the file.
Also, there's a pretty easy way to get into this situation. When the file
is not present, run `git-annex --from --to`. Then interrupt it after it's
downloaded the file --from but before it's finished sending it --to.
This results in the file being present locally, but only transiently so it
didn't update the location log.
So my guess is you interrupted a copy like that (or it failed incomplete
for whatever reason).
Now that I've fixed that bug, the behavior in that situation is that it
does copy the file to the remote. And then it drops the local copy since
the location log doesn't contain it. So it resumes correctly now.
"""]]

View file

@ -1,19 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2023-11-17T20:42:24Z"
content="""
So that leaves only the question of what it should do when
content is present locally but not on the --from remote.
Another reason for the current behavior is to be symmetric with `git-annex
move --from foo --to bar`. It would be surprising, I think, if that
populated bar with files that are not present in foo, but are in the local
repository!
So I'm inclined to not change the documented behavior. If you want to
populate a remote with files that are either in the local repo or in a
--from remote, you can just run `git-annex copy` twice after all.
(Or there could be a new option like `git-annex copy --to bar --from foo --or-from-here`)
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 5"
date="2023-11-18T01:35:35Z"
content="""
> (Or there could be a new option like git-annex copy --to bar --from foo --or-from-here)
or may be
`git-annex copy --to bar --from remote1 --or-from remote2 ...` or alike so there could be a sequence (in order of preference) of remotes? or better a general `git-annex copy --to bar --from-anywhere` so that `annex` first `get`'s it following current set costs etc if not present here, and then copies over.
"""]]

View file

@ -1,16 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2023-11-30T18:26:30Z"
content="""
I like the idea of `copy --from-anywhere --to=remote` and just
use the lowest cost remote (when not in local repo). Like `git-annex get`
and `git-annex copy --to=here`.
Hmm, if there's a remote that is too expensive to want to use in such a
copy, it would be possible to use `-c remote.foo.annex-ignore=true`
to make it avoid using that remote. As can also be done in the case of
`git-annex get`, although that was not documented well.
I've implemented --from-anywhere..
"""]]

View file

@ -1,45 +0,0 @@
### Please describe the problem.
```
( source ~/git-annexes/10.20230407+git201-g5df89d58c7.env; git annex version | head -n 1; git annex findkeys --in here | git annex dropkey --force --batch -z ; )
git-annex version: 10.20230407+git201-g5df89d58c7-1~ndall+1
git-annex: <stdout>: commitBuffer: resource vanished (Broken pipe)
dropkey MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz
ok
ls -ld .git/annex/objects/**/*gz/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz
-r-------- 1 yoh yoh 5663237 May 19 09:50 .git/annex/objects/V7/Pj/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz
( source ~/git-annexes/10.20230407+git201-g5df89d58c7.env; git annex version | head -n 1; git annex findkeys --in here | git annex dropkey --force --batch ; )
git-annex version: 10.20230407+git201-g5df89d58c7-1~ndall+1
git-annex: <stdout>: commitBuffer: resource vanished (Broken pipe)
dropkey MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz ok
ls -ld .git/annex/objects/**/*gz/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz
ls: cannot access '.git/annex/objects/**/*gz/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz': No such file or directory
```
and also was reported on 10.20230407 to not return anything causing us to stall: [https://github.com/datalad/datalad/issues/7315#issuecomment-1554348911](https://github.com/datalad/datalad/issues/7315#issuecomment-1554348911).
[[!meta author=yoh]]
[[!tag projects/datalad]]
### What steps will reproduce the problem?
### What version of git-annex are you using? On what operating system?
### Please provide any additional information below.
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
# End of transcript or log.
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
> [[closing|done]] per my comments --[[Joey]]

View file

@ -1,13 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-05-19T17:39:29Z"
content="""
You are piping non-null-terminated output into a command that needs
terminating nulls. So, it reads the entire findkeys output, including
newlines as the name of a key. And drops that key, which doesn't exist of
course.
With `findkeys --print0`, it does work. It would also be fine to not use
`-z`, since keys should never actually contain a newline in their name.
"""]]

View file

@ -1,20 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2023-05-19T17:53:51Z"
content="""
However, after successfully dropping all the keys with `--print0`, there
is then this oddity:
git-annex: Batch input parse failure: bad key
That's a bug in nul splitting when there's a trailing nul. Oops. I've
fixed that.
Also while I reproduced the rest of the behavior, I didn't see this part:
commitBuffer: resource vanished
I'm not sure which command that comes from. Probably I think the findkeys,
if its entire output was not consumed for some reason.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 3"
date="2023-05-19T18:47:49Z"
content="""
It makes total sense, thank you Joey! I guess a little odd behavior is only the reporting of git annex `ok` for dropping an unknown key. I guess like with `rm unknownfile` (unless `-f` is used) I would have expected it to error out.
"""]]

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 4"
date="2023-05-19T18:49:29Z"
content="""
re vanished -- it is from `annex version` whenever its output is not fully written out due to use of `head`:
```
( source ~/git-annexes/10.20230407+git201-g5df89d58c7.env; git annex version | head -n 1)
git-annex version: 10.20230407+git201-g5df89d58c7-1~ndall+1
git-annex: <stdout>: commitBuffer: resource vanished (Broken pipe)
```
"""]]

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 5"
date="2023-05-19T18:49:48Z"
content="""
re vanished -- it is from `annex version` whenever its output is not fully written out due to use of `head`:
```
( source ~/git-annexes/10.20230407+git201-g5df89d58c7.env; git annex version | head -n 1)
git-annex version: 10.20230407+git201-g5df89d58c7-1~ndall+1
git-annex: <stdout>: commitBuffer: resource vanished (Broken pipe)
```
"""]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2023-05-19T18:54:53Z"
content="""
Aha, thanks for clearing up that `git-annex version` does that! That seems
like a bit of a bug on its own really.
.. Fixed that.
"""]]

View file

@ -1,18 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2023-05-19T18:55:35Z"
content="""
The reason dropkeys does not error on an unknown key is that it's entirely
possible to get a repository into a state where a key's content is present
but the key is otherwise unknown to git-annex. Eg, it doesn't have any
location tracking information for it, there are no files in the git repo
that point to it, etc.
It makes sense to support dropping the content of such a key.
And, dropkeys intentionally operates the same on a key when its content is
not present as it does when the content is present and it successfully
dropped it. Because in either case the result is now that the specified
key's content is not present.
"""]]

View file

@ -1,9 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 8"
date="2023-05-19T19:22:04Z"
content="""
Gotcha. Just a food for possible discussion/future: I think it is more then of \"annotation\" of the action outcome to be not just a binary \"ok/fail\". Indeed `dropkey` can say \"ok\" as to the promise that in the end there is no key (either it was known or not etc). But it can arrive there differently. Similarish for \"fail\". In DataLad we have now 4 \"status\" states: \"ok\", \"notneeded\", \"impossible\", \"error\" for that reason where first two are for \"ok\" and the other two for \"fail\". [documented here](https://github.com/datalad/datalad/blob/HEAD/docs/source/design/result_records.rst#status].
So, here `dropkey unknown` was more of \"notneeded\" success I guess if it was for datalad to report. May be `--json` records and non-json output of `git-annex` in the future could somehow discriminate between those outcomes.
"""]]

View file

@ -1,13 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 9"""
date="2023-05-23T15:42:39Z"
content="""
Many commands do reflect "notneeded" by not displaying any output.
(I suppose that could even be a problem with --json --batch, since
a command like drop will not output anything when it has nothing to do.)
In the case of dropkey, it could have skipped displaying anything for keys
that don't exist, but changing that now doesn't seem wise.
"""]]

View file

@ -1,46 +0,0 @@
### Please describe the problem.
```shell
$> git annex version
git-annex version: 10.20230828+git6-g86c70833a1-1~ndall+1
...
$> git annex enableremote typhon
enableremote (normal) typhon
Unable to parse git config from typhon
Remote typhon does not have git-annex installed; setting annex-ignore
This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote typhon
failed
enableremote: 1 failed
```
here git-annex hints on git-annex not being installed (totally not true), or inability to parse config (in effect it is true but not for the reason of config being wrong etc).
It is all because that folder on the ssh remote belongs to someone else and if I run shell command manually then I see the hint from `git` itself:
```
$> ssh typhon git-annex-shell configlist /mnt/DATA/data/studies/bep302/gin_BEP032-examples --debug
[2023-08-31 11:57:26.338523978] (Utility.Process) process [3594411] read: git ["--git-dir=/mnt/DATA/data/studies/bep302/gin_BEP032-examples/.git","--work-tree=/mnt/DATA/data/studies/bep302/gin_BEP032-examples","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
[2023-08-31 11:57:26.339808748] (Utility.Process) process [3594411] done ExitSuccess
[2023-08-31 11:57:26.340366568] (Utility.Process) process [3594412] read: git ["config","--local","--list"]
[2023-08-31 11:57:26.342570264] (Utility.Process) process [3594412] done ExitFailure 128
[2023-08-31 11:57:26.342620672] (Git.Config) config output: fatal: --local can only be used inside a git repository
git-annex-shell: Git refuses to operate in this repository,
probably because it is owned by someone else.
To add an exception for this directory, call:
git config --global --add safe.directory /mnt/DATA/data/studies/bep302/gin_BEP032-examples
```
so, ideally `git annex enableremote` should provide a similar diagnostic output instead of incorrect reasons stated.
[[!meta author=yoh]]
[[!tag projects/dandi]]
```
> [[fixed|done]] --[[Joey]]

View file

@ -1,24 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-09-07T17:01:41Z"
content="""
I wonder if it even makes sense for git-annex-shell to replicate this git
security check, or would it be better for it to instruct git to trust the
repository, so it can be used on it?
git's CVE-2022-24765 involves a malicious creation of a .git repository
above the victim's cwd, with a .git/config that causes things like eg shell
prompts that run git to execute attacker-controlled commands.
git-annex-shell commands all take the directory that the repository is
in, and uses that repository. So it doesn't traverse above looking for
other .git directories.
And, `git clone` will happily clone a remote repsository that's owned
by another user, including over ssh. And pull and push etc work with such a
remote. So git-annex-shell should too.
(For that matter, other git-annex-shell commands do work, it's only the
command that reads the git config that fails to work.)
"""]]

View file

@ -1,13 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2023-09-07T18:21:30Z"
content="""
Closely related, when a local repo is owned by someone else, cloning it and
using it as a git-annex remote also fails, at the same config listing
stage.
I think the same reasoning applies to that, the path to the repo is
explicitly specified in the remote url, so it should treat it as a safe
repo for the purposes of listing its config.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2023-09-07T18:32:57Z"
content="""
Basically the same fix works for both the ssh remote and the local
remote cases.
"""]]

View file

@ -1,18 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2023-09-07T18:36:37Z"
content="""
Another related case is when git has been configured with
safe.bareRepository=explicit and the remote (either ssh or local)
is a bare repo. git-annex-shell will fail with the same misleading message,
and for a local repo, git-annex will also display the same misleading
message.
I think it also ought to override safe.bareRepository for such remotes,
because eg git pull works with such remotes. The point of
safe.bareRepository=explicit is not to prevent using bare remotes, but to
prevent things like shell prompts to accidentially use bare repos that are
eg, committed by a malicious attacker to a git repository, to avoid using
git configs that allow running arbitrary code.
"""]]

View file

@ -1,33 +0,0 @@
### Please describe the problem.
See e.g. on [https://github.com/datalad/git-annex/actions/runs/6680765679/job/18154374923](https://github.com/datalad/git-annex/actions/runs/6680765679/job/18154374923)
```
Repo Tests v10 unlocked
Init Tests
init: OK (0.17s)
add: OK (0.73s)
addurl: OK (0.57s)
crypto: FAIL (3.07s)
./Test/Framework.hs:86:
initremote failed with unexpected exit code (transcript follows)
initremote foo (encryption setup) (to gpg keys: 129D6E0AC537B9C7)
git-annex: .git/annex/othertmp/remote.log: hPut: invalid argument (invalid character)
failed
(recording state in git...)
initremote: 1 failed
```
started only recently but consistently:
```
(git)smaug:/mnt/datasets/datalad/ci/git-annex/builds/2023/10[master]git
$> git grep -l 'hPut: invalid argument'
cron-20231027/build-ubuntu.yaml-1289-1c03c8fd-failed/0_test-annex (normal, ubuntu-latest).txt
...
```
[[!meta author=yoh]]
[[!tag projects/repronim]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,20 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-11-01T16:07:15Z"
content="""
Reproduced with LANG=C:
./Test/Framework.hs:86:
initremote failed with unexpected exit code (transcript follows)
initremote foo (encryption setup) (to gpg keys: 129D6E0AC537B9C7)
git-annex: .git/annex/othertmp/remote.log: withFile: invalid argument (cannot encode character '\132')
failed
(recording state in git...)
initremote: 1 failed
Not quite the same error but almost certianly the same problem.
I've confirmed this is caused by
[[!commit 3742263c99180d1391e4fd51724aae52d6d02137]]
"""]]

View file

@ -1,25 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2023-11-01T16:53:48Z"
content="""
Will probably need to revert the Remote/Helper/Encryptable.hs part of that
commit.
What is happening here is, encodeBS is failing when run on the String from
a SharedPubKeyCipher. That String comes from Utility.Gpg.genRandom and is
literally a bunch of random bytes. So it's not encoded with the filesystem
encoding. And it really ought to be a ByteString of course, but since it's
not, anything involving encoding it fails.
That's why the old code had this comment:
{- Not using Utility.Base64 because these "Strings" are really
- bags of bytes and that would convert to unicode and not round-trip
- cleanly. -}
And converted that String to a ByteString via `B.pack . s2w8`, which avoids this problem.
What an ugly thing. Really ought to be fixed to use ByteString throughout.
But for now, let's revert.
"""]]

View file

@ -1,75 +0,0 @@
### Please describe the problem.
I have been running
`git annex --debug import --from s3-dandiarchive master`
from an S3 bucket which is versioned but I did not enable versioning for this "import" case (due to [git-annex unable to sense versioning read-only](https://git-annex.branchable.com/bugs/importtree_with_versioning__61__yes__58___check_first/)) and expected it to "quickly" import tree (with about 7k files) from S3. Note that some of the keys have **many** older revisions for one reason or another.
But currently that process, started hours ago yesterday IIRC, is
```
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3912831 dandi 20 0 1024.1g 51.7g 16000 S 100.0 82.4 19,48 git-annex
```
CPU heavy and very slow (now, started faster flipping through pages) on actually "importing" while listing a page every 30 seconds or so
```
[2024-11-12 14:59:23.587433059] (Remote.S3) Header: [("Date","Tue, 12 Nov 2024 19:59:23 GMT")]
[2024-11-12 14:59:58.073945529] (Remote.S3) Response status: Status {statusCode = 200, statusMessage = "OK"}
[2024-11-12 14:59:58.074057102] (Remote.S3) Response header 'x-amz-id-2': 'sxDUdIkuRLs3jjjTyIbFaI+cQqLCGpTXZNFcvykT2+F6OcqVRM2IMn6P1YquVrdH3fXmV9nRnTDs9EtOtctV05GptcIaBaF2'
[2024-11-12 14:59:58.07410232] (Remote.S3) Response header 'x-amz-request-id': 'Y35X1Z41GMF9PHY8'
[2024-11-12 14:59:58.074135941] (Remote.S3) Response header 'Date': 'Tue, 12 Nov 2024 19:59:24 GMT'
[2024-11-12 14:59:58.074167094] (Remote.S3) Response header 'x-amz-bucket-region': 'us-east-2'
[2024-11-12 14:59:58.074197609] (Remote.S3) Response header 'Content-Type': 'application/xml'
[2024-11-12 14:59:58.074228873] (Remote.S3) Response header 'Transfer-Encoding': 'chunked'
[2024-11-12 14:59:58.074259342] (Remote.S3) Response header 'Server': 'AmazonS3'
[2024-11-12 14:59:58.171273277] (Remote.S3) String to sign: "GET\n\n\nTue, 12 Nov 2024 19:59:58 GMT\n/dandiarchive/"
[2024-11-12 14:59:58.171355688] (Remote.S3) Host: "dandiarchive.s3.amazonaws.com"
[2024-11-12 14:59:58.17139206] (Remote.S3) Path: "/"
[2024-11-12 14:59:58.17142278] (Remote.S3) Query string: "prefix=dandisets%2F"
[2024-11-12 14:59:58.171463294] (Remote.S3) Header: [("Date","Tue, 12 Nov 2024 19:59:58 GMT")]
```
and not sure how many pages it got so far.
I suspect (can't tell from above) that it is using API to list all versions of keys, not just current version, even though I have not asked for versioned support.
Note: bucket is too heavy (about 300 million keys IIRC) to list all of it for all the versions. I do not have information ready on how many versions of keys in the `dandisets/` prefix - could be some hundreds of thousands, but I would still expect/hope it to complete by now. Nothing seems to be done on filesystem or to git store yet (du says it is 280k total size) -- git-annex is just being fed information from S3.
### What steps will reproduce the problem?
- add s3 importtree special remote matching
```
bucket=dandiarchive datacenter=US encryption=none fileprefix=dandisets/ host=s3.amazonaws.com importtree=yes name=s3-dandiarchive port=80 publicurl=https://dandiarchive.s3.amazonaws.com/ signature=anonymous storageclass=STANDARD type=S3 timestamp=1731015643s
```
- run `annex import` from it
### What version of git-annex are you using? On what operating system?
invocation of `static-git-annex-10.20241031` (build by kyleam https://git.kyleam.com/static-annex/ ... but I think I tried a different one before):
```shell
(dandisets-2) dandi@drogon:/mnt/backup/dandi/dandiset-manifests$ /home/dandi/git-annexes/static-git-annex-10.20241031/bin/git-annex version
git-annex version: 10.20241031
build flags: Pairing DBus DesktopNotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.24.2 bloomfilter-2.0.1.2 crypton-1.0.1 DAV-1.3.4 feed-1.3.2.1 ghc-9.8.3 http-client-0.7.17 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.16
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 10
```
[[!meta author=yoh]]
[[!tag projects/dandi]]
> Calling this [[done]] although memory use improvements still seem
> possible.. --[[Joey]]

View file

@ -1,29 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 1"
date="2024-11-13T18:08:59Z"
content="""
At the end (after over a day of torturing that poor bucket, whenever it took just few minutes for `s3cmd sync` to get everything including content) it crashed with
```
[2024-11-12 22:58:00.366878941] (Remote.S3) Response status: Status {statusCode = 200, statusMessage = \"OK\"}
[2024-11-12 22:58:00.373456754] (Remote.S3) Response header 'x-amz-id-2': 'DGXJztoRJRuHQrcOqs3FtnEUJomRz+53jawFoKoRbKQATcvAppqJcfcAVfR1d8cu7uepkEDvSXo='
[2024-11-12 22:58:00.384304583] (Remote.S3) Response header 'x-amz-request-id': 'W1PSPV7ZSBKJ7HTT'
[2024-11-12 22:58:00.38437407] (Remote.S3) Response header 'Date': 'Wed, 13 Nov 2024 03:50:18 GMT'
[2024-11-12 22:58:00.384436037] (Remote.S3) Response header 'x-amz-bucket-region': 'us-east-2'
[2024-11-12 22:58:00.384486611] (Remote.S3) Response header 'Content-Type': 'application/xml'
[2024-11-12 22:58:00.384533794] (Remote.S3) Response header 'Transfer-Encoding': 'chunked'
[2024-11-12 22:58:00.384581117] (Remote.S3) Response header 'Server': 'AmazonS3'
git-annex: Unable to list contents of s3-dandiarchive: Network.Socket.recvBuf: resource vanished (Connection reset by peer)
failed
[2024-11-12 22:58:00.565431711] (Utility.Process) process [3912839] done ExitSuccess
import: 1 failed
```
attesting that it is doing something unnecessary -- either listing full bucket (unlikely) or listing all versions of keys under the prefix (e.g. using [ListObjectVersions](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectVersions.html) instead of [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html)).
It would have been useful if logs included the API call involved here.
"""]]

View file

@ -1,26 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2024-11-14T18:23:54Z"
content="""
No, it does not request versions from S3 when versioning is not enabled.
This feels fairly similar to
[[git-annex-import_stalls_and_uses_all_ram_available]].
But I don't think it's really the same, that one used versioning, and relied
on preferred content to filter the wanted files.
Is the size of the whole bucket under the fileprefix, in your case, large
enough that storing a list of all the files (without the versions) could
logically take as much memory as you're seeing? At one point you said it
was 7k files, but later hundreds of thousands, so I'm confused about how
big it is.
Is this bucket supposed to be public? I am having difficulty finding an
initremote command that works.
It also seems quite possible, looking at the code, that it's keeping all
the responses from S3 in memory until it gets done with listing all the
files, which would further increase memory use.
I don't see any `O(N^2)` operations though.
"""]]

View file

@ -1,13 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2024-11-14T18:50:37Z"
content="""
This is the initremote for it:
git-annex initremote dandiarchive type=S3 encryption=none fileprefix=dandisets/ bucket=dandiarchive publicurl=https://dandiarchive.s3.amazonaws.com/ signature=anonymous host=s3.amazonaws.com datacenter=US importtree=yes
It started at 1 API call per second, but it slowed down as memory rapidly
went up. 3 gb in a few minutes, so I think there is definitely a memory
leak involved.
"""]]

View file

@ -1,9 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2024-11-14T19:05:48Z"
content="""
I suspect one way the CLI tool is faster, aside from not leaking memory,
is that there is a max-key max-keys parameter that git-annex is not using.
Less pagination would speed it up.
"""]]

View file

@ -1,18 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2024-11-14T19:21:33Z"
content="""
Apparently gbrNextMarker is Nothing despite the response being truncted. So
git-annex is looping forever, getting the same first page each time, and
storing it all in a list.
I think this is a bug in the aws library, or I'm using it wrong.
It looks for a NextMarker in the response XML, but accoccording to
<https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html>
> This element is returned only if you have the delimiter request parameter
> specified. If the response does not include the NextMarker element and it is
> truncated, you can use the value of the last Key element in the response as the
> marker parameter in the subsequent request to get the next set of object keys.
"""]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2024-11-14T20:14:29Z"
content="""
Fixed in [[!commit 4b87669ae229c89eadb4ff88eba927e105c003c4]]. Now it runs
in seconds.
Note that this bug does not seem to affect S3 remotes that have versioning
enabled.
"""]]

View file

@ -1,16 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2024-11-15T17:16:51Z"
content="""
Trying the same command but with versioning=yes, I have verified that
* it does not have the same loop forever behavior
* it does use a lot of memory quite quickly
Going back to the unversioned command, I was able to reduce the memory use
by 20% by processing each result, rather than building up a list of results
and processing at the end. It will be harder to do that in the versioning
case, but I expect it will improve it at least that much, and probably
more, since it will be able to GC all the delete markers.
"""]]

View file

@ -1,26 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 8"""
date="2024-11-15T17:48:08Z"
content="""
Did same memory optimisation for the versioned case, and the results are
striking! Running the command until it had made 45 API requests, it was
using 592788 kb of memory. Now it uses only 110968 kb.
Of that, about 78900 kb are used at startup, so it grew 29836 kb.
At that point, it has gathered 23537 changes. So about 1 kb is used per
change. That seems a bit more memory than really should be needed,
each change takes about 75 bytes of data, eg:
"y3RixvrmLvr1oWJ7meEa4vWK6B.C.aad",3340,"dandisets/000003/draft/dandiset.jsonld",2021-09-28 02:12:39 UTC
I did try some further memory optimisation, making it avoid storing the
same filename repeatedly in memory when gathering versioned changes. Which
oddly didn't save any memory.
Memory profiling might let this be improved further, but needing 1 gb of
memory to import a million changes to files doesn't seem too bad.
Update: Did some memory profiling, nothing stuck out as badly wrong.
Lists and tuples are using as much memory as anything.
"""]]

View file

@ -1,34 +0,0 @@
### Please describe the problem.
I wanted to use S3 special remote to "crawl" S3 bucket in `importtree=yes` mode. Bucket (dandiarchive) supports versioning, so it would be great to enable versioning here as well so URLs would use versionId. But unfortunately adding `versioning=yes` makes `git-annex` to try to establish versioning on the bucket (even if it is already enabled).
command to try with (should work for anyone since public bucket):
```
git annex --debug initremote s3-dandiarchive bucket=dandiarchive type=S3 encryption=none importtree=yes publicurl=https://dandiarchive.s3.amazonaws.com/ fileprefix=dandisets/000027/ signature=anonymous versioning=yes
```
to see that annex (I use 10.20240927) would try to enable versioning:
```
(enabling bucket versioning...) [2024-11-07 16:30:37.830416324] (Remote.S3) String to sign: "PUT\n\n\nThu, 07 Nov 2024 21:30:37 GMT\n/dandiarchive/?versioning"
[2024-11-07 16:30:37.830449238] (Remote.S3) Host: "dandiarchive.s3.amazonaws.com"
[2024-11-07 16:30:37.830459034] (Remote.S3) Path: "/"
[2024-11-07 16:30:37.830470676] (Remote.S3) Query string: "versioning"
[2024-11-07 16:30:37.830480666] (Remote.S3) Header: [("Date","Thu, 07 Nov 2024 21:30:37 GMT")]
[2024-11-07 16:30:37.830498329] (Remote.S3) Body: "<?xml version=\"1.0\" encoding=\"UTF-8\"?><VersioningConfiguration xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"><Status>Enabled</Status></VersioningConfiguration>"
[2024-11-07 16:30:37.879924822] (Remote.S3) Response status: Status {statusCode = 403, statusMessage = "Forbidden"}
```
It seems to be easy to check if versioning enabled:
```
curl -s "https://dandiarchive.s3.amazonaws.com/?versioning"
<?xml version="1.0" encoding="UTF-8"?>
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Status>Enabled</Status></VersioningConfiguration>
```
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,20 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2024-11-11T20:11:37Z"
content="""
Unfortunately <https://hackage.haskell.org/package/aws> does not implement
the versioning check, so it will need to be added there. And it tends to take
some time for new versions of the build dependency to reach everywhere.
<https://github.com/aristidb/aws/issues/290>
I do think that is the only safe way to go though. I considered making
git-annex assume that a bucket where versioning cannot be set is read-only.
If git-annex is really never going to write to a bucket, it's safe to
assume versioning is enabled. But, unfortunately, ACLs can sometimes
prevent changing configs like versioning, but still allow other write
operations. Also, a S3 remote might be initialized without permission to
write to an existing bucket, but later S3 creds be used that do allow
writing.
"""]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2024-11-12T17:35:16Z"
content="""
Made a pull request to aws <https://github.com/aristidb/aws/pull/292>
(As sometimes S3 maintainer of aws, I'll probably accept it if nobody
objects to it.)
"""]]

View file

@ -1,23 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2024-11-12T17:35:50Z"
content="""
Wait though... We have signature=anonymous. So git-annex does in fact know
that this special remote is read-only. git-annex will never try to write to
it (even if the bucket somehow allowed anonymous writes) as long as it's
configured with signature=anonymous.
So, it could just avoid trying to set versioning when signature=anonymous,
and assume the bucket has versioning enabled.
Hmm, in lockContentS3, when versioning is enabled, it calls
checkVersioning, which checks if a S3 version ID has been recorded for the
file. What if the bucket did not actually have versioning enabled? Then an
import from it would not record a S3 version ID. That would make this, and
other places like checkKey that expect versioned buckets to have S3 version
IDs fail in unexpected ways.
So, I guess I'm inclined to not go down this read-only path, and instead wait for
aws to get updated and use that.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2024-11-12T18:32:53Z"
content="""
The `checkbucketversioning` branch has this implemented, to be merged once
aws is released supporting it.
"""]]

View file

@ -1,59 +0,0 @@
### Please describe the problem.
Initially filed/expressed myself [on datalad issues](https://github.com/datalad/datalad/issues/7286#issuecomment-1434042685) but decided to duplicate here.
Since git-annex [10.20230126-78-g452b080db AKA 10.20230214~12](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=452b080dba11f0d9d5251061acfc50729bf6f633)
(so rapidly released after introduction while datalad testing just got a chance to start breaking/me report): behavior change was not just about a nonzero exit but rather that git-annex no longer bothers to output any info for any file as soon as it encounters the path it doesn't know.
<details>
<summary>In the case of untracked , completely wrong, and annexed file:</summary>
```shell
git status
On branch dl-test-branch
Your branch is up to date with 'dl-test-remote/dl-test-branch'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
not-committed.txt
nothing added to commit but untracked files present (use "git add" to track)
ls -l
total 8
-rw------- 1 yoh yoh 3 Feb 16 22:10 not-committed.txt
lrwxrwxrwx 1 yoh yoh 186 Feb 16 22:10 test-annex.dat -> .git/annex/objects/Gm/mv/SHA256E-s7--ed7002b439e9ac845f22357d822bac1444730fbdb6016d3ec9432297b9ec9f73.dat/SHA256E-s7--ed7002b439e9ac845f22357d822bac1444730fbdb6016d3ec9432297b9ec9f73.dat
```
</details>
Compare behavior before:
```shell
( source ~/git-annexes/10.20230126.env; git annex version | head -n 1; git annex info --json --json-error-messages not-committed.txt INFO.txt test-annex.dat; echo exit $?)
git-annex version: 10.20230126-1~ndall+1
fatal: Not a valid object name not-committed.txt
{"command":"info","note":"not a directory or an annexed file or a treeish or a remote or a uuid","success":false,"input":["not-committed.txt"],"error-messages":[],"file":"not-committed.txt"}
fatal: Not a valid object name INFO.txt
{"command":"info","note":"not a directory or an annexed file or a treeish or a remote or a uuid","success":false,"input":["INFO.txt"],"error-messages":[],"file":"INFO.txt"}
{"command":"info test-annex.dat","size":"7 bytes","success":true,"input":["test-annex.dat"],"key":"SHA256E-s7--ed7002b439e9ac845f22357d822bac1444730fbdb6016d3ec9432297b9ec9f73.dat","error-messages":[],"present":false,"file":"test-annex.dat"}
exit 0
```
where it did spit out errors to stderr but nevertheless trustfully returned json records for all files and eventually for the one it knows about (and we rely on such behavior!) to now
```shell
( source ~/git-annexes/10.20230214.env; git annex version | head -n 1; git annex info --json --json-error-messages not-committed.txt INFO.txt test-annex.dat ; echo exit $?)
git-annex version: 10.20230214-1~ndall+1
fatal: Not a valid object name not-committed.txt
git-annex: not a directory or an annexed file or a treeish or a remote or a uuid
exit 1
```
where we get only immediate error message to stderr and not a single record is output.
IMHO prior behavior is "more correct" and we rely on it in datalad - get responses per each path. If it exits with non-0 after, that is ok with me. If it stops producing results completely, it would be an extra effort first sort out paths first.
[[!meta author=yoh]]
[[!tag projects/datalad]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-02-20T17:56:39Z"
content="""
Fixed this. Note that with the fix, it will still exit nonzero at the end
when given a path that does not exist, but it will first process the other
inputs.
Also I've added a test case.
"""]]

View file

@ -1,36 +0,0 @@
### Please describe the problem.
ref: [https://github.com/datalad/datalad/issues/7173#issuecomment-1314968568](https://github.com/datalad/datalad/issues/7173#issuecomment-1314968568)
```
mkdir "/tmp/new
dquote> line"
cd "/tmp/new
line"
git init
Initialized empty Git repository in /tmp/new
line/.git/
git annex init
init ok
fatal: Cannot open '/tmp/new': No such file or directory
fatal: Cannot open '/tmp/new': No such file or directory
fatal: Cannot open '/tmp/new': No such file or directory
fatal: Cannot open '/tmp/new': No such file or directory
fatal: Cannot open '/tmp/new': No such file or directory
fatal: Cannot open '/tmp/new': No such file or directory
fatal: Cannot open '/tmp/new': No such file or directory
fatal: Cannot open '/tmp/new': No such file or directory
fatal: Cannot open '/tmp/new': No such file or directory
fatal: Cannot open '/tmp/new': No such file or directory
fatal: Cannot open '/tmp/new': No such file or directory
git-annex: fd:19: Data.ByteString.hGetLine: end of file
git annex version
git-annex version: 10.20230214+git26-g8f2829e646-1~ndall+1
```
as `git` doesn't mind, and now annex batched commands support `-z` already for filenames with newlines in them, I think git-annex should tolerate repository folders with newlines in them too.
[[!tag projects/datalad]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,20 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-03-13T16:20:54Z"
content="""
Unfortunately, `git hash-object --stdin-paths` does not support
-z or anything like that. It is a newline based protocol.
Ok, made git-annex fall back to running git hash-object once
per file when the filenames contain newlines to work around that.
BTW, another problem I noticed is that the repository decription
written to uuid.log contains a newline, which prevents parsing that line of
the log correctly. This can also be seen by passing a value
with a newline to `git-annex describe`. It would also happen in the
case with the newline directory if it didn't fail earlier.
Also fixed this, though, with a one-way escaping,
see [[!commit 38e9ea8497bb2ab058e5bd46a666857789c0a84d]].
"""]]

View file

@ -1,45 +0,0 @@
### Please describe the problem.
Original case has more in [datalad github issue](https://github.com/datalad/datalad/issues/7371).
In a nutshell in my words: a user has a repository which is v9, under ACL (but git annex works fine as is). A user clones from another user locally. `git annex init` fails to determine (doesn't record) UUID of the `origin` remote but also does not make it `git-annex` ignore. If we manually set `origin` uuid within .git/config of the clone, then `git annex whereis` reports presence fine. But if we do `git annex get` (see [here](https://github.com/datalad/datalad/issues/7371#issuecomment-1546158732)), it says that it is unable to access remote origin, and suggests two other remotes (not available).
The sad part is that `git-annex` did not really give any reason ( in --debug) on why it didn't discover UUID or why it is unable to access it, e.g. here is output from `git annex init` in the clone when I think it should have discovered/recorded UUID
```
[2023-05-12 11:26:12.750934374] (Annex.Branch) read uuid.log
[2023-05-12 11:26:12.753755353] (Annex.Branch) set uuid.log
[2023-05-12 11:26:12.7539016] (Annex.Branch) read remote.log
[2023-05-12 11:26:12.755652872] (Utility.Process) process [43725] read: git ["config","--null","--list"]
[2023-05-12 11:26:12.763856026] (Utility.Process) process [43725] done ExitSuccess
[2023-05-12 11:26:12.76467482] (Utility.Process) process [43726] call: /usr/local/miniconda3/share/git-annex-10.20220927-0/bin/git-annex ["upgrade","--quiet","--autoonly"]
[2023-05-12 11:26:12.794100842] (Utility.Process) process [43726] done ExitSuccess
[2023-05-12 11:26:12.79481645] (Utility.Process) process [43733] read: git ["config","--null","--list"]
[2023-05-12 11:26:12.802972197] (Utility.Process) process [43733] done ExitSuccess
[2023-05-12 11:26:12.803473974] (Annex.Branch) read trust.log
ok
```
from [this comment](https://github.com/datalad/datalad/issues/7371#issuecomment-1545929998).
So what we really need is some debug logging to tell us more.
### What steps will reproduce the problem?
we failed to create a reproducer. So it is something about that user + original location.
`git annex upgrade` from v9 to v10 somehow resolved it in one sample case. We have more cases like that we are not upgrading yet to reproduce again.
### What version of git-annex are you using? On what operating system?
originally in some older 8.2022 but now in 10.20230407
[[!meta author=yoh]]
[[!tag projects/datalad]]
> Hard to know when there is *enough* debugging, but with what I've added,
> I can't think of any more I could add that would help with a problem of
> this kind. Unless of course git-annex has a deep dark bug where it reads
> an annex.uuid from git config, but then somehow misplaces it. But I can't
> imagine such a bug so it's hard to add debugging for it. So, I suppose
> this is [[done]] --[[Joey]]

View file

@ -1,24 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-05-15T17:43:21Z"
content="""
Something that prevents `git config` from working, or prevents it from
listing an annex.uuid for the remote, seems like the overridingly likely
reason for their problem. (You were asking the right questions
[here](https://github.com/datalad/datalad/issues/7371#issuecomment-1545975295)
and I don't think they really answered them, unless it happened in your office
hours.)
I've made --debug include the output of `git config --list`,
which allows seeing if a problem prevents git from reading the config of
the remote.
I also made the debug output tell what directory it's running a command in
when it's not the pwd.
So, for example:
[2023-05-15 15:16:01.414302245] (Utility.Process) process [59665] read: git ["config","--null","--list"] in "/home/joey/tmp/a"
[2023-05-15 15:16:01.419396816] (Git.Config) git config read: [("",[]),("annex.uuid",["9553f51c-87ad-4321-86fb-de4aa630e997"]) [...]
"""]]

View file

@ -1,106 +0,0 @@
### Please describe the problem.
Reference: issue/discovery in [repronim/containers while adding neurodesk images](https://github.com/ReproNim/containers/issues/64#issuecomment-1492256561)
- apparently we had no URLs made registered with images despite running `registerurl KEY ANNEX`
- some images do have urls
took awhile to grasp what is going on and then I found an unfinished reproducer from `Mar 15 2021 annex-claimurl.sh` without recollection why I have not finished it, but it seems that it might be "operator error" somehow? but seems unlikely... might be datalad special remote bug?
Summary of the problem: if there is an external git-annex-remote which CLAIMURL - git-annex registerurl does **not** associate that URL with any (that external or web) remote and thus does not make that key available to the user despite knowing the url.
Should it btw default to `web` if no remote is associated with it?
Filed complimentary [registerurl --remote REMOTE](https://git-annex.branchable.com/todo/registerurl_--remote_REMOTE/) TODO since in this case I would have preferred to just register against web remote.
### What steps will reproduce the problem?
Here is a new "quick" reproducer but you need datalad being installed to get `git-annex-remote-datalad`.
```
#!/bin/bash
export PS4='> '
set -eu
set -x
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
git init
git annex init
# It works fine if we do not enable datalad special remote!
# so it is something about interaction there
git annex initremote datalad externaltype=datalad type=external encryption=none autoenable=true uuid=65b6c36b-debd-4a23-8fa3-675cbd200496
git annex enableremote datalad
git annex info
# so it seems that addurl does it right
git annex addurl --debug --file 123.dat http://www.oneukrainian.com/tmp/123.dat
# but if I do via registerurl -- not quite so
echo 124 > 124.dat
git annex add 124.dat
key=$(readlink -f 124.dat | xargs basename)
git annex registerurl --debug "$key" http://www.oneukrainian.com/tmp/124.dat
git commit -m 'added those two files with urls'
git annex whereis --debug 123.dat
git annex whereis --debug 124.dat
git checkout git-annex
: # URLs are known for both
git grep oneukrainian
: # but only 123.dat would be associated with datalad remote
git grep 65b6c36b-debd-4a23-8fa3-675cbd200496
```
With [full log here](http://www.oneukrainian.com/tmp/annex-claimurl-2023.sh.log) and without `--debug` ending up like
```
grep -v '^\[' annex-claimurl-2023.sh.log | tail -n 29
(recording state in git...)
> git commit -m 'added those two files with urls'
2 files changed, 2 insertions(+)
create mode 120000 123.dat
create mode 120000 124.dat
> git annex whereis --debug 123.dat
whereis 123.dat [2023-03-31 18:29:27.56573965] (Utility.Process) process [1429290] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
(2 copies)
62c53770-5274-40d4-a45a-de308c234ea9 -- yoh@bilena:~/.tmp/dl-FbOrptq [here]
65b6c36b-debd-4a23-8fa3-675cbd200496 -- [datalad]
datalad: http://www.oneukrainian.com/tmp/123.dat
ok
> git annex whereis --debug 124.dat
whereis 124.dat [2023-03-31 18:29:27.857735575] (Utility.Process) process [1429322] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
(1 copy)
62c53770-5274-40d4-a45a-de308c234ea9 -- yoh@bilena:~/.tmp/dl-FbOrptq [here]
ok
> git checkout git-annex
Switched to branch 'git-annex'
> :
> git grep oneukrainian
060/68b/SHA256E-s4--ca2ebdf97d7469496b1f4b78958f9dc8447efdcb623953fee7b6996b762f6fff.dat.log.web:1680301767.477711756s 1 :http://www.oneukrainian.com/tmp/124.dat
ae1/21c/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b.dat.log.web:1680301767.037966322s 1 :http://www.oneukrainian.com/tmp/123.dat
> :
> git grep 65b6c36b-debd-4a23-8fa3-675cbd200496
ae1/21c/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b.dat.log:1680301767.038748415s 1 65b6c36b-debd-4a23-8fa3-675cbd200496
remote.log:65b6c36b-debd-4a23-8fa3-675cbd200496 autoenable=true encryption=none externaltype=datalad name=datalad type=external timestamp=1680301766.517251391s
uuid.log:65b6c36b-debd-4a23-8fa3-675cbd200496 datalad timestamp=1680301765.789226249s
```
so - both keys have urls, but only 123.dat one is associated with datalad special remote, and only it has url reported by whereis
### What version of git-annex are you using? On what operating system?
10.20230126 but tried with older 8.20210803 since thought it must be regression -- the same result
[[!meta author=yoh]]
[[!tag projects/repronim]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,29 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-04-04T17:07:37Z"
content="""
This is intentional, see [[!commit 451171b7c1eaccfd0f39d4ec1d64c6964613f55a]]
which changed setUrlPresent to only update presence info when the url
belongs to the web but not when it's claimed by other special remotes.
It makes sense for registerurl to be symmetric with rmurl, and rmurl only
updates presence info when the url is a web url.
To the extent I've been able to follow the complex reasoning there for why,
part of it is clear: The web special remote is different from other special
remotes in that content cannot be dropped from it by git-annex, and the url is
the only pointer to content. So when rmurl removes the last web url, it makes
sense to treat the content as no longer present on the web. But if the url is
claimed by another special remote, which does support dropping content, the
content would still be present on it after removing its url, and would be
accessible w/o using that url, and `git-annex fsck --fast --from` would notice
it was present and fix up the location log if it didn't show it as content.
Also note that the rmurl man page documents this when it says:
Removing the last web url will make git-annex no longer treat content as being
present in the web special remote.
All you need to do is use `git-annex setpresentkey` along with registerurl.
"""]]

View file

@ -1,13 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2023-04-04T20:15:59Z"
content="""
yet to re-review that reasoning, but does it mean that to merely register a URL client needs to
- call `annex registerurl`
- inspect to which remote URL was added/was claimed (is there a way? `whois` is silent)
- if it was claimed by some special remote other than web -- use `annex setpresentkey`?
Sounds like too much / too fragile, and somewhat different from how `addurl` behaves which does it all just fine regardless either it is web or some claimurl'ed remote.
"""]]

View file

@ -1,35 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 3"
date="2023-04-05T00:30:00Z"
content="""
So to some degree it is a regression / broken behavior which initially worked just fine with registerurl -- tried the 6.20180913+git149-g23bd27773 version and it performed \"as expected\". Eh, never enough tests ;)
I have looked at that commit changelog and [detailed description](http://source.git-annex.branchable.com/?p=source.git;a=blob;f=doc/bugs/suggests_to_enable_web_remote_even_when_there_is_no_web_urls_for_the_file/comment_4_6dff7befbaacbff573c5f72688966af5._comment;h=c636b09291a23bbce52b0367a767717137f99a21;hb=451171b7c1eaccfd0f39d4ec1d64c6964613f55a) . Not fully grasping yet why `registerurl` should not behave symmetrically with `addurl` in being sufficient by itself to add a url to content so it becomes usable for `get` right away, without some other dances like `setpresentkey`. I think I do get `rmurl` \"ambiguity\" but here on that more reflected below.
Rereading your comment [above](https://git-annex.branchable.com/bugs/registerurl_does_not_register_if_external_remote/#comment-ba9d6517d8f8c10167da95b122a022b3):
> part of it is clear: The web special remote is different from other special remotes in that content cannot be dropped from it by git-annex, and the url is the only pointer to content.
This is just an assumption on some \"special nature of web remote\", e.g. the `datalad` remote also doesn't support dropping, and URL is also just the pointer to content. And CLAIMURL functionality came IIRC exactly for that use case and before adding some kind of duality for having content accessible directly from special remote and via url.
> But if the url is claimed by another special remote, which does support dropping content, the content would still be present on it after removing its url, and would be accessible w/o using that url,
that is yet another assumption, since e.g. in the case of datalad remote `rmurl` effect would be identical to `web` remote, and there is no other way to get content from that remote. (so there is no duality mentioned above)
> All you need to do is use git-annex setpresentkey along with registerurl.
this somewhat contradicts above \"the content would still be present on it after removing its url\" which suggests that presence of URL for the remote already sufficient indication of being present on the remote.
Overall, there is seems some assumptions about URLs and external remotes which ideally should be avoided. May be it it should somehow be reflected in the external remote protocol to indicate that CLAIMing URL indicates that it is present at that URL, and that there is no other way to access that content from the remote besides via URL.
As a workaround I of cause will now either `setpresentkey` or will just reassign all urls to be handled directly by web remote somehow. But in the long run I think it is problematic design since even `registerurl` doesn't even report to which remote that URL was registered to
```
> git annex registerurl --json SHA256E-s4--ca2ebdf97d7469496b1f4b78958f9dc8447efdcb623953fee7b6996b762f6fff.dat http://www.oneukrainian.com/tmp/124.dat
{\"command\":\"registerurl\",\"error-messages\":[],\"file\":null,\"input\":[\"SHA256E-s4--ca2ebdf97d7469496b1f4b78958f9dc8447efdcb623953fee7b6996b762f6fff.dat\",\"http://www.oneukrainian.com/tmp/124.dat\"],\"success\":true}
```
so how could I generally to know proper invocation for `setpresent` key to follow it up?
"""]]

View file

@ -1,64 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2023-04-05T17:25:48Z"
content="""
Whups, I forgot about the newish unregisterurl! That's the true inverse of
registerurl. So rmurl is really more the inverse of addurl.
I think I've fully understood the situation that led to this reversion now.
I do think it was a reversion. That change was all about SETURLPRESENT and
SETURLMISSING in the external special remote protocol, as well as rmurl;
I think that the effect on registerurl was not considered.
So while I'd like to simplify registerurl to as basic a plumbing command as
possible, and would prefer it not to update location tracking, there's the
matter of backward compatability. Especially for simple cases like adding
regular web urls with it. It would be ok to change it back to update location
tracking for remotes that claim an url. As long as unregisterurl can be
symmetric with it --- can it?
rmurl also has its own wacky behavior in this area:
# git-annex addurl --fast https://cdimage.debian.org/debian-cd/current/i386/bt-cd/debian-11.6.0-i386-netinst.iso.torrent
(downloading torrent file...) addurl https://cdimage.debian.org/debian-cd/current/i386/bt-cd/debian-11.6.0-i386-netinst.iso.torrent (from bittorrent) (to debian-11.6.0-i386-netinst.iso) ok
(recording state in git...)
# git-annex rmurl debian-11.6.0-i386-netinst.iso https://cdimage.debian.org/debian-cd/current/i386/bt-cd/debian-11.6.0-i386-netinst.iso.torrent
rmurl debian-11.6.0-i386-netinst.iso ok
(recording state in git...)
# git-annex whereis debian-11.6.0-i386-netinst.iso
whereis debian-11.6.0-i386-netinst.iso (1 copy)
00000000-0000-0000-0000-000000000002 -- bittorrent
ok
# git-annex get debian-11.6.0-i386-netinst.iso
(fails)
Is that a bug? It's certianly not ideal for the bittorrent special
remote, which can't download the file once the url is removed. (It is
documented behavior though.)
While thinking about those questions, I thought of this situation:
# git-annex initremote s3 type=S3 ..
# git-annex copy --key $key --to s3
# git-annex registerurl $key $url
# git-annex unregisterurl $key $url
# git-annex drop --key $key --from s3
At the end there, it's still able to drop the content from s3.
Now, consider hypothetically, if I decide to make the S3 remote CLAIMURL
urls that are in the S3 bucket. As things stand, that won't change the
above scenario. (Although the key won't be recorded as located in the web
after registerurl.)
But... If unregisterurl is changed to update remote tracking for other remotes
than web, after the S3 CLAIMURL change, the behavior of that scenario will not
be the same! After unregisterurl, it will no longer consider the content to be
present in S3. Now you're racking up S3 charges with content that git-annex
stored in S3, but that it refuses to delete. That seems bad.
So, that scenario is leading me to think that I should not change
unregisterurl (or rmurl) to update location tracking of remotes other than web.
And so changing registerurl is also looking like a bad idea.
"""]]

View file

@ -1,18 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2023-04-05T18:47:51Z"
content="""
What I'm inclined to do is is add a --remote= parameter to registerurl and
unregisterurl. If the specified remote does not claim the url, have it fail
to add it. (See also [[todo/registerurl_--remote_REMOTE]])
So, you can then use registerurl with --remote=$uuid, check that it
succeeded, and then use setpresentkey to mark it present on that uuid.
Without the fragility you complained of.
Update: The --remote parameter is implemented now.
(Could registerurl with --remote update location tracking itself? Maybe,
but I'd worry about a scenario like in the previous comment.)
"""]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 6"
date="2023-04-05T19:36:40Z"
content="""
Obviously, as the author of the referenced wishlist, I would welcome addition of `--remote` option to both those commands.
But IMHO addition of the option doesn't solve initial/naive/programmable user oriented use case where user doesn't know which remote could or should handle the URL, and just wants, analogously or complimentary to `addurl`, to extend the list of the urls available for some key. There is even no user level interface to ask for \"what remotes can handle this url\" to erect some tandem of commands to register extra URLs for a key. So I don't see how addition of the option would solve the problem.
"""]]

View file

@ -1,19 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2023-04-05T19:57:37Z"
content="""
Well, unregisterurl and rmurl can't safely update location tracking for remotes
other than the web. Unless there were some way to know that simply removing an
url was *sufficient*, like it is for the web, and unlike how it would be
with my S3 remote scenario above.
But, the only issue with registerurl updating location tracking is that it's
not symmetric with unregisterurl.
So is that symmetry more important than comment 6? I don't know. In both
cases, some users are going to be surprised by inconsistent behavior.
The only way to avoid all user surprise would be to go back in time and
make these plumbing commands not update location tracking from the start.
"""]]

View file

@ -1,14 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 8"""
date="2023-04-05T21:00:04Z"
content="""
Guess I'll come down on the side of restoring old behavior which was
changed w/o warning (and without the new behavior ever being documented).
And on the side of user experience showing the current behavior is surprising.
The future users who get surprised by the resulting inconsistency
of unregisterurl not unsetting location tracking will just have to
live with it.. Sigh.
"""]]

View file

@ -1,66 +0,0 @@
### Please describe the problem.
our datalad tests started to fail recently (in [this PR](https://github.com/datalad/datalad/pull/7372) is the effort to troubleshoot etc).
Here is what we see with recent version using such simple script:
```
#!/bin/bash
export PS4='> '
set -eu
set -x
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
git init
git annex init
n='gl\orious'
# touch "$n"
git annex add --json --json-error-messages "$n"
```
that now
```
( source /home/yoh/git-annexes/10.20230407+git63-g3d1d77a1bb.env ; bash escaped.sh )
>> mktemp -d /home/yoh/.tmp/dl-XXXXXXX
> cd /home/yoh/.tmp/dl-OAXQ1CE
> git init
Initialized empty Git repository in /home/yoh/.tmp/dl-OAXQ1CE/.git/
> git annex init
init ok
(recording state in git...)
> n='gl\orious'
> git annex add --json --json-error-messages 'gl\orious'
git-annex: "gl\\orious" not found
add: 1 failed
```
so we get `\\` instead of `\` in the output printed by git-annex
<details>
<summary>previously was all fine</summary>
```shell
( source /home/yoh/git-annexes/10.20230407.env ; bash escaped.sh )
>> mktemp -d /home/yoh/.tmp/dl-XXXXXXX
> cd /home/yoh/.tmp/dl-1TzrWdi
> git init
Initialized empty Git repository in /home/yoh/.tmp/dl-1TzrWdi/.git/
> git annex init
init ok
(recording state in git...)
> n='gl\orious'
> git annex add --json --json-error-messages 'gl\orious'
git-annex: gl\orious not found
add: 1 failed
```
</details>
[[!meta author=yoh]]
[[!tag projects/datalad]]
> [[closed|done]] --[[Joey]]

View file

@ -1,40 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-04-24T15:13:35Z"
content="""
The next release is going to escape and quote filenames that contain
special characters similarly to how git does. (But json will not be affected due
to already being escaped.)
This will affect filenames output in error messages. So if you are
parsing error messages or non-json output with filenames that contain
characters that need to be escaped, you will need to deal with the
change.
See [[todo/terminal_escapes_in_filenames]] for the full details of this
change.
One thing I noticed about your example is that git-annex add doesn't display
that particular filename the same as git add does:
joey@darkstar:~/tmp/xxx>git-annex add 'gl\orious'
git-annex: "gl\\orious" not found
joey@darkstar:~/tmp/xxx>git add 'gl\orious'
fatal: pathspec 'gl\orious' did not match any files
But, that is an inconsistency in git itself. More commonly it uses the same
display as git-annex for this filename:
joey@darkstar:~/tmp/xxx>touch 'gl\orious'
joey@darkstar:~/tmp/xxx>git add 'gl\orious'
joey@darkstar:~/tmp/xxx>git diff --cached
diff --git "a/gl\\orious" "b/gl\\orious"
new file mode 100644
index 0000000..e69de29
So I don't think there's a problem with git-annex's behavior here. With
that said, we can talk about adding something to make back-compatability
easy for you, or whatever. An config like core.quotePath but that also
affects special characters, not just unicode, for example.
"""]]

View file

@ -1,41 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2023-04-24T19:23:22Z"
content="""
hm, I didn't look inside `git` but `git diff` is likely to have it escaped because `patch` (and/or other unified diff operating tools) expect it such. In other words -- `git diff` must encode paths escaped because the \"diff standard\" expects it such.
On the other hand, as you confirmed, `git add` just displays the name on the screen, and as such it does not bother escaping it since may be I just cut/paste it as a string which is \"raw\" and thus not expecting any escape characters.
RTFMing [git-config on core.quotePath](https://git-scm.com/docs/git-config#Documentation/git-config.txt-corequotePath) I spotted
> ... enclosing the pathname in double-quotes and escaping ...
so it talks about double-quotes. `git` `status`, `diff` report paths in double (`\"`) not single (`'`) quotes. I wonder if that is where/how `git` is consistent since in your example that is the difference too:
```
# current master git-annex
joey@darkstar:~/tmp/xxx>git-annex add 'gl\orious'
git-annex: \"gl\\orious\" not found
joey@darkstar:~/tmp/xxx>git add 'gl\orious'
fatal: pathspec 'gl\orious' did not match any files
```
that git uses `'` (and does not escape) while git annex uses `\"` (and escapes)? Did you see git doing escaping in paths where it reports them within single (`'`) quotes?
and thus git-annex should have just wrapped in `'` to become consistent with git in :
```shell
# released git-annex
> git annex add --json --json-error-messages '\e[31mfo\o\e[0m'
git-annex: \e[31mfo\o\e[0m not found
add: 1 failed
> git add '\e[31mfo\o\e[0m'
fatal: pathspec '\e[31mfo\o\e[0m' did not match any files
> git rm '\e[31mfo\o\e[0m'
fatal: pathspec '\e[31mfo\o\e[0m' did not match any files
```
"""]]

View file

@ -1,37 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2023-04-24T19:25:12Z"
content="""
git escapes filenames like this extensively:
joey@darkstar:~/tmp/xxx>git ls-files
"gl\\orious"
joey@darkstar:~/tmp/xxx>git status
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: "gl\\orious"
joey@darkstar:~/tmp/xxx>git grep hi
"gl\\orious":hi
This message from `git add` escapes slightly differently, but it still escapes
some characters:
joey@darkstar:~/tmp/xxx>git add $(echo -e "\e[31mfoo\e[0m")
fatal: pathspec '?[31mfoo?[0m' did not match any files
Git only does this type of escaping when displaying a fatal error
(it's `vreportf` in the git source, used by things like `die`).
It's basically a last-ditch filtering of a string, which may contain a filename
or other untrusted data, to avoid displaying escape characters. git-annex does
contain such a last-ditch filtering too (safeOutput) but type safety let me avoid
needing to use it to handle this filename here. I don't think it's at all necessary
for git-annex to be bug-for-bug equivilant with git in its display of error
messages; but it is important that it escape somehow. Git's double-quoted escaping
is documented, and this other escaping is not.
Since either behavior would be a behavior change from before when git-annex didn't
escape the filename in the error message with either method, it seems to me either
one would likely break your assumption. So I don't know why you're arguing for
one way over the other way.
"""]]

View file

@ -1,14 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 4"
date="2023-04-24T21:25:08Z"
content="""
I am \"arguing\" because ideally I prefer not to handle some not quite standardized un-escaping.
1. Which characters I should expect to be escaped? [Here](https://github.com/datalad/datalad/blob/maint/datalad/support/network.py#L925) is the ones we have for SSH: `_SSH_ESCAPED_CHARACTERS = '\\#&;`|*?~<>^()[]{}$\'\" '`. The same here?
2. Would it be sensible to request `add --json --json-error-messages` to produce a proper machine readable json record for \"unknown\" input, or there is a reason why there is no json record here in `--json` mode?
Also I just wanted to make sure that we are not missing some aspect, like what I felt was an unnoticed (to me at least) difference between `'` and `\"` escaping methods, and possibly some original reason on why `git` has them different in those cases -- may be there is some good reason?
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2023-04-25T17:18:23Z"
content="""
Note that I should avoid releasing git-annex with the added escaping until
this is addressed.
"""]]

View file

@ -1,29 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2023-04-25T16:31:44Z"
content="""
I dug into your code, and datalad is parsing git-annex's (and perhaps
git's in some cases) stderr to find error messages like this one for files
that don't exist, and then it internally dummies up something as if git-annex
were outputting a --json-error-messages record for the file. See
`./datalad/support/annex_utils.py` `_get_non_existing_from_annex_output`
Ok, I can understand now how needing to do an additional form of unescaping
on top of that existing pain point would cause the reaction I have seen in
this bug report.
[[todo/api_for_telling_when_nonexistant_or_non_git_files_passed]] is a todo
item I opened the last time I became aware of this error message parsing ugliness.
(Also relevant is [this closed todo](https://git-annex.branchable.com/projects/datalad/bugs-done/copy_does_not_reflect_some_failed_copies_in_--json_output/)
where I discuss why --json-error-messages cannot include these errors
as-is.)
So the choice is between implementing
[[todo/api_for_telling_when_nonexistant_or_non_git_files_passed]]
and changing datalad to use that. Or adding a git config
that avoids escaping filenames. The latter would be easy
to do (and easier for datalad to use), but it kicks the can down the road.
Datalad parsing error messages would continue to be a problem going
forward. (Imagine if git-annex gets localized error messages..)
"""]]

View file

@ -1,18 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2023-04-25T23:27:25Z"
content="""
[[todo/api_for_telling_when_nonexistant_or_non_git_files_passed]] is
implemented now.
In datalad, all you should need to do now is check for a json object with
`errorid:"FileNotFound"` and the `file` field is the name of the file.
Note that the parser for error messages like "did not match any file(s)
known to git" from `git ls-files --error-unmatch` will still be needed in
datalad.
I'm going to leave this open as a git-annex release blocker until the
necessary changes get made to datalad.
"""]]

View file

@ -1,16 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 8"""
date="2023-04-25T23:33:56Z"
content="""
Also yarik mentioned this change
<https://github.com/datalad/datalad/pull/7372/commits/45ddd4b12ff637c6c77e982225c0e9d9eb53c1b6>
which was caused by [[!commit a0e6fa18eb3c16c3c8079bb41c18151e6ea8b554]],
which was part of my series for escaping control characters.
I think git-annex needs to be returned to the old behavior there, even
though `git-annex info dne` is not technically operating on a file when it
doesn't exist.
Update: Fixed this.
"""]]

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 9"""
date="2023-06-05T19:06:30Z"
content="""
This has been blocking git-annex release for the past month.
Apparently datalad has mostly been updated now.
<https://github.com/datalad/datalad/pull/7372> is still not merged, but I'm
not clear if the 2 failed tests there are caused by this change or
something else.
I'm feeling now that I've waited long enough for datalad. I've closed this
bug and don't consider it as blocking release.
"""]]

View file

@ -1,41 +0,0 @@
### Please describe the problem.
I am running `testremote` on a windows CI system to test a special remote implementation for dataverse.org. I run into this error:
```
git-annex: MoveFileEx "C:\\DLTMP\\ran2133" Just ".git\\annex\\objects\\f76\\373\\SHA256E-s1048576--813fea02438e9569e6222f802958fcd89bee742d06ffe9aabe27fd940ef01196.this-is-a-test-key\\SHA256E-s1048576--813fea02438e9569e6222f802958fcd89bee742d06ffe9aabe27fd940ef01196.this-is-a-test-key": does not exist (The system cannot find the path specified.)
```
I suspect this could be a path-length issue (the system reports a max length of 285, and the relative path given above is already 230 chars.
I thought to run `git annex testremote --backend=MD5E` instead, to shorten the key length, but this options is not honored (enough), the error showing a SHA256 key remains the same.
`testremote` man page says "Also the git-annex-common-options(1) can be used." and `--backend` is explicitly listed in the help output, hence I assumed this should work.
### What steps will reproduce the problem?
It happens when running the https://github.com/datalad/datalad-dataverse tests on a windows appveyor worker. Running on a crippled FS is not enough to trigger the initial `testremote` error, it only happens on windows proper. However, I assume that `--backend` not having the effect that I assumed it should have, is not platform specific.
Here is a demo test log: https://ci.appveyor.com/project/mih/datalad-dataverse/builds/44079592/job/b38woai0ekmq7bn5#L856
The corresponding datalad issue is https://github.com/datalad/datalad-dataverse/issues/127
### What version of git-annex are you using? On what operating system?
CI used
- annex: 8.20211117-gc3af94eff
- git: 2.37.0.windows.1
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
All the time! Sorry to mostly show up when there is an issue!
[[!tag projects/datalad]]
[[!meta title="testremote failure on windows due to long filename issues"]]
> [[fixed|done]] --[[Joey]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="mih"
avatar="http://cdn.libravatar.org/avatar/f881df265a423e4f24eff27c623148fd"
subject="Recent version tested"
date="2022-07-06T12:18:36Z"
content="""
The behavior is the same for the more recent git-annex 10.20220624-g17e4081d4
"""]]

View file

@ -1,24 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2022-07-12T16:50:16Z"
content="""
Actually testremote will not accept --backend in current master, since that
is no longer a global option and is accepted only by commands that can
actually use it.
testremote cannot support an arbitrary backend here, because it needs to
generate a test key that cannot possibly be used for real data. The only
backend that has a way implemented to do that is SHA256. It would not,
for example, be possible to make the WORM backend support that, since every
possible WORM key could be used by real data.
It would be possible to add support for --backend=MD5 and have it reject
other backends. But this does not strike me as solving the real problem.
Also, in [[bugs/tests_fail_on_windows:_retrieveKeyFile_resume]]
I ran into this same problem, when `git-annex test` was ran, and
worked around it by disabling that part of the test suite on windows.
If this is fixed, it would be worth re-enabling that, although it may have
also been failing for other reasons on windows.
"""]]

View file

@ -1,24 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2022-07-12T17:42:51Z"
content="""
ghc's IO manager tries to support Windows long paths by normalizing to
an UNC-style path in many system calls. However, when git-annex calls
rename, on windows that ends up in Win32's moveFileEx (via unix-compat),
and that does not do UNC-style normalization. And given the description of
the Win32 package, I think it's intended to pass data directly through
to the API without anything fancy.
System.Directory.renamePath could be used instead of Win32.
While it still uses Win32 moveFileEx, it first does an UNC-style
normalization. Filed an issue:
<https://github.com/jacobstanley/unix-compat/issues/56>
Rather than waiting for that to be fixed, I've made git-annex
use System.Directory.renamePath instead itself. But I don't know
if it will be enough to make testremote work, or if it will fall over
on a later operation on the same too-long path.
getFileStatus/getSymbolicLinkStatus seem like the main things in
unix-compat that would still be a problem.
"""]]

View file

@ -1,245 +0,0 @@
[[!comment format=mdwn
username="mih"
avatar="http://cdn.libravatar.org/avatar/f881df265a423e4f24eff27c623148fd"
subject="Update for git-annex 10.20230227-ga206cdddb4"
date="2023-02-28T15:35:50Z"
content="""
Sorry for the long silence. Coming back to this issue I find the behavior changed, but not sufficiently to get the test suite to run in full on windows.
I ran `git annex testremote --fast` on Windows `msys_nt-10.0-17763` with git-annex 10.20230227-ga206cdddb4 and git 2.38.1.windows.1
[[!toggle id=\"ipsum\" text=\"Show test output\"]]
[[!toggleable id=\"ipsum\" text=\"\"\"
```
[00:14:23] E unavailable remote
[00:14:23] E removeKey: OK (0.02s)
[00:14:23] E storeKey: OK
[00:14:23] E checkPresent: OK
[00:14:23] E retrieveKeyFile: OK (0.03s)
[00:14:23] E retrieveKeyFileCheap: OK
[00:14:23] E key size Just 1048576; remote chunksize=0 encryption=none
[00:14:23] E removeKey when not present: OK (2.53s)
[00:14:23] E present False: OK (0.31s)
[00:14:23] E storeKey: FAIL
[00:14:23] E Exception: content not available to send
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=none.storeKey\"' to rerun this test only.
[00:14:23] E present True: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=none.present True\"' to rerun this test only.
[00:14:23] E storeKey when already present: FAIL
[00:14:23] E Exception: content not available to send
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=none.storeKey when already present/' to rerun this test only.
[00:14:23] E present True: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=none.present True\"' to rerun this test only.
[00:14:23] E retrieveKeyFile: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=none.retrieveKeyFile\"' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E retrieveKeyFile resume from 0: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=none.retrieveKeyFile resume from 0/' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E retrieveKeyFile resume from 33%: FAIL
[00:14:23] E Exception: .git\annex\objects\86a\533\SHA256E-s1048576--38d246a8a1798726446526c42d4603e4d4ceecc7a2030b774c4bfb89588a3591.this-is-a-test-key\SHA256E-s1048576--38d246a8a179
8726446526c42d4603e4d4ceecc7a2030b774c4bfb89588a3591.this-is-a-test-key: openBinaryFile: does not exist (No such file or directory)
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=none.retrieveKeyFile resume from 33%/' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E retrieveKeyFile resume from end: FAIL (0.07s)
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=none.retrieveKeyFile resume from end/' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E removeKey when present: OK
[00:14:23] E present False: OK
[00:14:23] E key size Just 1048576; remote chunksize=0 encryption=shared
[00:14:23] E removeKey when not present: OK (2.61s)
[00:14:23] E present False: OK (0.31s)
[00:14:23] E storeKey: FAIL
[00:14:23] E Exception: content not available to send
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=shared.storeKey\"' to rerun this test only.
[00:14:23] E present True: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=shared.present True\"' to rerun this test only.
[00:14:23] E storeKey when already present: FAIL
[00:14:23] E Exception: content not available to send
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=shared.storeKey when already present/' to rerun this test only.
[00:14:23] E present True: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=shared.present True\"' to rerun this test only.
[00:14:23] E retrieveKeyFile: FAIL (0.01s)
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=shared.retrieveKeyFile\"' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E retrieveKeyFile resume from 0: FAIL (0.05s)
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=shared.retrieveKeyFile resume from 0/' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E retrieveKeyFile resume from 33%: FAIL
[00:14:23] E Exception: .git\annex\objects\86a\533\SHA256E-s1048576--38d246a8a1798726446526c42d4603e4d4ceecc7a2030b774c4bfb89588a3591.this-is-a-test-key\SHA256E-s1048576--38d246a8a1798726446526c42d4603e4d4ceecc7a2030b774c4bfb89588a3591.this-is-a-test-key: openBinaryFile: does not exist (No such file or directory)
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=shared.retrieveKeyFile resume from 33%/' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E retrieveKeyFile resume from end: FAIL (0.08s)
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=shared.retrieveKeyFile resume from end/' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E removeKey when present: OK
[00:14:23] E present False: OK
[00:14:23] E key size Just 1048575; remote chunksize=0 encryption=none
[00:14:23] E removeKey when not present: OK
[00:14:23] E present False: OK
[00:14:23] E storeKey: FAIL
[00:14:23] E Exception: content not available to send
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=none.storeKey\"' to rerun this test only.
[00:14:23] E present True: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=none.present True\"' to rerun this test only.
[00:14:23] E storeKey when already present: FAIL
[00:14:23] E Exception: content not available to send
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=none.storeKey when already present/' to rerun this test only.
[00:14:23] E present True: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=none.present True\"' to rerun this test only.
[00:14:23] E retrieveKeyFile: FAIL (0.03s)
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=none.retrieveKeyFile\"' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E retrieveKeyFile resume from 0: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=none.retrieveKeyFile resume from 0/' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E retrieveKeyFile resume from 33%: FAIL
[00:14:23] E Exception: .git\annex\objects\8c4\c61\SHA256E-s1048575--79c930fcd7d08355513f158bbf82532eb6a25d5c8688f4504ac1e240e35e7dc5.this-is-a-test-key\SHA256E-s1048575--79c930fcd7d08355513f158bbf82532eb6a25d5c8688f4504ac1e240e35e7dc5.this-is-a-test-key: openBinaryFile: does not exist (No such file or directory)
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=none.retrieveKeyFile resume from 33%/' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E retrieveKeyFile resume from end: FAIL (0.07s)
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=none.retrieveKeyFile resume from end/' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E removeKey when present: OK
[00:14:23] E present False: OK
[00:14:23] E key size Just 1048575; remote chunksize=0 encryption=shared
[00:14:23] E removeKey when not present: OK
[00:14:23] E present False: OK
[00:14:23] E storeKey: FAIL
[00:14:23] E Exception: content not available to send
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=shared.storeKey\"' to rerun this test only.
[00:14:23] E present True: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=shared.present True\"' to rerun this test only.
[00:14:23] E storeKey when already present: FAIL
[00:14:23] E Exception: content not available to send
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=shared.storeKey when already present/' to rerun this test only.
[00:14:23] E present True: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=shared.present True\"' to rerun this test only.
[00:14:23] E retrieveKeyFile: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=shared.retrieveKeyFile\"' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E retrieveKeyFile resume from 0: FAIL
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=shared.retrieveKeyFile resume from 0/' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E retrieveKeyFile resume from 33%: FAIL
[00:14:23] E Exception: .git\annex\objects\8c4\c61\SHA256E-s1048575--79c930fcd7d08355513f158bbf82532eb6a25d5c8688f4504ac1e240e35e7dc5.this-is-a-test-key\SHA256E-s1048575--79c930fcd7d08355513f158bbf82532eb6a25d5c8688f4504ac1e240e35e7dc5.this-is-a-test-key: openBinaryFile: does not exist (No such file or directory)
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=shared.retrieveKeyFile resume from 33%/' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E retrieveKeyFile resume from end: FAIL (0.06s)
[00:14:23] E .\Command\TestRemote.hs:292:
[00:14:23] E failed
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=shared.retrieveKeyFile resume from end/' to rerun this test only.
[00:14:23] E fsck downloaded object: OK
[00:14:23] E removeKey when present: OK
[00:14:23] E present False: OK
[00:14:23] E exporttree=yes; key size Just 1048576; key size Just 1048575
[00:14:23] E check present export when not present: OK
[00:14:23] E remove export when not present: OK
[00:14:23] E store export: OK
[00:14:23] E check present export after store: OK
[00:14:23] E store export when already present: OK
[00:14:23] E retrieve export: OK
[00:14:23] E store new content to export: OK
[00:14:23] E check present export after store of new content: OK
[00:14:23] E retrieve export new content: OK
[00:14:23] E remove export: OK
[00:14:23] E check present export after remove: OK
[00:14:23] E retrieve export fails after removal: OK
[00:14:23] E remove export directory: OK
[00:14:23] E remove export directory that is already removed: OK
[00:14:23] E exporttree=yes; key size Just 1048576; key size Just 1048576
[00:14:23] E check present export when not present: OK
[00:14:23] E remove export when not present: OK
[00:14:23] E store export: OK
[00:14:23] E check present export after store: OK
[00:14:23] E store export when already present: OK
[00:14:23] E retrieve export: OK
[00:14:23] E store new content to export: OK
[00:14:23] E check present export after store of new content: OK
[00:14:23] E retrieve export new content: OK
[00:14:23] E remove export: OK
[00:14:23] E check present export after remove: OK
[00:14:23] E retrieve export fails after removal: OK
[00:14:23] E remove export directory: OK
[00:14:23] E remove export directory that is already removed: OK
[00:14:23] E exporttree=yes; key size Just 1048575; key size Just 1048575
[00:14:23] E check present export when not present: OK
[00:14:23] E remove export when not present: OK
[00:14:23] E store export: OK
[00:14:23] E check present export after store: OK
[00:14:23] E store export when already present: OK
[00:14:23] E retrieve export: OK
[00:14:23] E store new content to export: OK
[00:14:23] E check present export after store of new content: OK
[00:14:23] E retrieve export new content: OK
[00:14:23] E remove export: OK
[00:14:23] E check present export after remove: OK
[00:14:23] E retrieve export fails after removal: OK
[00:14:23] E remove export directory: OK
[00:14:23] E remove export directory that is already removed: OK
[00:14:23] E exporttree=yes; key size Just 1048575; key size Just 1048576
[00:14:23] E check present export when not present: OK
[00:14:23] E remove export when not present: OK
[00:14:23] E store export: OK
[00:14:23] E check present export after store: OK
[00:14:23] E store export when already present: OK
[00:14:23] E retrieve export: OK
[00:14:23] E store new content to export: OK
[00:14:23] E check present export after store of new content: OK
[00:14:23] E retrieve export new content: OK
[00:14:23] E remove export: OK
[00:14:23] E check present export after remove: OK
[00:14:23] E retrieve export fails after removal: OK
[00:14:23] E remove export directory: OK
[00:14:23] E remove export directory that is already removed: OK
[00:14:23] E
[00:14:23] E 32 out of 125 tests failed (6.39s)
```
[[!toggle id=\"ipsum\" text=\"hide\"]]
\"\"\"]]
Right now, I cannot say whether this is pointing to a problem in my implementation or still to something in git-annex. However, the same implementation passes the test suite on linux.
Sidenote: I am not sure of you have access to a windows system for debugging. If this is needed or helpful, please let me know.
Thanks!
"""]]

View file

@ -1,21 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2023-03-01T16:39:39Z"
content="""
Seems like my renamePath fix did work, because looking back at the origianl
failure log, it was failing to generate test keys, before it got to run the
test cases at all.
The new failures seem likely to be due to getFileStatus/getSymbolicLinkStatus
failing on the long filename on windows, as I suspected might happen in
comment #3. I've updated the issue at
<https://github.com/jacobstanley/unix-compat/issues/56>. And maybe that
will get fixed, my understanding is that unix-compat has a new maintainer
recently. But, git-annex does contain a convertToNativeNamespace function
that it could use to work around the problem itself.
(I am able to run Windows in emulation, but it's sufficiently slow and disk
hog that I generally am not in a position to do it easily and appreciate
users who can save me the bother.)
"""]]

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2023-03-01T19:56:27Z"
content="""
In [[!commit 54ad1b4cfb1c8302f1b862cb2699ab9351e3eb5b]] I fully worked
around this class of problems with unix-compat.
I think it's reasonably likely that every access of a file in git-annex
on Windows now goes through UNC-style normalization, allowing long
filenames to be used. Assuming that everything in ghc base does it, which I
think it does.
So good chance this is fixed now..
"""]]

Some files were not shown because too many files have changed in this diff Show more