move old fixed datalad/dandi/repronim bugs to the project pages
As done previously in 2023 in commit bcc69f07e8
Commands used:
for f in $(git grep -l '\[\[!tag projects/dandi\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/dandi/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/dandi/bugs-done; fi; fi; done
for f in $(git grep -l '\[\[!tag projects/repronim\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/repronim/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/repronim/bugs-done; fi; fi; done
for f in $(git grep -l '\[\[!tag projects/datalad\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/datalad/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/datalad/bugs-done; fi; fi; done
This commit is contained in:
parent
2fe36b35a2
commit
292acd3c28
108 changed files with 0 additions and 0 deletions
|
@ -1,42 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
For the past few days consistently
|
||||
|
||||
```
|
||||
496 T Mar 06 GitHub Actions *-4.4* (4.8K/0) datalad/git-annex daily summary: 14 PASSED, 3 FAILED, 5 INCOMPLETE
|
||||
935 T Mar 05 GitHub Actions *-4.3* (4.8K/0) datalad/git-annex daily summary: 14 PASSED, 3 FAILED, 5 INCOMPLETE
|
||||
1471 N T Mar 04 GitHub Actions *-1.9* (4.8K/0) datalad/git-annex daily summary: 15 PASSED, 2 FAILED, 5 INCOMPLETE
|
||||
1704 T Mar 03 GitHub Actions *-0.9* (4.8K/0) datalad/git-annex daily summary: 15 PASSED, 2 FAILED, 5 INCOMPLETE
|
||||
2619 T Mar 01 GitHub Actions *-3.1* (6.5K/0) datalad/git-annex daily summary: 30 PASSED
|
||||
2935 O T Feb 28 GitHub Actions *-3.8* (6.5K/0) datalad/git-annex daily summary: 30 PASSED
|
||||
```
|
||||
|
||||
[sample build on OSX](https://github.com/datalad/git-annex/actions/runs/4320138939/jobs/7540059666) says
|
||||
|
||||
```
|
||||
|
||||
Utility/RawFilePath.hs:40:1: error:
|
||||
Could not load module ‘System.Posix.Files.ByteString’
|
||||
It is a member of the hidden package ‘unix-2.7.2.2’.
|
||||
You can run ‘:set -package unix’ to expose it.
|
||||
(Note: this unloads all the modules in the current scope.)
|
||||
Use -v (or `:set -v` in ghci) to see a list of the files searched for.
|
||||
|
|
||||
40 | import System.Posix.Files.ByteString
|
||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Utility/RawFilePath.hs:41:1: error:
|
||||
Could not load module ‘System.Posix.Directory.ByteString’
|
||||
It is a member of the hidden package ‘unix-2.7.2.2’.
|
||||
You can run ‘:set -package unix’ to expose it.
|
||||
(Note: this unloads all the modules in the current scope.)
|
||||
Use -v (or `:set -v` in ghci) to see a list of the files searched for.
|
||||
|
|
||||
41 | import qualified System.Posix.Directory.ByteString as D
|
||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
```
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-03-08T16:17:53Z"
|
||||
content="""
|
||||
Windows was already fixed.
|
||||
|
||||
OSX got further than that after some fixes on Monday. I've fixed
|
||||
one more build problem on it which may get it to build again.
|
||||
"""]]
|
|
@ -1,68 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Unable to addurl to a `file:///` on Windows
|
||||
|
||||
1. doesn't understand `file:///C:/`
|
||||
2. with `file://C:/` blows with permission denied:
|
||||
|
||||
[[!format sh """
|
||||
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>git annex addurl --file buga file:///C:/123
|
||||
addurl file:///C:/123
|
||||
download failed: /C:/123: openBinaryFile: invalid argument (Invalid argument)
|
||||
failed
|
||||
git-annex: addurl: 1 failed
|
||||
|
||||
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>git annex addurl --file buga file://C:/123
|
||||
addurl file://C:/123
|
||||
(to buga)
|
||||
git-annex: .git\annex\tmp\URL-s6--file&c%%C&c%123: renameFile:renamePath:MoveFileEx "\\\\?\\C:\\Users\\appveyor\\
|
||||
AppData\\Local\\Temp\\1\\datalad_temp_testrepo_tmphjl88\\.git\\annex\\tmp\\URL-s6--file&c%%C&c%123" Just "\\\\?\\
|
||||
C:\\Users\\appveyor\\AppData\\Local\\Temp\\1\\datalad_temp_testrepo_tmphjl88\\buga": permission denied (The proce
|
||||
ss cannot access the file because it is being used by another process.)
|
||||
failed
|
||||
git-annex: addurl: 1 failed
|
||||
|
||||
"""]]
|
||||
|
||||
here is some relevant details (and showing curl handling both file:// and file:///):
|
||||
[[!format sh """
|
||||
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>git status
|
||||
On branch adjusted/master(unlocked)
|
||||
nothing to commit, working tree clean
|
||||
|
||||
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>git annex version
|
||||
git-annex version: 7.20181205-g51d6f38b1
|
||||
build flags: Assistant Webapp Pairing S3(multipartupload)(storageclasses) WebDAV TorrentParser Feeds Testsuite
|
||||
dependency versions: aws-0.17.1 bloomfilter-2.0.1.0 cryptonite-0.23 DAV-1.3.1 feed-0.3.12.0 ghc-8.0.2 http-client
|
||||
-0.5.7.1 persistent-sqlite-2.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.4.5
|
||||
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3
|
||||
_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B51
|
||||
2E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2S256E BLAKE2S256 BLAKE2S
|
||||
160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM
|
||||
URL
|
||||
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar hook external
|
||||
operating system: mingw32 i386
|
||||
supported repository versions: 5 7
|
||||
upgrade supported from repository versions: 2 3 4 5 6
|
||||
local repository version: 7
|
||||
|
||||
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>git status
|
||||
On branch adjusted/master(unlocked)
|
||||
nothing to commit, working tree clean
|
||||
|
||||
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>curl file://C:/123
|
||||
124
|
||||
|
||||
C:\...pData\Local\Temp\1\datalad_temp_testrepo_tmphjl88>curl file:///C:/123
|
||||
124
|
||||
"""]]
|
||||
|
||||
More information about this appveyor server could be obtained from [datalad wtf](http://paste.debian.net/1055359/) output
|
||||
|
||||
Awhile back we [had related discussion](https://git-annex.branchable.com/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/) but at least `addurl` seemed to work then.
|
||||
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/repronim]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,12 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-03-27T16:27:00Z"
|
||||
content="""
|
||||
I tried this on windows, and the second command succeeds now.
|
||||
|
||||
The first command still fails as shown.
|
||||
|
||||
At this point, what's left of this bug seems to be the same as
|
||||
<https://git-annex.branchable.com/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/>
|
||||
"""]]
|
|
@ -1,7 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2023-03-27T17:57:12Z"
|
||||
content="""
|
||||
Ok, put in an ugly hack to fix this.
|
||||
"""]]
|
|
@ -1,62 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Somewhat too late for our current usecase since older git-annex would not know about it, but I think could be generalized into adding a configuration variable right away for **any** automated migration. E.g. there is no variable to prevent autoupgrades of the repos (e.g. from v5 to the next one etc), but AFAIK there is none for automated conversion" into `adjusted/master(unlocked)` mode.
|
||||
|
||||
Rationale: With thaw/freeze commands we now can use git-annex in indirect (default) mode on our HPC. But that requires a recent version of git-annex. User might have some other (older) version of git-annex available system-wide by default, and if the user forgets to switch to new version of git-annex before using it, it might trigger git-annex to realize that it operates on crippled FS, and since not knowing about thaw/freeze -- it just migrates repository to adjusted, which is very undesired.
|
||||
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
here is a demo of older git-annex going back to adjusted branch mode... yet to discover how else we could have migrated without directly invoking `git annex init`:
|
||||
|
||||
```
|
||||
[d31548v@discovery7 d31548v]$ mkdir repo
|
||||
[d31548v@discovery7 d31548v]$ cd repo
|
||||
[d31548v@discovery7 repo]$ git init
|
||||
Initialized empty Git repository in /dartfs/rc/lab/D/DBIC/DBIC/d31548v/repo/.git/
|
||||
[d31548v@discovery7 repo]$ git config --add annex.thawcontent-command "$HOME/bin-annex/thaw-content %path"
|
||||
[d31548v@discovery7 repo]$ git config --add annex.freezecontent-command "$HOME/bin-annex/freeze-content %path"
|
||||
[d31548v@discovery7 repo]$ git annex init
|
||||
init ok
|
||||
(recording state in git...)
|
||||
[d31548v@discovery7 repo]$ echo 123 > 123
|
||||
[d31548v@discovery7 repo]$ git annex add 123
|
||||
add 123
|
||||
ok
|
||||
(recording state in git...)
|
||||
git comm[d31548v@discovery7 repo]$ git commit -m 'added 123 in indirect mode' 123
|
||||
[master (root-commit) 3ceb200] added 123 in indirect mode
|
||||
1 file changed, 1 insertion(+)
|
||||
create mode 120000 123
|
||||
[d31548v@discovery7 repo]$ ls -ld 123
|
||||
lrwxr-x--- 1 d31548v rc-DBIC 178 May 6 11:19 123 -> .git/annex/objects/G6/qW/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b
|
||||
[d31548v@discovery7 repo]$ ls -l .git/annex/objects
|
||||
total 3
|
||||
drwxr-x--- 3 d31548v rc-DBIC 20 May 6 11:19 G6
|
||||
[d31548v@discovery7 repo]$ export PATH=/opt/bin:$PATH
|
||||
[d31548v@discovery7 repo]$ git annex version | head -n 1
|
||||
git-annex version: 8.20200502-g55acb2e52
|
||||
[d31548v@discovery7 repo]$ git annex drop 123
|
||||
drop 123
|
||||
git-annex: failed to lock content: .git/annex/objects/G6/qW/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b: openFd: permission denied (Permission denied)
|
||||
failed
|
||||
git-annex: drop: 1 failed
|
||||
[d31548v@discovery7 repo]$ git annex init
|
||||
init
|
||||
Filesystem allows writing to files whose write bit is not set.
|
||||
|
||||
Detected a crippled filesystem.
|
||||
|
||||
Disabling core.symlinks.
|
||||
(scanning for unlocked files...)
|
||||
|
||||
Entering an adjusted branch where files are unlocked as this filesystem does not support locked files.
|
||||
|
||||
Switched to branch 'adjusted/master(unlocked)'
|
||||
ok
|
||||
```
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> [[notabug|done]] per comments --[[Joey]]
|
|
@ -1,15 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2022-05-09T15:00:47Z"
|
||||
content="""
|
||||
git-annex does not enter adjusted branch mode except on `git-annex
|
||||
init` or when you explcitly tell it to. The only exception to this that I
|
||||
can find is that upgrading from a v5 repository that was in direct mode
|
||||
will enter an adjusted branch.
|
||||
|
||||
Switching back from an adjusted branch to master is a simple `git
|
||||
checkout`.
|
||||
|
||||
These two facts do not argue for a separate config setting IMHO.
|
||||
"""]]
|
|
@ -1,13 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2022-05-09T15:13:39Z"
|
||||
content="""
|
||||
It might be worth preventing `git-annex init` when in an existing, already
|
||||
initalized repo from entering an adjusted branch. But re-running `git-annex
|
||||
init` generally re-does initialization, except for generating a new UUID
|
||||
and description. If a repo has been moved to a crippled filesystem,
|
||||
I think it would be reasonable for a user to expect re-running git-annex
|
||||
init will react to that. (Which can also involve setting annex.pidlock or
|
||||
disabling annex.sshcaching.)
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 3"
|
||||
date="2023-02-17T21:50:19Z"
|
||||
content="""
|
||||
here my \"concern\" was freeze/thawing procedure. I am yet to get to the bottom of \"variance\" of how differently different ACL paths behave (some exploration from this friday is [here](https://github.com/dbic/handbook/issues/20)). And what I am afraid is that at some point, something would \"trigger\" git-annex to decide that path here is crippled now -- go to adjusted branches mode. If you say it cannot happen, it is ok - I will become more brave ;-)
|
||||
"""]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2023-02-20T18:40:00Z"
|
||||
content="""
|
||||
Well, if you unset annex.version, it will automatically reinitialize, and
|
||||
would enter an adjusted branch if a crippled filesystem was detected.
|
||||
|
||||
That's the only way I can see that does not involve you running
|
||||
`git-annex init` (or upgrade from v5 direct mode as mentioned earlier).
|
||||
"""]]
|
|
@ -1,58 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Here is a reproducer
|
||||
```
|
||||
#!/bin/bash
|
||||
|
||||
export PS4='> '
|
||||
set -x
|
||||
set -eu
|
||||
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
|
||||
|
||||
mkdir d-in d-repo
|
||||
echo content >| d-in/file
|
||||
|
||||
function dance() {
|
||||
git annex import master --from d-in
|
||||
# but we need to merge it
|
||||
git merge d-in/master
|
||||
ls -l
|
||||
grep -e . *
|
||||
}
|
||||
|
||||
cd d-repo
|
||||
git init
|
||||
git annex init
|
||||
git annex initremote d-in type=directory directory=../d-in exporttree=yes importtree=yes encryption=none
|
||||
git config annex.addunlocked true
|
||||
|
||||
ls -l ../d-in
|
||||
dance
|
||||
|
||||
echo "sample" > samplefile
|
||||
git annex add samplefile
|
||||
git commit -m 'Committing explicitly samplefile'
|
||||
ls -l samplefile
|
||||
git show
|
||||
|
||||
dance
|
||||
|
||||
```
|
||||
|
||||
which even if using super fresh annex 10.20240831+git21-gd717e9aca0-1~ndall+1 shows that files which were obtained via `annex import` and not added unlocked, whenever those which are `git annex add`ed directly, are:
|
||||
|
||||
```
|
||||
> ls -l
|
||||
total 8
|
||||
lrwxrwxrwx 1 yoh yoh 178 Sep 11 16:45 file -> .git/annex/objects/zm/2W/SHA256E-s8--434728a410a78f56fc1b5899c3593436e61ab0c731e9072d95e96db290205e53/SHA256E-s8--434728a410a78f56fc1b5899c3593436e61ab0c731e9072d95e96db290205e53
|
||||
-rw-rw-r-- 1 yoh yoh 7 Sep 11 16:45 samplefile
|
||||
```
|
||||
|
||||
IMHO behavior of `import` should respect setting of `annex.addunlocked`.
|
||||
|
||||
This was to consider using `import` for a folder with DANDI stats. For now I will just add them directly.
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2024-12-18T19:24:38Z"
|
||||
content="""
|
||||
Turns out that while `git-annex import` from a directory does support
|
||||
addunlocked, this was forgotten about when implementing the newer special
|
||||
remote tree import.
|
||||
|
||||
I agree that this should be supported.
|
||||
"""]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2024-12-19T15:31:59Z"
|
||||
content="""
|
||||
Note that for --no-content imports, it will not be possible for mimetype=
|
||||
and mimeencoding= expressions to match.
|
||||
|
||||
So if addunlocked is set to such an expression, it will not match and will
|
||||
add the file locked. Does not seem like a blocker.
|
||||
"""]]
|
|
@ -1,7 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2024-12-19T15:34:48Z"
|
||||
content="""
|
||||
Implemented this.
|
||||
"""]]
|
|
@ -1,26 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Familiarizing myself more with adjusted branches mode and might be doing smth wrong. But in this http://www.oneukrainian.com/tmp/case-20230630.tgz case I observe that `annex sync` simply updates `master` to some prior state, thus possibly silently causing a data loss for me if I don't spot it:
|
||||
|
||||
```
|
||||
❯ tar -xzf case-20230630.tgz
|
||||
❯ cd case
|
||||
content.html@ datasets.datalad.org/ subfolder/
|
||||
❯ ( source ~/git-annexes/10.20230626+git13-g029d12815c.env; git annex version | head -n 1; git describe master; git checkout 'adjusted/master(unlocked)'; git annex sync ; git describe master; )
|
||||
git-annex version: 10.20230626+git13-g029d12815c-1~ndall+1
|
||||
0.0.0-2-gf34191a
|
||||
Switched to branch 'adjusted/master(unlocked)'
|
||||
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
|
||||
commit
|
||||
On branch adjusted/master(unlocked)
|
||||
nothing to commit, working tree clean
|
||||
ok
|
||||
0.0.0-1-gde710c5
|
||||
```
|
||||
|
||||
PS investigation of adjusted/unlocked came up in ReproNim context where people wanted a "hard copy" of the fmriprep results without symlinks to simplify navigation of the results in the browser, which otherwise due to browser resolving symlinks makes it hard and require a workaround like starting a webserver [as we documented in dbic handbook](https://dbic-handbook.readthedocs.io/en/latest/datalad.html#how-to-view-mriqcfmriprepetc-dataladified-results-in-a-browser)
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/repronim]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,33 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-07-05T19:49:19Z"
|
||||
content="""
|
||||
Simplified test case:
|
||||
|
||||
git init tc
|
||||
cd tc
|
||||
git-annex init
|
||||
echo 1 > foo
|
||||
git-annex add
|
||||
git commit -m add
|
||||
git annex adjust --unlock
|
||||
git checkout master
|
||||
rm foo
|
||||
echo 2 > foo
|
||||
git-annex add
|
||||
git commit -m "this commit will be lost"
|
||||
git checkout 'adjusted/master(unlocked)'
|
||||
git annex adjust --unlock # or git-annex sync
|
||||
git log master
|
||||
|
||||
What an unfortunate oversight! And it's not a reversion, it's been there
|
||||
since the beginning of adjusted branches.
|
||||
|
||||
git-annex adjust should display a warning message in that situation,
|
||||
since the original branch has diverged from the adjusted branch.
|
||||
|
||||
And git-annex sync should be able to resolve the divergence by
|
||||
auto-merging the changes from the original branch into the adjusted
|
||||
branch.
|
||||
"""]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2023-07-05T21:01:53Z"
|
||||
content="""
|
||||
I've fixed the data loss part of this bug.
|
||||
|
||||
`git-annex sync` is able to resolve the divergence too. But for some
|
||||
reason, the first time it's run after the divergence, it leaves it
|
||||
diverged, and the second time it resolves it. That needs to be fixed.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2023-07-06T16:16:36Z"
|
||||
content="""
|
||||
Ok, fixed git-annex sync to immediately merge the changes from the original
|
||||
branch into the adjusted branch.
|
||||
"""]]
|
|
@ -1,36 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
I have not checked if it is legit to have an "empty" port number in the (http) URL but I see that `git` itself handles it fine, but git-annex is not happy:
|
||||
|
||||
```
|
||||
❯ git clone https://datasets.datalad.org:/dbic/QA/.git/
|
||||
Cloning into 'QA'...
|
||||
warning: redirecting to https://datasets.datalad.org/dbic/QA/.git/
|
||||
remote: Enumerating objects: 61661, done.
|
||||
remote: Counting objects: 100% (61661/61661), done.
|
||||
remote: Compressing objects: 100% (23181/23181), done.
|
||||
remote: Total 61661 (delta 31300), reused 56651 (delta 26299)
|
||||
Receiving objects: 100% (61661/61661), 33.27 MiB | 25.03 MiB/s, done.
|
||||
Resolving deltas: 100% (31300/31300), done.
|
||||
|
||||
❯ git annex get sub-emmet/ses-20180508/anat/sub-emmet_ses-20180508_acq-MPRAGE_T1w.nii.gz
|
||||
|
||||
Remote origin not usable by git-annex; setting annex-ignore
|
||||
|
||||
https://datasets.datalad.org:/dbic/QA/.git//config download failed: Unsupported url scheme https://datasets.datalad.org:/dbic/QA/.git//config
|
||||
get sub-emmet/ses-20180508/anat/sub-emmet_ses-20180508_acq-MPRAGE_T1w.nii.gz (from datasets.datalad.org...)
|
||||
(scanning for annexed files...)
|
||||
ok
|
||||
(recording state in git...)
|
||||
```
|
||||
|
||||
so it got the file only after enabling type=git special remote for the same location but with correct URL.
|
||||
|
||||
I think it would be nice if git-annex was as robust as git in such cases to avoid "late surprise".
|
||||
|
||||
Backstory: Happened to a user trying to access some NWB files on gin for DANDI project, here I used different/simpler/faster URL
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,26 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-02-10T17:04:51Z"
|
||||
content="""
|
||||
Not a legal url really, RFC 1738 says "If the port is omitted, the colon is as well."
|
||||
But web browsers, curl, wget, etc do mostly seem to support it, so at least
|
||||
Postel's law seems to apply..
|
||||
|
||||
Here's the root cause of it failing:
|
||||
|
||||
ghci> parseRequest "https://datasets.datalad.org:/dbic/QA/.git/"
|
||||
*** Exception: InvalidUrlException "https://datasets.datalad.org:/dbic/QA/.git/" "Invalid port"
|
||||
|
||||
So http-conduit refuses to parse it and so can't be used to download it.
|
||||
|
||||
Filed an issue, but I don't know if they'll want to change
|
||||
http-conduit to accept a malformed url.
|
||||
<https://github.com/snoyberg/http-client/issues/501>
|
||||
|
||||
Since network-uri is able to parse it, into an URI
|
||||
that has `"uriPort = ":"`, git-annex could special
|
||||
case handling of the empty port there, changing it to ""
|
||||
and so generating an url that http-conduit can parse.
|
||||
I've implemented this fix.
|
||||
"""]]
|
|
@ -1,190 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Our DataLad test which explicitly tests that we are not breeding commits in git-annex branch while adding files/urls to point to datalad-archive special remote started to fail going from git-annex 10.20240532-gf9ce7a452cc0fd5cdd2d58739741f7264fdbc598 to 10.20240532-g28f5c47b5a0daf96e5ed9aa719ff1e2763d3cc8b
|
||||
(invocation: `python -m pytest -s -v datalad/local/tests/test_add_archive_content.py::TestAddArchiveOptions::test_add_delete_after_and_drop_subdir`)
|
||||
|
||||
If before we had a single commit
|
||||
<details>
|
||||
<summary></summary>
|
||||
|
||||
```shell
|
||||
❯ git log -p git-annex^..git-annex
|
||||
commit b42433cab9f671d206fe937ee7b68b53f11a0c54 (git-annex)
|
||||
Author: DataLad Tester <test@example.com>
|
||||
Date: Sun Jun 30 10:48:16 2024 -0400
|
||||
|
||||
update
|
||||
|
||||
diff --git a/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log
|
||||
new file mode 100644
|
||||
index 0000000..cc638db
|
||||
--- /dev/null
|
||||
+++ b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log
|
||||
@@ -0,0 +1,2 @@
|
||||
+1719758896s 1 c04eb54b-4b4e-5755-8436-866b043170fa
|
||||
+1719758897s 0 d53ab0e3-21a9-4084-806f-bf9f5812f34e
|
||||
diff --git a/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log.web b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log.web
|
||||
new file mode 100644
|
||||
index 0000000..8ef0f1f
|
||||
--- /dev/null
|
||||
+++ b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log.web
|
||||
@@ -0,0 +1 @@
|
||||
+1719758896s 1 :dl+archive:MD5E-s3584--2f350c3650d5e3a21785d55f5a94ce70.tar#path=1/file.txt&size=4
|
||||
diff --git a/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
|
||||
new file mode 100644
|
||||
index 0000000..cc638db
|
||||
--- /dev/null
|
||||
+++ b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
|
||||
@@ -0,0 +1,2 @@
|
||||
+1719758896s 1 c04eb54b-4b4e-5755-8436-866b043170fa
|
||||
+1719758897s 0 d53ab0e3-21a9-4084-806f-bf9f5812f34e
|
||||
diff --git a/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log.web b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log.web
|
||||
new file mode 100644
|
||||
index 0000000..30bb5e9
|
||||
--- /dev/null
|
||||
+++ b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log.web
|
||||
@@ -0,0 +1 @@
|
||||
+1719758896s 1 :dl+archive:MD5E-s3584--2f350c3650d5e3a21785d55f5a94ce70.tar#path=1/1.dat&size=5
|
||||
|
||||
```
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>now we got two</summary>
|
||||
|
||||
```shell
|
||||
Author: DataLad Tester <test@example.com>
|
||||
Date: Sun Jun 30 10:45:12 2024 -0400
|
||||
|
||||
update
|
||||
|
||||
diff --git a/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log
|
||||
new file mode 100644
|
||||
index 0000000..97acf53
|
||||
--- /dev/null
|
||||
+++ b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log
|
||||
@@ -0,0 +1,2 @@
|
||||
+1719758713s 0 86661c7b-0604-49e7-8d65-1baf4ca9f469
|
||||
+1719758712s 1 c04eb54b-4b4e-5755-8436-866b043170fa
|
||||
diff --git a/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log.web b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log.web
|
||||
new file mode 100644
|
||||
index 0000000..e5bafba
|
||||
--- /dev/null
|
||||
+++ b/d77/a0b/MD5E-s4--ec4d1eb36b22d19728e9d1d23ca84d1c.txt.log.web
|
||||
@@ -0,0 +1 @@
|
||||
+1719758712s 1 :dl+archive:MD5E-s3584--de6498c9ca26fee011f289f5f5972ed0.tar#path=1/file.txt&size=4
|
||||
diff --git a/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
|
||||
index 11934b6..97acf53 100644
|
||||
--- a/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
|
||||
+++ b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
|
||||
@@ -1,2 +1,2 @@
|
||||
-1719758712s 1 86661c7b-0604-49e7-8d65-1baf4ca9f469
|
||||
+1719758713s 0 86661c7b-0604-49e7-8d65-1baf4ca9f469
|
||||
1719758712s 1 c04eb54b-4b4e-5755-8436-866b043170fa
|
||||
|
||||
commit 8c4fdbadb4b1735cbb47f833ef99235790b8bcbf
|
||||
Author: DataLad Tester <test@example.com>
|
||||
Date: Sun Jun 30 10:45:12 2024 -0400
|
||||
|
||||
update
|
||||
|
||||
diff --git a/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
|
||||
new file mode 100644
|
||||
index 0000000..11934b6
|
||||
--- /dev/null
|
||||
+++ b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log
|
||||
@@ -0,0 +1,2 @@
|
||||
+1719758712s 1 86661c7b-0604-49e7-8d65-1baf4ca9f469
|
||||
+1719758712s 1 c04eb54b-4b4e-5755-8436-866b043170fa
|
||||
diff --git a/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log.web b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log.web
|
||||
new file mode 100644
|
||||
index 0000000..107c66f
|
||||
--- /dev/null
|
||||
+++ b/f45/7f1/MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat.log.web
|
||||
@@ -0,0 +1 @@
|
||||
+1719758712s 1 :dl+archive:MD5E-s3584--de6498c9ca26fee011f289f5f5972ed0.tar#path=1/1.dat&size=5
|
||||
```
|
||||
</details>
|
||||
|
||||
for the same effect. And I believe the command which triggers them is `['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'addurl', '--with-files', '--json', '--json-error-messages', '--batch']` which before (for years?!) resulted in expected single commit.
|
||||
|
||||
<details>
|
||||
<summary>Here is the full set of datalad logs for the steps triggering that </summary>
|
||||
|
||||
```shell
|
||||
[DEBUG ] Determined class of decorated function: <class 'datalad.local.add_archive_content.AddArchiveContent'>
|
||||
[DEBUG ] Resolved dataset to add-archive-content: /home/yoh/.tmp/datalad_temp_tree_rsua9kmg
|
||||
[DEBUG ] Determined class of decorated function: <class 'datalad.core.local.status.Status'>
|
||||
[DEBUG ] Resolved dataset to report status: /home/yoh/.tmp/datalad_temp_tree_rsua9kmg
|
||||
[DEBUG ] Querying AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).diffstatus() for paths: [PosixPath('/home/yoh/.tmp/datalad_temp_tree_rsua9kmg/subdir/1.tar')]
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
|
||||
[DEBUG ] Query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory', '--', 'subdir/1.tar'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] Done query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
|
||||
[DEBUG ] Done AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'ls-files', '-z', '-m', '-d', '--', 'subdir/1.tar'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
|
||||
[DEBUG ] Query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l', '--', 'subdir/1.tar'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] Done query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
|
||||
[DEBUG ] Done AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'status', '--porcelain', '--untracked-files=normal', '--ignore-submodules=none'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'find', '--anything', '--json', '--json-error-messages', '-c', 'annex.dotfiles=true', '--', 'subdir/1.tar'] (protocol_class=AnnexJsonProtocol) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] Finished ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'find', '--anything', '--json', '--json-error-messages', '-c', 'annex.dotfiles=true', '--', 'subdir/1.tar'] with status 0
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'contentlocation', 'MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar', '-c', 'annex.dotfiles=true'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[INFO ] Adding content of the archive subdir/1.tar into annex AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] Initiating clean cache for the archives under /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives
|
||||
[DEBUG ] Cache initialized
|
||||
[DEBUG ] Not initiating existing cache for the archives under /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives
|
||||
[DEBUG ] Cached directory for archive /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/annex/objects/gg/zf/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar is fbab09b98e
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'cat-file', 'blob', 'git-annex:remote.log'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[Level 11] CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false cat-file blob git-annex:remote.log' failed with exitcode 128 [err: 'fatal: path 'remote.log' does not exist in 'git-annex'']
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'cat-file', 'blob', 'git-annex:trust.log'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[Level 11] CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false cat-file blob git-annex:trust.log' failed with exitcode 128 [err: 'fatal: path 'trust.log' does not exist in 'git-annex'']
|
||||
[INFO ] Initializing special remote datalad-archives
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'initremote', 'datalad-archives', 'encryption=none', 'type=external', 'autoenable=true', 'externaltype=datalad-archives', 'uuid=c04eb54b-4b4e-5755-8436-866b043170fa', '-c', 'annex.dotfiles=true'] (protocol_class=StdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] Finished ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'initremote', 'datalad-archives', 'encryption=none', 'type=external', 'autoenable=true', 'externaltype=datalad-archives', 'uuid=c04eb54b-4b4e-5755-8436-866b043170fa', '-c', 'annex.dotfiles=true'] with status 0
|
||||
[DEBUG ] Run ['git', 'config', '-z', '-l', '--show-origin'] (protocol_class=StdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] Finished ['git', 'config', '-z', '-l', '--show-origin'] with status 0
|
||||
[DEBUG ] Acquiring a lock /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e.extract-lck
|
||||
[DEBUG ] Acquired? lock /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e.extract-lck: True
|
||||
[DEBUG ] Extracting /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/annex/objects/gg/zf/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar under /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e
|
||||
[DEBUG ] Run ['7z', 'x', '/home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/annex/objects/gg/zf/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar'] (protocol_class=KillOutput) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e)
|
||||
[DEBUG ] Finished ['7z', 'x', '/home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/annex/objects/gg/zf/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar'] with status 0
|
||||
[DEBUG ] Releasing lock /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e.extract-lck
|
||||
[INFO ] Start Extracting archive
|
||||
[DEBUG ] Adding /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/subdir/.dataladiwgxvqzi/1/1.dat to annex pointing to dl+archive:MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar#path=1/1.dat&size=5 and with options None
|
||||
[DEBUG ] Starting new runner for BatchedAnnex(command=['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'addurl', '--with-files', '--json', '--json-error-messages', '--batch'], encoding=None, exception_on_timeout=False, last_request=None, output_proc=<function readline_json at 0x7f165f5adf80>, path=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg, return_code=None, runner=None, stderr_output=b'', timeout=None, wait_timed_out=None)
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'addurl', '--with-files', '--json', '--json-error-messages', '--batch'] (protocol_class=BatchedCommandProtocol) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] Starting new runner for BatchedAnnex(command=['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'dropkey', '--force', '--json', '--json-error-messages', '--batch'], encoding=None, exception_on_timeout=False, last_request=None, output_proc=<function readline_json at 0x7f165f5adf80>, path=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg, return_code=None, runner=None, stderr_output=b'', timeout=None, wait_timed_out=None)
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'annex', 'dropkey', '--force', '--json', '--json-error-messages', '--batch'] (protocol_class=BatchedCommandProtocol) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] Adding /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/subdir/.dataladiwgxvqzi/1/file.txt to annex pointing to dl+archive:MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar#path=1/file.txt&size=4 and with options None
|
||||
[INFO ] Finished adding subdir/1.tar: Files processed: 2, renamed: 2, removed: 2, +annex: 2
|
||||
[DEBUG ] Removing extracted and annexed files under /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/subdir/.dataladiwgxvqzi
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'rm', '--force', '-r', '--', 'subdir/.dataladiwgxvqzi'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] Query status of AnnexRepo('/home/yoh/.tmp/datalad_temp_tree_rsua9kmg') for all paths
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
|
||||
[DEBUG ] Query repo: ['ls-files', '--stage', '-z']
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'ls-files', '--stage', '-z'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] Done query repo: ['ls-files', '--stage', '-z']
|
||||
[DEBUG ] Done AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'ls-files', '-z', '-m', '-d'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
|
||||
[DEBUG ] Query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
|
||||
[DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/home/yoh/.tmp/datalad_temp_tree_rsua9kmg)
|
||||
[DEBUG ] Done query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
|
||||
[DEBUG ] Done AnnexRepo(/home/yoh/.tmp/datalad_temp_tree_rsua9kmg).get_content_info(...)
|
||||
[INFO ] Extracting archive 2 Files done in 0.872975 sec at 2.29102 Files/sec
|
||||
[DEBUG ] Cleaning up the cache for /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/annex/objects/gg/zf/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar under /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e
|
||||
[DEBUG ] Cleaning up the stamp file for /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/annex/objects/gg/zf/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar/MD5E-s3584--bb87b72d411b7415410da27950d2a165.tar under /home/yoh/.tmp/datalad_temp_tree_rsua9kmg/.git/datalad/tmp/archives/fbab09b98e.stamp
|
||||
add-archive-content(ok): /home/yoh/.tmp/datalad_temp_tree_rsua9kmg (dataset)
|
||||
|
||||
```
|
||||
</details>
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/repronim]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,19 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2024-07-31T14:20:51Z"
|
||||
content="""
|
||||
Note that this does not affect the number of commits made by `addurl` generally
|
||||
eg when adding multiple urls with --batch from the web.
|
||||
|
||||
Also, I don't think that the commits you picked out and showed necessarily
|
||||
correspond to one-another. The state being recorded in the commit in the 1st
|
||||
run is not the same as the state that gets recorded by the two commits in the
|
||||
2nd run. Unless, there is an actual behavior change that eg, leaves the file
|
||||
present in a repository that it was not present in before.
|
||||
|
||||
In the first run the commit shows key
|
||||
MD5E-s5--db87ebcba59a8c9f34b68e713c08a718.dat ends up recorded as present in
|
||||
datalad-archives but not in the local repository. In the second run, the
|
||||
commits show that the same key ends up recorded present in both repositories.
|
||||
"""]]
|
|
@ -1,23 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2024-07-31T16:06:38Z"
|
||||
content="""
|
||||
Bisected to [[!commit 780367200b14d532f745079dfa09ffaa214d0a84]],
|
||||
"remove dead nodes when loading the cluster log".
|
||||
|
||||
Replacing `loadClusters` with a noop on top of that commit gets the test
|
||||
suite passing again.
|
||||
|
||||
Since nothing in `loadClusters` involves the location log at all, I think
|
||||
this must come down to a difference in when/if git-annex starts reading
|
||||
from the git-annex branch. There could be git-annex commands that didn't
|
||||
used to read from the branch before, that now do. Which might mean merging
|
||||
in other git-annex branches at different points in time than happened
|
||||
before, which I suppose can result in an additional commit.
|
||||
|
||||
Unfortunately, I can't avoid the early `loadClusters` for reasons explained
|
||||
in that commit.
|
||||
|
||||
Anyway, I doubt this will result in a lot of additional commits.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2024-07-31T19:50:38Z"
|
||||
content="""
|
||||
Aha! I found a way around the dependency loop.
|
||||
|
||||
This is fixed.
|
||||
"""]]
|
|
@ -1,88 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
I need to "quickly" ensure that remote has all the files it should have gotten. For that I use invocation like
|
||||
|
||||
```
|
||||
time git annex copy --fast --from web --to dandi-dandisets-dropbox
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```
|
||||
time git annex copy --auto --from web --to dandi-dandisets-dropbox
|
||||
```
|
||||
|
||||
but then in the cases where all files are already there according to
|
||||
|
||||
```
|
||||
dandi@drogon:/mnt/backup/dandi/dandisets/000003$ time git annex find --not --in dandi-dandisets-dropbox
|
||||
|
||||
real 0m0.562s
|
||||
user 0m0.051s
|
||||
sys 0m0.019s
|
||||
```
|
||||
|
||||
the `copy` still goes and checks every chunk of every file
|
||||
|
||||
```
|
||||
dandi@drogon:/mnt/backup/dandi/dandisets/000003$ time git annex copy --fast --from web --to dandi-dandisets-dropbox
|
||||
copy sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
^C
|
||||
|
||||
real 0m3.886s
|
||||
user 0m0.037s
|
||||
sys 0m0.032s
|
||||
|
||||
```
|
||||
|
||||
so to achieve what I need, I thought to explicitly specify the query:
|
||||
|
||||
```
|
||||
dandi@drogon:/mnt/backup/dandi/dandisets/000003$ time git annex copy --fast --not --in dandi-dandisets-dropbox --from web --to dandi-dandisets-dropbox
|
||||
|
||||
real 0m0.221s
|
||||
user 0m0.056s
|
||||
sys 0m0.018s
|
||||
```
|
||||
|
||||
but it doesn't works out correctly whenever there are some files to actually copy:
|
||||
|
||||
```
|
||||
dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex find --in web --not --in dandi-dandisets-dropbox | nl | tail -n 2
|
||||
40 sub-440889/sub-440889_ses-837360280_obj-raw_behavior+image+ophys.nwb
|
||||
41 sub-440889/sub-440889_ses-838633305_obj-raw_behavior+image+ophys.nwb
|
||||
dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex copy --fast --from web --to dandi-dandisets-dropbox --not --in dandi-dandisets-dropbox
|
||||
dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex copy --fast --from web --to dandi-dandisets-dropbox --in web --not --in dandi-dandisets-dropbox
|
||||
dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex copy --from web --to dandi-dandisets-dropbox --in web --not --in dandi-dandisets-dropbox
|
||||
```
|
||||
|
||||
so the only way now would be to pipe `find` output into `copy`?
|
||||
|
||||
note on edit: filed a dedicated [https://git-annex.branchable.com/bugs/copy_--from_--to_does_not_copy_if_present_locally/](https://git-annex.branchable.com/bugs/copy_--from_--to_does_not_copy_if_present_locally/)
|
||||
|
||||
NB `git annex find` has `-z` for input but not for output...
|
||||
|
||||
|
||||
refs to related reports/issues which were said to be addressed for `--fast` mode:
|
||||
|
||||
- [https://git-annex.branchable.com/forum/copy_--auto_copies_already_synced_files/](https://git-annex.branchable.com/forum/copy_--auto_copies_already_synced_files/)
|
||||
- [https://git-annex.branchable.com/forum/batch_check_on_remote_when_using_copy/](https://git-annex.branchable.com/forum/batch_check_on_remote_when_using_copy/)
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
|
||||
```
|
||||
10.20230321-1~ndall+1
|
||||
```
|
||||
|
||||
and then in conda with `10.20230626-g801c4b7`
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,13 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-11-17T20:57:19Z"
|
||||
content="""
|
||||
> but it doesn't works out correctly whenever there are some files to actually copy
|
||||
|
||||
I think that was due to the bug you linked, which is now fixed.
|
||||
|
||||
I've confirmed that `--fast` is not actually implemented for `git-annex
|
||||
copy --from --to`. Explicitly specifying `--not --in destremote` is a
|
||||
fine workaround. But I've gone ahead and implemented `--fast` for it too.
|
||||
"""]]
|
|
@ -1,7 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2023-11-17T21:16:35Z"
|
||||
content="""
|
||||
BTW `git-annex find --print0` is the output eqivilant of -z.
|
||||
"""]]
|
|
@ -1,55 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
```
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/27964c5b-6ccd-4e23-a67d-a535282bab34$ git annex whereis 0/0/0/13/2/12 | head
|
||||
whereis 0/0/0/13/2/12 (1 copy)
|
||||
00000000-0000-0000-0000-000000000001 -- web
|
||||
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/0/0
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/1/0
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/10/0
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/11/0
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/12/0
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/13/0
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/14/0
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/27964c5b-6ccd-4e23-a67d-a535282bab34$ time git annex copy --from web --to dandi-dandizarrs-dropbox 0/0/0/13/2/12
|
||||
copy 0/0/0/13/2/12 (from web...) (to dandi-dandizarrs-dropbox...) ok
|
||||
|
||||
real 0m0.366s
|
||||
user 0m0.104s
|
||||
sys 0m0.042s
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/27964c5b-6ccd-4e23-a67d-a535282bab34$ git annex whereis 0/0/0/13/2/12 | head
|
||||
whereis 0/0/0/13/2/12 (1 copy)
|
||||
00000000-0000-0000-0000-000000000001 -- web
|
||||
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/0/0
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/1/0
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/10/0
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/11/0
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/12/0
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/13/0
|
||||
web: https://api.dandiarchive.org/api/zarr/27964c5b-6ccd-4e23-a67d-a535282bab34.zarr/0/0/0/1/14/0
|
||||
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/27964c5b-6ccd-4e23-a67d-a535282bab34$ git annex list 0/0/0/13/2/12
|
||||
here
|
||||
|github
|
||||
||dandiapi
|
||||
|||web
|
||||
||||bittorrent
|
||||
|||||dandi-dandizarrs-dropbox (untrusted)
|
||||
||||||
|
||||
__XX__ 0/0/0/13/2/12
|
||||
```
|
||||
|
||||
I would expect `copy` to make a record locally that now the content is also on destination remote, so 2nd invocation of `copy --from ... --to ... --auto` does nothing.
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
```
|
||||
10.20230227-gb02b9cc Debian GNU/Linux
|
||||
```
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-03-13T18:34:36Z"
|
||||
content="""
|
||||
Aah, I see, this is when the content is present on the --to
|
||||
remote, but git-annex is not locally aware of that yet.
|
||||
|
||||
And `git-annex copy --to remote` does
|
||||
update location tracking in such a case, so --from --to should also.
|
||||
"""]]
|
|
@ -1,93 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
originally reported while composing [https://git-annex.branchable.com/bugs/copy_--fast_--from_--to_checks_destination_files/](https://git-annex.branchable.com/bugs/copy_--fast_--from_--to_checks_destination_files/) but it is a separate issue: some files are simply not `annex copy`'ed at all: here it tries 6 out of 8 files and still reports that 2 are not on the target remote:
|
||||
|
||||
```
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex copy --from web --to dandi-dandisets-dropbox --fast
|
||||
copy sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210818T112556_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 696.194 MBytes (730012683 Bytes)
|
||||
(from web...) (to dandi-dandisets-dropbox...) ok
|
||||
copy sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210830T100716_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 224.618 MBytes (235528804 Bytes)
|
||||
(from web...) (to dandi-dandisets-dropbox...) ok
|
||||
copy sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210920T120959_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 295.387 MBytes (309735634 Bytes)
|
||||
(from web...) (to dandi-dandisets-dropbox...) ok
|
||||
copy sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210920T181347_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 860.168 MBytes (901951882 Bytes)
|
||||
(from web...) (to dandi-dandisets-dropbox...) ok
|
||||
copy sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave/sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave_ses-20211124T174401_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 856.342 MBytes (897939760 Bytes)
|
||||
(from web...) (to dandi-dandisets-dropbox...) ok
|
||||
copy sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20220525T092829_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
|
||||
Total objects: 1 Total size: 948.656 MBytes (994737479 Bytes)
|
||||
(from web...) (to dandi-dandisets-dropbox...) ok
|
||||
|
||||
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex find --in web --not --in dandi-dandisets-dropbox | nl
|
||||
1 sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210818T173531_behavior+ophys.nwb
|
||||
2 sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb
|
||||
```
|
||||
|
||||
and it seems to boil down (at least in one case, don't know yet if generalizes to other cases I have) to having those keys present locally:
|
||||
|
||||
|
||||
```
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex find --in web --not --in dandi-dandisets-dropbox | xargs ls -lL
|
||||
-r--r--r-- 1 dandi dandi 3878847966 Mar 16 2023 sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210818T173531_behavior+ophys.nwb
|
||||
-r--r--r-- 1 dandi dandi 3665589468 Mar 16 2023 sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb
|
||||
```
|
||||
|
||||
but somehow it doesn't know that it has them according to `list`:
|
||||
|
||||
```
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex list
|
||||
here
|
||||
|github
|
||||
||dandiapi
|
||||
|||web
|
||||
||||bittorrent
|
||||
|||||dandi-dandisets-dropbox (untrusted)
|
||||
||||||
|
||||
__XX_x sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210818T112556_behavior+ophys.nwb
|
||||
__XX__ sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210818T173531_behavior+ophys.nwb
|
||||
__XX_x sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210830T100716_behavior+ophys.nwb
|
||||
__XX_x sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210920T120959_behavior+ophys.nwb
|
||||
__XX_x sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210920T181347_behavior+ophys.nwb
|
||||
__XX__ sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb
|
||||
__XX_x sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave/sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave_ses-20211124T174401_behavior+ophys.nwb
|
||||
__XX_x sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20220525T092829_behavior+ophys.nwb
|
||||
|
||||
```
|
||||
|
||||
running without `--from web` starts the transfer:
|
||||
|
||||
```
|
||||
git annex copy --fast --to dandi-dandisets-dropbox
|
||||
```
|
||||
|
||||
IMHO it should perform copy from the local store into the remote since in effect it would be fulfilling the goal - adding a copy to the destination.
|
||||
I didn't check `move` command but if it does support similar `--from --to` and has similar defect -- should just compliment with dropping after from the original remote.
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
10.20230626-g801c4b7 from conda-forge .
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,57 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-11-17T19:58:39Z"
|
||||
content="""
|
||||
> -r--r--r-- 1 dandi dandi 3665589468 Mar 16 2023 sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb
|
||||
|
||||
This could be an unlocked file that has gotten modified but the staged
|
||||
version is not actually present locally. Or if `git-annex fsck` on it says
|
||||
its fixing the location logs, that would tell us something happened that
|
||||
got the location tracking out of sync with reality.
|
||||
|
||||
So possibly there's an issue that could be tracked down regarding the state
|
||||
of that file. But in either case, git-annex doesn't know it has a local
|
||||
copy of the file, so `copy --from --to` could not use it.
|
||||
|
||||
----
|
||||
|
||||
But: `copy --from --to` does in fact have an interesting bug:
|
||||
|
||||
joey@darkstar:~/tmp/bench/r2>git-annex whereis foo
|
||||
whereis foo (2 copies)
|
||||
22dfa446-7482-4c0a-92c9-70db793859fb -- joey@darkstar:~/tmp/bench/r [origin]
|
||||
8a504049-2c22-4baa-9a16-218e9561608b -- joey@darkstar:~/tmp/bench/r2 [here]
|
||||
ok
|
||||
joey@darkstar:~/tmp/bench/r2>git-annex copy foo --from origin --to r3
|
||||
joey@darkstar:~/tmp/bench/r2>
|
||||
|
||||
So the file content being present locally prevents it sending it to the remote! This needs to get fixed.
|
||||
|
||||
Hmm: In the corresponding case of `git-annex move --from --to`, it does not
|
||||
behave that way.
|
||||
|
||||
----
|
||||
|
||||
As far as what the behavior ought to be when a file is present locally but not on the --from remote,
|
||||
the documentation does say:
|
||||
|
||||
--from=remote
|
||||
|
||||
Copy the content of files from the specified remote to the local repository.
|
||||
|
||||
Any files that are not available on the remote will be silently skipped.
|
||||
|
||||
So it is behaving as documented. I can think of two reasons why that
|
||||
documented behavior makes some sense:
|
||||
|
||||
* The user may be intending to only copy files --to that are present in --from.
|
||||
The local repo may have a lot of files they do not want to populate --to.
|
||||
(For example, perhaps the goal is to make a replica of the --from
|
||||
repository.)
|
||||
With that said, the user could do `git-annex copy --from foo --to bar --in foo`
|
||||
to explicitly only act on files that are present in it.
|
||||
* Performance. Needing to check if there is a local copy when there is no
|
||||
remote copy would be a little extra work. Likely not enough to be
|
||||
significant though.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2023-11-17T20:27:37Z"
|
||||
content="""
|
||||
> So the file content being present locally prevents it sending it to the remote!
|
||||
|
||||
Fixed that.
|
||||
"""]]
|
|
@ -1,27 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2023-11-17T20:33:01Z"
|
||||
content="""
|
||||
That bug I fixed would also explain the behavior that you saw if the
|
||||
content *was* present locally, and the location log *was* out of date about
|
||||
that.
|
||||
|
||||
In that situation, git-annex sees that the object file is present, and so
|
||||
treats the content as present, despite the location log not knowing it's
|
||||
present. Which triggers the situation of the bug I fixed, causing it to
|
||||
skip copying the file.
|
||||
|
||||
Also, there's a pretty easy way to get into this situation. When the file
|
||||
is not present, run `git-annex --from --to`. Then interrupt it after it's
|
||||
downloaded the file --from but before it's finished sending it --to.
|
||||
This results in the file being present locally, but only transiently so it
|
||||
didn't update the location log.
|
||||
|
||||
So my guess is you interrupted a copy like that (or it failed incomplete
|
||||
for whatever reason).
|
||||
|
||||
Now that I've fixed that bug, the behavior in that situation is that it
|
||||
does copy the file to the remote. And then it drops the local copy since
|
||||
the location log doesn't contain it. So it resumes correctly now.
|
||||
"""]]
|
|
@ -1,19 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2023-11-17T20:42:24Z"
|
||||
content="""
|
||||
So that leaves only the question of what it should do when
|
||||
content is present locally but not on the --from remote.
|
||||
|
||||
Another reason for the current behavior is to be symmetric with `git-annex
|
||||
move --from foo --to bar`. It would be surprising, I think, if that
|
||||
populated bar with files that are not present in foo, but are in the local
|
||||
repository!
|
||||
|
||||
So I'm inclined to not change the documented behavior. If you want to
|
||||
populate a remote with files that are either in the local repo or in a
|
||||
--from remote, you can just run `git-annex copy` twice after all.
|
||||
|
||||
(Or there could be a new option like `git-annex copy --to bar --from foo --or-from-here`)
|
||||
"""]]
|
|
@ -1,12 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 5"
|
||||
date="2023-11-18T01:35:35Z"
|
||||
content="""
|
||||
> (Or there could be a new option like git-annex copy --to bar --from foo --or-from-here)
|
||||
|
||||
or may be
|
||||
|
||||
`git-annex copy --to bar --from remote1 --or-from remote2 ...` or alike so there could be a sequence (in order of preference) of remotes? or better a general `git-annex copy --to bar --from-anywhere` so that `annex` first `get`'s it following current set costs etc if not present here, and then copies over.
|
||||
"""]]
|
|
@ -1,16 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 6"""
|
||||
date="2023-11-30T18:26:30Z"
|
||||
content="""
|
||||
I like the idea of `copy --from-anywhere --to=remote` and just
|
||||
use the lowest cost remote (when not in local repo). Like `git-annex get`
|
||||
and `git-annex copy --to=here`.
|
||||
|
||||
Hmm, if there's a remote that is too expensive to want to use in such a
|
||||
copy, it would be possible to use `-c remote.foo.annex-ignore=true`
|
||||
to make it avoid using that remote. As can also be done in the case of
|
||||
`git-annex get`, although that was not documented well.
|
||||
|
||||
I've implemented --from-anywhere..
|
||||
"""]]
|
|
@ -1,45 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
```
|
||||
❯ ( source ~/git-annexes/10.20230407+git201-g5df89d58c7.env; git annex version | head -n 1; git annex findkeys --in here | git annex dropkey --force --batch -z ; )
|
||||
git-annex version: 10.20230407+git201-g5df89d58c7-1~ndall+1
|
||||
git-annex: <stdout>: commitBuffer: resource vanished (Broken pipe)
|
||||
dropkey MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz
|
||||
ok
|
||||
❯ ls -ld .git/annex/objects/**/*gz/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz
|
||||
-r-------- 1 yoh yoh 5663237 May 19 09:50 .git/annex/objects/V7/Pj/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz
|
||||
❯ ( source ~/git-annexes/10.20230407+git201-g5df89d58c7.env; git annex version | head -n 1; git annex findkeys --in here | git annex dropkey --force --batch ; )
|
||||
git-annex version: 10.20230407+git201-g5df89d58c7-1~ndall+1
|
||||
git-annex: <stdout>: commitBuffer: resource vanished (Broken pipe)
|
||||
dropkey MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz ok
|
||||
❯ ls -ld .git/annex/objects/**/*gz/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz
|
||||
ls: cannot access '.git/annex/objects/**/*gz/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz': No such file or directory
|
||||
|
||||
```
|
||||
|
||||
and also was reported on 10.20230407 to not return anything causing us to stall: [https://github.com/datalad/datalad/issues/7315#issuecomment-1554348911](https://github.com/datalad/datalad/issues/7315#issuecomment-1554348911).
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
|
||||
### Please provide any additional information below.
|
||||
|
||||
[[!format sh """
|
||||
# If you can, paste a complete transcript of the problem occurring here.
|
||||
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
|
||||
|
||||
|
||||
# End of transcript or log.
|
||||
"""]]
|
||||
|
||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
|
||||
> [[closing|done]] per my comments --[[Joey]]
|
|
@ -1,13 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-05-19T17:39:29Z"
|
||||
content="""
|
||||
You are piping non-null-terminated output into a command that needs
|
||||
terminating nulls. So, it reads the entire findkeys output, including
|
||||
newlines as the name of a key. And drops that key, which doesn't exist of
|
||||
course.
|
||||
|
||||
With `findkeys --print0`, it does work. It would also be fine to not use
|
||||
`-z`, since keys should never actually contain a newline in their name.
|
||||
"""]]
|
|
@ -1,20 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2023-05-19T17:53:51Z"
|
||||
content="""
|
||||
However, after successfully dropping all the keys with `--print0`, there
|
||||
is then this oddity:
|
||||
|
||||
git-annex: Batch input parse failure: bad key
|
||||
|
||||
That's a bug in nul splitting when there's a trailing nul. Oops. I've
|
||||
fixed that.
|
||||
|
||||
Also while I reproduced the rest of the behavior, I didn't see this part:
|
||||
|
||||
commitBuffer: resource vanished
|
||||
|
||||
I'm not sure which command that comes from. Probably I think the findkeys,
|
||||
if its entire output was not consumed for some reason.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 3"
|
||||
date="2023-05-19T18:47:49Z"
|
||||
content="""
|
||||
It makes total sense, thank you Joey! I guess a little odd behavior is only the reporting of git annex `ok` for dropping an unknown key. I guess like with `rm unknownfile` (unless `-f` is used) I would have expected it to error out.
|
||||
"""]]
|
|
@ -1,15 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 4"
|
||||
date="2023-05-19T18:49:29Z"
|
||||
content="""
|
||||
re vanished -- it is from `annex version` whenever its output is not fully written out due to use of `head`:
|
||||
|
||||
```
|
||||
❯ ( source ~/git-annexes/10.20230407+git201-g5df89d58c7.env; git annex version | head -n 1)
|
||||
git-annex version: 10.20230407+git201-g5df89d58c7-1~ndall+1
|
||||
git-annex: <stdout>: commitBuffer: resource vanished (Broken pipe)
|
||||
|
||||
```
|
||||
"""]]
|
|
@ -1,15 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 5"
|
||||
date="2023-05-19T18:49:48Z"
|
||||
content="""
|
||||
re vanished -- it is from `annex version` whenever its output is not fully written out due to use of `head`:
|
||||
|
||||
```
|
||||
❯ ( source ~/git-annexes/10.20230407+git201-g5df89d58c7.env; git annex version | head -n 1)
|
||||
git-annex version: 10.20230407+git201-g5df89d58c7-1~ndall+1
|
||||
git-annex: <stdout>: commitBuffer: resource vanished (Broken pipe)
|
||||
|
||||
```
|
||||
"""]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 6"""
|
||||
date="2023-05-19T18:54:53Z"
|
||||
content="""
|
||||
Aha, thanks for clearing up that `git-annex version` does that! That seems
|
||||
like a bit of a bug on its own really.
|
||||
|
||||
|
||||
.. Fixed that.
|
||||
"""]]
|
|
@ -1,18 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 7"""
|
||||
date="2023-05-19T18:55:35Z"
|
||||
content="""
|
||||
The reason dropkeys does not error on an unknown key is that it's entirely
|
||||
possible to get a repository into a state where a key's content is present
|
||||
but the key is otherwise unknown to git-annex. Eg, it doesn't have any
|
||||
location tracking information for it, there are no files in the git repo
|
||||
that point to it, etc.
|
||||
|
||||
It makes sense to support dropping the content of such a key.
|
||||
|
||||
And, dropkeys intentionally operates the same on a key when its content is
|
||||
not present as it does when the content is present and it successfully
|
||||
dropped it. Because in either case the result is now that the specified
|
||||
key's content is not present.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 8"
|
||||
date="2023-05-19T19:22:04Z"
|
||||
content="""
|
||||
Gotcha. Just a food for possible discussion/future: I think it is more then of \"annotation\" of the action outcome to be not just a binary \"ok/fail\". Indeed `dropkey` can say \"ok\" as to the promise that in the end there is no key (either it was known or not etc). But it can arrive there differently. Similarish for \"fail\". In DataLad we have now 4 \"status\" states: \"ok\", \"notneeded\", \"impossible\", \"error\" for that reason where first two are for \"ok\" and the other two for \"fail\". [documented here](https://github.com/datalad/datalad/blob/HEAD/docs/source/design/result_records.rst#status].
|
||||
So, here `dropkey unknown` was more of \"notneeded\" success I guess if it was for datalad to report. May be `--json` records and non-json output of `git-annex` in the future could somehow discriminate between those outcomes.
|
||||
"""]]
|
|
@ -1,13 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 9"""
|
||||
date="2023-05-23T15:42:39Z"
|
||||
content="""
|
||||
Many commands do reflect "notneeded" by not displaying any output.
|
||||
|
||||
(I suppose that could even be a problem with --json --batch, since
|
||||
a command like drop will not output anything when it has nothing to do.)
|
||||
|
||||
In the case of dropkey, it could have skipped displaying anything for keys
|
||||
that don't exist, but changing that now doesn't seem wise.
|
||||
"""]]
|
|
@ -1,46 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
```shell
|
||||
$> git annex version
|
||||
git-annex version: 10.20230828+git6-g86c70833a1-1~ndall+1
|
||||
...
|
||||
|
||||
$> git annex enableremote typhon
|
||||
enableremote (normal) typhon
|
||||
Unable to parse git config from typhon
|
||||
|
||||
Remote typhon does not have git-annex installed; setting annex-ignore
|
||||
|
||||
This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote typhon
|
||||
failed
|
||||
enableremote: 1 failed
|
||||
|
||||
```
|
||||
|
||||
here git-annex hints on git-annex not being installed (totally not true), or inability to parse config (in effect it is true but not for the reason of config being wrong etc).
|
||||
|
||||
It is all because that folder on the ssh remote belongs to someone else and if I run shell command manually then I see the hint from `git` itself:
|
||||
|
||||
```
|
||||
$> ssh typhon git-annex-shell configlist /mnt/DATA/data/studies/bep302/gin_BEP032-examples --debug
|
||||
[2023-08-31 11:57:26.338523978] (Utility.Process) process [3594411] read: git ["--git-dir=/mnt/DATA/data/studies/bep302/gin_BEP032-examples/.git","--work-tree=/mnt/DATA/data/studies/bep302/gin_BEP032-examples","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
|
||||
[2023-08-31 11:57:26.339808748] (Utility.Process) process [3594411] done ExitSuccess
|
||||
[2023-08-31 11:57:26.340366568] (Utility.Process) process [3594412] read: git ["config","--local","--list"]
|
||||
[2023-08-31 11:57:26.342570264] (Utility.Process) process [3594412] done ExitFailure 128
|
||||
[2023-08-31 11:57:26.342620672] (Git.Config) config output: fatal: --local can only be used inside a git repository
|
||||
|
||||
git-annex-shell: Git refuses to operate in this repository,
|
||||
probably because it is owned by someone else.
|
||||
|
||||
To add an exception for this directory, call:
|
||||
git config --global --add safe.directory /mnt/DATA/data/studies/bep302/gin_BEP032-examples
|
||||
```
|
||||
|
||||
so, ideally `git annex enableremote` should provide a similar diagnostic output instead of incorrect reasons stated.
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
```
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,24 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-09-07T17:01:41Z"
|
||||
content="""
|
||||
I wonder if it even makes sense for git-annex-shell to replicate this git
|
||||
security check, or would it be better for it to instruct git to trust the
|
||||
repository, so it can be used on it?
|
||||
|
||||
git's CVE-2022-24765 involves a malicious creation of a .git repository
|
||||
above the victim's cwd, with a .git/config that causes things like eg shell
|
||||
prompts that run git to execute attacker-controlled commands.
|
||||
|
||||
git-annex-shell commands all take the directory that the repository is
|
||||
in, and uses that repository. So it doesn't traverse above looking for
|
||||
other .git directories.
|
||||
|
||||
And, `git clone` will happily clone a remote repsository that's owned
|
||||
by another user, including over ssh. And pull and push etc work with such a
|
||||
remote. So git-annex-shell should too.
|
||||
|
||||
(For that matter, other git-annex-shell commands do work, it's only the
|
||||
command that reads the git config that fails to work.)
|
||||
"""]]
|
|
@ -1,13 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2023-09-07T18:21:30Z"
|
||||
content="""
|
||||
Closely related, when a local repo is owned by someone else, cloning it and
|
||||
using it as a git-annex remote also fails, at the same config listing
|
||||
stage.
|
||||
|
||||
I think the same reasoning applies to that, the path to the repo is
|
||||
explicitly specified in the remote url, so it should treat it as a safe
|
||||
repo for the purposes of listing its config.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2023-09-07T18:32:57Z"
|
||||
content="""
|
||||
Basically the same fix works for both the ssh remote and the local
|
||||
remote cases.
|
||||
"""]]
|
|
@ -1,18 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2023-09-07T18:36:37Z"
|
||||
content="""
|
||||
Another related case is when git has been configured with
|
||||
safe.bareRepository=explicit and the remote (either ssh or local)
|
||||
is a bare repo. git-annex-shell will fail with the same misleading message,
|
||||
and for a local repo, git-annex will also display the same misleading
|
||||
message.
|
||||
|
||||
I think it also ought to override safe.bareRepository for such remotes,
|
||||
because eg git pull works with such remotes. The point of
|
||||
safe.bareRepository=explicit is not to prevent using bare remotes, but to
|
||||
prevent things like shell prompts to accidentially use bare repos that are
|
||||
eg, committed by a malicious attacker to a git repository, to avoid using
|
||||
git configs that allow running arbitrary code.
|
||||
"""]]
|
|
@ -1,33 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
See e.g. on [https://github.com/datalad/git-annex/actions/runs/6680765679/job/18154374923](https://github.com/datalad/git-annex/actions/runs/6680765679/job/18154374923)
|
||||
|
||||
```
|
||||
Repo Tests v10 unlocked
|
||||
Init Tests
|
||||
init: OK (0.17s)
|
||||
add: OK (0.73s)
|
||||
addurl: OK (0.57s)
|
||||
crypto: FAIL (3.07s)
|
||||
./Test/Framework.hs:86:
|
||||
initremote failed with unexpected exit code (transcript follows)
|
||||
initremote foo (encryption setup) (to gpg keys: 129D6E0AC537B9C7)
|
||||
git-annex: .git/annex/othertmp/remote.log: hPut: invalid argument (invalid character)
|
||||
failed
|
||||
(recording state in git...)
|
||||
initremote: 1 failed
|
||||
```
|
||||
|
||||
started only recently but consistently:
|
||||
|
||||
```
|
||||
(git)smaug:/mnt/datasets/datalad/ci/git-annex/builds/2023/10[master]git
|
||||
$> git grep -l 'hPut: invalid argument'
|
||||
cron-20231027/build-ubuntu.yaml-1289-1c03c8fd-failed/0_test-annex (normal, ubuntu-latest).txt
|
||||
...
|
||||
```
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/repronim]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,20 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-11-01T16:07:15Z"
|
||||
content="""
|
||||
Reproduced with LANG=C:
|
||||
|
||||
./Test/Framework.hs:86:
|
||||
initremote failed with unexpected exit code (transcript follows)
|
||||
initremote foo (encryption setup) (to gpg keys: 129D6E0AC537B9C7)
|
||||
git-annex: .git/annex/othertmp/remote.log: withFile: invalid argument (cannot encode character '\132')
|
||||
failed
|
||||
(recording state in git...)
|
||||
initremote: 1 failed
|
||||
|
||||
Not quite the same error but almost certianly the same problem.
|
||||
|
||||
I've confirmed this is caused by
|
||||
[[!commit 3742263c99180d1391e4fd51724aae52d6d02137]]
|
||||
"""]]
|
|
@ -1,25 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2023-11-01T16:53:48Z"
|
||||
content="""
|
||||
Will probably need to revert the Remote/Helper/Encryptable.hs part of that
|
||||
commit.
|
||||
|
||||
What is happening here is, encodeBS is failing when run on the String from
|
||||
a SharedPubKeyCipher. That String comes from Utility.Gpg.genRandom and is
|
||||
literally a bunch of random bytes. So it's not encoded with the filesystem
|
||||
encoding. And it really ought to be a ByteString of course, but since it's
|
||||
not, anything involving encoding it fails.
|
||||
|
||||
That's why the old code had this comment:
|
||||
|
||||
{- Not using Utility.Base64 because these "Strings" are really
|
||||
- bags of bytes and that would convert to unicode and not round-trip
|
||||
- cleanly. -}
|
||||
|
||||
And converted that String to a ByteString via `B.pack . s2w8`, which avoids this problem.
|
||||
|
||||
What an ugly thing. Really ought to be fixed to use ByteString throughout.
|
||||
But for now, let's revert.
|
||||
"""]]
|
|
@ -1,75 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
I have been running
|
||||
|
||||
`git annex --debug import --from s3-dandiarchive master`
|
||||
|
||||
from an S3 bucket which is versioned but I did not enable versioning for this "import" case (due to [git-annex unable to sense versioning read-only](https://git-annex.branchable.com/bugs/importtree_with_versioning__61__yes__58___check_first/)) and expected it to "quickly" import tree (with about 7k files) from S3. Note that some of the keys have **many** older revisions for one reason or another.
|
||||
|
||||
But currently that process, started hours ago yesterday IIRC, is
|
||||
|
||||
```
|
||||
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
|
||||
3912831 dandi 20 0 1024.1g 51.7g 16000 S 100.0 82.4 19,48 git-annex
|
||||
```
|
||||
|
||||
CPU heavy and very slow (now, started faster flipping through pages) on actually "importing" while listing a page every 30 seconds or so
|
||||
|
||||
```
|
||||
[2024-11-12 14:59:23.587433059] (Remote.S3) Header: [("Date","Tue, 12 Nov 2024 19:59:23 GMT")]
|
||||
|
||||
[2024-11-12 14:59:58.073945529] (Remote.S3) Response status: Status {statusCode = 200, statusMessage = "OK"}
|
||||
[2024-11-12 14:59:58.074057102] (Remote.S3) Response header 'x-amz-id-2': 'sxDUdIkuRLs3jjjTyIbFaI+cQqLCGpTXZNFcvykT2+F6OcqVRM2IMn6P1YquVrdH3fXmV9nRnTDs9EtOtctV05GptcIaBaF2'
|
||||
[2024-11-12 14:59:58.07410232] (Remote.S3) Response header 'x-amz-request-id': 'Y35X1Z41GMF9PHY8'
|
||||
[2024-11-12 14:59:58.074135941] (Remote.S3) Response header 'Date': 'Tue, 12 Nov 2024 19:59:24 GMT'
|
||||
[2024-11-12 14:59:58.074167094] (Remote.S3) Response header 'x-amz-bucket-region': 'us-east-2'
|
||||
[2024-11-12 14:59:58.074197609] (Remote.S3) Response header 'Content-Type': 'application/xml'
|
||||
[2024-11-12 14:59:58.074228873] (Remote.S3) Response header 'Transfer-Encoding': 'chunked'
|
||||
[2024-11-12 14:59:58.074259342] (Remote.S3) Response header 'Server': 'AmazonS3'
|
||||
[2024-11-12 14:59:58.171273277] (Remote.S3) String to sign: "GET\n\n\nTue, 12 Nov 2024 19:59:58 GMT\n/dandiarchive/"
|
||||
[2024-11-12 14:59:58.171355688] (Remote.S3) Host: "dandiarchive.s3.amazonaws.com"
|
||||
[2024-11-12 14:59:58.17139206] (Remote.S3) Path: "/"
|
||||
[2024-11-12 14:59:58.17142278] (Remote.S3) Query string: "prefix=dandisets%2F"
|
||||
[2024-11-12 14:59:58.171463294] (Remote.S3) Header: [("Date","Tue, 12 Nov 2024 19:59:58 GMT")]
|
||||
|
||||
```
|
||||
|
||||
and not sure how many pages it got so far.
|
||||
|
||||
I suspect (can't tell from above) that it is using API to list all versions of keys, not just current version, even though I have not asked for versioned support.
|
||||
|
||||
Note: bucket is too heavy (about 300 million keys IIRC) to list all of it for all the versions. I do not have information ready on how many versions of keys in the `dandisets/` prefix - could be some hundreds of thousands, but I would still expect/hope it to complete by now. Nothing seems to be done on filesystem or to git store yet (du says it is 280k total size) -- git-annex is just being fed information from S3.
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
- add s3 importtree special remote matching
|
||||
|
||||
```
|
||||
bucket=dandiarchive datacenter=US encryption=none fileprefix=dandisets/ host=s3.amazonaws.com importtree=yes name=s3-dandiarchive port=80 publicurl=https://dandiarchive.s3.amazonaws.com/ signature=anonymous storageclass=STANDARD type=S3 timestamp=1731015643s
|
||||
```
|
||||
|
||||
- run `annex import` from it
|
||||
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
invocation of `static-git-annex-10.20241031` (build by kyleam https://git.kyleam.com/static-annex/ ... but I think I tried a different one before):
|
||||
|
||||
```shell
|
||||
(dandisets-2) dandi@drogon:/mnt/backup/dandi/dandiset-manifests$ /home/dandi/git-annexes/static-git-annex-10.20241031/bin/git-annex version
|
||||
git-annex version: 10.20241031
|
||||
build flags: Pairing DBus DesktopNotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV
|
||||
dependency versions: aws-0.24.2 bloomfilter-2.0.1.2 crypton-1.0.1 DAV-1.3.4 feed-1.3.2.1 ghc-9.8.3 http-client-0.7.17 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.16
|
||||
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X*
|
||||
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external
|
||||
operating system: linux x86_64
|
||||
supported repository versions: 8 9 10
|
||||
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
|
||||
local repository version: 10
|
||||
```
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> Calling this [[done]] although memory use improvements still seem
|
||||
> possible.. --[[Joey]]
|
|
@ -1,29 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 1"
|
||||
date="2024-11-13T18:08:59Z"
|
||||
content="""
|
||||
At the end (after over a day of torturing that poor bucket, whenever it took just few minutes for `s3cmd sync` to get everything including content) it crashed with
|
||||
|
||||
```
|
||||
[2024-11-12 22:58:00.366878941] (Remote.S3) Response status: Status {statusCode = 200, statusMessage = \"OK\"}
|
||||
[2024-11-12 22:58:00.373456754] (Remote.S3) Response header 'x-amz-id-2': 'DGXJztoRJRuHQrcOqs3FtnEUJomRz+53jawFoKoRbKQATcvAppqJcfcAVfR1d8cu7uepkEDvSXo='
|
||||
[2024-11-12 22:58:00.384304583] (Remote.S3) Response header 'x-amz-request-id': 'W1PSPV7ZSBKJ7HTT'
|
||||
[2024-11-12 22:58:00.38437407] (Remote.S3) Response header 'Date': 'Wed, 13 Nov 2024 03:50:18 GMT'
|
||||
[2024-11-12 22:58:00.384436037] (Remote.S3) Response header 'x-amz-bucket-region': 'us-east-2'
|
||||
[2024-11-12 22:58:00.384486611] (Remote.S3) Response header 'Content-Type': 'application/xml'
|
||||
[2024-11-12 22:58:00.384533794] (Remote.S3) Response header 'Transfer-Encoding': 'chunked'
|
||||
[2024-11-12 22:58:00.384581117] (Remote.S3) Response header 'Server': 'AmazonS3'
|
||||
|
||||
git-annex: Unable to list contents of s3-dandiarchive: Network.Socket.recvBuf: resource vanished (Connection reset by peer)
|
||||
failed
|
||||
[2024-11-12 22:58:00.565431711] (Utility.Process) process [3912839] done ExitSuccess
|
||||
import: 1 failed
|
||||
|
||||
```
|
||||
|
||||
attesting that it is doing something unnecessary -- either listing full bucket (unlikely) or listing all versions of keys under the prefix (e.g. using [ListObjectVersions](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectVersions.html) instead of [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html)).
|
||||
|
||||
It would have been useful if logs included the API call involved here.
|
||||
"""]]
|
|
@ -1,26 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2024-11-14T18:23:54Z"
|
||||
content="""
|
||||
No, it does not request versions from S3 when versioning is not enabled.
|
||||
|
||||
This feels fairly similar to
|
||||
[[git-annex-import_stalls_and_uses_all_ram_available]].
|
||||
But I don't think it's really the same, that one used versioning, and relied
|
||||
on preferred content to filter the wanted files.
|
||||
|
||||
Is the size of the whole bucket under the fileprefix, in your case, large
|
||||
enough that storing a list of all the files (without the versions) could
|
||||
logically take as much memory as you're seeing? At one point you said it
|
||||
was 7k files, but later hundreds of thousands, so I'm confused about how
|
||||
big it is.
|
||||
|
||||
Is this bucket supposed to be public? I am having difficulty finding an
|
||||
initremote command that works.
|
||||
|
||||
It also seems quite possible, looking at the code, that it's keeping all
|
||||
the responses from S3 in memory until it gets done with listing all the
|
||||
files, which would further increase memory use.
|
||||
I don't see any `O(N^2)` operations though.
|
||||
"""]]
|
|
@ -1,13 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2024-11-14T18:50:37Z"
|
||||
content="""
|
||||
This is the initremote for it:
|
||||
|
||||
git-annex initremote dandiarchive type=S3 encryption=none fileprefix=dandisets/ bucket=dandiarchive publicurl=https://dandiarchive.s3.amazonaws.com/ signature=anonymous host=s3.amazonaws.com datacenter=US importtree=yes
|
||||
|
||||
It started at 1 API call per second, but it slowed down as memory rapidly
|
||||
went up. 3 gb in a few minutes, so I think there is definitely a memory
|
||||
leak involved.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2024-11-14T19:05:48Z"
|
||||
content="""
|
||||
I suspect one way the CLI tool is faster, aside from not leaking memory,
|
||||
is that there is a max-key max-keys parameter that git-annex is not using.
|
||||
Less pagination would speed it up.
|
||||
"""]]
|
|
@ -1,18 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2024-11-14T19:21:33Z"
|
||||
content="""
|
||||
Apparently gbrNextMarker is Nothing despite the response being truncted. So
|
||||
git-annex is looping forever, getting the same first page each time, and
|
||||
storing it all in a list.
|
||||
|
||||
I think this is a bug in the aws library, or I'm using it wrong.
|
||||
It looks for a NextMarker in the response XML, but accoccording to
|
||||
<https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html>
|
||||
|
||||
> This element is returned only if you have the delimiter request parameter
|
||||
> specified. If the response does not include the NextMarker element and it is
|
||||
> truncated, you can use the value of the last Key element in the response as the
|
||||
> marker parameter in the subsequent request to get the next set of object keys.
|
||||
"""]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 6"""
|
||||
date="2024-11-14T20:14:29Z"
|
||||
content="""
|
||||
Fixed in [[!commit 4b87669ae229c89eadb4ff88eba927e105c003c4]]. Now it runs
|
||||
in seconds.
|
||||
|
||||
Note that this bug does not seem to affect S3 remotes that have versioning
|
||||
enabled.
|
||||
"""]]
|
|
@ -1,16 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 7"""
|
||||
date="2024-11-15T17:16:51Z"
|
||||
content="""
|
||||
Trying the same command but with versioning=yes, I have verified that
|
||||
|
||||
* it does not have the same loop forever behavior
|
||||
* it does use a lot of memory quite quickly
|
||||
|
||||
Going back to the unversioned command, I was able to reduce the memory use
|
||||
by 20% by processing each result, rather than building up a list of results
|
||||
and processing at the end. It will be harder to do that in the versioning
|
||||
case, but I expect it will improve it at least that much, and probably
|
||||
more, since it will be able to GC all the delete markers.
|
||||
"""]]
|
|
@ -1,26 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 8"""
|
||||
date="2024-11-15T17:48:08Z"
|
||||
content="""
|
||||
Did same memory optimisation for the versioned case, and the results are
|
||||
striking! Running the command until it had made 45 API requests, it was
|
||||
using 592788 kb of memory. Now it uses only 110968 kb.
|
||||
|
||||
Of that, about 78900 kb are used at startup, so it grew 29836 kb.
|
||||
At that point, it has gathered 23537 changes. So about 1 kb is used per
|
||||
change. That seems a bit more memory than really should be needed,
|
||||
each change takes about 75 bytes of data, eg:
|
||||
|
||||
"y3RixvrmLvr1oWJ7meEa4vWK6B.C.aad",3340,"dandisets/000003/draft/dandiset.jsonld",2021-09-28 02:12:39 UTC
|
||||
|
||||
I did try some further memory optimisation, making it avoid storing the
|
||||
same filename repeatedly in memory when gathering versioned changes. Which
|
||||
oddly didn't save any memory.
|
||||
|
||||
Memory profiling might let this be improved further, but needing 1 gb of
|
||||
memory to import a million changes to files doesn't seem too bad.
|
||||
|
||||
Update: Did some memory profiling, nothing stuck out as badly wrong.
|
||||
Lists and tuples are using as much memory as anything.
|
||||
"""]]
|
|
@ -1,34 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
I wanted to use S3 special remote to "crawl" S3 bucket in `importtree=yes` mode. Bucket (dandiarchive) supports versioning, so it would be great to enable versioning here as well so URLs would use versionId. But unfortunately adding `versioning=yes` makes `git-annex` to try to establish versioning on the bucket (even if it is already enabled).
|
||||
|
||||
command to try with (should work for anyone since public bucket):
|
||||
|
||||
```
|
||||
git annex --debug initremote s3-dandiarchive bucket=dandiarchive type=S3 encryption=none importtree=yes publicurl=https://dandiarchive.s3.amazonaws.com/ fileprefix=dandisets/000027/ signature=anonymous versioning=yes
|
||||
```
|
||||
|
||||
to see that annex (I use 10.20240927) would try to enable versioning:
|
||||
|
||||
```
|
||||
(enabling bucket versioning...) [2024-11-07 16:30:37.830416324] (Remote.S3) String to sign: "PUT\n\n\nThu, 07 Nov 2024 21:30:37 GMT\n/dandiarchive/?versioning"
|
||||
[2024-11-07 16:30:37.830449238] (Remote.S3) Host: "dandiarchive.s3.amazonaws.com"
|
||||
[2024-11-07 16:30:37.830459034] (Remote.S3) Path: "/"
|
||||
[2024-11-07 16:30:37.830470676] (Remote.S3) Query string: "versioning"
|
||||
[2024-11-07 16:30:37.830480666] (Remote.S3) Header: [("Date","Thu, 07 Nov 2024 21:30:37 GMT")]
|
||||
[2024-11-07 16:30:37.830498329] (Remote.S3) Body: "<?xml version=\"1.0\" encoding=\"UTF-8\"?><VersioningConfiguration xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"><Status>Enabled</Status></VersioningConfiguration>"
|
||||
[2024-11-07 16:30:37.879924822] (Remote.S3) Response status: Status {statusCode = 403, statusMessage = "Forbidden"}
|
||||
```
|
||||
|
||||
It seems to be easy to check if versioning enabled:
|
||||
|
||||
```
|
||||
❯ curl -s "https://dandiarchive.s3.amazonaws.com/?versioning"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Status>Enabled</Status></VersioningConfiguration>
|
||||
```
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,20 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2024-11-11T20:11:37Z"
|
||||
content="""
|
||||
Unfortunately <https://hackage.haskell.org/package/aws> does not implement
|
||||
the versioning check, so it will need to be added there. And it tends to take
|
||||
some time for new versions of the build dependency to reach everywhere.
|
||||
|
||||
<https://github.com/aristidb/aws/issues/290>
|
||||
|
||||
I do think that is the only safe way to go though. I considered making
|
||||
git-annex assume that a bucket where versioning cannot be set is read-only.
|
||||
If git-annex is really never going to write to a bucket, it's safe to
|
||||
assume versioning is enabled. But, unfortunately, ACLs can sometimes
|
||||
prevent changing configs like versioning, but still allow other write
|
||||
operations. Also, a S3 remote might be initialized without permission to
|
||||
write to an existing bucket, but later S3 creds be used that do allow
|
||||
writing.
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2024-11-12T17:35:16Z"
|
||||
content="""
|
||||
Made a pull request to aws <https://github.com/aristidb/aws/pull/292>
|
||||
|
||||
(As sometimes S3 maintainer of aws, I'll probably accept it if nobody
|
||||
objects to it.)
|
||||
"""]]
|
|
@ -1,23 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2024-11-12T17:35:50Z"
|
||||
content="""
|
||||
Wait though... We have signature=anonymous. So git-annex does in fact know
|
||||
that this special remote is read-only. git-annex will never try to write to
|
||||
it (even if the bucket somehow allowed anonymous writes) as long as it's
|
||||
configured with signature=anonymous.
|
||||
|
||||
So, it could just avoid trying to set versioning when signature=anonymous,
|
||||
and assume the bucket has versioning enabled.
|
||||
|
||||
Hmm, in lockContentS3, when versioning is enabled, it calls
|
||||
checkVersioning, which checks if a S3 version ID has been recorded for the
|
||||
file. What if the bucket did not actually have versioning enabled? Then an
|
||||
import from it would not record a S3 version ID. That would make this, and
|
||||
other places like checkKey that expect versioned buckets to have S3 version
|
||||
IDs fail in unexpected ways.
|
||||
|
||||
So, I guess I'm inclined to not go down this read-only path, and instead wait for
|
||||
aws to get updated and use that.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2024-11-12T18:32:53Z"
|
||||
content="""
|
||||
The `checkbucketversioning` branch has this implemented, to be merged once
|
||||
aws is released supporting it.
|
||||
"""]]
|
|
@ -1,59 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Initially filed/expressed myself [on datalad issues](https://github.com/datalad/datalad/issues/7286#issuecomment-1434042685) but decided to duplicate here.
|
||||
|
||||
Since git-annex [10.20230126-78-g452b080db AKA 10.20230214~12](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=452b080dba11f0d9d5251061acfc50729bf6f633)
|
||||
(so rapidly released after introduction while datalad testing just got a chance to start breaking/me report): behavior change was not just about a nonzero exit but rather that git-annex no longer bothers to output any info for any file as soon as it encounters the path it doesn't know.
|
||||
|
||||
<details>
|
||||
<summary>In the case of untracked , completely wrong, and annexed file:</summary>
|
||||
|
||||
```shell
|
||||
❯ git status
|
||||
On branch dl-test-branch
|
||||
Your branch is up to date with 'dl-test-remote/dl-test-branch'.
|
||||
|
||||
Untracked files:
|
||||
(use "git add <file>..." to include in what will be committed)
|
||||
not-committed.txt
|
||||
|
||||
nothing added to commit but untracked files present (use "git add" to track)
|
||||
❯ ls -l
|
||||
total 8
|
||||
-rw------- 1 yoh yoh 3 Feb 16 22:10 not-committed.txt
|
||||
lrwxrwxrwx 1 yoh yoh 186 Feb 16 22:10 test-annex.dat -> .git/annex/objects/Gm/mv/SHA256E-s7--ed7002b439e9ac845f22357d822bac1444730fbdb6016d3ec9432297b9ec9f73.dat/SHA256E-s7--ed7002b439e9ac845f22357d822bac1444730fbdb6016d3ec9432297b9ec9f73.dat
|
||||
|
||||
```
|
||||
</details>
|
||||
|
||||
Compare behavior before:
|
||||
|
||||
```shell
|
||||
❯ ( source ~/git-annexes/10.20230126.env; git annex version | head -n 1; git annex info --json --json-error-messages not-committed.txt INFO.txt test-annex.dat; echo exit $?)
|
||||
git-annex version: 10.20230126-1~ndall+1
|
||||
fatal: Not a valid object name not-committed.txt
|
||||
{"command":"info","note":"not a directory or an annexed file or a treeish or a remote or a uuid","success":false,"input":["not-committed.txt"],"error-messages":[],"file":"not-committed.txt"}
|
||||
fatal: Not a valid object name INFO.txt
|
||||
{"command":"info","note":"not a directory or an annexed file or a treeish or a remote or a uuid","success":false,"input":["INFO.txt"],"error-messages":[],"file":"INFO.txt"}
|
||||
{"command":"info test-annex.dat","size":"7 bytes","success":true,"input":["test-annex.dat"],"key":"SHA256E-s7--ed7002b439e9ac845f22357d822bac1444730fbdb6016d3ec9432297b9ec9f73.dat","error-messages":[],"present":false,"file":"test-annex.dat"}
|
||||
exit 0
|
||||
```
|
||||
|
||||
where it did spit out errors to stderr but nevertheless trustfully returned json records for all files and eventually for the one it knows about (and we rely on such behavior!) to now
|
||||
|
||||
```shell
|
||||
❯ ( source ~/git-annexes/10.20230214.env; git annex version | head -n 1; git annex info --json --json-error-messages not-committed.txt INFO.txt test-annex.dat ; echo exit $?)
|
||||
git-annex version: 10.20230214-1~ndall+1
|
||||
fatal: Not a valid object name not-committed.txt
|
||||
git-annex: not a directory or an annexed file or a treeish or a remote or a uuid
|
||||
exit 1
|
||||
```
|
||||
|
||||
where we get only immediate error message to stderr and not a single record is output.
|
||||
|
||||
IMHO prior behavior is "more correct" and we rely on it in datalad - get responses per each path. If it exits with non-0 after, that is ok with me. If it stops producing results completely, it would be an extra effort first sort out paths first.
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-02-20T17:56:39Z"
|
||||
content="""
|
||||
Fixed this. Note that with the fix, it will still exit nonzero at the end
|
||||
when given a path that does not exist, but it will first process the other
|
||||
inputs.
|
||||
|
||||
Also I've added a test case.
|
||||
"""]]
|
|
@ -1,36 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
ref: [https://github.com/datalad/datalad/issues/7173#issuecomment-1314968568](https://github.com/datalad/datalad/issues/7173#issuecomment-1314968568)
|
||||
|
||||
```
|
||||
❯ mkdir "/tmp/new
|
||||
dquote> line"
|
||||
❯ cd "/tmp/new
|
||||
line"
|
||||
❯ git init
|
||||
Initialized empty Git repository in /tmp/new
|
||||
line/.git/
|
||||
❯ git annex init
|
||||
init ok
|
||||
fatal: Cannot open '/tmp/new': No such file or directory
|
||||
fatal: Cannot open '/tmp/new': No such file or directory
|
||||
fatal: Cannot open '/tmp/new': No such file or directory
|
||||
fatal: Cannot open '/tmp/new': No such file or directory
|
||||
fatal: Cannot open '/tmp/new': No such file or directory
|
||||
fatal: Cannot open '/tmp/new': No such file or directory
|
||||
fatal: Cannot open '/tmp/new': No such file or directory
|
||||
fatal: Cannot open '/tmp/new': No such file or directory
|
||||
fatal: Cannot open '/tmp/new': No such file or directory
|
||||
fatal: Cannot open '/tmp/new': No such file or directory
|
||||
fatal: Cannot open '/tmp/new': No such file or directory
|
||||
git-annex: fd:19: Data.ByteString.hGetLine: end of file
|
||||
|
||||
❯ git annex version
|
||||
git-annex version: 10.20230214+git26-g8f2829e646-1~ndall+1
|
||||
```
|
||||
|
||||
as `git` doesn't mind, and now annex batched commands support `-z` already for filenames with newlines in them, I think git-annex should tolerate repository folders with newlines in them too.
|
||||
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,20 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-03-13T16:20:54Z"
|
||||
content="""
|
||||
Unfortunately, `git hash-object --stdin-paths` does not support
|
||||
-z or anything like that. It is a newline based protocol.
|
||||
|
||||
Ok, made git-annex fall back to running git hash-object once
|
||||
per file when the filenames contain newlines to work around that.
|
||||
|
||||
BTW, another problem I noticed is that the repository decription
|
||||
written to uuid.log contains a newline, which prevents parsing that line of
|
||||
the log correctly. This can also be seen by passing a value
|
||||
with a newline to `git-annex describe`. It would also happen in the
|
||||
case with the newline directory if it didn't fail earlier.
|
||||
|
||||
Also fixed this, though, with a one-way escaping,
|
||||
see [[!commit 38e9ea8497bb2ab058e5bd46a666857789c0a84d]].
|
||||
"""]]
|
|
@ -1,45 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Original case has more in [datalad github issue](https://github.com/datalad/datalad/issues/7371).
|
||||
In a nutshell in my words: a user has a repository which is v9, under ACL (but git annex works fine as is). A user clones from another user locally. `git annex init` fails to determine (doesn't record) UUID of the `origin` remote but also does not make it `git-annex` ignore. If we manually set `origin` uuid within .git/config of the clone, then `git annex whereis` reports presence fine. But if we do `git annex get` (see [here](https://github.com/datalad/datalad/issues/7371#issuecomment-1546158732)), it says that it is unable to access remote origin, and suggests two other remotes (not available).
|
||||
|
||||
The sad part is that `git-annex` did not really give any reason ( in --debug) on why it didn't discover UUID or why it is unable to access it, e.g. here is output from `git annex init` in the clone when I think it should have discovered/recorded UUID
|
||||
|
||||
```
|
||||
[2023-05-12 11:26:12.750934374] (Annex.Branch) read uuid.log
|
||||
[2023-05-12 11:26:12.753755353] (Annex.Branch) set uuid.log
|
||||
[2023-05-12 11:26:12.7539016] (Annex.Branch) read remote.log
|
||||
[2023-05-12 11:26:12.755652872] (Utility.Process) process [43725] read: git ["config","--null","--list"]
|
||||
[2023-05-12 11:26:12.763856026] (Utility.Process) process [43725] done ExitSuccess
|
||||
[2023-05-12 11:26:12.76467482] (Utility.Process) process [43726] call: /usr/local/miniconda3/share/git-annex-10.20220927-0/bin/git-annex ["upgrade","--quiet","--autoonly"]
|
||||
[2023-05-12 11:26:12.794100842] (Utility.Process) process [43726] done ExitSuccess
|
||||
[2023-05-12 11:26:12.79481645] (Utility.Process) process [43733] read: git ["config","--null","--list"]
|
||||
[2023-05-12 11:26:12.802972197] (Utility.Process) process [43733] done ExitSuccess
|
||||
[2023-05-12 11:26:12.803473974] (Annex.Branch) read trust.log
|
||||
ok
|
||||
```
|
||||
from [this comment](https://github.com/datalad/datalad/issues/7371#issuecomment-1545929998).
|
||||
|
||||
|
||||
So what we really need is some debug logging to tell us more.
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
we failed to create a reproducer. So it is something about that user + original location.
|
||||
|
||||
`git annex upgrade` from v9 to v10 somehow resolved it in one sample case. We have more cases like that we are not upgrading yet to reproduce again.
|
||||
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
originally in some older 8.2022 but now in 10.20230407
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> Hard to know when there is *enough* debugging, but with what I've added,
|
||||
> I can't think of any more I could add that would help with a problem of
|
||||
> this kind. Unless of course git-annex has a deep dark bug where it reads
|
||||
> an annex.uuid from git config, but then somehow misplaces it. But I can't
|
||||
> imagine such a bug so it's hard to add debugging for it. So, I suppose
|
||||
> this is [[done]] --[[Joey]]
|
|
@ -1,24 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-05-15T17:43:21Z"
|
||||
content="""
|
||||
Something that prevents `git config` from working, or prevents it from
|
||||
listing an annex.uuid for the remote, seems like the overridingly likely
|
||||
reason for their problem. (You were asking the right questions
|
||||
[here](https://github.com/datalad/datalad/issues/7371#issuecomment-1545975295)
|
||||
and I don't think they really answered them, unless it happened in your office
|
||||
hours.)
|
||||
|
||||
I've made --debug include the output of `git config --list`,
|
||||
which allows seeing if a problem prevents git from reading the config of
|
||||
the remote.
|
||||
|
||||
I also made the debug output tell what directory it's running a command in
|
||||
when it's not the pwd.
|
||||
|
||||
So, for example:
|
||||
|
||||
[2023-05-15 15:16:01.414302245] (Utility.Process) process [59665] read: git ["config","--null","--list"] in "/home/joey/tmp/a"
|
||||
[2023-05-15 15:16:01.419396816] (Git.Config) git config read: [("",[]),("annex.uuid",["9553f51c-87ad-4321-86fb-de4aa630e997"]) [...]
|
||||
"""]]
|
|
@ -1,106 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Reference: issue/discovery in [repronim/containers while adding neurodesk images](https://github.com/ReproNim/containers/issues/64#issuecomment-1492256561)
|
||||
|
||||
- apparently we had no URLs made registered with images despite running `registerurl KEY ANNEX`
|
||||
- some images do have urls
|
||||
|
||||
took awhile to grasp what is going on and then I found an unfinished reproducer from `Mar 15 2021 annex-claimurl.sh` without recollection why I have not finished it, but it seems that it might be "operator error" somehow? but seems unlikely... might be datalad special remote bug?
|
||||
|
||||
Summary of the problem: if there is an external git-annex-remote which CLAIMURL - git-annex registerurl does **not** associate that URL with any (that external or web) remote and thus does not make that key available to the user despite knowing the url.
|
||||
|
||||
Should it btw default to `web` if no remote is associated with it?
|
||||
|
||||
Filed complimentary [registerurl --remote REMOTE](https://git-annex.branchable.com/todo/registerurl_--remote_REMOTE/) TODO since in this case I would have preferred to just register against web remote.
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
Here is a new "quick" reproducer but you need datalad being installed to get `git-annex-remote-datalad`.
|
||||
|
||||
```
|
||||
#!/bin/bash
|
||||
|
||||
export PS4='> '
|
||||
|
||||
set -eu
|
||||
set -x
|
||||
|
||||
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
|
||||
|
||||
git init
|
||||
git annex init
|
||||
|
||||
# It works fine if we do not enable datalad special remote!
|
||||
# so it is something about interaction there
|
||||
git annex initremote datalad externaltype=datalad type=external encryption=none autoenable=true uuid=65b6c36b-debd-4a23-8fa3-675cbd200496
|
||||
git annex enableremote datalad
|
||||
|
||||
git annex info
|
||||
|
||||
# so it seems that addurl does it right
|
||||
git annex addurl --debug --file 123.dat http://www.oneukrainian.com/tmp/123.dat
|
||||
|
||||
# but if I do via registerurl -- not quite so
|
||||
echo 124 > 124.dat
|
||||
git annex add 124.dat
|
||||
key=$(readlink -f 124.dat | xargs basename)
|
||||
git annex registerurl --debug "$key" http://www.oneukrainian.com/tmp/124.dat
|
||||
|
||||
git commit -m 'added those two files with urls'
|
||||
|
||||
git annex whereis --debug 123.dat
|
||||
git annex whereis --debug 124.dat
|
||||
|
||||
git checkout git-annex
|
||||
: # URLs are known for both
|
||||
git grep oneukrainian
|
||||
: # but only 123.dat would be associated with datalad remote
|
||||
git grep 65b6c36b-debd-4a23-8fa3-675cbd200496
|
||||
```
|
||||
|
||||
With [full log here](http://www.oneukrainian.com/tmp/annex-claimurl-2023.sh.log) and without `--debug` ending up like
|
||||
|
||||
```
|
||||
❯ grep -v '^\[' annex-claimurl-2023.sh.log | tail -n 29
|
||||
(recording state in git...)
|
||||
> git commit -m 'added those two files with urls'
|
||||
2 files changed, 2 insertions(+)
|
||||
create mode 120000 123.dat
|
||||
create mode 120000 124.dat
|
||||
> git annex whereis --debug 123.dat
|
||||
whereis 123.dat [2023-03-31 18:29:27.56573965] (Utility.Process) process [1429290] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
|
||||
(2 copies)
|
||||
62c53770-5274-40d4-a45a-de308c234ea9 -- yoh@bilena:~/.tmp/dl-FbOrptq [here]
|
||||
65b6c36b-debd-4a23-8fa3-675cbd200496 -- [datalad]
|
||||
|
||||
datalad: http://www.oneukrainian.com/tmp/123.dat
|
||||
ok
|
||||
> git annex whereis --debug 124.dat
|
||||
whereis 124.dat [2023-03-31 18:29:27.857735575] (Utility.Process) process [1429322] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
|
||||
(1 copy)
|
||||
62c53770-5274-40d4-a45a-de308c234ea9 -- yoh@bilena:~/.tmp/dl-FbOrptq [here]
|
||||
ok
|
||||
> git checkout git-annex
|
||||
Switched to branch 'git-annex'
|
||||
> :
|
||||
> git grep oneukrainian
|
||||
060/68b/SHA256E-s4--ca2ebdf97d7469496b1f4b78958f9dc8447efdcb623953fee7b6996b762f6fff.dat.log.web:1680301767.477711756s 1 :http://www.oneukrainian.com/tmp/124.dat
|
||||
ae1/21c/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b.dat.log.web:1680301767.037966322s 1 :http://www.oneukrainian.com/tmp/123.dat
|
||||
> :
|
||||
> git grep 65b6c36b-debd-4a23-8fa3-675cbd200496
|
||||
ae1/21c/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b.dat.log:1680301767.038748415s 1 65b6c36b-debd-4a23-8fa3-675cbd200496
|
||||
remote.log:65b6c36b-debd-4a23-8fa3-675cbd200496 autoenable=true encryption=none externaltype=datalad name=datalad type=external timestamp=1680301766.517251391s
|
||||
uuid.log:65b6c36b-debd-4a23-8fa3-675cbd200496 datalad timestamp=1680301765.789226249s
|
||||
```
|
||||
|
||||
so - both keys have urls, but only 123.dat one is associated with datalad special remote, and only it has url reported by whereis
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
10.20230126 but tried with older 8.20210803 since thought it must be regression -- the same result
|
||||
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/repronim]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,29 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-04-04T17:07:37Z"
|
||||
content="""
|
||||
This is intentional, see [[!commit 451171b7c1eaccfd0f39d4ec1d64c6964613f55a]]
|
||||
which changed setUrlPresent to only update presence info when the url
|
||||
belongs to the web but not when it's claimed by other special remotes.
|
||||
|
||||
It makes sense for registerurl to be symmetric with rmurl, and rmurl only
|
||||
updates presence info when the url is a web url.
|
||||
|
||||
To the extent I've been able to follow the complex reasoning there for why,
|
||||
part of it is clear: The web special remote is different from other special
|
||||
remotes in that content cannot be dropped from it by git-annex, and the url is
|
||||
the only pointer to content. So when rmurl removes the last web url, it makes
|
||||
sense to treat the content as no longer present on the web. But if the url is
|
||||
claimed by another special remote, which does support dropping content, the
|
||||
content would still be present on it after removing its url, and would be
|
||||
accessible w/o using that url, and `git-annex fsck --fast --from` would notice
|
||||
it was present and fix up the location log if it didn't show it as content.
|
||||
|
||||
Also note that the rmurl man page documents this when it says:
|
||||
|
||||
Removing the last web url will make git-annex no longer treat content as being
|
||||
present in the web special remote.
|
||||
|
||||
All you need to do is use `git-annex setpresentkey` along with registerurl.
|
||||
"""]]
|
|
@ -1,13 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2023-04-04T20:15:59Z"
|
||||
content="""
|
||||
yet to re-review that reasoning, but does it mean that to merely register a URL client needs to
|
||||
- call `annex registerurl`
|
||||
- inspect to which remote URL was added/was claimed (is there a way? `whois` is silent)
|
||||
- if it was claimed by some special remote other than web -- use `annex setpresentkey`?
|
||||
|
||||
Sounds like too much / too fragile, and somewhat different from how `addurl` behaves which does it all just fine regardless either it is web or some claimurl'ed remote.
|
||||
"""]]
|
|
@ -1,35 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 3"
|
||||
date="2023-04-05T00:30:00Z"
|
||||
content="""
|
||||
So to some degree it is a regression / broken behavior which initially worked just fine with registerurl -- tried the 6.20180913+git149-g23bd27773 version and it performed \"as expected\". Eh, never enough tests ;)
|
||||
|
||||
I have looked at that commit changelog and [detailed description](http://source.git-annex.branchable.com/?p=source.git;a=blob;f=doc/bugs/suggests_to_enable_web_remote_even_when_there_is_no_web_urls_for_the_file/comment_4_6dff7befbaacbff573c5f72688966af5._comment;h=c636b09291a23bbce52b0367a767717137f99a21;hb=451171b7c1eaccfd0f39d4ec1d64c6964613f55a) . Not fully grasping yet why `registerurl` should not behave symmetrically with `addurl` in being sufficient by itself to add a url to content so it becomes usable for `get` right away, without some other dances like `setpresentkey`. I think I do get `rmurl` \"ambiguity\" but here on that more reflected below.
|
||||
|
||||
Rereading your comment [above](https://git-annex.branchable.com/bugs/registerurl_does_not_register_if_external_remote/#comment-ba9d6517d8f8c10167da95b122a022b3):
|
||||
|
||||
> part of it is clear: The web special remote is different from other special remotes in that content cannot be dropped from it by git-annex, and the url is the only pointer to content.
|
||||
|
||||
This is just an assumption on some \"special nature of web remote\", e.g. the `datalad` remote also doesn't support dropping, and URL is also just the pointer to content. And CLAIMURL functionality came IIRC exactly for that use case and before adding some kind of duality for having content accessible directly from special remote and via url.
|
||||
|
||||
> But if the url is claimed by another special remote, which does support dropping content, the content would still be present on it after removing its url, and would be accessible w/o using that url,
|
||||
|
||||
that is yet another assumption, since e.g. in the case of datalad remote `rmurl` effect would be identical to `web` remote, and there is no other way to get content from that remote. (so there is no duality mentioned above)
|
||||
|
||||
> All you need to do is use git-annex setpresentkey along with registerurl.
|
||||
|
||||
this somewhat contradicts above \"the content would still be present on it after removing its url\" which suggests that presence of URL for the remote already sufficient indication of being present on the remote.
|
||||
|
||||
Overall, there is seems some assumptions about URLs and external remotes which ideally should be avoided. May be it it should somehow be reflected in the external remote protocol to indicate that CLAIMing URL indicates that it is present at that URL, and that there is no other way to access that content from the remote besides via URL.
|
||||
|
||||
As a workaround I of cause will now either `setpresentkey` or will just reassign all urls to be handled directly by web remote somehow. But in the long run I think it is problematic design since even `registerurl` doesn't even report to which remote that URL was registered to
|
||||
|
||||
```
|
||||
> git annex registerurl --json SHA256E-s4--ca2ebdf97d7469496b1f4b78958f9dc8447efdcb623953fee7b6996b762f6fff.dat http://www.oneukrainian.com/tmp/124.dat
|
||||
{\"command\":\"registerurl\",\"error-messages\":[],\"file\":null,\"input\":[\"SHA256E-s4--ca2ebdf97d7469496b1f4b78958f9dc8447efdcb623953fee7b6996b762f6fff.dat\",\"http://www.oneukrainian.com/tmp/124.dat\"],\"success\":true}
|
||||
```
|
||||
so how could I generally to know proper invocation for `setpresent` key to follow it up?
|
||||
|
||||
"""]]
|
|
@ -1,64 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2023-04-05T17:25:48Z"
|
||||
content="""
|
||||
Whups, I forgot about the newish unregisterurl! That's the true inverse of
|
||||
registerurl. So rmurl is really more the inverse of addurl.
|
||||
|
||||
I think I've fully understood the situation that led to this reversion now.
|
||||
I do think it was a reversion. That change was all about SETURLPRESENT and
|
||||
SETURLMISSING in the external special remote protocol, as well as rmurl;
|
||||
I think that the effect on registerurl was not considered.
|
||||
|
||||
So while I'd like to simplify registerurl to as basic a plumbing command as
|
||||
possible, and would prefer it not to update location tracking, there's the
|
||||
matter of backward compatability. Especially for simple cases like adding
|
||||
regular web urls with it. It would be ok to change it back to update location
|
||||
tracking for remotes that claim an url. As long as unregisterurl can be
|
||||
symmetric with it --- can it?
|
||||
|
||||
rmurl also has its own wacky behavior in this area:
|
||||
|
||||
# git-annex addurl --fast https://cdimage.debian.org/debian-cd/current/i386/bt-cd/debian-11.6.0-i386-netinst.iso.torrent
|
||||
(downloading torrent file...) addurl https://cdimage.debian.org/debian-cd/current/i386/bt-cd/debian-11.6.0-i386-netinst.iso.torrent (from bittorrent) (to debian-11.6.0-i386-netinst.iso) ok
|
||||
(recording state in git...)
|
||||
# git-annex rmurl debian-11.6.0-i386-netinst.iso https://cdimage.debian.org/debian-cd/current/i386/bt-cd/debian-11.6.0-i386-netinst.iso.torrent
|
||||
rmurl debian-11.6.0-i386-netinst.iso ok
|
||||
(recording state in git...)
|
||||
# git-annex whereis debian-11.6.0-i386-netinst.iso
|
||||
whereis debian-11.6.0-i386-netinst.iso (1 copy)
|
||||
00000000-0000-0000-0000-000000000002 -- bittorrent
|
||||
ok
|
||||
# git-annex get debian-11.6.0-i386-netinst.iso
|
||||
(fails)
|
||||
|
||||
Is that a bug? It's certianly not ideal for the bittorrent special
|
||||
remote, which can't download the file once the url is removed. (It is
|
||||
documented behavior though.)
|
||||
|
||||
While thinking about those questions, I thought of this situation:
|
||||
|
||||
# git-annex initremote s3 type=S3 ..
|
||||
# git-annex copy --key $key --to s3
|
||||
# git-annex registerurl $key $url
|
||||
# git-annex unregisterurl $key $url
|
||||
# git-annex drop --key $key --from s3
|
||||
|
||||
At the end there, it's still able to drop the content from s3.
|
||||
|
||||
Now, consider hypothetically, if I decide to make the S3 remote CLAIMURL
|
||||
urls that are in the S3 bucket. As things stand, that won't change the
|
||||
above scenario. (Although the key won't be recorded as located in the web
|
||||
after registerurl.)
|
||||
|
||||
But... If unregisterurl is changed to update remote tracking for other remotes
|
||||
than web, after the S3 CLAIMURL change, the behavior of that scenario will not
|
||||
be the same! After unregisterurl, it will no longer consider the content to be
|
||||
present in S3. Now you're racking up S3 charges with content that git-annex
|
||||
stored in S3, but that it refuses to delete. That seems bad.
|
||||
|
||||
So, that scenario is leading me to think that I should not change
|
||||
unregisterurl (or rmurl) to update location tracking of remotes other than web.
|
||||
And so changing registerurl is also looking like a bad idea.
|
||||
"""]]
|
|
@ -1,18 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2023-04-05T18:47:51Z"
|
||||
content="""
|
||||
What I'm inclined to do is is add a --remote= parameter to registerurl and
|
||||
unregisterurl. If the specified remote does not claim the url, have it fail
|
||||
to add it. (See also [[todo/registerurl_--remote_REMOTE]])
|
||||
|
||||
So, you can then use registerurl with --remote=$uuid, check that it
|
||||
succeeded, and then use setpresentkey to mark it present on that uuid.
|
||||
Without the fragility you complained of.
|
||||
|
||||
Update: The --remote parameter is implemented now.
|
||||
|
||||
(Could registerurl with --remote update location tracking itself? Maybe,
|
||||
but I'd worry about a scenario like in the previous comment.)
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 6"
|
||||
date="2023-04-05T19:36:40Z"
|
||||
content="""
|
||||
Obviously, as the author of the referenced wishlist, I would welcome addition of `--remote` option to both those commands.
|
||||
|
||||
But IMHO addition of the option doesn't solve initial/naive/programmable user oriented use case where user doesn't know which remote could or should handle the URL, and just wants, analogously or complimentary to `addurl`, to extend the list of the urls available for some key. There is even no user level interface to ask for \"what remotes can handle this url\" to erect some tandem of commands to register extra URLs for a key. So I don't see how addition of the option would solve the problem.
|
||||
"""]]
|
|
@ -1,19 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 7"""
|
||||
date="2023-04-05T19:57:37Z"
|
||||
content="""
|
||||
Well, unregisterurl and rmurl can't safely update location tracking for remotes
|
||||
other than the web. Unless there were some way to know that simply removing an
|
||||
url was *sufficient*, like it is for the web, and unlike how it would be
|
||||
with my S3 remote scenario above.
|
||||
|
||||
But, the only issue with registerurl updating location tracking is that it's
|
||||
not symmetric with unregisterurl.
|
||||
|
||||
So is that symmetry more important than comment 6? I don't know. In both
|
||||
cases, some users are going to be surprised by inconsistent behavior.
|
||||
|
||||
The only way to avoid all user surprise would be to go back in time and
|
||||
make these plumbing commands not update location tracking from the start.
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 8"""
|
||||
date="2023-04-05T21:00:04Z"
|
||||
content="""
|
||||
Guess I'll come down on the side of restoring old behavior which was
|
||||
changed w/o warning (and without the new behavior ever being documented).
|
||||
|
||||
And on the side of user experience showing the current behavior is surprising.
|
||||
|
||||
The future users who get surprised by the resulting inconsistency
|
||||
of unregisterurl not unsetting location tracking will just have to
|
||||
live with it.. Sigh.
|
||||
"""]]
|
|
@ -1,66 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
our datalad tests started to fail recently (in [this PR](https://github.com/datalad/datalad/pull/7372) is the effort to troubleshoot etc).
|
||||
|
||||
Here is what we see with recent version using such simple script:
|
||||
|
||||
```
|
||||
#!/bin/bash
|
||||
|
||||
export PS4='> '
|
||||
|
||||
set -eu
|
||||
set -x
|
||||
|
||||
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
|
||||
|
||||
git init
|
||||
git annex init
|
||||
|
||||
n='gl\orious'
|
||||
# touch "$n"
|
||||
git annex add --json --json-error-messages "$n"
|
||||
```
|
||||
|
||||
that now
|
||||
|
||||
```
|
||||
❯ ( source /home/yoh/git-annexes/10.20230407+git63-g3d1d77a1bb.env ; bash escaped.sh )
|
||||
>> mktemp -d /home/yoh/.tmp/dl-XXXXXXX
|
||||
> cd /home/yoh/.tmp/dl-OAXQ1CE
|
||||
> git init
|
||||
Initialized empty Git repository in /home/yoh/.tmp/dl-OAXQ1CE/.git/
|
||||
> git annex init
|
||||
init ok
|
||||
(recording state in git...)
|
||||
> n='gl\orious'
|
||||
> git annex add --json --json-error-messages 'gl\orious'
|
||||
git-annex: "gl\\orious" not found
|
||||
add: 1 failed
|
||||
```
|
||||
|
||||
so we get `\\` instead of `\` in the output printed by git-annex
|
||||
|
||||
<details>
|
||||
<summary>previously was all fine</summary>
|
||||
|
||||
```shell
|
||||
❯ ( source /home/yoh/git-annexes/10.20230407.env ; bash escaped.sh )
|
||||
>> mktemp -d /home/yoh/.tmp/dl-XXXXXXX
|
||||
> cd /home/yoh/.tmp/dl-1TzrWdi
|
||||
> git init
|
||||
Initialized empty Git repository in /home/yoh/.tmp/dl-1TzrWdi/.git/
|
||||
> git annex init
|
||||
init ok
|
||||
(recording state in git...)
|
||||
> n='gl\orious'
|
||||
> git annex add --json --json-error-messages 'gl\orious'
|
||||
git-annex: gl\orious not found
|
||||
add: 1 failed
|
||||
```
|
||||
</details>
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> [[closed|done]] --[[Joey]]
|
|
@ -1,40 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-04-24T15:13:35Z"
|
||||
content="""
|
||||
The next release is going to escape and quote filenames that contain
|
||||
special characters similarly to how git does. (But json will not be affected due
|
||||
to already being escaped.)
|
||||
|
||||
This will affect filenames output in error messages. So if you are
|
||||
parsing error messages or non-json output with filenames that contain
|
||||
characters that need to be escaped, you will need to deal with the
|
||||
change.
|
||||
|
||||
See [[todo/terminal_escapes_in_filenames]] for the full details of this
|
||||
change.
|
||||
|
||||
One thing I noticed about your example is that git-annex add doesn't display
|
||||
that particular filename the same as git add does:
|
||||
|
||||
joey@darkstar:~/tmp/xxx>git-annex add 'gl\orious'
|
||||
git-annex: "gl\\orious" not found
|
||||
joey@darkstar:~/tmp/xxx>git add 'gl\orious'
|
||||
fatal: pathspec 'gl\orious' did not match any files
|
||||
|
||||
But, that is an inconsistency in git itself. More commonly it uses the same
|
||||
display as git-annex for this filename:
|
||||
|
||||
joey@darkstar:~/tmp/xxx>touch 'gl\orious'
|
||||
joey@darkstar:~/tmp/xxx>git add 'gl\orious'
|
||||
joey@darkstar:~/tmp/xxx>git diff --cached
|
||||
diff --git "a/gl\\orious" "b/gl\\orious"
|
||||
new file mode 100644
|
||||
index 0000000..e69de29
|
||||
|
||||
So I don't think there's a problem with git-annex's behavior here. With
|
||||
that said, we can talk about adding something to make back-compatability
|
||||
easy for you, or whatever. An config like core.quotePath but that also
|
||||
affects special characters, not just unicode, for example.
|
||||
"""]]
|
|
@ -1,41 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2023-04-24T19:23:22Z"
|
||||
content="""
|
||||
hm, I didn't look inside `git` but `git diff` is likely to have it escaped because `patch` (and/or other unified diff operating tools) expect it such. In other words -- `git diff` must encode paths escaped because the \"diff standard\" expects it such.
|
||||
|
||||
On the other hand, as you confirmed, `git add` just displays the name on the screen, and as such it does not bother escaping it since may be I just cut/paste it as a string which is \"raw\" and thus not expecting any escape characters.
|
||||
|
||||
RTFMing [git-config on core.quotePath](https://git-scm.com/docs/git-config#Documentation/git-config.txt-corequotePath) I spotted
|
||||
|
||||
> ... enclosing the pathname in double-quotes and escaping ...
|
||||
|
||||
so it talks about double-quotes. `git` `status`, `diff` report paths in double (`\"`) not single (`'`) quotes. I wonder if that is where/how `git` is consistent since in your example that is the difference too:
|
||||
|
||||
```
|
||||
# current master git-annex
|
||||
joey@darkstar:~/tmp/xxx>git-annex add 'gl\orious'
|
||||
git-annex: \"gl\\orious\" not found
|
||||
joey@darkstar:~/tmp/xxx>git add 'gl\orious'
|
||||
fatal: pathspec 'gl\orious' did not match any files
|
||||
```
|
||||
|
||||
that git uses `'` (and does not escape) while git annex uses `\"` (and escapes)? Did you see git doing escaping in paths where it reports them within single (`'`) quotes?
|
||||
|
||||
and thus git-annex should have just wrapped in `'` to become consistent with git in :
|
||||
|
||||
```shell
|
||||
# released git-annex
|
||||
> git annex add --json --json-error-messages '\e[31mfo\o\e[0m'
|
||||
git-annex: \e[31mfo\o\e[0m not found
|
||||
add: 1 failed
|
||||
> git add '\e[31mfo\o\e[0m'
|
||||
fatal: pathspec '\e[31mfo\o\e[0m' did not match any files
|
||||
> git rm '\e[31mfo\o\e[0m'
|
||||
fatal: pathspec '\e[31mfo\o\e[0m' did not match any files
|
||||
```
|
||||
|
||||
|
||||
"""]]
|
|
@ -1,37 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2023-04-24T19:25:12Z"
|
||||
content="""
|
||||
git escapes filenames like this extensively:
|
||||
|
||||
joey@darkstar:~/tmp/xxx>git ls-files
|
||||
"gl\\orious"
|
||||
joey@darkstar:~/tmp/xxx>git status
|
||||
Changes to be committed:
|
||||
(use "git restore --staged <file>..." to unstage)
|
||||
new file: "gl\\orious"
|
||||
joey@darkstar:~/tmp/xxx>git grep hi
|
||||
"gl\\orious":hi
|
||||
|
||||
This message from `git add` escapes slightly differently, but it still escapes
|
||||
some characters:
|
||||
|
||||
joey@darkstar:~/tmp/xxx>git add $(echo -e "\e[31mfoo\e[0m")
|
||||
fatal: pathspec '?[31mfoo?[0m' did not match any files
|
||||
|
||||
Git only does this type of escaping when displaying a fatal error
|
||||
(it's `vreportf` in the git source, used by things like `die`).
|
||||
It's basically a last-ditch filtering of a string, which may contain a filename
|
||||
or other untrusted data, to avoid displaying escape characters. git-annex does
|
||||
contain such a last-ditch filtering too (safeOutput) but type safety let me avoid
|
||||
needing to use it to handle this filename here. I don't think it's at all necessary
|
||||
for git-annex to be bug-for-bug equivilant with git in its display of error
|
||||
messages; but it is important that it escape somehow. Git's double-quoted escaping
|
||||
is documented, and this other escaping is not.
|
||||
|
||||
Since either behavior would be a behavior change from before when git-annex didn't
|
||||
escape the filename in the error message with either method, it seems to me either
|
||||
one would likely break your assumption. So I don't know why you're arguing for
|
||||
one way over the other way.
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 4"
|
||||
date="2023-04-24T21:25:08Z"
|
||||
content="""
|
||||
I am \"arguing\" because ideally I prefer not to handle some not quite standardized un-escaping.
|
||||
|
||||
1. Which characters I should expect to be escaped? [Here](https://github.com/datalad/datalad/blob/maint/datalad/support/network.py#L925) is the ones we have for SSH: `_SSH_ESCAPED_CHARACTERS = '\\#&;`|*?~<>^()[]{}$\'\" '`. The same here?
|
||||
2. Would it be sensible to request `add --json --json-error-messages` to produce a proper machine readable json record for \"unknown\" input, or there is a reason why there is no json record here in `--json` mode?
|
||||
|
||||
Also I just wanted to make sure that we are not missing some aspect, like what I felt was an unnoticed (to me at least) difference between `'` and `\"` escaping methods, and possibly some original reason on why `git` has them different in those cases -- may be there is some good reason?
|
||||
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 6"""
|
||||
date="2023-04-25T17:18:23Z"
|
||||
content="""
|
||||
Note that I should avoid releasing git-annex with the added escaping until
|
||||
this is addressed.
|
||||
"""]]
|
|
@ -1,29 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 6"""
|
||||
date="2023-04-25T16:31:44Z"
|
||||
content="""
|
||||
I dug into your code, and datalad is parsing git-annex's (and perhaps
|
||||
git's in some cases) stderr to find error messages like this one for files
|
||||
that don't exist, and then it internally dummies up something as if git-annex
|
||||
were outputting a --json-error-messages record for the file. See
|
||||
`./datalad/support/annex_utils.py` `_get_non_existing_from_annex_output`
|
||||
|
||||
Ok, I can understand now how needing to do an additional form of unescaping
|
||||
on top of that existing pain point would cause the reaction I have seen in
|
||||
this bug report.
|
||||
|
||||
[[todo/api_for_telling_when_nonexistant_or_non_git_files_passed]] is a todo
|
||||
item I opened the last time I became aware of this error message parsing ugliness.
|
||||
(Also relevant is [this closed todo](https://git-annex.branchable.com/projects/datalad/bugs-done/copy_does_not_reflect_some_failed_copies_in_--json_output/)
|
||||
where I discuss why --json-error-messages cannot include these errors
|
||||
as-is.)
|
||||
|
||||
So the choice is between implementing
|
||||
[[todo/api_for_telling_when_nonexistant_or_non_git_files_passed]]
|
||||
and changing datalad to use that. Or adding a git config
|
||||
that avoids escaping filenames. The latter would be easy
|
||||
to do (and easier for datalad to use), but it kicks the can down the road.
|
||||
Datalad parsing error messages would continue to be a problem going
|
||||
forward. (Imagine if git-annex gets localized error messages..)
|
||||
"""]]
|
|
@ -1,18 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 7"""
|
||||
date="2023-04-25T23:27:25Z"
|
||||
content="""
|
||||
[[todo/api_for_telling_when_nonexistant_or_non_git_files_passed]] is
|
||||
implemented now.
|
||||
|
||||
In datalad, all you should need to do now is check for a json object with
|
||||
`errorid:"FileNotFound"` and the `file` field is the name of the file.
|
||||
|
||||
Note that the parser for error messages like "did not match any file(s)
|
||||
known to git" from `git ls-files --error-unmatch` will still be needed in
|
||||
datalad.
|
||||
|
||||
I'm going to leave this open as a git-annex release blocker until the
|
||||
necessary changes get made to datalad.
|
||||
"""]]
|
|
@ -1,16 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 8"""
|
||||
date="2023-04-25T23:33:56Z"
|
||||
content="""
|
||||
Also yarik mentioned this change
|
||||
<https://github.com/datalad/datalad/pull/7372/commits/45ddd4b12ff637c6c77e982225c0e9d9eb53c1b6>
|
||||
which was caused by [[!commit a0e6fa18eb3c16c3c8079bb41c18151e6ea8b554]],
|
||||
which was part of my series for escaping control characters.
|
||||
|
||||
I think git-annex needs to be returned to the old behavior there, even
|
||||
though `git-annex info dne` is not technically operating on a file when it
|
||||
doesn't exist.
|
||||
|
||||
Update: Fixed this.
|
||||
"""]]
|
|
@ -1,15 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 9"""
|
||||
date="2023-06-05T19:06:30Z"
|
||||
content="""
|
||||
This has been blocking git-annex release for the past month.
|
||||
|
||||
Apparently datalad has mostly been updated now.
|
||||
<https://github.com/datalad/datalad/pull/7372> is still not merged, but I'm
|
||||
not clear if the 2 failed tests there are caused by this change or
|
||||
something else.
|
||||
|
||||
I'm feeling now that I've waited long enough for datalad. I've closed this
|
||||
bug and don't consider it as blocking release.
|
||||
"""]]
|
|
@ -1,41 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
I am running `testremote` on a windows CI system to test a special remote implementation for dataverse.org. I run into this error:
|
||||
|
||||
```
|
||||
git-annex: MoveFileEx "C:\\DLTMP\\ran2133" Just ".git\\annex\\objects\\f76\\373\\SHA256E-s1048576--813fea02438e9569e6222f802958fcd89bee742d06ffe9aabe27fd940ef01196.this-is-a-test-key\\SHA256E-s1048576--813fea02438e9569e6222f802958fcd89bee742d06ffe9aabe27fd940ef01196.this-is-a-test-key": does not exist (The system cannot find the path specified.)
|
||||
```
|
||||
|
||||
I suspect this could be a path-length issue (the system reports a max length of 285, and the relative path given above is already 230 chars.
|
||||
|
||||
I thought to run `git annex testremote --backend=MD5E` instead, to shorten the key length, but this options is not honored (enough), the error showing a SHA256 key remains the same.
|
||||
|
||||
`testremote` man page says "Also the git-annex-common-options(1) can be used." and `--backend` is explicitly listed in the help output, hence I assumed this should work.
|
||||
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
It happens when running the https://github.com/datalad/datalad-dataverse tests on a windows appveyor worker. Running on a crippled FS is not enough to trigger the initial `testremote` error, it only happens on windows proper. However, I assume that `--backend` not having the effect that I assumed it should have, is not platform specific.
|
||||
|
||||
Here is a demo test log: https://ci.appveyor.com/project/mih/datalad-dataverse/builds/44079592/job/b38woai0ekmq7bn5#L856
|
||||
|
||||
The corresponding datalad issue is https://github.com/datalad/datalad-dataverse/issues/127
|
||||
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
CI used
|
||||
|
||||
- annex: 8.20211117-gc3af94eff
|
||||
- git: 2.37.0.windows.1
|
||||
|
||||
|
||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
All the time! Sorry to mostly show up when there is an issue!
|
||||
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
[[!meta title="testremote failure on windows due to long filename issues"]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="mih"
|
||||
avatar="http://cdn.libravatar.org/avatar/f881df265a423e4f24eff27c623148fd"
|
||||
subject="Recent version tested"
|
||||
date="2022-07-06T12:18:36Z"
|
||||
content="""
|
||||
The behavior is the same for the more recent git-annex 10.20220624-g17e4081d4
|
||||
"""]]
|
|
@ -1,24 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2022-07-12T16:50:16Z"
|
||||
content="""
|
||||
Actually testremote will not accept --backend in current master, since that
|
||||
is no longer a global option and is accepted only by commands that can
|
||||
actually use it.
|
||||
|
||||
testremote cannot support an arbitrary backend here, because it needs to
|
||||
generate a test key that cannot possibly be used for real data. The only
|
||||
backend that has a way implemented to do that is SHA256. It would not,
|
||||
for example, be possible to make the WORM backend support that, since every
|
||||
possible WORM key could be used by real data.
|
||||
|
||||
It would be possible to add support for --backend=MD5 and have it reject
|
||||
other backends. But this does not strike me as solving the real problem.
|
||||
|
||||
Also, in [[bugs/tests_fail_on_windows:_retrieveKeyFile_resume]]
|
||||
I ran into this same problem, when `git-annex test` was ran, and
|
||||
worked around it by disabling that part of the test suite on windows.
|
||||
If this is fixed, it would be worth re-enabling that, although it may have
|
||||
also been failing for other reasons on windows.
|
||||
"""]]
|
|
@ -1,24 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2022-07-12T17:42:51Z"
|
||||
content="""
|
||||
ghc's IO manager tries to support Windows long paths by normalizing to
|
||||
an UNC-style path in many system calls. However, when git-annex calls
|
||||
rename, on windows that ends up in Win32's moveFileEx (via unix-compat),
|
||||
and that does not do UNC-style normalization. And given the description of
|
||||
the Win32 package, I think it's intended to pass data directly through
|
||||
to the API without anything fancy.
|
||||
|
||||
System.Directory.renamePath could be used instead of Win32.
|
||||
While it still uses Win32 moveFileEx, it first does an UNC-style
|
||||
normalization. Filed an issue:
|
||||
<https://github.com/jacobstanley/unix-compat/issues/56>
|
||||
|
||||
Rather than waiting for that to be fixed, I've made git-annex
|
||||
use System.Directory.renamePath instead itself. But I don't know
|
||||
if it will be enough to make testremote work, or if it will fall over
|
||||
on a later operation on the same too-long path.
|
||||
getFileStatus/getSymbolicLinkStatus seem like the main things in
|
||||
unix-compat that would still be a problem.
|
||||
"""]]
|
|
@ -1,245 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="mih"
|
||||
avatar="http://cdn.libravatar.org/avatar/f881df265a423e4f24eff27c623148fd"
|
||||
subject="Update for git-annex 10.20230227-ga206cdddb4"
|
||||
date="2023-02-28T15:35:50Z"
|
||||
content="""
|
||||
Sorry for the long silence. Coming back to this issue I find the behavior changed, but not sufficiently to get the test suite to run in full on windows.
|
||||
|
||||
I ran `git annex testremote --fast` on Windows `msys_nt-10.0-17763` with git-annex 10.20230227-ga206cdddb4 and git 2.38.1.windows.1
|
||||
|
||||
[[!toggle id=\"ipsum\" text=\"Show test output\"]]
|
||||
|
||||
[[!toggleable id=\"ipsum\" text=\"\"\"
|
||||
```
|
||||
[00:14:23] E unavailable remote
|
||||
[00:14:23] E removeKey: OK (0.02s)
|
||||
[00:14:23] E storeKey: OK
|
||||
[00:14:23] E checkPresent: OK
|
||||
[00:14:23] E retrieveKeyFile: OK (0.03s)
|
||||
[00:14:23] E retrieveKeyFileCheap: OK
|
||||
[00:14:23] E key size Just 1048576; remote chunksize=0 encryption=none
|
||||
[00:14:23] E removeKey when not present: OK (2.53s)
|
||||
[00:14:23] E present False: OK (0.31s)
|
||||
[00:14:23] E storeKey: FAIL
|
||||
[00:14:23] E Exception: content not available to send
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=none.storeKey\"' to rerun this test only.
|
||||
[00:14:23] E present True: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=none.present True\"' to rerun this test only.
|
||||
[00:14:23] E storeKey when already present: FAIL
|
||||
[00:14:23] E Exception: content not available to send
|
||||
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=none.storeKey when already present/' to rerun this test only.
|
||||
[00:14:23] E present True: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=none.present True\"' to rerun this test only.
|
||||
[00:14:23] E retrieveKeyFile: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=none.retrieveKeyFile\"' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E retrieveKeyFile resume from 0: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=none.retrieveKeyFile resume from 0/' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E retrieveKeyFile resume from 33%: FAIL
|
||||
[00:14:23] E Exception: .git\annex\objects\86a\533\SHA256E-s1048576--38d246a8a1798726446526c42d4603e4d4ceecc7a2030b774c4bfb89588a3591.this-is-a-test-key\SHA256E-s1048576--38d246a8a179
|
||||
8726446526c42d4603e4d4ceecc7a2030b774c4bfb89588a3591.this-is-a-test-key: openBinaryFile: does not exist (No such file or directory)
|
||||
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=none.retrieveKeyFile resume from 33%/' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E retrieveKeyFile resume from end: FAIL (0.07s)
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=none.retrieveKeyFile resume from end/' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E removeKey when present: OK
|
||||
[00:14:23] E present False: OK
|
||||
[00:14:23] E key size Just 1048576; remote chunksize=0 encryption=shared
|
||||
[00:14:23] E removeKey when not present: OK (2.61s)
|
||||
[00:14:23] E present False: OK (0.31s)
|
||||
[00:14:23] E storeKey: FAIL
|
||||
[00:14:23] E Exception: content not available to send
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=shared.storeKey\"' to rerun this test only.
|
||||
[00:14:23] E present True: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=shared.present True\"' to rerun this test only.
|
||||
[00:14:23] E storeKey when already present: FAIL
|
||||
[00:14:23] E Exception: content not available to send
|
||||
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=shared.storeKey when already present/' to rerun this test only.
|
||||
[00:14:23] E present True: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=shared.present True\"' to rerun this test only.
|
||||
[00:14:23] E retrieveKeyFile: FAIL (0.01s)
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048576; remote chunksize=0 encryption=shared.retrieveKeyFile\"' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E retrieveKeyFile resume from 0: FAIL (0.05s)
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=shared.retrieveKeyFile resume from 0/' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E retrieveKeyFile resume from 33%: FAIL
|
||||
[00:14:23] E Exception: .git\annex\objects\86a\533\SHA256E-s1048576--38d246a8a1798726446526c42d4603e4d4ceecc7a2030b774c4bfb89588a3591.this-is-a-test-key\SHA256E-s1048576--38d246a8a1798726446526c42d4603e4d4ceecc7a2030b774c4bfb89588a3591.this-is-a-test-key: openBinaryFile: does not exist (No such file or directory)
|
||||
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=shared.retrieveKeyFile resume from 33%/' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E retrieveKeyFile resume from end: FAIL (0.08s)
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '/key size Just 1048576; remote chunksize=0 encryption=shared.retrieveKeyFile resume from end/' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E removeKey when present: OK
|
||||
[00:14:23] E present False: OK
|
||||
[00:14:23] E key size Just 1048575; remote chunksize=0 encryption=none
|
||||
[00:14:23] E removeKey when not present: OK
|
||||
[00:14:23] E present False: OK
|
||||
[00:14:23] E storeKey: FAIL
|
||||
[00:14:23] E Exception: content not available to send
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=none.storeKey\"' to rerun this test only.
|
||||
[00:14:23] E present True: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=none.present True\"' to rerun this test only.
|
||||
[00:14:23] E storeKey when already present: FAIL
|
||||
[00:14:23] E Exception: content not available to send
|
||||
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=none.storeKey when already present/' to rerun this test only.
|
||||
[00:14:23] E present True: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=none.present True\"' to rerun this test only.
|
||||
[00:14:23] E retrieveKeyFile: FAIL (0.03s)
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=none.retrieveKeyFile\"' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E retrieveKeyFile resume from 0: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=none.retrieveKeyFile resume from 0/' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E retrieveKeyFile resume from 33%: FAIL
|
||||
[00:14:23] E Exception: .git\annex\objects\8c4\c61\SHA256E-s1048575--79c930fcd7d08355513f158bbf82532eb6a25d5c8688f4504ac1e240e35e7dc5.this-is-a-test-key\SHA256E-s1048575--79c930fcd7d08355513f158bbf82532eb6a25d5c8688f4504ac1e240e35e7dc5.this-is-a-test-key: openBinaryFile: does not exist (No such file or directory)
|
||||
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=none.retrieveKeyFile resume from 33%/' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E retrieveKeyFile resume from end: FAIL (0.07s)
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=none.retrieveKeyFile resume from end/' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E removeKey when present: OK
|
||||
[00:14:23] E present False: OK
|
||||
[00:14:23] E key size Just 1048575; remote chunksize=0 encryption=shared
|
||||
[00:14:23] E removeKey when not present: OK
|
||||
[00:14:23] E present False: OK
|
||||
[00:14:23] E storeKey: FAIL
|
||||
[00:14:23] E Exception: content not available to send
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=shared.storeKey\"' to rerun this test only.
|
||||
[00:14:23] E present True: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=shared.present True\"' to rerun this test only.
|
||||
[00:14:23] E storeKey when already present: FAIL
|
||||
[00:14:23] E Exception: content not available to send
|
||||
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=shared.storeKey when already present/' to rerun this test only.
|
||||
[00:14:23] E present True: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=shared.present True\"' to rerun this test only.
|
||||
[00:14:23] E retrieveKeyFile: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '$0==\"Remote Tests.key size Just 1048575; remote chunksize=0 encryption=shared.retrieveKeyFile\"' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E retrieveKeyFile resume from 0: FAIL
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=shared.retrieveKeyFile resume from 0/' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E retrieveKeyFile resume from 33%: FAIL
|
||||
[00:14:23] E Exception: .git\annex\objects\8c4\c61\SHA256E-s1048575--79c930fcd7d08355513f158bbf82532eb6a25d5c8688f4504ac1e240e35e7dc5.this-is-a-test-key\SHA256E-s1048575--79c930fcd7d08355513f158bbf82532eb6a25d5c8688f4504ac1e240e35e7dc5.this-is-a-test-key: openBinaryFile: does not exist (No such file or directory)
|
||||
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=shared.retrieveKeyFile resume from 33%/' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E retrieveKeyFile resume from end: FAIL (0.06s)
|
||||
[00:14:23] E .\Command\TestRemote.hs:292:
|
||||
[00:14:23] E failed
|
||||
[00:14:23] E Use -p '/key size Just 1048575; remote chunksize=0 encryption=shared.retrieveKeyFile resume from end/' to rerun this test only.
|
||||
[00:14:23] E fsck downloaded object: OK
|
||||
[00:14:23] E removeKey when present: OK
|
||||
[00:14:23] E present False: OK
|
||||
[00:14:23] E exporttree=yes; key size Just 1048576; key size Just 1048575
|
||||
[00:14:23] E check present export when not present: OK
|
||||
[00:14:23] E remove export when not present: OK
|
||||
[00:14:23] E store export: OK
|
||||
[00:14:23] E check present export after store: OK
|
||||
[00:14:23] E store export when already present: OK
|
||||
[00:14:23] E retrieve export: OK
|
||||
[00:14:23] E store new content to export: OK
|
||||
[00:14:23] E check present export after store of new content: OK
|
||||
[00:14:23] E retrieve export new content: OK
|
||||
[00:14:23] E remove export: OK
|
||||
[00:14:23] E check present export after remove: OK
|
||||
[00:14:23] E retrieve export fails after removal: OK
|
||||
[00:14:23] E remove export directory: OK
|
||||
[00:14:23] E remove export directory that is already removed: OK
|
||||
[00:14:23] E exporttree=yes; key size Just 1048576; key size Just 1048576
|
||||
[00:14:23] E check present export when not present: OK
|
||||
[00:14:23] E remove export when not present: OK
|
||||
[00:14:23] E store export: OK
|
||||
[00:14:23] E check present export after store: OK
|
||||
[00:14:23] E store export when already present: OK
|
||||
[00:14:23] E retrieve export: OK
|
||||
[00:14:23] E store new content to export: OK
|
||||
[00:14:23] E check present export after store of new content: OK
|
||||
[00:14:23] E retrieve export new content: OK
|
||||
[00:14:23] E remove export: OK
|
||||
[00:14:23] E check present export after remove: OK
|
||||
[00:14:23] E retrieve export fails after removal: OK
|
||||
[00:14:23] E remove export directory: OK
|
||||
[00:14:23] E remove export directory that is already removed: OK
|
||||
[00:14:23] E exporttree=yes; key size Just 1048575; key size Just 1048575
|
||||
[00:14:23] E check present export when not present: OK
|
||||
[00:14:23] E remove export when not present: OK
|
||||
[00:14:23] E store export: OK
|
||||
[00:14:23] E check present export after store: OK
|
||||
[00:14:23] E store export when already present: OK
|
||||
[00:14:23] E retrieve export: OK
|
||||
[00:14:23] E store new content to export: OK
|
||||
[00:14:23] E check present export after store of new content: OK
|
||||
[00:14:23] E retrieve export new content: OK
|
||||
[00:14:23] E remove export: OK
|
||||
[00:14:23] E check present export after remove: OK
|
||||
[00:14:23] E retrieve export fails after removal: OK
|
||||
[00:14:23] E remove export directory: OK
|
||||
[00:14:23] E remove export directory that is already removed: OK
|
||||
[00:14:23] E exporttree=yes; key size Just 1048575; key size Just 1048576
|
||||
[00:14:23] E check present export when not present: OK
|
||||
[00:14:23] E remove export when not present: OK
|
||||
[00:14:23] E store export: OK
|
||||
[00:14:23] E check present export after store: OK
|
||||
[00:14:23] E store export when already present: OK
|
||||
[00:14:23] E retrieve export: OK
|
||||
[00:14:23] E store new content to export: OK
|
||||
[00:14:23] E check present export after store of new content: OK
|
||||
[00:14:23] E retrieve export new content: OK
|
||||
[00:14:23] E remove export: OK
|
||||
[00:14:23] E check present export after remove: OK
|
||||
[00:14:23] E retrieve export fails after removal: OK
|
||||
[00:14:23] E remove export directory: OK
|
||||
[00:14:23] E remove export directory that is already removed: OK
|
||||
[00:14:23] E
|
||||
[00:14:23] E 32 out of 125 tests failed (6.39s)
|
||||
```
|
||||
[[!toggle id=\"ipsum\" text=\"hide\"]]
|
||||
\"\"\"]]
|
||||
|
||||
Right now, I cannot say whether this is pointing to a problem in my implementation or still to something in git-annex. However, the same implementation passes the test suite on linux.
|
||||
|
||||
Sidenote: I am not sure of you have access to a windows system for debugging. If this is needed or helpful, please let me know.
|
||||
|
||||
Thanks!
|
||||
"""]]
|
|
@ -1,21 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2023-03-01T16:39:39Z"
|
||||
content="""
|
||||
Seems like my renamePath fix did work, because looking back at the origianl
|
||||
failure log, it was failing to generate test keys, before it got to run the
|
||||
test cases at all.
|
||||
|
||||
The new failures seem likely to be due to getFileStatus/getSymbolicLinkStatus
|
||||
failing on the long filename on windows, as I suspected might happen in
|
||||
comment #3. I've updated the issue at
|
||||
<https://github.com/jacobstanley/unix-compat/issues/56>. And maybe that
|
||||
will get fixed, my understanding is that unix-compat has a new maintainer
|
||||
recently. But, git-annex does contain a convertToNativeNamespace function
|
||||
that it could use to work around the problem itself.
|
||||
|
||||
(I am able to run Windows in emulation, but it's sufficiently slow and disk
|
||||
hog that I generally am not in a position to do it easily and appreciate
|
||||
users who can save me the bother.)
|
||||
"""]]
|
|
@ -1,15 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 6"""
|
||||
date="2023-03-01T19:56:27Z"
|
||||
content="""
|
||||
In [[!commit 54ad1b4cfb1c8302f1b862cb2699ab9351e3eb5b]] I fully worked
|
||||
around this class of problems with unix-compat.
|
||||
|
||||
I think it's reasonably likely that every access of a file in git-annex
|
||||
on Windows now goes through UNC-style normalization, allowing long
|
||||
filenames to be used. Assuming that everything in ghc base does it, which I
|
||||
think it does.
|
||||
|
||||
So good chance this is fixed now..
|
||||
"""]]
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue