Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2022-08-09 12:27:38 -04:00
commit 1d079c33df
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 103 additions and 0 deletions

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="Atemu"
avatar="http://cdn.libravatar.org/avatar/d1f0f4275931c552403f4c6707bead7a"
subject="comment 2"
date="2022-08-09T06:14:05Z"
content="""
Agreed.
I see two potential ways to improve performance:
* Batching:
Bup can split multiple files at once. If it's given 10, 20, 100 files at a time, the per-split overhead matters less. Batching is something git-annex might need to learn sooner or later anyways because file transfer generally doesn't scale well currently (bup's slowness just exacerbates the problem).
* Bup index+save:
Use the same pattern as Borg. Back up the whole git-annex repo at a time and selectively restore in order to `get`. Not sure this would be a great idea but it should improve performance in my use-case (copy everything).
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="rinomizu5@5ead4c82685c65d7717dbd5591b80425036ae9e3"
nickname="rinomizu5"
avatar="http://cdn.libravatar.org/avatar/62478823018c68821064febcda7e5d4f"
subject="comment 5"
date="2022-08-09T00:33:55Z"
content="""
Thank you joey for the additional help.
I see that I need to use the SSH protocol to upload the file contents to GIN with git annex.
And I also understand why it showed (not found).
I appreciate your advice.
RIno Mizuguchi
"""]]

View file

@ -0,0 +1,61 @@
### Please describe the problem.
Our datalad tests started to fail on 04 Aug 2022 build of git-annex. `git bisect` brought me to the [10.20220724-56-g3a513cfe7](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=3a513cfe73ed873aeeabbc17d2c458b62dd4198c) change which added `--dry-run` to `annex add`. The "regression" manifests in that we end up with a file not added/committed. Unfortunately I do not have yet more details or git-annex minimal reproducer.
Meanwhile - the full log from running `DATALAD_LOG_OUTPUTS=1 DATALAD_LOG_LEVEL=DEBUG python -m pytest -s -v datalad/local/tests/test_add_archive_content.py::test_add_archive_content `
[http://www.onerussian.com/tmp/test_add_archive_content-fail.log](http://www.onerussian.com/tmp/test_add_archive_content-fail.log)
and it has
```shell
$> grep '2/1_f.txt\>' /tmp/test_add_archive_content-fail.log
[DEBUG] Adding /home/yoh/.tmp/datalad_temp_test_add_archive_contentxyf002ii/2/1_f.txt to annex pointing to dl+archive:MD5E-s151--eb922c8b7151d0c53f56e03c10bb0e70.tar.gz#path=1/1+f.txt&size=8 and with options ['-c', 'annex.largefiles=exclude=*.txt']
[DEBUG] File /home/yoh/.tmp/datalad_temp_test_add_archive_contentxyf002ii/2/1_f.txt was added to git, not adding url
first = '2/1_f.txt'
E AssertionError: assert '2/1_f.txt' in ['.datalad/.gitattributes', '.datalad/config', '.gitattributes', '1/1 f-1.1.txt', '1/1 f-1.2.txt', '1/1 f-1.txt', ...]
DEBUG datalad.local.add_archive_content:add_archive_content.py:609 Adding /home/yoh/.tmp/datalad_temp_test_add_archive_contentxyf002ii/2/1_f.txt to annex pointing to dl+archive:MD5E-s151--eb922c8b7151d0c53f56e03c10bb0e70.tar.gz#path=1/1+f.txt&size=8 and with options ['-c', 'annex.largefiles=exclude=*.txt']
DEBUG datalad.local.add_archive_content:add_archive_content.py:627 File /home/yoh/.tmp/datalad_temp_test_add_archive_contentxyf002ii/2/1_f.txt was added to git, not adding url
```
so git-annex seems reported that it was added to `git` but if we stop in (another run) at that point and look at the repo we see that it was not added:
```
$> DATALAD_TESTS_TEMP_KEEP=1 DATALAD_LOG_OUTPUTS=1 DATALAD_LOG_LEVEL=DEBUG python -m pytest -s -v --pdb datalad/local/tests/test_add_archive_content.py::test_add_archive_content
...
DEBUG datalad.runner.runner:runner.py:171 Run ['git', '-c', 'diff.ignoreSubmodules=none', 'annex', 'addurl', '-c', 'annex.largefiles=exclude=*.txt', '--with-files', '--json', '--json-error-messages', '--batch'] (protocol_class=BatchedCommandProtocol) (cwd=/home/yoh/.tmp/datalad_temp_test_add_archive_content0r8qsr6l)
DEBUG datalad.local.add_archive_content:add_archive_content.py:627 File /home/yoh/.tmp/datalad_temp_test_add_archive_content0r8qsr6l/2/1_f.txt was added to git, not adding url
INFO datalad.local.add_archive_content:log.py:431 Files to extract 0
DEBUG datalad.local.add_archive_content:add_archive_content.py:506 Skipping 1/d/1d since contains d pattern
DEBUG datalad.local.add_archive_content:add_archive_content.py:641 Removing the original archive 1.tar.gz
...
```
and if we go to that folder -- we see that `2/` was not added to git:
```
(git-annex)lena:~/.tmp/datalad_temp_test_add_archive_content0r8qsr6l[dl-test-branch]
$> git status
On branch dl-test-branch
Untracked files:
(use "git add <file>..." to include in what will be committed)
2/
nothing added to commit but untracked files present (use "git add" to track)
$> ls 2/
1_f.txt
$> ls .git/annex/journal/
# came out empty, FWIW
```
Versions: annexremote=1.6.0 boto=2.49.0 cmd:7z=16.02 cmd:annex=10.20220724+git77-ga24ae0814-1~ndall+1 cmd:bundled-git=2.30.2 cmd:git=2.30.2 cmd:ssh=9.0p1 cmd:system-git=2.35.1 cmd:system-ssh=9.0p1 datalad=0.17.2+75.g3bc853bb2 exifread=3.0.0 humanize=4.2.3 iso8601=1.0.2 keyring=23.6.0 keyrings.alt=UNKNOWN msgpack=1.0.4 mutagen=1.45.1 platformdirs=2.5.2 requests=2.28.1 tqdm=4.64.0
I will try to dig deeper some time later, unless you Joey immediately see what could be a culprit or recommend something specific to try
[[!meta author=yoh]]
[[!tag projects/datalad]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="Atemu"
avatar="http://cdn.libravatar.org/avatar/d1f0f4275931c552403f4c6707bead7a"
subject="comment 5"
date="2022-08-09T06:22:50Z"
content="""
Thank you!
That's sad to hear as it means bup will perform even worse where it wasn't performing well to begin with but unsafe is unsafe.
I'd consider this issue solved as bup performance problems are tracked in https://git-annex.branchable.com/bugs/Copying_many_files_to_bup_remotes_is_very_slow/ already.
"""]]