Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2020-02-27 11:54:37 -04:00
commit eb884085b5
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 221 additions and 0 deletions

View file

@ -0,0 +1,194 @@
### Please describe the problem.
Originally [reported and troubleshooted against datalad](https://github.com/datalad/datalad/issues/3929). With older git-annex, behavior was even worse -- the `.nfs*` file was left behind, forbidding datalad remove that repository directory entirely. With later git-annex (eg. current `7.20200219+git135-g34d726cba-1~ndall+1`) situation improved: No `.nfs*` file, but the KEY directory itself is still there:
<details>
<summary>Full protocol with details on NFS etc (click to expand)</summary>
[[!format sh """
/tmp > mkdir -p /tmp/nfsmount{,_}
/tmp > cat /etc/exports
/tmp/nfsmount_ localhost(rw)
/tmp > sudo exportfs -a
exportfs: /etc/exports [1]: Neither 'subtree_check' or 'no_subtree_check' specified for export "localhost:/tmp/nfsmount_".
Assuming default behaviour ('no_subtree_check').
NOTE: this default has changed since nfs-utils version 1.0.x
/tmp > sudo mount -t nfs localhost:/tmp/nfsmount_ /tmp/nfsmount
/tmp > mount | grep /tmp/nfsmount
localhost:/tmp/nfsmount_ on /tmp/nfsmount type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp6,timeo=600,retrans=2,sec=sys,clientaddr=::1,local_lock=none,addr=::1)
/tmp > cat /home/yoh/proj/datalad/trash/nfsdrop
#!/bin/bash
set -xeu
git annex version 2>&1 | head -n 1
if [ -e nfsdrop ]; then
chmod +w -R nfsdrop
rm -rf nfsdrop
fi
mkdir nfsdrop
cd nfsdrop
git init
# git config --add annex.pidlock true
git annex init
echo 123 > file1
git annex add file1
git commit -m dummy
tree -a .git/annex/objects
git annex drop --force file1
tree -a .git/annex/objects
/tmp > git annex version
git-annex version: 7.20200219+git135-g34d726cba-1~ndall+1
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.20 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.0.0 ghc-8.4.4 http-client-0.5.13.1 persistent-sqlite-2.8.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
/tmp > cd nfsmount
/tmp/nfsmount > /home/yoh/proj/datalad/trash/nfsdrop
+git annex version
+head -n 1
git-annex version: 7.20200219+git135-g34d726cba-1~ndall+1
+'[' -e nfsdrop ']'
+mkdir nfsdrop
+cd nfsdrop
+git init
Initialized empty Git repository in /tmp/nfsmount/nfsdrop/.git/
+git annex init
init (scanning for unlocked files...)
ok
(recording state in git...)
+echo 123
+git annex add file1
add file1
ok
(recording state in git...)
+git commit -m dummy
[master (root-commit) f9539b4] dummy
1 file changed, 1 insertion(+)
create mode 120000 file1
+tree -a .git/annex/objects
.git/annex/objects
└── G6
└── qW
└── SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b
└── SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b
3 directories, 1 file
+git annex drop --force file1
drop file1 ok
(recording state in git...)
+tree -a .git/annex/objects
.git/annex/objects
└── G6
└── qW
└── SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b
3 directories, 0 files
/tmp/nfsmount > cat nfsdrop/.git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[annex]
uuid = 7c5d4d5f-c2da-4616-8a91-25f9d932d7aa
version = 8
[filter "annex"]
smudge = git-annex smudge -- %f
clean = git-annex smudge --clean -- %f
"""]]
</details>
What was noticed is that git-annex did not decide to have pidlock enabled.
<details>
<summary> If we enable it via config -- no KEY directory is left behind:
</summary>
[[!format sh """
/tmp/nfsmount > cat /home/yoh/proj/datalad/trash/nfsdrop
#!/bin/bash
set -xeu
git annex version 2>&1 | head -n 1
if [ -e nfsdrop ]; then
chmod +w -R nfsdrop
rm -rf nfsdrop
fi
mkdir nfsdrop
cd nfsdrop
git init
git config --add annex.pidlock true
git annex init
echo 123 > file1
git annex add file1
git commit -m dummy
tree -a .git/annex/objects
git annex drop --force file1
tree -a .git/annex/objects
/tmp/nfsmount > /home/yoh/proj/datalad/trash/nfsdrop
+git annex version
+head -n 1
git-annex version: 7.20200219+git135-g34d726cba-1~ndall+1
+'[' -e nfsdrop ']'
+chmod +w -R nfsdrop
+rm -rf nfsdrop
+mkdir nfsdrop
+cd nfsdrop
+git init
Initialized empty Git repository in /tmp/nfsmount/nfsdrop/.git/
+git config --add annex.pidlock true
+git annex init
init (scanning for unlocked files...)
ok
(recording state in git...)
+echo 123
+git annex add file1
add file1
ok
(recording state in git...)
+git commit -m dummy
[master (root-commit) ff91d78] dummy
1 file changed, 1 insertion(+)
create mode 120000 file1
+tree -a .git/annex/objects
.git/annex/objects
└── G6
└── qW
└── SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b
└── SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b
3 directories, 1 file
+git annex drop --force file1
drop file1 ok
(recording state in git...)
+tree -a .git/annex/objects
.git/annex/objects
0 directories, 0 files
"""]]
</details>
[[!meta author=yoh]]
[[!tag projects/datalad]]

View file

@ -76,3 +76,5 @@ remote annex size: 103.28 gigabytes (+ 632 unknown size)
[[!meta author=yoh]]
[[!tag projects/datalad]]
[[fixed|done]] as the one no longer observed --[[yarikoptic]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2020-02-27T15:52:10Z"
content="""
Not sure what you meant by \"need to know how to reproduce this to get any further\" since a full example was provided, and to reproduce \"this\" problem, commands could have been cut/paste'ed.
Now, with all the changes to git-annex, and that dataset (may be next time I would do some kind of fast-export on them or host a dedicated clone), it would indeed be hard if not impossible to reproduce, and I do not observe it with git annex `7.20200219+git135-g34d726cba-1~ndall+1` so I will mark it as done
"""]]

View file

@ -0,0 +1,5 @@
Having very big files in chunked special remotes makes operations like fsck, move or drop extremely slow. Eg. a movie of 20GiB in a special remote with the [recommended chunk size](https://github.com/DanielDent/git-annex-remote-rclone/issues/1) of 50MiB results in 400 lookups each time the presence of the key is checked. One could increase the chunk size, but that has other disadvantages: a) Chunks are apparently still [buffered in memory](https://git-annex.branchable.com/todo/upload_large_chunks_without_buffering_in_memory/) b) one would lose more transfer progress if the special remote does not implement resumable transfers and c) it would render the use of `git annex inprogress` for streaming purposes useless. Letting the remote know about all the keys that are about to be checked would allow it to optimise the request to the remote server.
So I propose a protocol extension `CHECKPRESENT-MULTI Key1 Key2...` to let the remote do the optimised lookup after which it can reply with a list of present keys `PRESENT Key1 Key2...`. This list can be empty if none of the keys are present: `PRESENT`.
--Lykos

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="lykos@d125a37d89b1cfac20829f12911656c40cb70018"
nickname="lykos"
avatar="http://cdn.libravatar.org/avatar/085df7b04d3408ba23c19f9c49be9ea2"
subject="comment 2"
date="2020-02-27T13:12:28Z"
content="""
git-annex-remote-googledrive compares partial checksums after every transmitted chunk and before resuming a transfer, so it would not be affected by both problems you describe. However, you're probably right that some other remote might not behave that way.
My current workaround is to move the file to a tmp directory specific to the remote (and UUID) and when uploading, prefer files inside this directory. Downsides are that it can use slightly more disc space and that git-annex-remote-googledrive has to handle cleanup itself. But I think it's good enough. And the problem probably does not justify changes as big as apparently needed. Thanks for your thoughts on this!
"""]]