Merge branch 'master' into append

This commit is contained in:
Joey Hess 2022-07-18 16:46:01 -04:00
commit 71b4a6ba26
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
12 changed files with 478 additions and 0 deletions

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="cehteh"
avatar="http://cdn.libravatar.org/avatar/8ff07aa39d4d817cf3af3e717f41ab1a"
subject="comment 6"
date="2022-07-18T13:19:43Z"
content="""
starting `git annex webapp` fails for me with (Android 12, CalyxOS):
```
CANNOT LINK EXECUTABLE \"/system/bin/app_process\": library \"libnativeloader.so\" not found: needed by main executable
failed to start web browser
```
The webapp still runs and I can connect the browser by copypaste the link.
instead trying to open the browser natively i have a bit more success with Termux:API by
termux-open-url 'http://127.0.0.1:42005/?auth=...
This seems to be the right thing to do, unfortunally for some unknown reason it fails sometimes too, but I think thats an issue
within termux which may become fixed eventually.
"""]]

View file

@ -0,0 +1,81 @@
### Please describe the problem.
Copying many files (thousands) to a bup remote is *very* slow. <1MiB/s slow.
When evaluating multiple options for compressed deduplicated storage, I tried storing my documents repo that has 1.35 gigabytes split across 1265 local annex keys.
This is how long it took (tmpfs -> bup in tmpfs):
```
* J1
real 72m19.684s
user 63m8.171s
sys 12m10.564s
* J2
copy: 3 failed
real 37m1.465s
user 66m24.674s
sys 11m48.973s
* J4
copy: 17 failed
real 22m36.806s
user 75m21.566s
sys 12m54.847s
```
(Failures due to https://git-annex.branchable.com/bugs/bup_often_errors_out_when_-J___62___1/)
### What steps will reproduce the problem?
```
$ cd /tmp
$ wget http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip
$ unzip silesia.zip
$ cd silesia
$ git init
$ git annex init
$ git annex add
$ git commit -m "silesia corpus"
$ git annex initremote bup type=bup bupdir=/tmp/bup
$ time git annex copy --to bup -J4
```
Will be decently fast: 12s on my machine (~17MB/s).
Now do the same but untar+rm mozilla, samba and xml before adding.
On my machine, it now took **34 minutes**; 210MiB/24min ~= 0.1MiB/s.
### What version of git-annex are you using? On what operating system?
```
git-annex version: 10.20220624
build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.11 persistent-sqlite-2.13.1.0 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: darwin aarch64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 8
```
(Same happens on Linux)
### Please provide any additional information below.
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
# End of transcript or log.
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)

View file

@ -0,0 +1,42 @@
### Please describe the problem.
When copying to a bup remote with -J many copies "fail":
```
copy samba-2.2.3a/packaging/LSB/lsb-samba.spec (to bup...)
user error (bup ["split","-r","/tmp/silesia_bup","-q","-n","SHA256E-s2260--1e95bcc60c9b332608774459570b80ab9576b5e971013a30ff6da6d93ebcfd9d.spec"] exited 1)
```
They actually seem to complete in some sense though as trying to copy them again will print `ok` instantly.
### What steps will reproduce the problem?
git annex copy --to bup -J2
### What version of git-annex are you using? On what operating system?
```
git-annex version: 10.20220624
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.11 persistent-sqlite-2.13.1.0 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 8
```
### Please provide any additional information below.
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
# End of transcript or log.
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
It's amazing and I use it every day :)

View file

@ -0,0 +1,244 @@
### Please describe the problem.
I'm not sure this is a bug.
We are trying to share a git-lfs special remote on GitLab, encrypted with gcrypt, but one client is getting "gpg: decryption failed: No secret key" when getting or copying content, but sync works well.
I followed the [private encrypted git remote on a git-lfs hosting site](https://git-annex.branchable.com/tips/fully_encrypted_git_repositories_with_gcrypt/#index4h2) guide.
I can sync/add/get git-annex content from the git-lfs special remote (named origin), I can also clone the repository on another machine and get/copy content to origin with no problems.
My collegue (listed in gcryt-participants) is able to clone the repo and sync the remote but he cannot get/add content to it (see below).
### What steps will reproduce the problem?
Fist I created an new empty project on GitLab
On my machine I created an empty swws-library directory then:
[[!format sh """
git init
# initialize git annex and set the encrypted git-lfs remote
git annex init
git annex initremote origin type=git-lfs url=gcrypt::git@gitlab.com:softwareworkers/swws-library.git keyid=FCE2EDE78BD9B2CB keyid=D37D0EA7CECC3912
git config remote.origin.gcrypt-participants "D37D0EA7CECC3912 FCE2EDE78BD9B2CB"
git annex sync
# I had to unprotect the master branch in the GitLab project repository settings and again:
git annex sync
# Add a couple of media files
git annex sync --content
# All went fine
"""]]
Then, my collegue (the owner of the FCE2EDE78BD9B2CB gpg key) cloned the repository and:
[[!format sh """
git annex enableremote origin
git config remote.origin.gcrypt-participants "D37D0EA7CECC3912 FCE2EDE78BD9B2CB"
git push origin master git-annex
git annex sync
# git annex add <some media>
git annex sync
# all is going fine
"""]]
On my machine I can "git annex sync origin" and get the updated repository, but my collegue cannot get or copy from/to that remote:
[[!format sh """
a9i@trix:~/Repos/SW/swws-library$ git annex get
smarketing/220517_workshop_testuale_part1.mp3
get smarketing/220517_workshop_testuale_part1.mp3 (from origin...)
gpg: decryption failed: No secret key
user error (gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
exited 2)
Unable to access these remotes: origin
Maybe add some of these git remotes (git remote add ...):
5e070ec6-adbe-4a60-be06-af60b777d03f --
g@renaissance:~/{git}/softwareworkers.it/swws-library
failed
get: 1 failed
"""]]
This is the debug output when copying annexed files to the origin remote:
[[!format sh """
a9i@trix:~/Repos/SW/swws-library$ LC_ALL=C git annex copy --debug --to=origin corsi/
[2022-07-18 15:47:28.478299533] (Utility.Process) process [269581] read:
git
["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
[2022-07-18 15:47:28.480014105] (Utility.Process) process [269581] done
ExitSuccess
[2022-07-18 15:47:28.480418294] (Utility.Process) process [269582] read:
git
["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
[2022-07-18 15:47:28.482213732] (Utility.Process) process [269582] done
ExitSuccess
[2022-07-18 15:47:28.491746214] (Utility.Process) process [269583] read:
git
["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..19a97025cc5689891c920766f8fa3ed85fe16229","--pretty=%H","-n1"]
[2022-07-18 15:47:28.494757234] (Utility.Process) process [269583] done
ExitSuccess
[2022-07-18 15:47:28.496856609] (Utility.Process) process [269584] chat:
git
["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
[2022-07-18 15:47:28.507245457] (Utility.Process) process [269585] read:
git
["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","ls-files","--stage","-z","--error-unmatch","--","corsi/"]
[2022-07-18 15:47:28.509053804] (Utility.Process) process [269586] chat:
git
["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname)
%(objecttype) %(objectsize)","--buffer"]
[2022-07-18 15:47:28.510594336] (Utility.Process) process [269587] chat:
git
["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname)
%(objecttype) %(objectsize)","--buffer"]
[2022-07-18 15:47:28.511891608] (Utility.Process) process [269584] done
ExitSuccess
[2022-07-18 15:47:28.513074507] (Utility.Process) process [269588] chat:
git
["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname)
%(objecttype) %(objectsize)","--buffer"]
copy corsi/2022-06-09_Landing-Pages/2022-06-09_Landing-Pages.org
[2022-07-18 15:47:28.524203277] (Utility.Process) process [269590] chat:
gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
gpg: decryption failed: No secret key
[2022-07-18 15:47:28.532883171] (Utility.Process) process [269590] done
ExitFailure 2
(user error (gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
exited 2)) failed
copy corsi/2022-06-09_Landing-Pages/2022-06-09_Landing-Pages.org~
[2022-07-18 15:47:28.533500984] (Utility.Process) process [269592] chat:
gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
gpg: decryption failed: No secret key
[2022-07-18 15:47:28.540138973] (Utility.Process) process [269592] done
ExitFailure 2
(user error (gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
exited 2)) failed
copy corsi/2022-06-09_Landing-Pages/202206 Formaper - Landing page.pdf
[2022-07-18 15:47:28.540791533] (Utility.Process) process [269594] chat:
gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
gpg: decryption failed: No secret key
[2022-07-18 15:47:28.551983197] (Utility.Process) process [269594] done
ExitFailure 2
(user error (gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
exited 2)) failed
copy corsi/2022-06-09_Landing-Pages/3700213292852724482.mp4 [2022-07-18
15:47:28.554726904] (Utility.Process) process [269596] chat: gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
gpg: decryption failed: No secret key
[2022-07-18 15:47:28.567432702] (Utility.Process) process [269596] done
ExitFailure 2
(user error (gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
exited 2)) failed
copy corsi/2022-06-09_Landing-Pages/6258438201216682752.mp4 [2022-07-18
15:47:28.571835037] (Utility.Process) process [269598] chat: gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
gpg: decryption failed: No secret key
[2022-07-18 15:47:28.579064056] (Utility.Process) process [269598] done
ExitFailure 2
(user error (gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
exited 2)) failed
copy corsi/2022-06-09_Landing-Pages/8959772244561726210.mp4 [2022-07-18
15:47:28.581423831] (Utility.Process) process [269600] chat: gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
gpg: decryption failed: No secret key
[2022-07-18 15:47:28.602507936] (Utility.Process) process [269600] done
ExitFailure 2
(user error (gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
exited 2)) failed
copy corsi/2022-06-09_Landing-Pages/Slide - Landing page efficaci.pdf
[2022-07-18 15:47:28.6080323] (Utility.Process) process [269602] chat:
gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
gpg: decryption failed: No secret key
[2022-07-18 15:47:28.616162922] (Utility.Process) process [269602] done
ExitFailure 2
(user error (gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
exited 2)) failed
copy corsi/2022-06-09_Landing-Pages/Story_mapping.png [2022-07-18
15:47:28.617007178] (Utility.Process) process [269604] chat: gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
gpg: decryption failed: No secret key
[2022-07-18 15:47:28.624214363] (Utility.Process) process [269604] done
ExitFailure 2
(user error (gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
exited 2)) failed
copy corsi/2022-06-09_Landing-Pages/canvas.png [2022-07-18
15:47:28.624826009] (Utility.Process) process [269606] chat: gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
gpg: decryption failed: No secret key
[2022-07-18 15:47:28.634726242] (Utility.Process) process [269606] done
ExitFailure 2
(user error (gpg
["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
exited 2)) failed
[2022-07-18 15:47:28.634918681] (Utility.Process) process [269588] done
ExitSuccess
[2022-07-18 15:47:28.634986759] (Utility.Process) process [269587] done
ExitSuccess
[2022-07-18 15:47:28.635050868] (Utility.Process) process [269586] done
ExitSuccess
[2022-07-18 15:47:28.63509396] (Utility.Process) process [269585] done
ExitSuccess
copy: 9 failed
"""]]
### What version of git-annex are you using? On what operating system?
git-annex version: 10.20220624 on Guix on Debian as foreign distro.
### Please provide any additional information below.
This is my collegue (the failing part) git repository configuration:
[[!format sh """
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = gcrypt::git@gitlab.com:softwareworkers/swws-library.git
gcrypt-id = :id:Fb42A4B5ODZW/35Z5ZoI
fetch = +refs/heads/*:refs/remotes/origin/*
annex-git-lfs = true
annex-uuid = 9b43bd79-3d24-4e6f-b196-46f5bc67f214
gcrypt-participants = D37D0EA7CECC3912 FCE2EDE78BD9B2CB
gcrypt-publish-participants = true
annex-ignore = false
[branch "master"]
remote = origin
merge = refs/heads/master
[annex]
uuid = 6a09ad78-19a1-451a-8ab5-fb51d18966eb
version = 8
[filter "annex"]
smudge = git-annex smudge -- %f
clean = git-annex smudge --clean -- %f
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Oh yes, I'm using git-annex since many many years ago and it's my preferred storage solution, I really love it!
**Kudos** Joey!

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="oliv5"
avatar="http://cdn.libravatar.org/avatar/d7f0d33c51583bbd8578e4f1f9f8cf4b"
subject="comment 2"
date="2022-07-17T20:41:51Z"
content="""
I'm using the latest git-annex version available for arm64 on android already. I checked both the stable build and the autobuild, both point to the same archive with git-annex 10.20220121-g0bcb94487
I upgraded git to the latest revision available in termux base packages, namely git 2.37.1. It did not change anything.
"""]]

View file

@ -0,0 +1 @@
Is there a way to have git-annex list missing files / broken links within a repository? This seems like it would be a common thing to want to do, and I feel like I've done it before, but I can find nothing about this anywhere. It may just be that I'm not searching using the right terminology.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Lukey"
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
subject="comment 1"
date="2022-07-17T09:33:04Z"
content="""
`git annex find --not --in=here`
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="tomdhunt"
avatar="http://cdn.libravatar.org/avatar/02694633d0fb05bb89f025cf779218a3"
subject="Hashes for files added via addurl"
date="2022-07-16T20:12:20Z"
content="""
If you add a file to your repo first via `addurl --fast`, it writes the filename as a symlink to a file that incorporates the URL, rather than the file hash. This is expected, since git-annex can't know the file hash until it's actually downloaded the file.
If you then `git annex get` that file, it downloads the file to the path that uses the URL. Is the hash ever recorded for these files? If you were to drop and re-download the file, would git-annex accept a different file?
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="checksums and addurl --fast"
date="2022-07-18T16:01:41Z"
content="""
>Is the hash ever recorded for these files? If you were to drop and re-download the file, would git-annex accept a different file?
Hash is not recorded, but file size is. You can disable the size check with `--relaxed`. See [[tips/using_the_web_as_a_special_remote]].
After [[`git-annex-get`|git-annex-get]]ting the file, you can use [[`git-annex-migrate`|git-annex-migrate]] to record it under a new checksum-based hash, then use [[`git-annex-unused`|git-annex-unused]] to find and remove the old key.
Sometimes you can get the hash without downloading the file, e.g. if the hash is stored next to the file at `http://my/file.md5`, or if the file is stored in the Google Cloud. Then you can use the plumbing commands [[`git-annex-registerurl`|git-annex-registerurl]] to associate the checksum-based key with the URL, and [[`git-annex-setpresentkey`|git-annex-setpresentkey]] to record the key's presence in the (web) remote.
Related discussion: [[todo/alternate_keys_for_same_content]]
"""]]

View file

@ -0,0 +1 @@
Many websites return an Etag in the http response header, indicating the version of the resource. Could the etag (or a checksum of it) be recorded in the URL- key, the way size is now? Then e.g. `fsck --from web` could do a stronger check that the same file is still downloadable from the web, and the situation where different remotes have different versions of a file with the same URL- key could be better prevented.

View file

@ -0,0 +1,33 @@
[[!comment format=mdwn
username="joey"
subject="""comment 13"""
date="2022-07-18T18:01:09Z"
content="""
The `append` branch has basic appending implemented, but it's not yet
done atomically.
For benchmarking, I'm using this command.
perl -e 'for (1..'$ITER') { print "WORM--foo http://example.com/$_\n" }' | /usr/bin/time git-annex registerurl --batch
ITER=2000
Old: 52s
Appending: 28s
Appending without reading old value: 2s
ITER=4000
Old: 190s
Appending: 111s
Appending without reading old value: 5s
So an improvement of 50%. But remains nonlinear even when appending,
because it needs to read the existing log file each time to determine if it
can append, or if it needs to compact it. (Disk cache didn't work as well
as I had hoped.)
What this suggests to me is that it would be good to also add a mode that
blindly appends without compacting. Or, possibly, to blindly append,
but then compact the journalled file before committing it to the git-annex
branch.
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="joey"
subject="""re: comment 12"""
date="2022-07-18T18:40:20Z"
content="""
@yarikoptic, the new git-annex would resolve the insonsistency the next
time it ran. Only when annex.alwayscommit=false would there be any time
window where the old git-annex missed something written by git-annex
process that ran before the one that got interrupted. This does not seem
like a large problem.
"""]]