Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2023-03-27 13:38:13 -04:00
commit 5fdc074fa7
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 170 additions and 0 deletions

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="gioele@678b7c03f524f2669b179b603f65352fcc16774e"
nickname="gioele"
avatar="http://cdn.libravatar.org/avatar/366dbda84e78aff8a8a070622aeb63ce"
subject="comment 5"
date="2023-03-25T08:47:01Z"
content="""
> I wonder if there are other situations where modifications can prevent checkout of the updated adjusted branch?
I clearly remember seeing something similar when a `sync` failed due a non-recoverable network failure. Unfortunately I have no logs from that day.
> It seems wise for git-annex to defend against it in depth, by making sure no crash can leave it in detached head state.
Perhaps also extend `fsck` to automatically recover from such a state? Or at least suggest a couple of possible solutions to the user?
"""]]

View file

@ -0,0 +1,86 @@
### Please describe the problem.
When getting a lot of objects from an external special remote of the exporttree kind,
sometimes the `EXPORT` command is sent to a different process of the external remote
than the following `TRANSFEREXPORT` command.
As a result, one process receives two `EXPORT`s in a row, the first of which it ignores,
while some other process receives a `TRANSFEREXPORT` without a prior `EXPORT`,
reusing whatever filename was set in the previous transaction,
and either ovewriting the last accessed remote file with the wrong content,
or retrieving the content of the last accessed remote file instead of the one git-annex wanted.
Or I misunderstood the protocol - please tell me if that's the case.
### What steps will reproduce the problem?
I don't have a minimal reproducing example.
I was working on adding tree export support to git-annex-remote-rclone,
and its the testsuite on my branch can reproduce the problem around half of the time, but only on MacOS.
It seems like it's a race condition and the interleaving that triggers it doesn't happen on Linux
Here's an example test run when it fails
[on github](https://github.com/Wolf480pl/git-annex-remote-rclone/actions/runs/4525558194/jobs/7970140554)
and the test code - [on github](https://github.com/Wolf480pl/git-annex-remote-rclone/blob/bd1a497fca8614286ec290bb557c83442c0e23c9/tests/all-in-one.sh#L224)
(I'd post it as an attachment but I'm getting an error that only admin can upload attachments).
### What version of git-annex are you using? On what operating system?
git-annex version: 10.20230227
OS: macOS 12.6.3 21G419
### Please provide any additional information below.
Full log is quite long so I put it in a [github gist](https://gist.github.com/Wolf480pl/4bdc83e23154827aad46e84bad631419).
I wanted to upload it as an attachment but apparently I'm not allowed.
Here are the interesting parts:
First, during the tree export, we see an example of correct upload of the `test 9.dat` and `test 10.dat` files:
[[!format txt """
2023-03-26T16:34:52.2551390Z [2023-03-26 16:34:52.085377] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[3] <-- EXPORT test 9.dat
2023-03-26T16:34:52.2653060Z [2023-03-26 16:34:52.086085] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[3] <-- TRANSFEREXPORT STORE SHA256E-s7--a30e9d26ce633b40da3ba8cd8806b9b349cb6c3de6816c6d64f85a571012a973.dat .git/annex/objects/zZ/kw/SHA256E-s7--a30e9d26ce633b40da3ba8cd8806b9b349cb6c3de6816c6d64f85a571012a973.dat/SHA256E-s7--a30e9d26ce633b40da3ba8cd8806b9b349cb6c3de6816c6d64f85a571012a973.dat
[...]
2023-03-26T16:34:49.9382650Z [2023-03-26 16:34:49.629684] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[3] <-- EXPORT test 10.dat
2023-03-26T16:34:49.9384770Z [2023-03-26 16:34:49.629761] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[3] <-- TRANSFEREXPORT STORE SHA256E-s8--d5624694cf1515bdf8c3a648ae35d64f4bdff934800a22a070c6d0baddc120b0.dat .git/annex/objects/6k/X3/SHA256E-s8--d5624694cf1515bdf8c3a648ae35d64f4bdff934800a22a070c6d0baddc120b0.dat/SHA256E-s8--d5624694cf1515bdf8c3a648ae35d64f4bdff934800a22a070c6d0baddc120b0.dat
"""]]
so we know that key of `test 9.dat` is `a30e...973.dat`, and for `test 10.dat` it's `d56...0b0.dat`
Then, during the get, we see the bug happen:
[[!format txt """
2023-03-26T16:34:55.4392510Z [2023-03-26 16:34:54.534833] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[5] --> TRANSFER-SUCCESS RETRIEVE SHA256E-s8--fbb0a327e1528cdb3214abb2ec3fb1dd97cc39dfc12751df102eca019f602e73.dat
2023-03-26T16:34:55.4394480Z [2023-03-26 16:34:54.53532] (Annex.Perms) freezing content .git/annex/objects/45/Gp/SHA256E-s8--91d9183c1a8f61526ed68c5357d52a719481ccbba5039f815cf8f71ae4edbf24.dat/SHA256E-s8--91d9183c1a8f61526ed68c5357d52a719481ccbba5039f815cf8f71ae4edbf24.dat
2023-03-26T16:34:55.4395900Z [2023-03-26 16:34:54.545524] (Annex.Branch) read b51/a3b/SHA256E-s7--a30e9d26ce633b40da3ba8cd8806b9b349cb6c3de6816c6d64f85a571012a973.dat.log
2023-03-26T16:34:55.4397280Z [2023-03-26 16:34:54.545836] (Annex.Perms) freezing content directory .git/annex/objects/45/Gp/SHA256E-s8--91d9183c1a8f61526ed68c5357d52a719481ccbba5039f815cf8f71ae4edbf24.dat
2023-03-26T16:34:55.4398630Z [2023-03-26 16:34:54.54609] (Annex.Branch) read 253/975/SHA256E-s8--91d9183c1a8f61526ed68c5357d52a719481ccbba5039f815cf8f71ae4edbf24.dat.log
2023-03-26T16:34:55.4399950Z [2023-03-26 16:34:54.5477] (Annex.Branch) set 253/975/SHA256E-s8--91d9183c1a8f61526ed68c5357d52a719481ccbba5039f815cf8f71ae4edbf24.dat.log
2023-03-26T16:34:55.4401900Z [2023-03-26 16:34:54.558828] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[2] --> TRANSFER-SUCCESS RETRIEVE SHA256E-s7--b163189df1a08d63ba85e1566799cf07ca2de3865ac97a85b2b2d0cbfbd9a2f3.dat
2023-03-26T16:34:55.4403360Z [2023-03-26 16:34:54.559391] (Annex.Branch) read c9a/21e/SHA256E-s8--128888bb8975fc51a7fd410322b088593e158b37a24973483da2ad17fb6d7ff5.dat.log
2023-03-26T16:34:55.4405510Z [2023-03-26 16:34:54.561866] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[2] <-- EXPORT test 9.dat
2023-03-26T16:34:55.4407810Z [2023-03-26 16:34:54.561942] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[2] <-- EXPORT test 10.dat
2023-03-26T16:34:55.4409480Z [2023-03-26 16:34:54.561946] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[5] <-- TRANSFEREXPORT RETRIEVE SHA256E-s7--a30e9d26ce633b40da3ba8cd8806b9b349cb6c3de6816c6d64f85a571012a973.dat .git/annex/tmp/SHA256E-s7--a30e9d26ce633b40da3ba8cd8806b9b349cb6c3de6816c6d64f85a571012a973.dat
2023-03-26T16:34:55.4411390Z [2023-03-26 16:34:54.562032] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[2] <-- TRANSFEREXPORT RETRIEVE SHA256E-s8--d5624694cf1515bdf8c3a648ae35d64f4bdff934800a22a070c6d0baddc120b0.dat .git/annex/tmp/SHA256E-s8--d5624694cf1515bdf8c3a648ae35d64f4bdff934800a22a070c6d0baddc120b0.dat
"""]]
As you can see in the last 4 lines:
* process 2 got `EXPORT test 9.dat` and then immediately `EXPORT test 10.dat`
* then process 5 got the `TRANSFEREXPORT` command for the `test 9.dat` file (`a30..973.dat`)
* process 5 did not, prior to this transfer, receive any `EXPORT` command since its last transfer of an unrelated file (see `TRANSFER-SUCCESS` in the first quoted line).
* so it was still holding the old filename from that unrelated transfer
* then, process 2 receives a (correct) `TRANSFEREXPORT` for the `test 10.dat` file (`d56...0b0.dat`).
It appears that the `EXPORT test 9.dat` should've been sent to process 5 instead of process 2.
But since it wasn't, process 5 retrieved the wrong file into what git-annex expected to be `test 9.dat`.
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
I really wanted to use git annex for my media storage, it seems like the project I've been looking for for years.
But I first needed to be able to exporttree over FTP.
(I have a cheap NAS is terribly slow at encryption (SSH, etc) so I'm using FTP to get reasonable speeds with it.)
So I thought I'd implement it and then hit this bug.

View file

@ -0,0 +1,42 @@
I'm using version `10.20230321-gb624394c7` on Arch Linux and BTRFS.
I'd love to have the directory special remote using `cp --reflink=auto` to export to a directory on the same file system.
My use case is using syncthing as a way to export (parts of my repo) to my phone.
I'm deliberately not using `adb` as this needs my phone to be connected to the computer somehow.
I basically want to do what's described on [syncthing special remote](/todo/syncthing_special_remote/#index2h2) under “Copying objects”.
The main drawback, as also noted in the linked document, is that all files are duplicated.
As I'm using BTRFS it would really help if the `export` were to use `cp --reflink=auto` instead of its own copying mechanism.
I've read [here](/forum/Use_reflinks_on_BTRFS_instead_of_symlinks___63__/) that you suggested (albeit 7 years ago) using a shared clone instead.
However, this does not work for me as syncthing does not synchronize symlinks to Android (they are ignored even if they point to something).
(Trying with a shared repo and `adjust --unlock-present` gives merge problems when dropping files from the directory.)
Nearly two years ago, you wrote [here](/todo/import_from_directory_does_not_use_cp_--reflink__63___/) that you “Implemented CoW for directory special remote, comprehensively”.
It seems to me that this is only true for importing files which, as I checked, actually uses cow.
Since I want to manage relatively many files of decent size (music, audio books, pictures, videos / movies) it would be much better if the tree export would also use reflink.
Maybe having this as an option, for the cost of not having a nice progress bar, would be something worth considering.
A test script to show that an export does indeed use cow is the following:
```bash
mkdir export import
git init repo
cd repo
git annex init
git annex initremote import type=directory directory=../import encryption=none importtree=yes
git annex initremote export type=directory directory=../export encryption=none exporttree=yes
git config remote.import.annex-tracking-branch main
git config remote.export.annex-tracking-branch main
for i in {1..100}; do
dd if=/dev/urandom bs=1M count=10 of=../import/file$i
done
git annex sync --content
```
After running, a `btrfs filesystem du ...` tells me that the `repo` and `import` files point to the same file, whereas `export` does not.
---
In general: Thank you so much for this wonderful piece of software, I'm using it for years now and manage virtually everything with it (audio, video, pictures, important documents, papers, …).

View file

@ -0,0 +1,27 @@
[[!comment format=mdwn
username="jt"
avatar="http://cdn.libravatar.org/avatar/920cb5f2bc783736e679d046aa7cf987"
subject="securehashesonly conflicts with addurl"
date="2023-03-25T03:22:50Z"
content="""
Turning on `securehashesonly` seems to disable the `addurl` command:
```console
% git config --get annex.securehashesonly
true
% git annex addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html
addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html
annex.securehashesonly blocked transfer of URL key
failed
addurl: 1 failed
% git annex addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html --relaxed
addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html (to www.gutenberg.org_cache_epub_2591_pg2591-images.html) ok
(recording state in git...)
% ls -l www.gutenberg.org_cache_epub_2591_pg2591-images.html
www.gutenberg.org_cache_epub_2591_pg2591-images.html -> .git/annex/objects/gg/kG/URL--https&c%%www.gutenberg.org%cache%epub%2591%pg2591-images.html/URL--https&c%%www.gutenberg.org%cache%epub%2591%pg2591-images.html
```
Does this have something to do with the URL prefix that the annex object has?
"""]]