Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2024-04-06 19:52:26 -04:00
commit 6401a1eed5
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 203 additions and 1 deletions

View file

@ -0,0 +1,39 @@
[[!comment format=mdwn
username="jasonc"
nickname="mail"
avatar="http://cdn.libravatar.org/avatar/cb07bdfbe978aa83388d64e08a972eb2"
subject="Possible simplified scenario"
date="2024-04-03T16:48:04Z"
content="""
Hello, firstly thank you for developing a really useful piece of software. During my initial experimentation I came across what appears to be a variation of this bug, and think I've distilled it to a minimal reproducible scenario.
Initialise in the usual way on an NTFS partition, then add a directory special remote (no `encryption`, no `importtree` and no `exporttree`):
<pre>
git init
git annex init local
git annex initremote nextdoor type=directory directory=N:\nextdoordir encryption=none
</pre>
In my case I then added and committed the files locally, then moved them to the directory special remote and back again:
<pre>
git annex add .
git commit --all --message=\"first commit\"
git annex move . --to nextdoor
git annex move . --from nextdoor
</pre>
This completes successfully, however repeating the last two steps a second time triggers the `permission denied (Access is denied.)` failure at the start of the bug report.
Going through each part step by step:
* Since NTFS is designated as a \"crippled filesystem\", the annexed objects appear to be read-write by default (no ACL modifications, no ReadOnly attribute).
* When the files are moved away to the directory special remote (in my test, the same NTFS partition), they pick up a ReadOnly attribute in the new location, so `Archive+Compression` becomes `ReadOnly+Archive+Compression`.
* When the files are then moved back from the directory special remote, the ReadOnly attribute persists.
* Repeating the movement then fails, as the file cannot be dropped locally (the UNC path exists, but `DeleteFile` fails).
If I remove the ReadOnly attributes and try again, the move away is successful. Similarly if I use a networked ext4 location for the directory special remote (and NTFS locally), the same cycle of success then failure can be observed.
Version information: git `git version 2.44.0.windows.1`, annex `git-annex version: 10.20240130-gad8e32c09d3ec866e0c0654cdcd146bf1aefbc5e` (installer from 2024-02-27), Windows 10 22H2
If you require logs or other information, please let me know.
"""]]

View file

@ -0,0 +1,142 @@
### Please describe the problem.
With URLs that are handled by the web remote the URL will be kept through a migration, i.e.
```
git annex addurl --relaxed <some-http-url>
git annex get <the-added-file>
git annex migrate
```
will migrate the key of the file to be hash based, and keep the URL associated to that key.
If I do the same with a URL whose scheme is handled by a custom special remote (this one specifically is what I got the issue with: <https://github.com/matrss/datalad-cds/blob/main/src/datalad_cds/cds_remote.py>, it registers itself for the `cds:` scheme), the URL seems to be dropped from the key (i.e. whereis no longer shows it and git annex can no longer fetch it from the special remote).
### What steps will reproduce the problem?
The steps mentioned above for a URL whose scheme is handled by an (external?) special remote. Specifically, I saw it with datalad-cds, it might happen for other special remotes as well.
### What version of git-annex are you using? On what operating system?
```
git-annex version: 10.20231129
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.24.1 bloomfilter-2.0.1.2 crypton-0.32 DAV-1.3.4 feed-1.3.2.1 ghc-9.4.8 http-client-0.7.15 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 10
```
Installed with nix from nixpkgs on an ubuntu system.
### Please provide any additional information below.
It works with the web remote:
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
$ datalad create test-ds
create(ok): [...] (dataset)
$ cd test-ds/
$ git annex addurl --relaxed "https://git-annex.branchable.com/"
addurl https://git-annex.branchable.com/ (to git-annex.branchable.com_) ok
(recording state in git...)
$ git annex whereis git-annex.branchable.com_
whereis git-annex.branchable.com_ (1 copy)
00000000-0000-0000-0000-000000000001 -- web
web: https://git-annex.branchable.com/
ok
$ ls -l
[...] git-annex.branchable.com_ -> '.git/annex/objects/pJ/v4/URL--https&c%%git-annex.branchable.com%/URL--https&c%%git-annex.branchable.com%'
$ git annex get git-annex.branchable.com_
get git-annex.branchable.com_ (from web...)
ok
(recording state in git...)
$ git annex migrate
migrate git-annex.branchable.com_ (checksum...) ok
(recording state in git...)
$ ls -l
[...] git-annex.branchable.com_ -> .git/annex/objects/31/qv/MD5E-s12462--6b956d66f8352205df79936ada326ec3/MD5E-s12462--6b956d66f8352205df79936ada326ec3
$ git annex whereis git-annex.branchable.com_
whereis git-annex.branchable.com_ (2 copies)
00000000-0000-0000-0000-000000000001 -- web
60079e0e-42e4-492e-a7b1-dde764d069eb -- [here]
web: https://git-annex.branchable.com/
ok
# End of transcript or log.
"""]]
It doesn't work with the custom special remote:
[[!format sh """
$ datalad create test-ds
create(ok): [...] (dataset)
$ cd test-ds/
$ datalad download-cds --lazy --nosave --path "2022-01-01.grib" "$(cat <<EOF
{
"dataset": "reanalysis-era5-complete",
"sub-selection": {
"class": "ea",
"date": "2022-01-01",
"expver": "1",
"levelist": "1",
"levtype": "ml",
"param": "130",
"stream": "oper",
"time": "00:00:00/06:00:00/12:00:00/18:00:00",
"type": "an",
"grid": ".3/.3"
}
}
EOF
)"
cds(ok): [...] (dataset)
$ ls -l
[...] 2022-01-01.grib -> '.git/annex/objects/Mx/4G/URL--cds&cv1-eyJkYXRhc2V0IjoicmVhbmFs-162a71d794c333f5e04b13283421a49a/URL--cds&cv1-eyJkYXRhc2V0IjoicmVhbmFs-162a71d794c333f5e04b13283421a49a'
$ git annex whereis 2022-01-01.grib
whereis 2022-01-01.grib (1 copy)
923e2755-e747-42f4-890a-9c921068fb82 -- [cds]
cds: {"dataset":"reanalysis-era5-complete","sub-selection":{"class":"ea","date":"2022-01-01","expver":"1","levelist":"1","levtype":"ml","param":"130","stream":"oper","time":"00:00:00/06:00:00/12:00:00/18:00:00","type":"an","grid":".3/.3"}}
cds: cds:v1-eyJkYXRhc2V0IjoicmVhbmFseXNpcy1lcmE1LWNvbXBsZXRlIiwic3ViLXNlbGVjdGlvbiI6eyJjbGFzcyI6ImVhIiwiZGF0ZSI6IjIwMjItMDEtMDEiLCJleHB2ZXIiOiIxIiwibGV2ZWxpc3QiOiIxIiwibGV2dHlwZSI6Im1sIiwicGFyYW0iOiIxMzAiLCJzdHJlYW0iOiJvcGVyIiwidGltZSI6IjAwOjAwOjAwLzA2OjAwOjAwLzEyOjAwOjAwLzE4OjAwOjAwIiwidHlwZSI6ImFuIiwiZ3JpZCI6Ii4zLy4zIn19
ok
$ git config --local remote.cds.annex-security-allow-unverified-downloads ACKTHPPT
$ git annex get 2022-01-01.grib
get 2022-01-01.grib (from cds...)
2024-04-06 11:37:05,250 INFO Welcome to the CDS
2024-04-06 11:37:05,251 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-complete
2024-04-06 11:37:05,340 INFO Request is queued
2024-04-06 11:37:06,400 INFO Request is running
2024-04-06 11:37:26,399 INFO Request is completed
2024-04-06 11:37:26,399 INFO Downloading https://download-0017.copernicus-climate.eu/cache-compute-0017/cache/data9/adaptor.mars.external-1712396225.5545986-18258-18-822e5b91-cf60-4dbd-a808-a1253d4fe109.grib to .git/annex/tmp/URL--cds&cv1-eyJkYXRhc2V0IjoicmVhbmFs-162a71d794c333f5e04b13283421a49a (5.5M)
0%| | 0.00/5.51M [00:00<?, ?B/s] 2%|▏ | 104k/5.51M [00:00<00:06, 866kB/s] 17%|█▋ | 942k/5.51M [00:00<00:00, 5.00MB/s] 57%|█████▋ | 3.15M/5.51M [00:00<00:00, 13.0MB/s] 99%|█████████▉| 5.48M/5.51M [00:00<00:00, 17.3MB/s] 2024-04-06 11:37:27,322 INFO Download rate 6M/s
ok
(recording state in git...)
$ git annex migrate
migrate 2022-01-01.grib (checksum...) ok
(recording state in git...)
$ ls -l
[...] 2022-01-01.grib -> .git/annex/objects/KJ/6K/MD5E-s5774880--94a848eefd02d72952c8541c52a93550.grib/MD5E-s5774880--94a848eefd02d72952c8541c52a93550.grib
$ git annex whereis 2022-01-01.grib
whereis 2022-01-01.grib (1 copy)
5dfef0c9-8e18-4ea2-9ee1-646830b5749b -- [here]
ok
$ git annex drop 2022-01-01.grib
drop 2022-01-01.grib (unsafe)
Could only verify the existence of 0 out of 1 necessary copy
Rather than dropping this file, try using: git annex move
(Use --force to override this check, or adjust numcopies.)
failed
drop: 1 failed
"""]]
I know that this is sort of abusing the URL handling in git-annex, but it was super easy to implement. You recommended me to use SETSTATE/GETSTATE from the external special remote protocol instead already at some point, but I didn't get around to reworking it for that yet.
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes! It is absolutely great, thank you for it.

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="m.risse@77eac2c22d673d5f10305c0bade738ad74055f92"
nickname="m.risse"
avatar="http://cdn.libravatar.org/avatar/59541f50d845e5f81aff06e88a38b9de"
subject="comment 1"
date="2024-04-06T10:05:04Z"
content="""
Looking at this again I am also surprised that I didn't need to set the equivalent of `git config --local remote.cds.annex-security-allow-unverified-downloads ACKTHPPT` for the web special remote when get'ing a file with just a URL key.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="nobodyinperson"
avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
subject="more use cases for configurable default preferred content"
date="2024-04-06T15:12:50Z"
content="""
This came up again at the distribits meeting.
DataLad itself is designed to work like `git annex wanted . present` (i.e. content is supposed to be fetched manually. It is assumed that the user does not generally want all content of a DataLad dataset / git annex repo). DataLad could itself run `git annex wanted . present` as part of its setup (talked about that with @mih), but I still think a setting in the git-annex branch that auto-sets the above settings in fresh clones (even when using plain git annex, not DataLad), is useful. It enhances the user experience of sparse checkouts (a `git annex assist` in a freshly cloned annex repo can then be configured to only pull specific or no files).
I also discussed it with people in the context of handling confidential patient data that should not necessarily be copied everywhere. The default of just wanting all worktree content increases the delicacy of the matter a bit. Were there a way to have fresh clones (or even freshly created remotes that were not yet given a preferred content manually) have a preconfigured default wanted content, it would reduce the possibility of confidential data accidentally being copied all over the place.
"""]]

View file

@ -10,4 +10,4 @@ I made a [Thunar plugin](https://gitlab.com/nobodyinperson/thunar-plugins) for g
In an attempt to [#gitAnnexAllTheThings](https://fosstodon.org/tags/gitAnnexAllTheThings), I used git annex as a backend for a cli time tracker [annextimelog](https://pypi.org/project/annextimelog/). It has similarities with timewarrior but adresses many of its inconveniences.
At the [Tübix 2023](https://www.tuebix.org/) I gave a (German) git annex workshop, of which you can find a recording of the initial talk [📹 here in the fediverse](https://tube.tchncs.de/w/db1ec5ca-94ad-4f49-a507-2124fd699ff1) and [📹 here on Odysee](https://odysee.com/@nobodyinperson:6/T%C3%BCbix2023-Yann-B%C3%BCchau-git-annex:6).
At the [Tübix 2023](https://www.tuebix.org/) I gave a git annex workshop, of which you can find a recording of the initial talk [📹 here (🇩🇪 German)](https://tube.tchncs.de/w/db1ec5ca-94ad-4f49-a507-2124fd699ff1) or [📹 here (English)](https://tube.tchncs.de/w/1U4vbTAhSEje3KQ1dGqvxh) in the fediverse and [📹 here on Odysee (🇩🇪 German)](https://odysee.com/@nobodyinperson:6/T%C3%BCbix2023-Yann-B%C3%BCchau-git-annex:6).