From 0ed64f5bd08d1d7d3fcfa40dca46470ec29c7f16 Mon Sep 17 00:00:00 2001 From: nobodyinperson Date: Wed, 3 Apr 2024 11:09:08 +0000 Subject: [PATCH 1/5] =?UTF-8?q?Add=20link=20to=20English=20re-recording=20?= =?UTF-8?q?of=20Yann's=20git-annex=20workshop=20kickoff=20talk=20@T=C3=BCb?= =?UTF-8?q?ix2023?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- doc/users/nobodyinperson.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/users/nobodyinperson.mdwn b/doc/users/nobodyinperson.mdwn index 18b8cfb8f1..56a3d1ee05 100644 --- a/doc/users/nobodyinperson.mdwn +++ b/doc/users/nobodyinperson.mdwn @@ -10,4 +10,4 @@ I made a [Thunar plugin](https://gitlab.com/nobodyinperson/thunar-plugins) for g In an attempt to [#gitAnnexAllTheThings](https://fosstodon.org/tags/gitAnnexAllTheThings), I used git annex as a backend for a cli time tracker [annextimelog](https://pypi.org/project/annextimelog/). It has similarities with timewarrior but adresses many of its inconveniences. -At the [TΓΌbix 2023](https://www.tuebix.org/) I gave a (German) git annex workshop, of which you can find a recording of the initial talk [πŸ“Ή here in the fediverse](https://tube.tchncs.de/w/db1ec5ca-94ad-4f49-a507-2124fd699ff1) and [πŸ“Ή here on Odysee](https://odysee.com/@nobodyinperson:6/T%C3%BCbix2023-Yann-B%C3%BCchau-git-annex:6). +At the [TΓΌbix 2023](https://www.tuebix.org/) I gave a git annex workshop, of which you can find a recording of the initial talk [πŸ“Ή here (πŸ‡©πŸ‡ͺ German)](https://tube.tchncs.de/w/db1ec5ca-94ad-4f49-a507-2124fd699ff1) or [πŸ“Ή here (English)](https://tube.tchncs.de/w/1U4vbTAhSEje3KQ1dGqvxh) in the fediverse and [πŸ“Ή here on Odysee (πŸ‡©πŸ‡ͺ German)](https://odysee.com/@nobodyinperson:6/T%C3%BCbix2023-Yann-B%C3%BCchau-git-annex:6). From dbd2f0e7bf3a545a685bb42798da65a26cc27f89 Mon Sep 17 00:00:00 2001 From: jasonc Date: Wed, 3 Apr 2024 16:48:04 +0000 Subject: [PATCH 2/5] Added a comment: Possible simplified scenario --- ..._cdedb5512dc9c27a88632e963a27a389._comment | 39 +++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 doc/bugs/Fails_to_drop_key_on_windows___40__Access_denied__41__/comment_2_cdedb5512dc9c27a88632e963a27a389._comment diff --git a/doc/bugs/Fails_to_drop_key_on_windows___40__Access_denied__41__/comment_2_cdedb5512dc9c27a88632e963a27a389._comment b/doc/bugs/Fails_to_drop_key_on_windows___40__Access_denied__41__/comment_2_cdedb5512dc9c27a88632e963a27a389._comment new file mode 100644 index 0000000000..1f1090c353 --- /dev/null +++ b/doc/bugs/Fails_to_drop_key_on_windows___40__Access_denied__41__/comment_2_cdedb5512dc9c27a88632e963a27a389._comment @@ -0,0 +1,39 @@ +[[!comment format=mdwn + username="jasonc" + nickname="mail" + avatar="http://cdn.libravatar.org/avatar/cb07bdfbe978aa83388d64e08a972eb2" + subject="Possible simplified scenario" + date="2024-04-03T16:48:04Z" + content=""" +Hello, firstly thank you for developing a really useful piece of software. During my initial experimentation I came across what appears to be a variation of this bug, and think I've distilled it to a minimal reproducible scenario. + +Initialise in the usual way on an NTFS partition, then add a directory special remote (no `encryption`, no `importtree` and no `exporttree`): +
+    git init
+    git annex init local
+    git annex initremote nextdoor type=directory directory=N:\nextdoordir encryption=none
+
+ +In my case I then added and committed the files locally, then moved them to the directory special remote and back again: +
+    git annex add .
+    git commit --all --message=\"first commit\"
+    git annex move . --to nextdoor
+    git annex move . --from nextdoor
+
+ +This completes successfully, however repeating the last two steps a second time triggers the `permission denied (Access is denied.)` failure at the start of the bug report. + +Going through each part step by step: + +* Since NTFS is designated as a \"crippled filesystem\", the annexed objects appear to be read-write by default (no ACL modifications, no ReadOnly attribute). +* When the files are moved away to the directory special remote (in my test, the same NTFS partition), they pick up a ReadOnly attribute in the new location, so `Archive+Compression` becomes `ReadOnly+Archive+Compression`. +* When the files are then moved back from the directory special remote, the ReadOnly attribute persists. +* Repeating the movement then fails, as the file cannot be dropped locally (the UNC path exists, but `DeleteFile` fails). + +If I remove the ReadOnly attributes and try again, the move away is successful. Similarly if I use a networked ext4 location for the directory special remote (and NTFS locally), the same cycle of success then failure can be observed. + +Version information: git `git version 2.44.0.windows.1`, annex `git-annex version: 10.20240130-gad8e32c09d3ec866e0c0654cdcd146bf1aefbc5e` (installer from 2024-02-27), Windows 10 22H2 + +If you require logs or other information, please let me know. +"""]] From 554b73bb3908f3c714f034c89230d2e4ba238892 Mon Sep 17 00:00:00 2001 From: "m.risse@77eac2c22d673d5f10305c0bade738ad74055f92" Date: Sat, 6 Apr 2024 09:59:37 +0000 Subject: [PATCH 3/5] --- ...es_associated_URLs_with_custom_scheme.mdwn | 142 ++++++++++++++++++ 1 file changed, 142 insertions(+) create mode 100644 doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn diff --git a/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn b/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn new file mode 100644 index 0000000000..7ebff66146 --- /dev/null +++ b/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn @@ -0,0 +1,142 @@ +### Please describe the problem. + +With URLs that are handled by the web remote the URL will be kept through a migration, i.e. + +``` +git annex addurl --relaxed +git annex get +git annex migrate +``` + +will migrate the key of the file to be hash based, and keep the URL associated to that key. +If I do the same with a URL whose scheme is handled by a custom special remote (this one specifically is what I got the issue with: , it registers itself for the `cds:` scheme), the URL seems to be dropped from the key (i.e. whereis no longer shows it and git annex can no longer fetch it from the special remote). + +### What steps will reproduce the problem? + +The steps mentioned above for a URL whose scheme is handled by an (external?) special remote. Specifically, I saw it with datalad-cds, it might happen for other special remotes as well. + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20231129 +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.1 bloomfilter-2.0.1.2 crypton-0.32 DAV-1.3.4 feed-1.3.2.1 ghc-9.4.8 http-client-0.7.15 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + +Installed with nix from nixpkgs on an ubuntu system. + +### Please provide any additional information below. + +It works with the web remote: +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log +$ datalad create test-ds +create(ok): [...] (dataset) +$ cd test-ds/ +$ git annex addurl --relaxed "https://git-annex.branchable.com/" +addurl https://git-annex.branchable.com/ (to git-annex.branchable.com_) ok +(recording state in git...) +$ git annex whereis git-annex.branchable.com_ +whereis git-annex.branchable.com_ (1 copy) + 00000000-0000-0000-0000-000000000001 -- web + + web: https://git-annex.branchable.com/ +ok +$ ls -l +[...] git-annex.branchable.com_ -> '.git/annex/objects/pJ/v4/URL--https&c%%git-annex.branchable.com%/URL--https&c%%git-annex.branchable.com%' +$ git annex get git-annex.branchable.com_ +get git-annex.branchable.com_ (from web...) +ok +(recording state in git...) +$ git annex migrate +migrate git-annex.branchable.com_ (checksum...) ok +(recording state in git...) +$ ls -l +[...] git-annex.branchable.com_ -> .git/annex/objects/31/qv/MD5E-s12462--6b956d66f8352205df79936ada326ec3/MD5E-s12462--6b956d66f8352205df79936ada326ec3 +$ git annex whereis git-annex.branchable.com_ +whereis git-annex.branchable.com_ (2 copies) + 00000000-0000-0000-0000-000000000001 -- web + 60079e0e-42e4-492e-a7b1-dde764d069eb -- [here] + + web: https://git-annex.branchable.com/ +ok +# End of transcript or log. +"""]] + +It doesn't work with the custom special remote: +[[!format sh """ +$ datalad create test-ds +create(ok): [...] (dataset) +$ cd test-ds/ +$ datalad download-cds --lazy --nosave --path "2022-01-01.grib" "$(cat < '.git/annex/objects/Mx/4G/URL--cds&cv1-eyJkYXRhc2V0IjoicmVhbmFs-162a71d794c333f5e04b13283421a49a/URL--cds&cv1-eyJkYXRhc2V0IjoicmVhbmFs-162a71d794c333f5e04b13283421a49a' +$ git annex whereis 2022-01-01.grib +whereis 2022-01-01.grib (1 copy) + 923e2755-e747-42f4-890a-9c921068fb82 -- [cds] + + cds: {"dataset":"reanalysis-era5-complete","sub-selection":{"class":"ea","date":"2022-01-01","expver":"1","levelist":"1","levtype":"ml","param":"130","stream":"oper","time":"00:00:00/06:00:00/12:00:00/18:00:00","type":"an","grid":".3/.3"}} + cds: cds:v1-eyJkYXRhc2V0IjoicmVhbmFseXNpcy1lcmE1LWNvbXBsZXRlIiwic3ViLXNlbGVjdGlvbiI6eyJjbGFzcyI6ImVhIiwiZGF0ZSI6IjIwMjItMDEtMDEiLCJleHB2ZXIiOiIxIiwibGV2ZWxpc3QiOiIxIiwibGV2dHlwZSI6Im1sIiwicGFyYW0iOiIxMzAiLCJzdHJlYW0iOiJvcGVyIiwidGltZSI6IjAwOjAwOjAwLzA2OjAwOjAwLzEyOjAwOjAwLzE4OjAwOjAwIiwidHlwZSI6ImFuIiwiZ3JpZCI6Ii4zLy4zIn19 +ok +$ git config --local remote.cds.annex-security-allow-unverified-downloads ACKTHPPT +$ git annex get 2022-01-01.grib +get 2022-01-01.grib (from cds...) +2024-04-06 11:37:05,250 INFO Welcome to the CDS +2024-04-06 11:37:05,251 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-complete +2024-04-06 11:37:05,340 INFO Request is queued +2024-04-06 11:37:06,400 INFO Request is running +2024-04-06 11:37:26,399 INFO Request is completed +2024-04-06 11:37:26,399 INFO Downloading https://download-0017.copernicus-climate.eu/cache-compute-0017/cache/data9/adaptor.mars.external-1712396225.5545986-18258-18-822e5b91-cf60-4dbd-a808-a1253d4fe109.grib to .git/annex/tmp/URL--cds&cv1-eyJkYXRhc2V0IjoicmVhbmFs-162a71d794c333f5e04b13283421a49a (5.5M) + 0%| | 0.00/5.51M [00:00 .git/annex/objects/KJ/6K/MD5E-s5774880--94a848eefd02d72952c8541c52a93550.grib/MD5E-s5774880--94a848eefd02d72952c8541c52a93550.grib +$ git annex whereis 2022-01-01.grib +whereis 2022-01-01.grib (1 copy) + 5dfef0c9-8e18-4ea2-9ee1-646830b5749b -- [here] +ok +$ git annex drop 2022-01-01.grib +drop 2022-01-01.grib (unsafe) + Could only verify the existence of 0 out of 1 necessary copy + + Rather than dropping this file, try using: git annex move + + (Use --force to override this check, or adjust numcopies.) +failed +drop: 1 failed +"""]] + +I know that this is sort of abusing the URL handling in git-annex, but it was super easy to implement. You recommended me to use SETSTATE/GETSTATE from the external special remote protocol instead already at some point, but I didn't get around to reworking it for that yet. + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Yes! It is absolutely great, thank you for it. From 26549ec34301afba8de41af44336a6afb1b3f946 Mon Sep 17 00:00:00 2001 From: "m.risse@77eac2c22d673d5f10305c0bade738ad74055f92" Date: Sat, 6 Apr 2024 10:05:04 +0000 Subject: [PATCH 4/5] Added a comment --- .../comment_1_6e9a2b3ff2be4e2b3ff5b67ac913efb8._comment | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 doc/bugs/migrate_removes_associated_URLs_with_custom_scheme/comment_1_6e9a2b3ff2be4e2b3ff5b67ac913efb8._comment diff --git a/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme/comment_1_6e9a2b3ff2be4e2b3ff5b67ac913efb8._comment b/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme/comment_1_6e9a2b3ff2be4e2b3ff5b67ac913efb8._comment new file mode 100644 index 0000000000..5f493c6841 --- /dev/null +++ b/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme/comment_1_6e9a2b3ff2be4e2b3ff5b67ac913efb8._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="m.risse@77eac2c22d673d5f10305c0bade738ad74055f92" + nickname="m.risse" + avatar="http://cdn.libravatar.org/avatar/59541f50d845e5f81aff06e88a38b9de" + subject="comment 1" + date="2024-04-06T10:05:04Z" + content=""" +Looking at this again I am also surprised that I didn't need to set the equivalent of `git config --local remote.cds.annex-security-allow-unverified-downloads ACKTHPPT` for the web special remote when get'ing a file with just a URL key. +"""]] From bae7076e43beb2eb0bea041911964645f55df123 Mon Sep 17 00:00:00 2001 From: nobodyinperson Date: Sat, 6 Apr 2024 15:12:50 +0000 Subject: [PATCH 5/5] Added a comment: more use cases for configurable default preferred content --- ...mment_2_575320e07f005d46d26cc6db9ad3ebd3._comment | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 doc/todo/Setting_default_preferred_content_expressions/comment_2_575320e07f005d46d26cc6db9ad3ebd3._comment diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_2_575320e07f005d46d26cc6db9ad3ebd3._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_2_575320e07f005d46d26cc6db9ad3ebd3._comment new file mode 100644 index 0000000000..9ebfd99b8e --- /dev/null +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_2_575320e07f005d46d26cc6db9ad3ebd3._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="more use cases for configurable default preferred content" + date="2024-04-06T15:12:50Z" + content=""" +This came up again at the distribits meeting. + +DataLad itself is designed to work like `git annex wanted . present` (i.e. content is supposed to be fetched manually. It is assumed that the user does not generally want all content of a DataLad dataset / git annex repo). DataLad could itself run `git annex wanted . present` as part of its setup (talked about that with @mih), but I still think a setting in the git-annex branch that auto-sets the above settings in fresh clones (even when using plain git annex, not DataLad), is useful. It enhances the user experience of sparse checkouts (a `git annex assist` in a freshly cloned annex repo can then be configured to only pull specific or no files). + +I also discussed it with people in the context of handling confidential patient data that should not necessarily be copied everywhere. The default of just wanting all worktree content increases the delicacy of the matter a bit. Were there a way to have fresh clones (or even freshly created remotes that were not yet given a preferred content manually) have a preconfigured default wanted content, it would reduce the possibility of confidential data accidentally being copied all over the place. +"""]]