From ca687413efdd35d4a253977f24fddb2a8095fc6d Mon Sep 17 00:00:00 2001 From: "ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" Date: Wed, 5 Jun 2024 16:53:51 +0000 Subject: [PATCH 01/11] Added a comment --- ...comment_1_23260a8010a9dc707783408ac1663b00._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/todo/wherewas/comment_1_23260a8010a9dc707783408ac1663b00._comment diff --git a/doc/todo/wherewas/comment_1_23260a8010a9dc707783408ac1663b00._comment b/doc/todo/wherewas/comment_1_23260a8010a9dc707783408ac1663b00._comment new file mode 100644 index 0000000000..7c91260bad --- /dev/null +++ b/doc/todo/wherewas/comment_1_23260a8010a9dc707783408ac1663b00._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" + nickname="ruslan" + avatar="http://cdn.libravatar.org/avatar/37d3c852372d96daa8a99629755ed1f9" + subject="comment 1" + date="2024-06-05T16:53:50Z" + content=""" +Yes, limiting it to a single file would be sufficient for the use case I encountered, and keep it simple from the usage / user interface stand point IMHO +Would look forward to this! +"""]] From 6b4ae7b635b1dd6967d5ba63542f24fae1123130 Mon Sep 17 00:00:00 2001 From: "ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" Date: Wed, 5 Jun 2024 17:22:04 +0000 Subject: [PATCH 02/11] --- ..._unannex_-_some_files_still_symlinked.mdwn | 35 +++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 doc/bugs/git_annex_unannex_-_some_files_still_symlinked.mdwn diff --git a/doc/bugs/git_annex_unannex_-_some_files_still_symlinked.mdwn b/doc/bugs/git_annex_unannex_-_some_files_still_symlinked.mdwn new file mode 100644 index 0000000000..14c1a4cf89 --- /dev/null +++ b/doc/bugs/git_annex_unannex_-_some_files_still_symlinked.mdwn @@ -0,0 +1,35 @@ +### Please describe the problem. + +1. Some files remain symlinked after aborted `git annex add` and completed `git annex unannex` +2. This files are present in``.git/annex/objects` but `git annex unused` does not find them. Running `git annex whereused --key=SHA256E...` runs empty. + +To restore files and remove them from git-annex objects folder - need manual workarounds or hacks like adding file again with `git annex add` and trying to removing it again + +### What steps will reproduce the problem? + +1. run `git annex add` and abort operation mid-way (this was on directory with large number of files ~3K and running with 12 jobs command switch) +2. run `git annex unannex` until done +3. find that some files that were added - were restored, and some still symlinked but are not tracked by git annex + + +### What version of git-annex are you using? On what operating system? + +Debian Bookworm / git-annex version: 10.20240227-1 + +### Please provide any additional information below. + +Similar report from another user here: +https://git-annex.branchable.com/forum/File_still_symlinked_after_git_annex_unannex/ + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + + +Yes, using it extensively for a few years with terabytes of data From 93b11da4db471af3c781ca1a3f6016db259d39b7 Mon Sep 17 00:00:00 2001 From: "ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" Date: Wed, 5 Jun 2024 17:34:32 +0000 Subject: [PATCH 03/11] Added a comment --- ..._a6c2c4e87743da11dcc2ed718a350bb4._comment | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 doc/bugs/git_annex_unannex_-_some_files_still_symlinked/comment_1_a6c2c4e87743da11dcc2ed718a350bb4._comment diff --git a/doc/bugs/git_annex_unannex_-_some_files_still_symlinked/comment_1_a6c2c4e87743da11dcc2ed718a350bb4._comment b/doc/bugs/git_annex_unannex_-_some_files_still_symlinked/comment_1_a6c2c4e87743da11dcc2ed718a350bb4._comment new file mode 100644 index 0000000000..c507684212 --- /dev/null +++ b/doc/bugs/git_annex_unannex_-_some_files_still_symlinked/comment_1_a6c2c4e87743da11dcc2ed718a350bb4._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" + nickname="ruslan" + avatar="http://cdn.libravatar.org/avatar/37d3c852372d96daa8a99629755ed1f9" + subject="comment 1" + date="2024-06-05T17:34:32Z" + content=""" +Solution with running `git annex add` is also described at the link below: + +https://git-annex.branchable.com/forum/git_annex_add_crash_and_subsequent_recovery/#comment-4f5af644597a055624009c5bbb9aca3f + +--- + +So need to find files that are symlinks to git annex object folder and run `git annex add` / `git annex unused` - I can handle that with a script, though would be nice to have a built-in method + +--- + +Additional notes: + +1. There should be a way to find files that were added to git annex folder but are not tracked by git annex. Is this something that can be done with existing commands? +2. It's desirable to have a way to abort `git annex add` gracefully on long-running jobs. Is there a way to do it now? Looks like ctrl-c resulted in a broken state. Whould Ctrl-z work better? +"""]] From 7dbfb16415ccf2c9a3aaa3e4e4740bd5e47c4eb2 Mon Sep 17 00:00:00 2001 From: "ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" Date: Wed, 5 Jun 2024 17:45:49 +0000 Subject: [PATCH 04/11] --- .../How_to_add_git_annex_metadata_to_directories__63__.mdwn | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 doc/forum/How_to_add_git_annex_metadata_to_directories__63__.mdwn diff --git a/doc/forum/How_to_add_git_annex_metadata_to_directories__63__.mdwn b/doc/forum/How_to_add_git_annex_metadata_to_directories__63__.mdwn new file mode 100644 index 0000000000..f5920e2b55 --- /dev/null +++ b/doc/forum/How_to_add_git_annex_metadata_to_directories__63__.mdwn @@ -0,0 +1,3 @@ +As I understand - there is currently now way to track metadata for directories with `git annex metadata`, and it only works for files. Is that indeed the case? + +One workaround I'm looking at is to add a metadata placeholder file for directory metadata inside the directory. As I understand - each directory would need to have such file with some unique content (perhaps UUID), otherwise metadata between files for different directories will actually collide. Are there alternatives/better solutions for tracking datasets metadata (groups of files in a folder)? From 6985c62a47b9c1ba673ba2a764118f92215611b9 Mon Sep 17 00:00:00 2001 From: nobodyinperson Date: Thu, 6 Jun 2024 09:09:03 +0000 Subject: [PATCH 05/11] Added a comment --- .../comment_1_3eee0688143bbd0696cde16c7fca8d06._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_1_3eee0688143bbd0696cde16c7fca8d06._comment diff --git a/doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_1_3eee0688143bbd0696cde16c7fca8d06._comment b/doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_1_3eee0688143bbd0696cde16c7fca8d06._comment new file mode 100644 index 0000000000..fd9ecec870 --- /dev/null +++ b/doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_1_3eee0688143bbd0696cde16c7fca8d06._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="comment 1" + date="2024-06-06T09:09:03Z" + content=""" +You are absolutely right. You might be interested in [DataLad](https://datalad.org), which provides a lot of convenience around git-annex, has the concept of datasets (git submodules) and also an extended approach to metadata. +"""]] From a1e1af35af5a4ac2b4eed8f92468f5e2e74d711c Mon Sep 17 00:00:00 2001 From: "ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" Date: Thu, 6 Jun 2024 10:29:21 +0000 Subject: [PATCH 06/11] --- .../How_to_add_git_annex_metadata_to_directories__63__.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/forum/How_to_add_git_annex_metadata_to_directories__63__.mdwn b/doc/forum/How_to_add_git_annex_metadata_to_directories__63__.mdwn index f5920e2b55..e67ec99e77 100644 --- a/doc/forum/How_to_add_git_annex_metadata_to_directories__63__.mdwn +++ b/doc/forum/How_to_add_git_annex_metadata_to_directories__63__.mdwn @@ -1,3 +1,3 @@ -As I understand - there is currently now way to track metadata for directories with `git annex metadata`, and it only works for files. Is that indeed the case? +As I understand - there is currently no way to track metadata for directories with `git annex metadata` (it only works for files). Is that indeed the case? One workaround I'm looking at is to add a metadata placeholder file for directory metadata inside the directory. As I understand - each directory would need to have such file with some unique content (perhaps UUID), otherwise metadata between files for different directories will actually collide. Are there alternatives/better solutions for tracking datasets metadata (groups of files in a folder)? From d4993248eb808430e6017db46ba52a19ba714731 Mon Sep 17 00:00:00 2001 From: "ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" Date: Thu, 6 Jun 2024 11:23:34 +0000 Subject: [PATCH 07/11] Added a comment --- ...nt_2_efe39f86b7ab71a64cd6ce4770f39d42._comment | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_2_efe39f86b7ab71a64cd6ce4770f39d42._comment diff --git a/doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_2_efe39f86b7ab71a64cd6ce4770f39d42._comment b/doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_2_efe39f86b7ab71a64cd6ce4770f39d42._comment new file mode 100644 index 0000000000..e6a32f5a0a --- /dev/null +++ b/doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_2_efe39f86b7ab71a64cd6ce4770f39d42._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" + nickname="ruslan" + avatar="http://cdn.libravatar.org/avatar/37d3c852372d96daa8a99629755ed1f9" + subject="comment 2" + date="2024-06-06T11:23:34Z" + content=""" +Thank you for the heads up! + +I've actually looked in to DataLad, and have been using git annex with submodules. + +Problem I found with submodules is that they required a lot of additional steps as far as adding/moving/deleting/syncing them. A very manual process, with a lot of complexity and some rough edge cases. They also interfere with some of Git-Annex functionality like metadata driven views I believe. So I'm using submodules very sparingly, only when I really need them. + +As far as DataLad - it looks like a mature and well supported project, would love to see more feedback/reviews on it. +"""]] From 6274d16102d3829d0a8f306ae017f7f7ce5a2757 Mon Sep 17 00:00:00 2001 From: "ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" Date: Thu, 6 Jun 2024 11:23:55 +0000 Subject: [PATCH 08/11] Added a comment --- ...nt_3_1a6f7ef00e8cdabf8b52dfd01a1f6148._comment | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_3_1a6f7ef00e8cdabf8b52dfd01a1f6148._comment diff --git a/doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_3_1a6f7ef00e8cdabf8b52dfd01a1f6148._comment b/doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_3_1a6f7ef00e8cdabf8b52dfd01a1f6148._comment new file mode 100644 index 0000000000..74c78fe2bf --- /dev/null +++ b/doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_3_1a6f7ef00e8cdabf8b52dfd01a1f6148._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" + nickname="ruslan" + avatar="http://cdn.libravatar.org/avatar/37d3c852372d96daa8a99629755ed1f9" + subject="comment 3" + date="2024-06-06T11:23:55Z" + content=""" +Thank you for the heads up! + +I've actually looked in to DataLad, and have been using git annex with submodules. + +Problem I found with submodules is that they required a lot of additional steps as far as adding/moving/deleting/syncing them. A very manual process, with a lot of complexity and some rough edge cases. They also interfere with some of Git-Annex functionality like metadata driven views I believe. So I'm using submodules very sparingly, only when I really need them. + +As far as DataLad - it looks like a mature and well supported project, would love to see more feedback/reviews on it. +"""]] From 1e6b4f324abcb142108a03511be2648720e07248 Mon Sep 17 00:00:00 2001 From: "ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" Date: Thu, 6 Jun 2024 13:40:26 +0000 Subject: [PATCH 09/11] removed --- ...nt_3_1a6f7ef00e8cdabf8b52dfd01a1f6148._comment | 15 --------------- 1 file changed, 15 deletions(-) delete mode 100644 doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_3_1a6f7ef00e8cdabf8b52dfd01a1f6148._comment diff --git a/doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_3_1a6f7ef00e8cdabf8b52dfd01a1f6148._comment b/doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_3_1a6f7ef00e8cdabf8b52dfd01a1f6148._comment deleted file mode 100644 index 74c78fe2bf..0000000000 --- a/doc/forum/How_to_add_git_annex_metadata_to_directories__63__/comment_3_1a6f7ef00e8cdabf8b52dfd01a1f6148._comment +++ /dev/null @@ -1,15 +0,0 @@ -[[!comment format=mdwn - username="ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" - nickname="ruslan" - avatar="http://cdn.libravatar.org/avatar/37d3c852372d96daa8a99629755ed1f9" - subject="comment 3" - date="2024-06-06T11:23:55Z" - content=""" -Thank you for the heads up! - -I've actually looked in to DataLad, and have been using git annex with submodules. - -Problem I found with submodules is that they required a lot of additional steps as far as adding/moving/deleting/syncing them. A very manual process, with a lot of complexity and some rough edge cases. They also interfere with some of Git-Annex functionality like metadata driven views I believe. So I'm using submodules very sparingly, only when I really need them. - -As far as DataLad - it looks like a mature and well supported project, would love to see more feedback/reviews on it. -"""]] From d59383beafc72609b8d4c61767793571b8f8b8fa Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 6 Jun 2024 17:23:51 -0400 Subject: [PATCH 10/11] update --- doc/todo/git-annex_proxies.mdwn | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-) diff --git a/doc/todo/git-annex_proxies.mdwn b/doc/todo/git-annex_proxies.mdwn index ddac3b9cad..84b012368f 100644 --- a/doc/todo/git-annex_proxies.mdwn +++ b/doc/todo/git-annex_proxies.mdwn @@ -34,21 +34,9 @@ For June's work on [[design/passthrough_proxy]], implementation plan: 1. Add `git-annex updateproxy` command and remote.name.annex-proxy configuration. (done) -1. getProxies should be cached to avoid repeatedly reading the log and - parsing. +2. Test implementation of remote instantiation for proxies. -1. Remote names coming from the git-annex branch need to be - limited to what's legal in git remote names. If a remote name is not - legal, munge it until it is. - This will also prevent remote names being a security hazard - via eg escape characters. - -2. Remote instantiation for proxies. When a remote "foo" is a proxy, - and has a remote "bar", instantiate a remote "foo-bar" that has the UUID - of bar but is of the same type and configuration of remote "foo". - -3. Implement proxying in git-annex-shell so connections with the UUID - of one of the proxy's +3. Implement proxying in git-annex-shell. 4. Let `storeKey` return a list of UUIDs where content was stored, and make proxies accept uploads directed at them, rather than a specific @@ -73,4 +61,4 @@ For June's work on [[design/passthrough_proxy]], implementation plan: 11. indirect uploads (to be considered). See design. - +12. Support using a proxy when its url is a P2P address. From 058726ee866c5c60d628f6a5ee5d635142ae9ffa Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 6 Jun 2024 18:06:45 -0400 Subject: [PATCH 11/11] next step identified --- doc/todo/git-annex_proxies.mdwn | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/doc/todo/git-annex_proxies.mdwn b/doc/todo/git-annex_proxies.mdwn index 84b012368f..90dc9c614d 100644 --- a/doc/todo/git-annex_proxies.mdwn +++ b/doc/todo/git-annex_proxies.mdwn @@ -34,7 +34,13 @@ For June's work on [[design/passthrough_proxy]], implementation plan: 1. Add `git-annex updateproxy` command and remote.name.annex-proxy configuration. (done) -2. Test implementation of remote instantiation for proxies. +2. Remote instantiation for proxies almost works, but fails at: + "git-annex: cannot determine uuid for origin-foo" + + getRepoUUID does not look at the Repo's UUID setting, but reads it + from git-config. It's not set there for a proxied remote. + + So: Add annex-uuid parsing to RemoteConfig. 3. Implement proxying in git-annex-shell.