From e0e16187a87c2dc42b54b655a5aa19bbdcecf9c3 Mon Sep 17 00:00:00 2001 From: prancewit Date: Tue, 13 Sep 2022 19:45:23 +0000 Subject: [PATCH 1/8] Added a comment: My current use case --- ...t_3_4f18712f974f85c3cef810714d304d85._comment | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_3_4f18712f974f85c3cef810714d304d85._comment diff --git a/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_3_4f18712f974f85c3cef810714d304d85._comment b/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_3_4f18712f974f85c3cef810714d304d85._comment new file mode 100644 index 0000000000..1edb09b9d0 --- /dev/null +++ b/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_3_4f18712f974f85c3cef810714d304d85._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="prancewit" + avatar="http://cdn.libravatar.org/avatar/f6cc165b68a5cca3311f9a1cd7fd027c" + subject="My current use case" + date="2022-09-13T19:45:23Z" + content=""" +Thanks for the response, Joey. + +Let me start by providing more details the use case where I noticed this slowness. + +I was using a slow remote with a lot of chunks. I stopped the upload and wanted to do a cleanup of the uploaded chunks. That's when I noticed that git-annex was requesting a removal of each chunk individually, even ones that never actually got uploaded. + +In this particular case, I could \"preload\" the data since I knew which chunks were valid and which ones weren't to make it faster (though I actually just waited it out) + +Also, like you mentioned, this MULTIREMOVE is most useful for this specific case so a more generic solution will definitely be much better. +"""]] From 97ce72210b0f89c4f617ef0709062fb408f1e440 Mon Sep 17 00:00:00 2001 From: pat Date: Tue, 13 Sep 2022 21:14:03 +0000 Subject: [PATCH 2/8] Added a comment --- ...comment_3_d9bd5f9ef8c8a8cc08cc39396f0a6f67._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/forum/Ensure_all_versions_are_on_remotes/comment_3_d9bd5f9ef8c8a8cc08cc39396f0a6f67._comment diff --git a/doc/forum/Ensure_all_versions_are_on_remotes/comment_3_d9bd5f9ef8c8a8cc08cc39396f0a6f67._comment b/doc/forum/Ensure_all_versions_are_on_remotes/comment_3_d9bd5f9ef8c8a8cc08cc39396f0a6f67._comment new file mode 100644 index 0000000000..34794bd433 --- /dev/null +++ b/doc/forum/Ensure_all_versions_are_on_remotes/comment_3_d9bd5f9ef8c8a8cc08cc39396f0a6f67._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="pat" + avatar="http://cdn.libravatar.org/avatar/6b552550673a6a6df3b33364076f8ea8" + subject="comment 3" + date="2022-09-13T21:14:02Z" + content=""" +hrm… as you can see in my post, I AM using “anything” as the wanted content. So I would expect all of the remotes (wasabi and machines) to get all of the file versions. But that’s not happening. It’s behaving more like “used” would. + +I will try “anything or unused” despite the fact that it seems like “or unused” should be unnecessary. +"""]] From ca1d6b0c5065abfc8ecc4798bdc609d4a38aa06c Mon Sep 17 00:00:00 2001 From: pat Date: Tue, 13 Sep 2022 21:15:33 +0000 Subject: [PATCH 3/8] --- doc/forum/Ensure_all_versions_are_on_remotes.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/forum/Ensure_all_versions_are_on_remotes.mdwn b/doc/forum/Ensure_all_versions_are_on_remotes.mdwn index 662f43a2f1..a4656f27d8 100644 --- a/doc/forum/Ensure_all_versions_are_on_remotes.mdwn +++ b/doc/forum/Ensure_all_versions_are_on_remotes.mdwn @@ -22,7 +22,7 @@ git annex required wasabi-west groupwanted git annex group machine1 active git annex group machine2 active -git annex groupwanted anything +git annex groupwanted active anything # from machine1 git annex sync -a origin machine2 wasabi-east wasabi-west From 518105f89c4af3e30f132e7e059adf5f8579c197 Mon Sep 17 00:00:00 2001 From: prancewit Date: Tue, 13 Sep 2022 21:32:45 +0000 Subject: [PATCH 4/8] Added a comment --- ...4_217b6295850d28cc797675ddc4904244._comment | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_4_217b6295850d28cc797675ddc4904244._comment diff --git a/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_4_217b6295850d28cc797675ddc4904244._comment b/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_4_217b6295850d28cc797675ddc4904244._comment new file mode 100644 index 0000000000..ae97f3eb09 --- /dev/null +++ b/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_4_217b6295850d28cc797675ddc4904244._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="prancewit" + avatar="http://cdn.libravatar.org/avatar/f6cc165b68a5cca3311f9a1cd7fd027c" + subject="comment 4" + date="2022-09-13T21:32:45Z" + content=""" +At a high level, I see 2 possible ways in which special remotes can optimize for larger queries. +1) Pre-fetch or cache state of existing keys (Mostly useful only for no-op requests. For instance, pre-fetch the list of keys in the remote enabling no-op REMOVEs, but hard to tell if there's been a separate change since the fetch) +2) Batch multiple requests in a single call. (Batching can be done before or after sending the SUCCESS response to git-annex with corresponding results) + +> So it might be that a protocol extension would be useful, some way for git-annex to indicate that it is blocked waiting on current requests to finish. + +I can think of a few related ways to do this: + +1) Have the remote send ACKs to notify that it's ready for the next request, and send SUCCESS when the request is actually completed. The remote can then have the flexibility to run them in whatever batch/async manner suitable. In the case of chunk-keys, git-annex could rapidly send successive keys in sequence since there's no additional lookup required making it pretty efficient. +2) Have git-annex send some kind of group identifier (all chunks of same key might be grouped together) or delimiter(eg: GROUP_COMPLETED). This acts as a hint that these requests could be batched together without any obligation from the remote to do so. Coupled with a guarantee that all items in one group will be sent sequentially, the first item that belongs to a different group provides a guarantee that the previous group is completed. In this case, the SUCCESS for the last item could be taken to mean that the entire group is completed. One issue here is that this could leak some information in encrypted repositories. +3) Define a CACHE_TIMEOUT_SECONDS. This could be used by the remote to decide if any pre-fetched or cached data can be trusted or if they should be re-checked. Git-annex would use this during merge/sync with other remotes to determine if there's a conflict that needs to be handled differently. (Seems too complicated TBH, but trying to see how we can make pre-fetch/caching work) +"""]] From 9f5f960548479ba4ef303772dbd82703d5a1f7ea Mon Sep 17 00:00:00 2001 From: prancewit Date: Tue, 13 Sep 2022 21:34:39 +0000 Subject: [PATCH 5/8] removed --- ...4_217b6295850d28cc797675ddc4904244._comment | 18 ------------------ 1 file changed, 18 deletions(-) delete mode 100644 doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_4_217b6295850d28cc797675ddc4904244._comment diff --git a/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_4_217b6295850d28cc797675ddc4904244._comment b/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_4_217b6295850d28cc797675ddc4904244._comment deleted file mode 100644 index ae97f3eb09..0000000000 --- a/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_4_217b6295850d28cc797675ddc4904244._comment +++ /dev/null @@ -1,18 +0,0 @@ -[[!comment format=mdwn - username="prancewit" - avatar="http://cdn.libravatar.org/avatar/f6cc165b68a5cca3311f9a1cd7fd027c" - subject="comment 4" - date="2022-09-13T21:32:45Z" - content=""" -At a high level, I see 2 possible ways in which special remotes can optimize for larger queries. -1) Pre-fetch or cache state of existing keys (Mostly useful only for no-op requests. For instance, pre-fetch the list of keys in the remote enabling no-op REMOVEs, but hard to tell if there's been a separate change since the fetch) -2) Batch multiple requests in a single call. (Batching can be done before or after sending the SUCCESS response to git-annex with corresponding results) - -> So it might be that a protocol extension would be useful, some way for git-annex to indicate that it is blocked waiting on current requests to finish. - -I can think of a few related ways to do this: - -1) Have the remote send ACKs to notify that it's ready for the next request, and send SUCCESS when the request is actually completed. The remote can then have the flexibility to run them in whatever batch/async manner suitable. In the case of chunk-keys, git-annex could rapidly send successive keys in sequence since there's no additional lookup required making it pretty efficient. -2) Have git-annex send some kind of group identifier (all chunks of same key might be grouped together) or delimiter(eg: GROUP_COMPLETED). This acts as a hint that these requests could be batched together without any obligation from the remote to do so. Coupled with a guarantee that all items in one group will be sent sequentially, the first item that belongs to a different group provides a guarantee that the previous group is completed. In this case, the SUCCESS for the last item could be taken to mean that the entire group is completed. One issue here is that this could leak some information in encrypted repositories. -3) Define a CACHE_TIMEOUT_SECONDS. This could be used by the remote to decide if any pre-fetched or cached data can be trusted or if they should be re-checked. Git-annex would use this during merge/sync with other remotes to determine if there's a conflict that needs to be handled differently. (Seems too complicated TBH, but trying to see how we can make pre-fetch/caching work) -"""]] From bef6eb5d022784dff7d6410f0f19c81871f06e78 Mon Sep 17 00:00:00 2001 From: prancewit Date: Tue, 13 Sep 2022 21:36:38 +0000 Subject: [PATCH 6/8] Added a comment --- ..._2dbf3558c7c969281f1cc5e1738d2c0b._comment | 21 +++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_4_2dbf3558c7c969281f1cc5e1738d2c0b._comment diff --git a/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_4_2dbf3558c7c969281f1cc5e1738d2c0b._comment b/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_4_2dbf3558c7c969281f1cc5e1738d2c0b._comment new file mode 100644 index 0000000000..3e838fd098 --- /dev/null +++ b/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_4_2dbf3558c7c969281f1cc5e1738d2c0b._comment @@ -0,0 +1,21 @@ +[[!comment format=mdwn + username="prancewit" + avatar="http://cdn.libravatar.org/avatar/f6cc165b68a5cca3311f9a1cd7fd027c" + subject="comment 4" + date="2022-09-13T21:36:38Z" + content=""" +At a high level, I see 2 possible ways in which special remotes can optimize for larger queries + +* Pre-fetch or cache state of existing keys (Mostly useful only for no-op requests. For instance, pre-fetch the list of keys in the remote enabling no-op REMOVEs, but hard to tell if there's been a separate change since the fetch) +* Batch multiple requests in a single call. (Batching can be done before or after sending the SUCCESS response to git-annex with corresponding results) + + +> So it might be that a protocol extension would be useful, some way for git-annex to indicate that it is blocked waiting on current requests to finish. + +I can think of a few related ways to do this: + +* Have the remote send ACKs to notify that it's ready for the next request, and send SUCCESS when the request is actually completed. The remote can then have the flexibility to run them in whatever batch/async manner suitable. In the case of chunk-keys, git-annex could rapidly send successive keys in sequence since there's no additional lookup required making it pretty efficient. +* Have git-annex send some kind of group identifier (all chunks of same key might be grouped together) or delimiter(eg: GROUP_COMPLETED). This acts as a hint that these requests could be batched together without any obligation from the remote to do so. Coupled with a guarantee that all items in one group will be sent sequentially, the first item that belongs to a different group provides a guarantee that the previous group is completed. In this case, the SUCCESS for the last item could be taken to mean that the entire group is completed. One issue here is that this could leak some information in encrypted repositories. +* Define a CACHE_TIMEOUT_SECONDS. This could be used by the remote to decide if any pre-fetched or cached data can be trusted or if they should be re-checked. Git-annex would use this during merge/sync with other remotes to determine if there's a conflict that needs to be handled differently. (Seems too complicated TBH, but trying to see how we can make pre-fetch/caching work) + +"""]] From d21bac4d887d5a8a7138fe03359fedb1859356e2 Mon Sep 17 00:00:00 2001 From: "nick.guenther@e418ed3c763dff37995c2ed5da4232a7c6cee0a9" Date: Tue, 13 Sep 2022 23:32:01 +0000 Subject: [PATCH 7/8] Added a comment --- ...omment_6_e13932ead5920c7599031e85127317bb._comment | 11 +++++++++++ 1 file changed, 11 insertions(+) create mode 100644 doc/bugs/http_remotes_ignore_annex.web-options_--netrc/comment_6_e13932ead5920c7599031e85127317bb._comment diff --git a/doc/bugs/http_remotes_ignore_annex.web-options_--netrc/comment_6_e13932ead5920c7599031e85127317bb._comment b/doc/bugs/http_remotes_ignore_annex.web-options_--netrc/comment_6_e13932ead5920c7599031e85127317bb._comment new file mode 100644 index 0000000000..14182c2689 --- /dev/null +++ b/doc/bugs/http_remotes_ignore_annex.web-options_--netrc/comment_6_e13932ead5920c7599031e85127317bb._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="nick.guenther@e418ed3c763dff37995c2ed5da4232a7c6cee0a9" + nickname="nick.guenther" + avatar="http://cdn.libravatar.org/avatar/9e85c6ca61c3f877fef4f91c2bf6e278" + subject="comment 6" + date="2022-09-13T23:32:01Z" + content=""" +That's awesome! Thanks very much joey. + +I'll mark this done now :) +"""]] From 50cdf369b99cb048b9f2ce9535fc505513c1894e Mon Sep 17 00:00:00 2001 From: "nick.guenther@e418ed3c763dff37995c2ed5da4232a7c6cee0a9" Date: Tue, 13 Sep 2022 23:32:49 +0000 Subject: [PATCH 8/8] closing --- doc/bugs/http_remotes_ignore_annex.web-options_--netrc.mdwn | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/bugs/http_remotes_ignore_annex.web-options_--netrc.mdwn b/doc/bugs/http_remotes_ignore_annex.web-options_--netrc.mdwn index 1f198c25d8..389a95bd94 100644 --- a/doc/bugs/http_remotes_ignore_annex.web-options_--netrc.mdwn +++ b/doc/bugs/http_remotes_ignore_annex.web-options_--netrc.mdwn @@ -290,3 +290,5 @@ UBUNTU_CODENAME=jammy ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) Sure! Lots! We use it to share a large open access dataset at https://github.com/spine-generic, and [I'm working on](https://github.com/neuropoly/gitea/pull/1) helping other researchers share their datasets on their own infrastructure using git-annex + gitea. + +[[done]]