From e13444fb2b2ae0a780f1a578bb1d156f4a5928da Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 13 Sep 2022 12:46:05 -0400 Subject: [PATCH] comments --- ..._cc95104ae0803db35fee504152db046a._comment | 38 +++++++++++++++++ ..._f58cc7b04948a8c758f97778631b0f02._comment | 41 +++++++++++++++++++ 2 files changed, 79 insertions(+) create mode 100644 doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_1_cc95104ae0803db35fee504152db046a._comment create mode 100644 doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_2_f58cc7b04948a8c758f97778631b0f02._comment diff --git a/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_1_cc95104ae0803db35fee504152db046a._comment b/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_1_cc95104ae0803db35fee504152db046a._comment new file mode 100644 index 0000000000..966e9d220b --- /dev/null +++ b/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_1_cc95104ae0803db35fee504152db046a._comment @@ -0,0 +1,38 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2022-09-13T16:02:58Z" + content=""" +This extension to the protocol would only be useful when removing chunks, +because otherwise git-annex doesn't have a way to build up a list of keys +that are going to be removed, in a way that could usefully be sent to the +external special remote together. + +For chunks, it has a list of keys. So this is feasible. + +I wonder if it's necessary to extend the protocol though. If an external +special remote wants to, it can buffer a list of keys that it's been told +to remove, and return REMOVE-SUCCESS to each request before actually +doing the removal. It could then +remove all the buffered keys in a single API call or whatever. + +The risk of course is that if the removal fails, or it's interrupted before +it can do the removal, it will have incorrectly claimed to remove the keys. +And git-annex will have recorded incorrect information and wrongly +indicated the removal succeeded. This would not be a good idea for non-chunk +keys (although `fsck --fast --from` the remote could recover from it). + +For a set of chunk keys that are all chunks of the same key, though, +git-annex doesn't record anything until they've all been successfully +removed. Also, it so happens that after asking for all the chunked keys to +be removed, git-annex normally[1] then asks for the unchunked key to be +removed too. So, a special remote could buffer chunked keys until it sees +an unchunked key, and then remove them all efficiently, and reply to the +removal of the unchunked key with the combined result of all the removals. + +[1] The exception is that, if the special remote is not currently +configured to use chunking, git-annex happens to remove the unchunked key +first, followed by all the chunked keys. I don't think there is a good +reason for this in removal; it's a useful optimisation for retrieving +content that happens to affect removal too. +"""]] diff --git a/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_2_f58cc7b04948a8c758f97778631b0f02._comment b/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_2_f58cc7b04948a8c758f97778631b0f02._comment new file mode 100644 index 0000000000..37ef514e6e --- /dev/null +++ b/doc/todo/Special_remotes__58___support_for_MULTIREMOVE/comment_2_f58cc7b04948a8c758f97778631b0f02._comment @@ -0,0 +1,41 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2022-09-13T16:26:06Z" + content=""" +I don't know that the above comment is really a good idea for an external +remote to try to implement. Needing to know about chunk keys is not too +bad, but it also relies on details of git-annex's implementation. + +But it seems worth considering possibilities +like that, since this extension would only be used in such a relative +corner case. + +Or possibly considering ways to generalize the idea to be usable in more +cases.. + +Along those lines, it occurs to me that the async extension to the +protocol is somewhat similar, since git-annex can ask the external remote +to do several things at the same time. Removals of chunk keys are not +currently run concurrently, but they could be. An external remote could +then gather together some number of concurrent remove requests and perform +them all in a single API call (or whatever). + +But how would the external remote know when it's seen all the remove +requests for chunks of a key? It seems like it would need to use a +heuristic, like no new requests in some amount of time means git-annex is +waiting on it to remove everything it's been requested to remove. + +So it might be that a protocol extension would be useful, some way for +git-annex to indicate that it is blocked waiting on current requests to +finish. That seems more general purpose than a MULTIREMOVE extension. +For example, git-annex could also send it when retrieving chunks. +(Although retrieving chunks is also not currently done concurrently.) + +(There's also a question of how many concurrent removals of chunk +keys it would make sense for git-annex to request at the same time. It +could request removing all chunks concurrently but if the special +remote needs to do much work or use resources for each request, that +might not be good. It would probably be more natural to use something +based on -J.) +"""]]