This commit is contained in:
Joey Hess 2022-09-13 12:46:05 -04:00
parent 733a74a7e8
commit e13444fb2b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 79 additions and 0 deletions

View file

@ -0,0 +1,38 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2022-09-13T16:02:58Z"
content="""
This extension to the protocol would only be useful when removing chunks,
because otherwise git-annex doesn't have a way to build up a list of keys
that are going to be removed, in a way that could usefully be sent to the
external special remote together.
For chunks, it has a list of keys. So this is feasible.
I wonder if it's necessary to extend the protocol though. If an external
special remote wants to, it can buffer a list of keys that it's been told
to remove, and return REMOVE-SUCCESS to each request before actually
doing the removal. It could then
remove all the buffered keys in a single API call or whatever.
The risk of course is that if the removal fails, or it's interrupted before
it can do the removal, it will have incorrectly claimed to remove the keys.
And git-annex will have recorded incorrect information and wrongly
indicated the removal succeeded. This would not be a good idea for non-chunk
keys (although `fsck --fast --from` the remote could recover from it).
For a set of chunk keys that are all chunks of the same key, though,
git-annex doesn't record anything until they've all been successfully
removed. Also, it so happens that after asking for all the chunked keys to
be removed, git-annex normally[1] then asks for the unchunked key to be
removed too. So, a special remote could buffer chunked keys until it sees
an unchunked key, and then remove them all efficiently, and reply to the
removal of the unchunked key with the combined result of all the removals.
[1] The exception is that, if the special remote is not currently
configured to use chunking, git-annex happens to remove the unchunked key
first, followed by all the chunked keys. I don't think there is a good
reason for this in removal; it's a useful optimisation for retrieving
content that happens to affect removal too.
"""]]

View file

@ -0,0 +1,41 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2022-09-13T16:26:06Z"
content="""
I don't know that the above comment is really a good idea for an external
remote to try to implement. Needing to know about chunk keys is not too
bad, but it also relies on details of git-annex's implementation.
But it seems worth considering possibilities
like that, since this extension would only be used in such a relative
corner case.
Or possibly considering ways to generalize the idea to be usable in more
cases..
Along those lines, it occurs to me that the async extension to the
protocol is somewhat similar, since git-annex can ask the external remote
to do several things at the same time. Removals of chunk keys are not
currently run concurrently, but they could be. An external remote could
then gather together some number of concurrent remove requests and perform
them all in a single API call (or whatever).
But how would the external remote know when it's seen all the remove
requests for chunks of a key? It seems like it would need to use a
heuristic, like no new requests in some amount of time means git-annex is
waiting on it to remove everything it's been requested to remove.
So it might be that a protocol extension would be useful, some way for
git-annex to indicate that it is blocked waiting on current requests to
finish. That seems more general purpose than a MULTIREMOVE extension.
For example, git-annex could also send it when retrieving chunks.
(Although retrieving chunks is also not currently done concurrently.)
(There's also a question of how many concurrent removals of chunk
keys it would make sense for git-annex to request at the same time. It
could request removing all chunks concurrently but if the special
remote needs to do much work or use resources for each request, that
might not be good. It would probably be more natural to use something
based on -J.)
"""]]