comments
This commit is contained in:
parent
733a74a7e8
commit
e13444fb2b
2 changed files with 79 additions and 0 deletions
|
@ -0,0 +1,38 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 1"""
|
||||||
|
date="2022-09-13T16:02:58Z"
|
||||||
|
content="""
|
||||||
|
This extension to the protocol would only be useful when removing chunks,
|
||||||
|
because otherwise git-annex doesn't have a way to build up a list of keys
|
||||||
|
that are going to be removed, in a way that could usefully be sent to the
|
||||||
|
external special remote together.
|
||||||
|
|
||||||
|
For chunks, it has a list of keys. So this is feasible.
|
||||||
|
|
||||||
|
I wonder if it's necessary to extend the protocol though. If an external
|
||||||
|
special remote wants to, it can buffer a list of keys that it's been told
|
||||||
|
to remove, and return REMOVE-SUCCESS to each request before actually
|
||||||
|
doing the removal. It could then
|
||||||
|
remove all the buffered keys in a single API call or whatever.
|
||||||
|
|
||||||
|
The risk of course is that if the removal fails, or it's interrupted before
|
||||||
|
it can do the removal, it will have incorrectly claimed to remove the keys.
|
||||||
|
And git-annex will have recorded incorrect information and wrongly
|
||||||
|
indicated the removal succeeded. This would not be a good idea for non-chunk
|
||||||
|
keys (although `fsck --fast --from` the remote could recover from it).
|
||||||
|
|
||||||
|
For a set of chunk keys that are all chunks of the same key, though,
|
||||||
|
git-annex doesn't record anything until they've all been successfully
|
||||||
|
removed. Also, it so happens that after asking for all the chunked keys to
|
||||||
|
be removed, git-annex normally[1] then asks for the unchunked key to be
|
||||||
|
removed too. So, a special remote could buffer chunked keys until it sees
|
||||||
|
an unchunked key, and then remove them all efficiently, and reply to the
|
||||||
|
removal of the unchunked key with the combined result of all the removals.
|
||||||
|
|
||||||
|
[1] The exception is that, if the special remote is not currently
|
||||||
|
configured to use chunking, git-annex happens to remove the unchunked key
|
||||||
|
first, followed by all the chunked keys. I don't think there is a good
|
||||||
|
reason for this in removal; it's a useful optimisation for retrieving
|
||||||
|
content that happens to affect removal too.
|
||||||
|
"""]]
|
|
@ -0,0 +1,41 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 2"""
|
||||||
|
date="2022-09-13T16:26:06Z"
|
||||||
|
content="""
|
||||||
|
I don't know that the above comment is really a good idea for an external
|
||||||
|
remote to try to implement. Needing to know about chunk keys is not too
|
||||||
|
bad, but it also relies on details of git-annex's implementation.
|
||||||
|
|
||||||
|
But it seems worth considering possibilities
|
||||||
|
like that, since this extension would only be used in such a relative
|
||||||
|
corner case.
|
||||||
|
|
||||||
|
Or possibly considering ways to generalize the idea to be usable in more
|
||||||
|
cases..
|
||||||
|
|
||||||
|
Along those lines, it occurs to me that the async extension to the
|
||||||
|
protocol is somewhat similar, since git-annex can ask the external remote
|
||||||
|
to do several things at the same time. Removals of chunk keys are not
|
||||||
|
currently run concurrently, but they could be. An external remote could
|
||||||
|
then gather together some number of concurrent remove requests and perform
|
||||||
|
them all in a single API call (or whatever).
|
||||||
|
|
||||||
|
But how would the external remote know when it's seen all the remove
|
||||||
|
requests for chunks of a key? It seems like it would need to use a
|
||||||
|
heuristic, like no new requests in some amount of time means git-annex is
|
||||||
|
waiting on it to remove everything it's been requested to remove.
|
||||||
|
|
||||||
|
So it might be that a protocol extension would be useful, some way for
|
||||||
|
git-annex to indicate that it is blocked waiting on current requests to
|
||||||
|
finish. That seems more general purpose than a MULTIREMOVE extension.
|
||||||
|
For example, git-annex could also send it when retrieving chunks.
|
||||||
|
(Although retrieving chunks is also not currently done concurrently.)
|
||||||
|
|
||||||
|
(There's also a question of how many concurrent removals of chunk
|
||||||
|
keys it would make sense for git-annex to request at the same time. It
|
||||||
|
could request removing all chunks concurrently but if the special
|
||||||
|
remote needs to do much work or use resources for each request, that
|
||||||
|
might not be good. It would probably be more natural to use something
|
||||||
|
based on -J.)
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue