Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
e3d19c7674
6 changed files with 61 additions and 1 deletions
|
@ -290,3 +290,5 @@ UBUNTU_CODENAME=jammy
|
|||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
Sure! Lots! We use it to share a large open access dataset at https://github.com/spine-generic, and [I'm working on](https://github.com/neuropoly/gitea/pull/1) helping other researchers share their datasets on their own infrastructure using git-annex + gitea.
|
||||
|
||||
[[done]]
|
||||
|
|
|
@ -0,0 +1,11 @@
|
|||
[[!comment format=mdwn
|
||||
username="nick.guenther@e418ed3c763dff37995c2ed5da4232a7c6cee0a9"
|
||||
nickname="nick.guenther"
|
||||
avatar="http://cdn.libravatar.org/avatar/9e85c6ca61c3f877fef4f91c2bf6e278"
|
||||
subject="comment 6"
|
||||
date="2022-09-13T23:32:01Z"
|
||||
content="""
|
||||
That's awesome! Thanks very much joey.
|
||||
|
||||
I'll mark this done now :)
|
||||
"""]]
|
|
@ -22,7 +22,7 @@ git annex required wasabi-west groupwanted
|
|||
|
||||
git annex group machine1 active
|
||||
git annex group machine2 active
|
||||
git annex groupwanted anything
|
||||
git annex groupwanted active anything
|
||||
|
||||
# from machine1
|
||||
git annex sync -a origin machine2 wasabi-east wasabi-west
|
||||
|
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="pat"
|
||||
avatar="http://cdn.libravatar.org/avatar/6b552550673a6a6df3b33364076f8ea8"
|
||||
subject="comment 3"
|
||||
date="2022-09-13T21:14:02Z"
|
||||
content="""
|
||||
hrm… as you can see in my post, I AM using “anything” as the wanted content. So I would expect all of the remotes (wasabi and machines) to get all of the file versions. But that’s not happening. It’s behaving more like “used” would.
|
||||
|
||||
I will try “anything or unused” despite the fact that it seems like “or unused” should be unnecessary.
|
||||
"""]]
|
|
@ -0,0 +1,16 @@
|
|||
[[!comment format=mdwn
|
||||
username="prancewit"
|
||||
avatar="http://cdn.libravatar.org/avatar/f6cc165b68a5cca3311f9a1cd7fd027c"
|
||||
subject="My current use case"
|
||||
date="2022-09-13T19:45:23Z"
|
||||
content="""
|
||||
Thanks for the response, Joey.
|
||||
|
||||
Let me start by providing more details the use case where I noticed this slowness.
|
||||
|
||||
I was using a slow remote with a lot of chunks. I stopped the upload and wanted to do a cleanup of the uploaded chunks. That's when I noticed that git-annex was requesting a removal of each chunk individually, even ones that never actually got uploaded.
|
||||
|
||||
In this particular case, I could \"preload\" the data since I knew which chunks were valid and which ones weren't to make it faster (though I actually just waited it out)
|
||||
|
||||
Also, like you mentioned, this MULTIREMOVE is most useful for this specific case so a more generic solution will definitely be much better.
|
||||
"""]]
|
|
@ -0,0 +1,21 @@
|
|||
[[!comment format=mdwn
|
||||
username="prancewit"
|
||||
avatar="http://cdn.libravatar.org/avatar/f6cc165b68a5cca3311f9a1cd7fd027c"
|
||||
subject="comment 4"
|
||||
date="2022-09-13T21:36:38Z"
|
||||
content="""
|
||||
At a high level, I see 2 possible ways in which special remotes can optimize for larger queries
|
||||
|
||||
* Pre-fetch or cache state of existing keys (Mostly useful only for no-op requests. For instance, pre-fetch the list of keys in the remote enabling no-op REMOVEs, but hard to tell if there's been a separate change since the fetch)
|
||||
* Batch multiple requests in a single call. (Batching can be done before or after sending the SUCCESS response to git-annex with corresponding results)
|
||||
|
||||
|
||||
> So it might be that a protocol extension would be useful, some way for git-annex to indicate that it is blocked waiting on current requests to finish.
|
||||
|
||||
I can think of a few related ways to do this:
|
||||
|
||||
* Have the remote send ACKs to notify that it's ready for the next request, and send SUCCESS when the request is actually completed. The remote can then have the flexibility to run them in whatever batch/async manner suitable. In the case of chunk-keys, git-annex could rapidly send successive keys in sequence since there's no additional lookup required making it pretty efficient.
|
||||
* Have git-annex send some kind of group identifier (all chunks of same key might be grouped together) or delimiter(eg: GROUP_COMPLETED). This acts as a hint that these requests could be batched together without any obligation from the remote to do so. Coupled with a guarantee that all items in one group will be sent sequentially, the first item that belongs to a different group provides a guarantee that the previous group is completed. In this case, the SUCCESS for the last item could be taken to mean that the entire group is completed. One issue here is that this could leak some information in encrypted repositories.
|
||||
* Define a CACHE_TIMEOUT_SECONDS. This could be used by the remote to decide if any pre-fetched or cached data can be trusted or if they should be re-checked. Git-annex would use this during merge/sync with other remotes to determine if there's a conflict that needs to be handled differently. (Seems too complicated TBH, but trying to see how we can make pre-fetch/caching work)
|
||||
|
||||
"""]]
|
Loading…
Reference in a new issue