fix addurl concurrency issue
addurl: Support adding the same url to multiple files at the same time when using -J with --batch --with-files. Implementation was easier than expected, was able to reuse OnlyActionOn. While it will download the url's content multiple times, that seems like the best thing to do; see my comment for why. Sponsored-by: Dartmouth College's DANDI project
This commit is contained in:
parent
a3cdff3fd5
commit
eb95ed4863
5 changed files with 42 additions and 2 deletions
|
@ -27,6 +27,8 @@ git-annex (8.20211012) UNRELEASED; urgency=medium
|
|||
the hook to freeze the file in ways that prevent moving it, such as
|
||||
removing the Windows delete permission.
|
||||
Thanks, Reiko Asakura.
|
||||
* addurl: Support adding the same url to multiple files at the same
|
||||
time when using -J with --batch --with-files.
|
||||
|
||||
-- Joey Hess <id@joeyh.name> Mon, 11 Oct 2021 14:09:13 -0400
|
||||
|
||||
|
|
|
@ -372,11 +372,21 @@ checkRaw o a
|
|||
- a filename. It's not displayed then for output consistency,
|
||||
- but is added to the json when available. -}
|
||||
startingAddUrl :: SeekInput -> URLString -> AddUrlOptions -> CommandPerform -> CommandStart
|
||||
startingAddUrl si url o p = starting "addurl" (ActionItemOther (Just url)) si $ do
|
||||
startingAddUrl si url o p = starting "addurl" ai si $ do
|
||||
case fileOption (downloadOptions o) of
|
||||
Nothing -> noop
|
||||
Just file -> maybeShowJSON $ JSONChunk [("file", file)]
|
||||
p
|
||||
where
|
||||
-- Avoid failure when the same url is downloaded concurrently
|
||||
-- to two different files, by using OnlyActionOn with a key
|
||||
-- based on the url. Note that this may not be the actual key
|
||||
-- that is used for the download; later size information may be
|
||||
-- available and get added to it. That's ok, this is only
|
||||
-- used to prevent two threads running concurrently when that would
|
||||
-- likely fail.
|
||||
ai = OnlyActionOn urlkey (ActionItemOther (Just url))
|
||||
urlkey = Backend.URL.fromUrl url Nothing
|
||||
|
||||
showDestinationFile :: FilePath -> Annex ()
|
||||
showDestinationFile file = do
|
||||
|
|
|
@ -2,3 +2,5 @@ While running `git-annex addurl --batch --with-files --jobs 10 --json --json-err
|
|||
|
||||
[[!meta author=jwodder]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
||||
|
|
|
@ -8,7 +8,7 @@ downloading, and the key transfer machinery prevents redundant downloads
|
|||
of the same Key at the same time.
|
||||
|
||||
Arguably, the problem is not where the message gets put, but that
|
||||
it fails when adding an url to two different paths.
|
||||
it fails when adding an url to two different paths at the same time.
|
||||
|
||||
I have, though, moved that message so it will appear in error-messages.
|
||||
"""]]
|
||||
|
|
|
@ -0,0 +1,26 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2021-10-27T18:56:23Z"
|
||||
content="""
|
||||
The best solution I can find is for it to notice when another thread is
|
||||
downloading the same url, and wait until it finishes. Then proceed
|
||||
with downloading the url for a second time.
|
||||
|
||||
It's not very satisfying to re-download. But once the url Key is downloaded,
|
||||
it does not keep that url Key populated, but hashes the content and moves
|
||||
the content to the final Key. It would be a real complication to
|
||||
communicate, across threads, what Key the content ended up at, and have the
|
||||
waiting thread use that. And addurl is already complicated well beyond a
|
||||
point I am comfortable with.
|
||||
|
||||
Also, the content of an url can of course change over time. If I feed
|
||||
"$url foo" into git-annex addurl --batch -J10 and then some time
|
||||
later, I feed "$url bar", I might expect that file bar gets whatever
|
||||
content the url has now, not the content that the url had back when I added
|
||||
the same url to file foo. And if I cared about avoiding re-downloading,
|
||||
I could add the url to the first file, and then copy the annex link to the
|
||||
second file myself.
|
||||
|
||||
Implemented this approach.
|
||||
"""]]
|
Loading…
Add table
Reference in a new issue