Avoid concurrent git-config setting problem when running concurrent threads.

See my comment. This only avoids the problem for -J; two git-annex
processes started at the same time could still both try to write to
.git/config and one fail. That would be very unlikely though, and it
doesn't really seem worth adding an additional layer of locking around
.git/config.

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2017-05-25 18:27:01 -04:00
parent 7db37ddde0
commit e1cf095ae8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 40 additions and 0 deletions

View file

@ -6,6 +6,8 @@ git-annex (6.20170520) UNRELEASED; urgency=medium
a transfer does not resume.
* Fix transfer log file locking problem when running concurrent
transfers.
* Avoid concurrent git-config setting problem when running concurrent
threads.
-- Joey Hess <id@joeyh.name> Wed, 24 May 2017 14:03:40 -0400

View file

@ -16,6 +16,7 @@ import Types.Command
import Types.Concurrency
import Messages.Concurrent
import Types.Messages
import Remote.List
import Control.Concurrent.Async
import Control.Exception (throwIO)
@ -57,6 +58,12 @@ commandAction a = go =<< Annex.getState Annex.concurrency
ws <- Annex.getState Annex.workers
(st, ws') <- if null ws
then do
-- Generate the remote list now, to avoid
-- each thread generating it, which would
-- be more expensive and could cause
-- threads to contend over eg, calls to
-- setConfig.
_ <- remoteList
st <- dupState
return (st, replicate (n-1) (Left st))
else do

View file

@ -29,3 +29,5 @@ SHA256E-s328--c2eb8088cdc71a0d4cbd660312bef5421a47ce7da3655efdb17712e7188be4a1.t
"""]]
[[!meta author=yoh]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,29 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2017-05-25T21:52:37Z"
content="""
The .git/config concurrent access happens because the remote
list is only generated on demand, and nothing demands it when running with
-J until all the threads are spun up and each thread has its own state
then, so each generates the remote list.
There don't look to be any other git-config settings that
would cause problems for concurrency other than ones run while
generating the remote list.
So, generating the remote list before starting concurrent threads
would avoid that problem, and also leads to a slightly faster startup
since the remote git config only has to be read once, etc.
The only risk in doing that would be if generating a Remote
opens some kind of handle, which can't be used concurrently, or
is less efficient if only opened once and then used by multiple threads.
I've audited all the Remote.*.gen methods, and they all
seem ok. For example, Remote.External.gen sets up a worker pool
of external special remote processes, and new ones are allocated as needed.
And Remote.P2P.chainGen sets up a connection pool.
Ok, gone ahead with this fix.
"""]]