analysis

2021-12-01 13:38:47 -04:00 · 2021-12-01 13:38:47 -04:00 · d4e99d902b
commit d4e99d902b
parent b7976e08f0
1 changed files with 56 additions and 0 deletions
--- a/doc/bugs/annex_get_should_retry_failed_downloads_from_S3/comment_2_5da78a05b781029fa7f0f9d8ead7e093._comment
+++ b/doc/bugs/annex_get_should_retry_failed_downloads_from_S3/comment_2_5da78a05b781029fa7f0f9d8ead7e093._comment
@ -0,0 +1,56 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2021-12-01T17:05:14Z"
+ content="""
+What's happening is Utility.LockFile.PidLock.tryLock
+calls trySideLock and in these failure cases,
+that returns Nothing, because another thread also happens to
+have taken the sidelock. 
+
+Due to the concurrency, the pid lock file always already exists, so
+linkToLock fails. When it is able to take the side lock, it treats this as
+a stale pid lock file situation, and takes over the pid lock.
+
+(This could also affect other locks than transfer locks, potentially.)
+
+One way to solve it would be to use a LockPool instead to take the side
+lock. Then multiple concurrent threads could all lock the side lock.
+
+Or, it could special case when the pid in the pid lock file is the same as
+the current pid, and handle it in the lock file take over code.
+
+----
+
+Now, annex.pidlock is supposed to be a big top-level lock, which is used
+instead of the fine-grained locking. What if 2 threads are each wanting
+to take a lock before operating on the same resource? If either of the
+solutions above is implemented, then both threads will "succeed" at locking
+even though it's a shared pidlock. Which could result in any undefined
+behavior.
+
+And, in all the cases where it does *not* fail to take the transfer lock,
+but instead takes over the pid lock, we're perilously close to such a thing
+happening! The only reason it's not a problem, in the case of transfers
+is that OnlyActionOn is used and prevents two threads transferring the same
+key. (Also used for dropping etc.)
+
+But this makes me worry that using annex.pidlock with concurrency enabled
+is flirting with disaster, and perhaps it should refuse to use concurrency
+to avoid such potential problems. Unless there's a way to avoid them
+entirely.
+
+Hmm.. Normally all locking in git-annex uses LockPool, which
+handles inter-process locking. If LockPool is used, one of those 2 threads
+will win at the lock and the other one will wait for it. But,
+Annex.LockPool.PosixOrPid.tryPidLock/pidLock do not use LockPool
+for fine-grained locking when pid locking is enabled.
+
+So, I think it's potentially buggy in the unsafe direction of letting
+2 threads "lock" the same resource, as well as in the safe direction of
+sometimes unncessarily failing. Both need to be fixed.
+
+I am leaning toward a process only taking the pid lock once and holding it
+until exit, with LockPool used to make that thread safe. And add fine grained
+locking using LockPool when doing pid locking.
+"""]]