This locking has been missing from the beginning of annex.pidlock.
It used to be possble, when two threads are doing conflicting things,
for both to run at the same time despite using locking. Seems likely
that nothing actually had a problem, but it was possible, and this
eliminates that possible source of failure.
Sponsored-by: Dartmouth College's Datalad project
Seem there are several races that happen when 2 threads run PidLock.tryLock
at the same time. One involves checkSaneLock of the side lock file, which may
be deleted by another process that is dropping the lock, causing checkSaneLock
to fail. And even with the deletion disabled, it can still fail, Probably due
to linkToLock failing when a second thread overwrites the lock file.
The same can happen when 2 processes do, but then one process just fails
to take the lock, which is fine. But with 2 threads, some actions where failing
even though the process as a whole had the pid lock held.
Utility.LockPool.PidLock already maintains a STM lock, and since it uses
LockShared, 2 threads can hold the pidlock at the same time, and when
the first thread drops the lock, it will remain held by the second
thread, and so the pid lock file should not get deleted until the last
thread to hold it drops the lock. Which is the right behavior, and why a
LockShared STM lock is used in the first place.
The problem is that each time it takes the STM lock, it then also calls
PidLock.tryLock. So that was getting called repeatedly and concurrently.
Fixed by noticing when the shared lock is already held, and stop calling
PidLock.tryLock again, just use the pid lock that already exists then.
Also, LockFile.PidLock.tryLock was deleting the pid lock when it failed
to take the lock, which was entirely wrong. It should only drop the side
lock.
Sponsored-by: Dartmouth College's Datalad project
This fixes a reversion introduced in commit
ac56a5c2a0.
I didn't notice there that it was handling the case of a shared lock
file that was still open elsewhere by not running the close action.
This was especially deadly when annex.pidlock is set, as it caused early
deletion of the pid lock file.
Sponsored-by: Dartmouth College's Datalad project
It ought to exist, since linkToLock has just created it. However,
Lustre seems to have a rather probabilisitic view of the contents of a
directory, so catching the error if it somehow does not exist and
running the same code path that would be ran if linkToLock failed
might avoid this fun Lustre failure.
Sponsored-by: Dartmouth College's Datalad project
Commit b6e4ed9aa7 made non-annexed files
be re-uploaded every time, since they're not tracked in the location log,
and it made it check the location log. Don't do that for non-annexed files.
Sponsored-by: Brock Spratlen on Patreon