Commit graph

39 commits

Author SHA1 Message Date
Joey Hess
ef3ab0769e
close pid lock only once no threads use it
This fixes a FD leak when annex.pidlock is set and -J is used. Also, it
fixes bugs where the pid lock file got deleted because one thread was
done with it, while another thread was still holding it open.

The LockPool now has two distinct types of resources,
one is per-LockHandle and is used for file Handles, which get closed
when the associated LockHandle is closed. The other one is per lock
file, and gets closed when no more LockHandles use that lock file,
including other shared locks of the same file.

That latter kind is used for the pid lock file, so it's opened by the
first thread to use a lock, and closed when the last thread closes a lock.

In practice, this means that eg git-annex get of several files opens and
closes the pidlock file a few times per file. While with -J5 it will open
the pidlock file, process a number of files, until all the threads happen to
finish together, at which point the pidlock file gets closed, and then
that repeats. So in either case, another process still gets a chance to
take the pidlock.

registerPostRelease has a rather intricate dance, there are fine-grained
STM locks, a STM lock of the pidfile itself, and the actual pidlock file
on disk that are all resolved in stages by it.

Sponsored-by: Dartmouth College's Datalad project
2021-12-06 15:01:39 -04:00
Joey Hess
774c7dab2f
Merge branch 'master' into pidlockfinegrained 2021-12-06 13:00:40 -04:00
Joey Hess
e464ffd641
update comment to current status 2021-12-03 18:41:51 -04:00
Joey Hess
e5ca67ea1c
fine-grained locking when annex.pidlock is enabled
This locking has been missing from the beginning of annex.pidlock.
It used to be possble, when two threads are doing conflicting things,
for both to run at the same time despite using locking. Seems likely
that nothing actually had a problem, but it was possible, and this
eliminates that possible source of failure.

Sponsored-by: Dartmouth College's Datalad project
2021-12-03 17:20:21 -04:00
Joey Hess
ed0afbc36b
avoid concurrent threads trying to take pid lock at same time
Seem there are several races that happen when 2 threads run PidLock.tryLock
at the same time. One involves checkSaneLock of the side lock file, which may
be deleted by another process that is dropping the lock, causing checkSaneLock
to fail. And even with the deletion disabled, it can still fail, Probably due
to linkToLock failing when a second thread overwrites the lock file.

The same can happen when 2 processes do, but then one process just fails
to take the lock, which is fine. But with 2 threads, some actions where failing
even though the process as a whole had the pid lock held.

Utility.LockPool.PidLock already maintains a STM lock, and since it uses
LockShared, 2 threads can hold the pidlock at the same time, and when
the first thread drops the lock, it will remain held by the second
thread, and so the pid lock file should not get deleted until the last
thread to hold it drops the lock. Which is the right behavior, and why a
LockShared STM lock is used in the first place.

The problem is that each time it takes the STM lock, it then also calls
PidLock.tryLock. So that was getting called repeatedly and concurrently.

Fixed by noticing when the shared lock is already held, and stop calling
PidLock.tryLock again, just use the pid lock that already exists then.

Also, LockFile.PidLock.tryLock was deleting the pid lock when it failed
to take the lock, which was entirely wrong. It should only drop the side
lock.

Sponsored-by: Dartmouth College's Datalad project
2021-12-01 17:14:39 -04:00
Joey Hess
a6699be79d
catch error statting pid lock file if it somehow does not exist
It ought to exist, since linkToLock has just created it. However,
Lustre seems to have a rather probabilisitic view of the contents of a
directory, so catching the error if it somehow does not exist and
running the same code path that would be ran if linkToLock failed
might avoid this fun Lustre failure.

Sponsored-by: Dartmouth College's Datalad project
2021-11-29 14:53:07 -04:00
Joey Hess
e853ef3095
decorate openTempFile errors with the template name
This is to track down what file in .git/annex/ is being written to via a
temp file when the repository is read-only.

Sponsored-by: Dartmouth College's Datalad project
2021-08-30 13:05:02 -04:00
Joey Hess
e505c03bcc
more RawFilePath conversion
nukeFile replaced with removeWhenExistsWith removeLink, which allows
using RawFilePath. Utility.Directory cannot use RawFilePath since setup
does not depend on posix.

This commit was sponsored by Graham Spencer on Patreon.
2020-10-29 10:50:29 -04:00
Joey Hess
08cbaee1f8
more RawFilePath conversion
Most of Git/ builds now.

Notable win is toTopFilePath no longer double converts

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-10-28 15:55:30 -04:00
Joey Hess
b68f214312
Display a message when git-annex has to wait for a pid lock file held by another process 2020-08-26 13:05:34 -04:00
Joey Hess
7bdb0cdc0d
add gitAnnexChildProcess and use instead of incorrect use of runsGitAnnexChildProcess
Fixes reversion in 8.20200617 that made annex.pidlock being enabled result
in some commands stalling, particularly those needing to autoinit.

Renamed runsGitAnnexChildProcess to make clearer where it should be
used.

Arguably, it would be better to have a way to make any process git-annex
runs have the env var set. But then it would need to take the pid lock
when running any and all processes, and that would be a problem when
git-annex runs two processes concurrently. So, I'm left doing it ad-hoc
in places where git-annex really does run a child process, directly
or indirectly via a particular git command.
2020-08-25 14:57:49 -04:00
Joey Hess
82448bdf39
fix a annex.pidlock issue
That made eg git-annex get of an unlocked file hang until the
annex.pidlocktimeout and then fail.

This fix should be fully thread safe no matter what else git-annex is
doing.

Only using runsGitAnnexChildProcess in the one place it's known to be a
problem. Could audit for all places where git-annex runs itself as a child
and add it to all of them, later.
2020-06-17 15:30:59 -04:00
Joey Hess
0210e81d83
async exception safety for openFd
Audited for openFile and openFd, and this fixes all the ones I found
where an async exception could prevent the file getting closed.

Except for the lock pool, which is a whole other can of worms.
2020-06-05 15:48:00 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
19a6227e6e
remove temp file in failure case 2017-06-06 14:23:33 -04:00
Joey Hess
ed639c140d
Fix bug that prevented transfer locks from working when run on SMB or other filesystem that does not support fcntl locks and hard links.
This commit was sponsored by Ethan Aubin.
2017-06-06 14:22:03 -04:00
Joey Hess
6dd806f1ad
stop using MissingH for MD5
Cryptonite is faster and allocates less, and I want to get rid of
MissingH use.

Note that the new dependency on memory is free; it's a dependency of
cryptonite.

This commit was supported by the NSF-funded DataLad project.
2017-05-15 21:36:03 -04:00
Joey Hess
5e6ced7d0f
Improve pid locking code to work on filesystems that don't support hard links.
Probing for hard link support in the pid locking code is redundant since
git-annex init already probes that. But, it didn't seem worth threading
that data through; the pid locking code runs at most once per git-annex
process, and only on unusual filesystems. Optimising a single hard link
and unlink isn't worth it.

This commit was sponsored by Francois Marier on Patreon.
2017-02-10 15:22:28 -04:00
Joey Hess
0a4479b8ec
Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.

Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.

This commit was sponsored by Ethan Aubin.
2016-11-15 21:29:54 -04:00
Joey Hess
b22409db38
avoid warnings about not exported System.Directory.isSymbolicLink 2016-04-28 15:18:11 -04:00
Joey Hess
5fe450514b
Fix build with directory-1.2.6.2.
It started exporting a isSymbolicLink which supports windows. But,
git-annex does no use symlinks on windows yet and this conflicts with the
function by the same name from unix-compat, so hide it.
2016-04-28 13:18:44 -04:00
Joey Hess
f219ffc33b
comment typo fix 2016-03-01 12:24:22 -04:00
Joey Hess
b6ac443b60
fix build warnings under ghc 7.10
Caused by AMP.. Since I've finally upgraded my dev laptop to 7.10,
I may start missing imports that are not needed with it but are with older
versions..
2015-12-19 17:42:45 -04:00
Joey Hess
c670a0642c
fix warning 2015-11-16 15:37:27 -04:00
Joey Hess
e2b4861bff
store abspath to the lock file
Avoids problems if the program chdirs
2015-11-16 15:25:04 -04:00
Joey Hess
2e44da5c46
clean up side lock files when we're done with them
There's a potential race, but it's detected and just results in the other
process failing to take the side lock, so possibly retrying one second
later on. The race window is quite narrow so the extra delay is minor.

Left the side lock files mode 666 because an interruption can leave a side
lock file created by another user for a shared repository. When this
happens, the non-owning user can't delete it (+t) but can still lock it,
and so the code falls back to acting as it did before this commit.
2015-11-16 11:36:11 -04:00
Joey Hess
8efd3d71c8
starting to get a handle on how to detect that mad gleam in lustre's eye 2015-11-13 16:18:44 -04:00
Joey Hess
70bfe218f5
one more try to get sane behavior our of lustre 2015-11-13 15:51:45 -04:00
Joey Hess
389c6c7d37
fixed a fd double-close 2015-11-13 15:43:09 -04:00
Joey Hess
b0155d9093
also compare lock file contents to double-check link worked
And it closes the tmp file before this. I don't know if this will help
avoid lustre's craziness, but it can't hurt..
2015-11-13 15:20:52 -04:00
Joey Hess
1aba23ab4e
use /tmp for sidelock file when no /dev/shm 2015-11-13 14:49:30 -04:00
Joey Hess
60a9c7f5c6
require the side lock be held to take pidlock
This is less portable, since currently sidelocks rely on /dev/shm.
But, I've seen crazy lustre inconsistencies that make me not trust the
link() method at all, so what can you do.
2015-11-13 14:44:53 -04:00
Joey Hess
85345abe8b
avoid over-long filenames for side lock files 2015-11-13 14:04:29 -04:00
Joey Hess
c2cbe5619b
add stat check
I have a strace taken on a lustre filesystem on which link() returned 0,
but didn't actually succeed, since the file already existed.

One of the linux man pages recommended using link followed by checking like
this. I was reading it yesterday, but cannot find it now.
2015-11-13 13:22:45 -04:00
Joey Hess
88d94e674c
clean up temp file 2015-11-13 12:52:24 -04:00
Joey Hess
e31a51c5bb
better lock dropping order 2015-11-13 12:36:37 -04:00
Joey Hess
77b490bfba
add timeout for pid lock waiting 2015-11-12 17:12:54 -04:00
Joey Hess
0f25a7365a
module for PidLocks in LockPool 2015-11-12 16:31:34 -04:00
Joey Hess
710d1eeeac
module for pid lock files with atomic stale lock file takeover when possible 2015-11-12 15:39:49 -04:00