7ebf234616
Deleting lock files is tricky, tricky stuff. I think I got it right!
61 lines
2.5 KiB
Markdown
61 lines
2.5 KiB
Markdown
There's a race that can result in 2 concurrent downloads
|
|
of the same key. This can happen because the transfer lock files get
|
|
deleted after a transfer completes.
|
|
|
|
Scenario:
|
|
|
|
1. first process creates lock file, takes lock, starts download
|
|
2. second process opens existing lock file
|
|
3. first process finishes/fails download, so deletes lock file, and then
|
|
drops lock
|
|
4. second process takes lock (exclusive, non-blocking) of deleted file!
|
|
5. third process sees no lock file, so creates a new one, takes lock,
|
|
starts download
|
|
|
|
Worst-case, this might result in data loss, if the two concurrent
|
|
downloaders confuse one-another. Perhaps the second one overwrites the file
|
|
the first was downloading, and then the first, thinking it's written the
|
|
file, moves it into the object directory.
|
|
|
|
Note that the window between 4-5 can be as long as it takes for a
|
|
file to download. However, the window between 2-4 is very narrow indeed,
|
|
since the second process is not blocking on the lock.
|
|
So, this is very unlikely to happen.
|
|
|
|
But, it could. Can it be prevented?
|
|
|
|
Yes: The second process, after taking the lock, can check that
|
|
the lock file exists, and has the same dev and fileno as the fd it has
|
|
locked. If the lock file doesn't exist, or is different, then the
|
|
second process can give up.
|
|
|
|
Oh BTW, this race cannot happen on windows, since on windows, an open lock
|
|
file cannot be deleted by another process.
|
|
|
|
> [[fixed|done]]
|
|
|
|
Let's also consider if this change will allow for safely expiring stale
|
|
lock files..
|
|
|
|
Situation before with expiry would have been buggy (which is why it never
|
|
tried to expire them):
|
|
|
|
1. first process creates lock file, takes lock, is interrupted (no more
|
|
lock)
|
|
2. second process wants to delete stale lock file; checks if it's locked;
|
|
isn't
|
|
3. third process opens existing lock file, prepares to take lock
|
|
4. second process deletes lock file
|
|
5. third process takes lock
|
|
6. third process is now doing a transfer w/o a (visible) lock
|
|
|
|
But, after this bug fix, the third process checks if it's lock file
|
|
exists after taking the lock. Since the second process deleted it,
|
|
the third process fails with an error "transfer in progress"
|
|
which is perhaps not accurate, but at least it stopped.
|
|
|
|
For this to work though, the second process needs to actually take
|
|
the lock, in non-blocking mode. If it succeeds, it can keep the lock
|
|
held while deleting the lock file. This ensures that when the third process
|
|
takes the lock, the lock file will already be deleted by the time it checks
|
|
if it's there.
|