f259be7f39
When adding a small file, it does not get locked down, so can be modified after git-annex checks that it's small. The use of queued git add made the race window nice and wide too. Fixed by checking if the file has changed, and by not using git add. Instead, have to recapitulate git add's handling of things like symlinks and executable files. Sponsored-by: Jochen Bartl on Patreon
59 lines
1.9 KiB
Markdown
59 lines
1.9 KiB
Markdown
I was running `git-annex add` on 5 gb of files, and accidentially overwrote
|
|
some of the first ones, which it had already processed, while it was
|
|
running. This caused binary files to get staged in git, rather than the
|
|
annex pointers.
|
|
|
|
Test case:
|
|
|
|
echo hi > 1
|
|
dd if=/dev/urandom of=2 bs=1M count=1000
|
|
(sleep 2s; rm 1; echo bye > 1) &
|
|
git-annex add
|
|
git diff --cached 1
|
|
diff --git a/1 b/1
|
|
new file mode 100644
|
|
index 0000000..b023018
|
|
--- /dev/null
|
|
+++ b/1
|
|
@@ -0,0 +1 @@
|
|
+bye
|
|
|
|
This happens due to ingestAdd using addLink on the symlink,
|
|
which just queues a "git add" of the file for later. In the
|
|
meantime, the symlink is replaced with something else, so git
|
|
adds that.
|
|
|
|
It seems that the solution will be to use update-index rather than git add.
|
|
Note that addLink has a comment about why it uses git add, which seems to mostly
|
|
be that it's faster. It also sometimes relies on git add to check gitignore,
|
|
although sometimes redundandly, some of the callers of it may rely on that
|
|
and have to be changed to check it first themselves.
|
|
|
|
Since adding a file to the annex also involves locking it down and
|
|
detecting modifications made while generating the key, update-index is
|
|
sufficient.
|
|
|
|
> Update: This is fixed.
|
|
|
|
When it's adding a file unlocked, it already stages the pointer file using
|
|
update-index instead so there is no overwrite problem there.
|
|
|
|
But, there's a similar problem when it decides not to annex a file
|
|
and adds it to git. If the file content is overwritten then, it will
|
|
git add the new content. Which may be large enough that it should have been
|
|
added to the annex after all. Test case for this:
|
|
|
|
git config annex.largefiles largerthan=3b
|
|
echo hi > 1
|
|
dd if=/dev/urandom of=2 bs=1M count=1000
|
|
(sleep 2s; rm 1; echo bye > 1) &
|
|
git-annex add
|
|
git diff --cached 1
|
|
|
|
Guess it needs to cache the inode,
|
|
hash the file content, then verifiy the inode did not change during
|
|
hashing, and then also use update-index.
|
|
|
|
> [[done]]
|
|
|
|
--[[Joey]]
|