bug report

Sponsored-by: Luke Shumaker on Patreon
This commit is contained in:
Joey Hess 2022-06-14 12:51:04 -04:00
parent b9e9ad1ffb
commit b471438c51
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38

View file

@ -0,0 +1,48 @@
I was running `git-annex add` on 5 gb of files, and accidentially overwrote
some of the first ones, which it had already processed, while it was
running. This caused binary files to get staged in git, rather than the
annex pointers.
Test case:
echo hi > 1
dd if=/dev/urandom of=2 bs=1M count=1000
(sleep 2s; rm 1; echo bye > 1) &
git-annex add
git diff --cached 1
diff --git a/1 b/1
new file mode 100644
index 0000000..b023018
--- /dev/null
+++ b/1
@@ -0,0 +1 @@
+bye
This happens due to ingestAdd using addLink on the symlink,
which just queues a "git add" of the file for later. In the
meantime, the symlink is replaced with something else, so git
adds that.
It seems that the solution will be to use update-index rather than git add.
Note that addLink has a comment about why it uses git add, which seems to mostly
be that it's faster. It also sometimes relies on git add to check gitignore,
although sometimes redundandly, some of the callers of it may rely on that
and have to be changed to check it first themselves.
When it's adding a file unlocked, it already stages the pointer file using
update-index instead so there is no overwrite problem there.
But, there's a similar problem when it decides not to annex a file
and adds it to git. If the file content is overwritten then, it will
git add the new content. Which may be large enough that it should have been
added to the annex after all. Test case for this:
git config annex.largefiles largerthan=3b
echo hi > 1
dd if=/dev/urandom of=2 bs=1M count=1000
(sleep 2s; rm 1; echo bye > 1) &
git-annex add
git diff --cached 1
Unsure how to fix this case yet?
--[[Joey]]