bug report

Sponsored-by: Luke Shumaker on Patreon
2022-06-14 12:51:04 -04:00 · 2022-06-14 12:51:04 -04:00 · b471438c51
commit b471438c51
parent b9e9ad1ffb
1 changed files with 48 additions and 0 deletions
--- a/doc/bugs/add_overwrite_race.mdwn
+++ b/doc/bugs/add_overwrite_race.mdwn
@ -0,0 +1,48 @@
+I was running `git-annex add` on 5 gb of files, and accidentially overwrote
+some of the first ones, which it had already processed, while it was
+running. This caused binary files to get staged in git, rather than the
+annex pointers. 
+
+Test case: 
+
+	echo hi > 1
+	dd if=/dev/urandom of=2 bs=1M count=1000
+	(sleep 2s; rm 1; echo bye > 1) &
+	git-annex add
+	git diff --cached 1
+	diff --git a/1 b/1
+	new file mode 100644
+	index 0000000..b023018
+	--- /dev/null
+	+++ b/1
+	@@ -0,0 +1 @@
+	+bye
+
+This happens due to ingestAdd using addLink on the symlink, 
+which just queues a "git add" of the file for later. In the
+meantime, the symlink is replaced with something else, so git
+adds that.
+
+It seems that the solution will be to use update-index rather than git add.
+Note that addLink has a comment about why it uses git add, which seems to mostly
+be that it's faster. It also sometimes relies on git add to check gitignore,
+although sometimes redundandly, some of the callers of it may rely on that
+and have to be changed to check it first themselves.
+
+When it's adding a file unlocked, it already stages the pointer file using
+update-index instead so there is no overwrite problem there.
+
+But, there's a similar problem when it decides not to annex a file
+and adds it to git. If the file content is overwritten then, it will
+git add the new content. Which may be large enough that it should have been
+added to the annex after all. Test case for this:
+
+	git config annex.largefiles largerthan=3b
+	echo hi > 1
+	dd if=/dev/urandom of=2 bs=1M count=1000
+	(sleep 2s; rm 1; echo bye > 1) &
+	git-annex add
+	git diff --cached 1
+
+Unsure how to fix this case yet?
+--[[Joey]]