fix reversion

add: Avoid unncessarily converting a newly unlocked file to be stored in git when it is not modified, even when annex.largefiles does not match it. This fixes a reversion in version 10.20220222, where git-annex unlock followed by git-annex add, followed by git commit file could result in git thinking the file was modified after the commit. I do have half a mind to remove the withUnmodifiedUnlockedPointers part of git-annex add. It seems weird, despite that old bug report arguing a case of consistency that it ought to behave that way. When git-annex add surpises me, it seems likely it's wrong.. But for now, this is the smallest possible fix. Sponsored-by: Dartmouth College's Datalad project
2022-03-21 15:50:27 -04:00 · 2022-03-21 15:50:27 -04:00 · 6079b0c72c
commit 6079b0c72c
parent d7cd8491b0
4 changed files with 101 additions and 10 deletions
--- a/7
+++ b/7
@ -28,6 +28,13 @@ git-annex (10.20220223) UNRELEASED; urgency=medium
    a new directory special remote, or change the configuration of your
    existing remote:
      git-annex enableremote foo ignoreinodes=yes
+  * add: Avoid unncessarily converting a newly unlocked file to be stored
+    in git when it is not modified, even when annex.largefiles does not
+    match it.
+  * The above change to add fixes a reversion in version 10.20220222,
+    where git-annex unlock followed by git-annex add, followed by git
+    commit file could result in git thinking the file was modified
+    after the commit.

 -- Joey Hess <id@joeyh.name>  Wed, 23 Feb 2022 14:14:09 -0400

--- a/Command/Add.hs
+++ b/Command/Add.hs
@ -78,16 +78,18 @@ seek o = startConcurrency commandStages $ do
 	largematcher <- largeFilesMatcher
 	addunlockedmatcher <- addUnlockedMatcher
 	annexdotfiles <- getGitConfigVal annexDotFiles 
-	let gofile (si, file) = case largeFilesOverride o of
+	let gofile includingsmall (si, file) = case largeFilesOverride o of
 		Nothing -> 
 			ifM (pure (annexdotfiles || not (dotfile file))
 				<&&> (checkFileMatcher largematcher file 
 				<||> Annex.getState Annex.force))
 				( start o si file addunlockedmatcher
-				, ifM (annexAddSmallFiles <$> Annex.getGitConfig)
-					( startSmall o si file
-					, stop
-					)
+				, if includingsmall
+					then ifM (annexAddSmallFiles <$> Annex.getGitConfig)
+						( startSmall o si file
+						, stop
+						)
+					else stop
 				)
 		Just True -> start o si file addunlockedmatcher
 		Just False -> startSmallOverridden o si file
@ -96,7 +98,7 @@ seek o = startConcurrency commandStages $ do
 			| updateOnly o ->
 				giveup "--update --batch is not supported"
 			| otherwise -> batchOnly Nothing (addThese o) $
-				batchFiles fmt gofile
+				batchFiles fmt (gofile True)
 		NoBatch -> do
 			-- Avoid git ls-files complaining about files that
 			-- are not known to git yet, since this will add
@ -104,11 +106,14 @@ seek o = startConcurrency commandStages $ do
 			-- problems, like files that don't exist.
 			let ww = WarnUnmatchWorkTreeItems
 			l <- workTreeItems ww (addThese o)
-			let go a = a ww (commandAction . gofile) l
+			let go b a = a ww (commandAction . gofile b) l
 			unless (updateOnly o) $
-				go (withFilesNotInGit (checkGitIgnoreOption o))
-			go withFilesMaybeModified
-			go withUnmodifiedUnlockedPointers
+				go True (withFilesNotInGit (checkGitIgnoreOption o))
+			go True withFilesMaybeModified
+			-- Convert newly unlocked files back to locked files,
+			-- same as a modified unlocked file would get
+			-- locked when added.
+			go False withUnmodifiedUnlockedPointers

 {- Pass file off to git-add. -}
 startSmall :: AddOptions -> SeekInput -> RawFilePath -> CommandStart
--- a/doc/bugs/Change_to_annex.largefiles_leaves_repo_modified.mdwn
+++ b/doc/bugs/Change_to_annex.largefiles_leaves_repo_modified.mdwn
@ -175,3 +175,5 @@ Lots! I love git-annex!

 [[!meta author=bpoldrack]]
 [[!tag projects/datalad]]
+
+> [[fixed|done]] --[[Joey]]
--- a/doc/bugs/Change_to_annex.largefiles_leaves_repo_modified/comment_2_725d46358366ea79e375fd47beac6acd._comment
+++ b/doc/bugs/Change_to_annex.largefiles_leaves_repo_modified/comment_2_725d46358366ea79e375fd47beac6acd._comment
@ -0,0 +1,77 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2022-03-21T17:33:28Z"
+ content="""
+Isolated to [[!commit c68f52c6a25ebd8a12f4572da53fbba827d682aa]]
+
+Since that commit is only a speed improvement, it can be reverted, which
+does avoid the problem.
+
+But I don't really see a bug in that commit either. It only causes
+git to smudge the file when it's unlocked, rather than at some later
+point. The result should be the same either way. Except, of course,
+that you're varying the annex.largefiles config, so when git smudges
+the file later, it sees a different result than when it smudges the file
+while unlock is running.
+
+I think that the actually surprising thing here is that 
+`git-annex add somefile` re-adds the file (adding it to git per the new
+annex.largefiles configuration). The file is in typechange state at that
+point but is not modified, so it does not need to be re-added at all.
+But add seeks `withUnmodifiedUnlockedPointers` which finds files in
+typechange state.
+
+So git-annex add decides to add the file to git even though it was
+previously added to the annex. At that point, I'd expect `git add` would
+update the index and replace the smudged content with the unsmudged
+content. However, it actually does not!
+Looking at `git ls-files  --stage` before and after git-annex add,
+the annex pointer content remains staged:
+
+	100644 bf1114c25db2833499838552f3fa69635fcc1292 0	somefile
+	add .gitattributes (non-large file; adding content to git repository) ok
+	add somefile (non-large file; adding content to git repository) ok
+	(recording state in git...)
+	100644 bf1114c25db2833499838552f3fa69635fcc1292 0	somefile
+
+The only reason that git commit goes on to commit the non-annexed
+content is because you pass somefile to it, so it re-adds somefile
+at that point, which runs the smudge filter, which passes the content
+through.
+
+And that's why git status then sees the file as modified -- 
+the index has the annex pointer still staged in it, which is different
+from what got committed.
+
+So, probably this same thing could happen in another situation, even if
+the change to git-annex unlock were reverted. Eg, if something else
+caused git smudge to run just after unlock, before the annex.largefiles
+config got changed.
+
+Why is `git add` not updating the index? Presumably because the inode
+and mtime have not changed, so it does not think it needs to. Even though
+it's being run with the smudge filter disabled, and so sees a different
+hash than the one in the index. I think a case could be made for this
+being a bug in `git add`. Although maybe the `--renormalize` option
+is intended to handle this kind of situation.
+
+The documented way for a user to change a file from being stored in git-annex
+to git is to first `git rm --cached` and then `git annex add --force-small`.
+Which avoids that `git add` behavior.
+
+I tracked git-annex add's behavior here back to 
+[[!commit 2743224658cee33657ad30653ff97de2363366c6]].
+Which addressed an old bug, filed by Kyle, which can be seen at
+[[!commit 5567496518]]. So it was intentional that unlocking a file
+and then `git annex add` should lock it, for consistency with v5
+and for consistency with unlocking a file
+and then modifying it, and then `git annex add`. I am actually a bit
+surprised at this behavior, but it was intended to handle locking files,
+not adding files to git.
+
+So, I think it would be fine for `git-annex add` to skip over a typechanged
+file when it would be added to git rather than to the annex. 
+
+I've implemented that, and it does fix this bug.
+"""]]