Revert "smudge: check for known annexed inodes before checking annex.largefiles"

This reverts commit 424bef6b6f. This commit caused other buggy behavior unfortunately.
2021-05-10 12:19:26 -04:00 · 2021-05-10 12:19:26 -04:00 · 72a8bbce12
commit 72a8bbce12
parent 56ccc0302e
4 changed files with 65 additions and 30 deletions
--- a/3
+++ b/3
@ -2,9 +2,6 @@ git-annex (8.20210429) UNRELEASED; urgency=medium

  * fromkey: Create an unlocked file when used in an adjusted branch
    where the file should be unlocked, or when configured by annex.addunlocked.
-  * smudge: Fix a case where an unlocked annexed file that annex.largefiles
-    does not match could get its unchanged content checked into git,
-    due to git running the smudge filter unecessarily.
  * Fix behavior of several commands, including reinject, addurl, and rmurl
    when given an absolute path to an unlocked file, or a relative path
    that leaves and re-enters the repository.
--- a/Command/Smudge.hs
+++ b/Command/Smudge.hs
@ -168,26 +168,25 @@ clean file = do
 		filepath <- liftIO $ absPath file
 		return $ not $ dirContains repopath filepath

-- If annex.largefiles is configured (and not disabled by annex.gitaddtoannex
-- being set to false), matching files are added to the annex and the rest to
-- git.
+-- If annex.largefiles is configured, matching files are added to the
+-- annex. But annex.gitaddtoannex can be set to false to disable that.
 --
 -- When annex.largefiles is not configured, files are normally not
-- added to the annex, so will be added to git. However, if the file
-- is annexed in the index, keep it annexed. This prevents accidental
-- conversions when previously annexed files get modified and added.
+-- added to the annex, so will be added to git. But some heuristics
+-- are used to avoid bad behavior:
 --
-- In either case, if the file's inode is the same as one that was used
-- for annexed content before, annex it. And if the file is not annexed
-- in the index, and has the same content, leave it in git.
-- This handles cases such as renaming a file followed by git add,
-- which the user naturally expects to behave the same as git mv.
+-- If the file is annexed in the index, keep it annexed.
+-- This prevents accidental conversions.
+--
+-- Otherwise, when the file's inode is the same as one that was used for
+-- annexed content before, annex it. This handles cases such as renaming an
+-- unlocked annexed file followed by git add, which the user naturally
+-- expects to behave the same as git mv.
 shouldAnnex :: RawFilePath -> Maybe (Sha, FileSize, ObjectType) -> Maybe Key -> Annex Bool
-shouldAnnex file indexmeta moldkey = do
-	ifM (annexGitAddToAnnex <$> Annex.getGitConfig)
-		( checkunchanged $ checkmatcher checkwasannexed
-		, checkunchanged checkwasannexed
-		)
+shouldAnnex file indexmeta moldkey = ifM (annexGitAddToAnnex <$> Annex.getGitConfig)
+	( checkunchangedgitfile $ checkmatcher checkheuristics
+	, checkunchangedgitfile checkheuristics
+	)
  where
 	checkmatcher d
 		| dotfile file = ifM (getGitConfigVal annexDotFiles)
@ -200,21 +199,14 @@ shouldAnnex file indexmeta moldkey = do
 			matcher <- largeFilesMatcher
 			checkFileMatcher' matcher file d
 	
-	checkwasannexed = pure $ isJust moldkey
+	checkheuristics = case moldkey of
+		Just _ -> return True
+		Nothing -> checkknowninode

-	isknownannexedinode = withTSDelta (liftIO . genInodeCache file) >>= \case
+	checkknowninode = withTSDelta (liftIO . genInodeCache file) >>= \case
 		Nothing -> pure False
 		Just ic -> Database.Keys.isInodeKnown ic =<< sentinalStatus

-	-- If the inode matches one known used for annexed content,
-	-- keep the file annexed. This handles a case where the file
-	-- has been annexed before, and the git is running the clean filter
-	-- again on it for whatever reason.
-	checkunchanged cont = ifM isknownannexedinode
-		( return True
-		, checkunchangedgitfile cont
-		)
-
 	-- This checks for a case where the file had been added to git
 	-- previously, not to the annex before, and its content is not
 	-- changed, but git is running the clean filter again on it
--- a/doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn
+++ b/doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn
@ -115,3 +115,5 @@ Thanks for taking a look.

 [[!meta author=kyle]]
 [[!tag projects/datalad]]
+
+> commit reverted; [[done]] --[[Joey]]
--- a/doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s/comment_1_a71ce38ec25075ac7037550094da2bf7._comment
+++ b/doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s/comment_1_a71ce38ec25075ac7037550094da2bf7._comment
@ -0,0 +1,44 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 1"""
+ date="2021-05-10T16:04:07Z"
+ content="""
+On the second `git annex add foo`, I see:
+
+	add foo (non-large file; adding content to git repository) ok
+
+Which seems right as far as the largefiles config goes.
+
+That in turn runs `git add`, which runs the smudge filter. Due to my change
+in [[!commit 424bef6b6]], the smudge filter then sees an inode it already
+knows (because the file was unlocked), and so it avoids adding the file to
+git, and leaves it annexed.
+
+I don't quite understand how the different ways of running git commit play
+into this, or how the file ends up, after the commit, to having a
+non-annexed file recorded in the index. I guess the smudge filter must
+end up being run again, and perhaps the [[!commit 424bef6b6]] change
+doesn't work then, but anyway the behavior before that point is already a
+problem.
+
+Another instance of I think the same problem is following the
+[[tips/largefiles]] recipe to convert an annexed file to git:
+
+	git init repo; cd repo; git annex init
+	echo foo > file
+	git annex add file
+	git commit -m add
+	git annex unlock file
+	git rm --cached file
+	git annex add --force-small file
+	git commit -m convert file
+
+This results in a commit that unlocked the file but leaves
+it annexed, and after the commit, git diff --cached shows that the index
+has staged a conversion from unlocked to stored in git. So very similar,
+and clearly [[!commit 424bef6b6]] cannot be allowed to cause that breakage,
+which does not even involve largefiles!
+
+So I've reverted the commit for now, and we can discuss next steps in the
+forum thread.
+"""]]