From 72a8bbce124f84c7f104497875bd8e84992ab998 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 10 May 2021 12:19:26 -0400 Subject: [PATCH] Revert "smudge: check for known annexed inodes before checking annex.largefiles" This reverts commit 424bef6b6f43ba9e09b61faa337fc6c9104e1a1f. This commit caused other buggy behavior unfortunately. --- CHANGELOG | 3 -- Command/Smudge.hs | 46 ++++++++----------- ...ing_pathspec_with_git-commit_leaves_s.mdwn | 2 + ..._a71ce38ec25075ac7037550094da2bf7._comment | 44 ++++++++++++++++++ 4 files changed, 65 insertions(+), 30 deletions(-) create mode 100644 doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s/comment_1_a71ce38ec25075ac7037550094da2bf7._comment diff --git a/CHANGELOG b/CHANGELOG index 4f73fbcec7..b5d2290e43 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -2,9 +2,6 @@ git-annex (8.20210429) UNRELEASED; urgency=medium * fromkey: Create an unlocked file when used in an adjusted branch where the file should be unlocked, or when configured by annex.addunlocked. - * smudge: Fix a case where an unlocked annexed file that annex.largefiles - does not match could get its unchanged content checked into git, - due to git running the smudge filter unecessarily. * Fix behavior of several commands, including reinject, addurl, and rmurl when given an absolute path to an unlocked file, or a relative path that leaves and re-enters the repository. diff --git a/Command/Smudge.hs b/Command/Smudge.hs index cbecd055f6..4903ce4a4f 100644 --- a/Command/Smudge.hs +++ b/Command/Smudge.hs @@ -168,26 +168,25 @@ clean file = do filepath <- liftIO $ absPath file return $ not $ dirContains repopath filepath --- If annex.largefiles is configured (and not disabled by annex.gitaddtoannex --- being set to false), matching files are added to the annex and the rest to --- git. +-- If annex.largefiles is configured, matching files are added to the +-- annex. But annex.gitaddtoannex can be set to false to disable that. -- -- When annex.largefiles is not configured, files are normally not --- added to the annex, so will be added to git. However, if the file --- is annexed in the index, keep it annexed. This prevents accidental --- conversions when previously annexed files get modified and added. +-- added to the annex, so will be added to git. But some heuristics +-- are used to avoid bad behavior: -- --- In either case, if the file's inode is the same as one that was used --- for annexed content before, annex it. And if the file is not annexed --- in the index, and has the same content, leave it in git. --- This handles cases such as renaming a file followed by git add, --- which the user naturally expects to behave the same as git mv. +-- If the file is annexed in the index, keep it annexed. +-- This prevents accidental conversions. +-- +-- Otherwise, when the file's inode is the same as one that was used for +-- annexed content before, annex it. This handles cases such as renaming an +-- unlocked annexed file followed by git add, which the user naturally +-- expects to behave the same as git mv. shouldAnnex :: RawFilePath -> Maybe (Sha, FileSize, ObjectType) -> Maybe Key -> Annex Bool -shouldAnnex file indexmeta moldkey = do - ifM (annexGitAddToAnnex <$> Annex.getGitConfig) - ( checkunchanged $ checkmatcher checkwasannexed - , checkunchanged checkwasannexed - ) +shouldAnnex file indexmeta moldkey = ifM (annexGitAddToAnnex <$> Annex.getGitConfig) + ( checkunchangedgitfile $ checkmatcher checkheuristics + , checkunchangedgitfile checkheuristics + ) where checkmatcher d | dotfile file = ifM (getGitConfigVal annexDotFiles) @@ -200,21 +199,14 @@ shouldAnnex file indexmeta moldkey = do matcher <- largeFilesMatcher checkFileMatcher' matcher file d - checkwasannexed = pure $ isJust moldkey + checkheuristics = case moldkey of + Just _ -> return True + Nothing -> checkknowninode - isknownannexedinode = withTSDelta (liftIO . genInodeCache file) >>= \case + checkknowninode = withTSDelta (liftIO . genInodeCache file) >>= \case Nothing -> pure False Just ic -> Database.Keys.isInodeKnown ic =<< sentinalStatus - -- If the inode matches one known used for annexed content, - -- keep the file annexed. This handles a case where the file - -- has been annexed before, and the git is running the clean filter - -- again on it for whatever reason. - checkunchanged cont = ifM isknownannexedinode - ( return True - , checkunchangedgitfile cont - ) - -- This checks for a case where the file had been added to git -- previously, not to the annex before, and its content is not -- changed, but git is running the clean filter again on it diff --git a/doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn b/doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn index 862bb1fc8d..a9d33dff6e 100644 --- a/doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn +++ b/doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn @@ -115,3 +115,5 @@ Thanks for taking a look. [[!meta author=kyle]] [[!tag projects/datalad]] + +> commit reverted; [[done]] --[[Joey]] diff --git a/doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s/comment_1_a71ce38ec25075ac7037550094da2bf7._comment b/doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s/comment_1_a71ce38ec25075ac7037550094da2bf7._comment new file mode 100644 index 0000000000..d4776c2799 --- /dev/null +++ b/doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s/comment_1_a71ce38ec25075ac7037550094da2bf7._comment @@ -0,0 +1,44 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2021-05-10T16:04:07Z" + content=""" +On the second `git annex add foo`, I see: + + add foo (non-large file; adding content to git repository) ok + +Which seems right as far as the largefiles config goes. + +That in turn runs `git add`, which runs the smudge filter. Due to my change +in [[!commit 424bef6b6]], the smudge filter then sees an inode it already +knows (because the file was unlocked), and so it avoids adding the file to +git, and leaves it annexed. + +I don't quite understand how the different ways of running git commit play +into this, or how the file ends up, after the commit, to having a +non-annexed file recorded in the index. I guess the smudge filter must +end up being run again, and perhaps the [[!commit 424bef6b6]] change +doesn't work then, but anyway the behavior before that point is already a +problem. + +Another instance of I think the same problem is following the +[[tips/largefiles]] recipe to convert an annexed file to git: + + git init repo; cd repo; git annex init + echo foo > file + git annex add file + git commit -m add + git annex unlock file + git rm --cached file + git annex add --force-small file + git commit -m convert file + +This results in a commit that unlocked the file but leaves +it annexed, and after the commit, git diff --cached shows that the index +has staged a conversion from unlocked to stored in git. So very similar, +and clearly [[!commit 424bef6b6]] cannot be allowed to cause that breakage, +which does not even involve largefiles! + +So I've reverted the commit for now, and we can discuss next steps in the +forum thread. +"""]]