smudge: check for known annexed inodes before checking annex.largefiles
smudge: Fix a case where an unlocked annexed file that annex.largefiles does not match could get its unchanged content checked into git, due to git running the smudge filter unecessarily. When the file has the same inodecache as an already annexed file, we can assume that the user is not intending to change how it's stored in git. Note that checkunchangedgitfile already handled the inverse case, where the file was added to git previously. That goes further and actually sha1 hashes the new file and checks if it's the same hash in the index. It would be possible to generate a key for the file and see if it's the same as the old key, however that could be considerably more expensive than sha1 of a small file is, and it is not necessary for the case I have, at least, where the file is not modified or touched, and so its inode will match the cache.
This commit is contained in:
parent
f2876804ca
commit
424bef6b6f
3 changed files with 43 additions and 19 deletions
|
@ -2,6 +2,9 @@ git-annex (8.20210429) UNRELEASED; urgency=medium
|
|||
|
||||
* fromkey: Create an unlocked file when used in an adjusted branch
|
||||
where the file should be unlocked, or when configured by annex.addunlocked.
|
||||
* smudge: Fix a case where an unlocked annexed file that annex.largefiles
|
||||
does not match could get its unchanged content checked into git,
|
||||
due to git running the smudge filter unecessarily.
|
||||
|
||||
-- Joey Hess <id@joeyh.name> Mon, 03 May 2021 10:33:10 -0400
|
||||
|
||||
|
|
|
@ -168,25 +168,26 @@ clean file = do
|
|||
filepath <- liftIO $ absPath file
|
||||
return $ not $ dirContains repopath filepath
|
||||
|
||||
-- If annex.largefiles is configured, matching files are added to the
|
||||
-- annex. But annex.gitaddtoannex can be set to false to disable that.
|
||||
-- If annex.largefiles is configured (and not disabled by annex.gitaddtoannex
|
||||
-- being set to false), matching files are added to the annex and the rest to
|
||||
-- git.
|
||||
--
|
||||
-- When annex.largefiles is not configured, files are normally not
|
||||
-- added to the annex, so will be added to git. But some heuristics
|
||||
-- are used to avoid bad behavior:
|
||||
-- added to the annex, so will be added to git. However, if the file
|
||||
-- is annexed in the index, keep it annexed. This prevents accidental
|
||||
-- conversions when previously annexed files get modified and added.
|
||||
--
|
||||
-- If the file is annexed in the index, keep it annexed.
|
||||
-- This prevents accidental conversions.
|
||||
--
|
||||
-- Otherwise, when the file's inode is the same as one that was used for
|
||||
-- annexed content before, annex it. This handles cases such as renaming an
|
||||
-- unlocked annexed file followed by git add, which the user naturally
|
||||
-- expects to behave the same as git mv.
|
||||
-- In either case, if the file's inode is the same as one that was used
|
||||
-- for annexed content before, annex it. And if the file is not annexed
|
||||
-- in the index, and has the same content, leave it in git.
|
||||
-- This handles cases such as renaming a file followed by git add,
|
||||
-- which the user naturally expects to behave the same as git mv.
|
||||
shouldAnnex :: RawFilePath -> Maybe (Sha, FileSize, ObjectType) -> Maybe Key -> Annex Bool
|
||||
shouldAnnex file indexmeta moldkey = ifM (annexGitAddToAnnex <$> Annex.getGitConfig)
|
||||
( checkunchangedgitfile $ checkmatcher checkheuristics
|
||||
, checkunchangedgitfile checkheuristics
|
||||
)
|
||||
shouldAnnex file indexmeta moldkey = do
|
||||
ifM (annexGitAddToAnnex <$> Annex.getGitConfig)
|
||||
( checkunchanged $ checkmatcher checkwasannexed
|
||||
, checkunchanged checkwasannexed
|
||||
)
|
||||
where
|
||||
checkmatcher d
|
||||
| dotfile file = ifM (getGitConfigVal annexDotFiles)
|
||||
|
@ -199,14 +200,21 @@ shouldAnnex file indexmeta moldkey = ifM (annexGitAddToAnnex <$> Annex.getGitCon
|
|||
matcher <- largeFilesMatcher
|
||||
checkFileMatcher' matcher file d
|
||||
|
||||
checkheuristics = case moldkey of
|
||||
Just _ -> return True
|
||||
Nothing -> checkknowninode
|
||||
checkwasannexed = pure $ isJust moldkey
|
||||
|
||||
checkknowninode = withTSDelta (liftIO . genInodeCache file) >>= \case
|
||||
isknownannexedinode = withTSDelta (liftIO . genInodeCache file) >>= \case
|
||||
Nothing -> pure False
|
||||
Just ic -> Database.Keys.isInodeKnown ic =<< sentinalStatus
|
||||
|
||||
-- If the inode matches one known used for annexed content,
|
||||
-- keep the file annexed. This handles a case where the file
|
||||
-- has been annexed before, and the git is running the clean filter
|
||||
-- again on it for whatever reason.
|
||||
checkunchanged cont = ifM isknownannexedinode
|
||||
( return True
|
||||
, checkunchangedgitfile cont
|
||||
)
|
||||
|
||||
-- This checks for a case where the file had been added to git
|
||||
-- previously, not to the annex before, and its content is not
|
||||
-- changed, but git is running the clean filter again on it
|
||||
|
|
|
@ -0,0 +1,13 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2021-05-03T16:26:26Z"
|
||||
content="""
|
||||
I think this is a bug.
|
||||
|
||||
The smudge/clean filter already handles several similar cases so ought to
|
||||
also be able to handle this one.
|
||||
|
||||
I've made a change that seems to work, and will *probably* not break other
|
||||
cases, although this is a complex and subtle area.
|
||||
"""]]
|
Loading…
Reference in a new issue