Revert "smudge: check for known annexed inodes before checking annex.largefiles"

This reverts commit 424bef6b6f.

This commit caused other buggy behavior unfortunately.
This commit is contained in:
Joey Hess 2021-05-10 12:19:26 -04:00
parent 56ccc0302e
commit 72a8bbce12
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 65 additions and 30 deletions

View file

@ -2,9 +2,6 @@ git-annex (8.20210429) UNRELEASED; urgency=medium
* fromkey: Create an unlocked file when used in an adjusted branch
where the file should be unlocked, or when configured by annex.addunlocked.
* smudge: Fix a case where an unlocked annexed file that annex.largefiles
does not match could get its unchanged content checked into git,
due to git running the smudge filter unecessarily.
* Fix behavior of several commands, including reinject, addurl, and rmurl
when given an absolute path to an unlocked file, or a relative path
that leaves and re-enters the repository.

View file

@ -168,26 +168,25 @@ clean file = do
filepath <- liftIO $ absPath file
return $ not $ dirContains repopath filepath
-- If annex.largefiles is configured (and not disabled by annex.gitaddtoannex
-- being set to false), matching files are added to the annex and the rest to
-- git.
-- If annex.largefiles is configured, matching files are added to the
-- annex. But annex.gitaddtoannex can be set to false to disable that.
--
-- When annex.largefiles is not configured, files are normally not
-- added to the annex, so will be added to git. However, if the file
-- is annexed in the index, keep it annexed. This prevents accidental
-- conversions when previously annexed files get modified and added.
-- added to the annex, so will be added to git. But some heuristics
-- are used to avoid bad behavior:
--
-- In either case, if the file's inode is the same as one that was used
-- for annexed content before, annex it. And if the file is not annexed
-- in the index, and has the same content, leave it in git.
-- This handles cases such as renaming a file followed by git add,
-- which the user naturally expects to behave the same as git mv.
-- If the file is annexed in the index, keep it annexed.
-- This prevents accidental conversions.
--
-- Otherwise, when the file's inode is the same as one that was used for
-- annexed content before, annex it. This handles cases such as renaming an
-- unlocked annexed file followed by git add, which the user naturally
-- expects to behave the same as git mv.
shouldAnnex :: RawFilePath -> Maybe (Sha, FileSize, ObjectType) -> Maybe Key -> Annex Bool
shouldAnnex file indexmeta moldkey = do
ifM (annexGitAddToAnnex <$> Annex.getGitConfig)
( checkunchanged $ checkmatcher checkwasannexed
, checkunchanged checkwasannexed
)
shouldAnnex file indexmeta moldkey = ifM (annexGitAddToAnnex <$> Annex.getGitConfig)
( checkunchangedgitfile $ checkmatcher checkheuristics
, checkunchangedgitfile checkheuristics
)
where
checkmatcher d
| dotfile file = ifM (getGitConfigVal annexDotFiles)
@ -200,21 +199,14 @@ shouldAnnex file indexmeta moldkey = do
matcher <- largeFilesMatcher
checkFileMatcher' matcher file d
checkwasannexed = pure $ isJust moldkey
checkheuristics = case moldkey of
Just _ -> return True
Nothing -> checkknowninode
isknownannexedinode = withTSDelta (liftIO . genInodeCache file) >>= \case
checkknowninode = withTSDelta (liftIO . genInodeCache file) >>= \case
Nothing -> pure False
Just ic -> Database.Keys.isInodeKnown ic =<< sentinalStatus
-- If the inode matches one known used for annexed content,
-- keep the file annexed. This handles a case where the file
-- has been annexed before, and the git is running the clean filter
-- again on it for whatever reason.
checkunchanged cont = ifM isknownannexedinode
( return True
, checkunchangedgitfile cont
)
-- This checks for a case where the file had been added to git
-- previously, not to the annex before, and its content is not
-- changed, but git is running the clean filter again on it

View file

@ -115,3 +115,5 @@ Thanks for taking a look.
[[!meta author=kyle]]
[[!tag projects/datalad]]
> commit reverted; [[done]] --[[Joey]]

View file

@ -0,0 +1,44 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-05-10T16:04:07Z"
content="""
On the second `git annex add foo`, I see:
add foo (non-large file; adding content to git repository) ok
Which seems right as far as the largefiles config goes.
That in turn runs `git add`, which runs the smudge filter. Due to my change
in [[!commit 424bef6b6]], the smudge filter then sees an inode it already
knows (because the file was unlocked), and so it avoids adding the file to
git, and leaves it annexed.
I don't quite understand how the different ways of running git commit play
into this, or how the file ends up, after the commit, to having a
non-annexed file recorded in the index. I guess the smudge filter must
end up being run again, and perhaps the [[!commit 424bef6b6]] change
doesn't work then, but anyway the behavior before that point is already a
problem.
Another instance of I think the same problem is following the
[[tips/largefiles]] recipe to convert an annexed file to git:
git init repo; cd repo; git annex init
echo foo > file
git annex add file
git commit -m add
git annex unlock file
git rm --cached file
git annex add --force-small file
git commit -m convert file
This results in a commit that unlocked the file but leaves
it annexed, and after the commit, git diff --cached shows that the index
has staged a conversion from unlocked to stored in git. So very similar,
and clearly [[!commit 424bef6b6]] cannot be allowed to cause that breakage,
which does not even involve largefiles!
So I've reverted the commit for now, and we can discuss next steps in the
forum thread.
"""]]