fix reversion

add: Avoid unncessarily converting a newly unlocked file to be stored
in git when it is not modified, even when annex.largefiles does not
match it.

This fixes a reversion in version 10.20220222, where git-annex unlock
followed by git-annex add, followed by git commit file could result in
git thinking the file was modified after the commit.

I do have half a mind to remove the withUnmodifiedUnlockedPointers part
of git-annex add. It seems weird, despite that old bug report arguing
a case of consistency that it ought to behave that way. When git-annex
add surpises me, it seems likely it's wrong.. But for now, this is the
smallest possible fix.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2022-03-21 15:50:27 -04:00
parent d7cd8491b0
commit 6079b0c72c
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 101 additions and 10 deletions

View file

@ -28,6 +28,13 @@ git-annex (10.20220223) UNRELEASED; urgency=medium
a new directory special remote, or change the configuration of your
existing remote:
git-annex enableremote foo ignoreinodes=yes
* add: Avoid unncessarily converting a newly unlocked file to be stored
in git when it is not modified, even when annex.largefiles does not
match it.
* The above change to add fixes a reversion in version 10.20220222,
where git-annex unlock followed by git-annex add, followed by git
commit file could result in git thinking the file was modified
after the commit.
-- Joey Hess <id@joeyh.name> Wed, 23 Feb 2022 14:14:09 -0400

View file

@ -78,16 +78,18 @@ seek o = startConcurrency commandStages $ do
largematcher <- largeFilesMatcher
addunlockedmatcher <- addUnlockedMatcher
annexdotfiles <- getGitConfigVal annexDotFiles
let gofile (si, file) = case largeFilesOverride o of
let gofile includingsmall (si, file) = case largeFilesOverride o of
Nothing ->
ifM (pure (annexdotfiles || not (dotfile file))
<&&> (checkFileMatcher largematcher file
<||> Annex.getState Annex.force))
( start o si file addunlockedmatcher
, ifM (annexAddSmallFiles <$> Annex.getGitConfig)
( startSmall o si file
, stop
)
, if includingsmall
then ifM (annexAddSmallFiles <$> Annex.getGitConfig)
( startSmall o si file
, stop
)
else stop
)
Just True -> start o si file addunlockedmatcher
Just False -> startSmallOverridden o si file
@ -96,7 +98,7 @@ seek o = startConcurrency commandStages $ do
| updateOnly o ->
giveup "--update --batch is not supported"
| otherwise -> batchOnly Nothing (addThese o) $
batchFiles fmt gofile
batchFiles fmt (gofile True)
NoBatch -> do
-- Avoid git ls-files complaining about files that
-- are not known to git yet, since this will add
@ -104,11 +106,14 @@ seek o = startConcurrency commandStages $ do
-- problems, like files that don't exist.
let ww = WarnUnmatchWorkTreeItems
l <- workTreeItems ww (addThese o)
let go a = a ww (commandAction . gofile) l
let go b a = a ww (commandAction . gofile b) l
unless (updateOnly o) $
go (withFilesNotInGit (checkGitIgnoreOption o))
go withFilesMaybeModified
go withUnmodifiedUnlockedPointers
go True (withFilesNotInGit (checkGitIgnoreOption o))
go True withFilesMaybeModified
-- Convert newly unlocked files back to locked files,
-- same as a modified unlocked file would get
-- locked when added.
go False withUnmodifiedUnlockedPointers
{- Pass file off to git-add. -}
startSmall :: AddOptions -> SeekInput -> RawFilePath -> CommandStart

View file

@ -175,3 +175,5 @@ Lots! I love git-annex!
[[!meta author=bpoldrack]]
[[!tag projects/datalad]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,77 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2022-03-21T17:33:28Z"
content="""
Isolated to [[!commit c68f52c6a25ebd8a12f4572da53fbba827d682aa]]
Since that commit is only a speed improvement, it can be reverted, which
does avoid the problem.
But I don't really see a bug in that commit either. It only causes
git to smudge the file when it's unlocked, rather than at some later
point. The result should be the same either way. Except, of course,
that you're varying the annex.largefiles config, so when git smudges
the file later, it sees a different result than when it smudges the file
while unlock is running.
I think that the actually surprising thing here is that
`git-annex add somefile` re-adds the file (adding it to git per the new
annex.largefiles configuration). The file is in typechange state at that
point but is not modified, so it does not need to be re-added at all.
But add seeks `withUnmodifiedUnlockedPointers` which finds files in
typechange state.
So git-annex add decides to add the file to git even though it was
previously added to the annex. At that point, I'd expect `git add` would
update the index and replace the smudged content with the unsmudged
content. However, it actually does not!
Looking at `git ls-files --stage` before and after git-annex add,
the annex pointer content remains staged:
100644 bf1114c25db2833499838552f3fa69635fcc1292 0 somefile
add .gitattributes (non-large file; adding content to git repository) ok
add somefile (non-large file; adding content to git repository) ok
(recording state in git...)
100644 bf1114c25db2833499838552f3fa69635fcc1292 0 somefile
The only reason that git commit goes on to commit the non-annexed
content is because you pass somefile to it, so it re-adds somefile
at that point, which runs the smudge filter, which passes the content
through.
And that's why git status then sees the file as modified --
the index has the annex pointer still staged in it, which is different
from what got committed.
So, probably this same thing could happen in another situation, even if
the change to git-annex unlock were reverted. Eg, if something else
caused git smudge to run just after unlock, before the annex.largefiles
config got changed.
Why is `git add` not updating the index? Presumably because the inode
and mtime have not changed, so it does not think it needs to. Even though
it's being run with the smudge filter disabled, and so sees a different
hash than the one in the index. I think a case could be made for this
being a bug in `git add`. Although maybe the `--renormalize` option
is intended to handle this kind of situation.
The documented way for a user to change a file from being stored in git-annex
to git is to first `git rm --cached` and then `git annex add --force-small`.
Which avoids that `git add` behavior.
I tracked git-annex add's behavior here back to
[[!commit 2743224658cee33657ad30653ff97de2363366c6]].
Which addressed an old bug, filed by Kyle, which can be seen at
[[!commit 5567496518]]. So it was intentional that unlocking a file
and then `git annex add` should lock it, for consistency with v5
and for consistency with unlocking a file
and then modifying it, and then `git annex add`. I am actually a bit
surprised at this behavior, but it was intended to handle locking files,
not adding files to git.
So, I think it would be fine for `git-annex add` to skip over a typechanged
file when it would be added to git rather than to the annex.
I've implemented that, and it does fix this bug.
"""]]