fix reversion
add: Avoid unncessarily converting a newly unlocked file to be stored in git when it is not modified, even when annex.largefiles does not match it. This fixes a reversion in version 10.20220222, where git-annex unlock followed by git-annex add, followed by git commit file could result in git thinking the file was modified after the commit. I do have half a mind to remove the withUnmodifiedUnlockedPointers part of git-annex add. It seems weird, despite that old bug report arguing a case of consistency that it ought to behave that way. When git-annex add surpises me, it seems likely it's wrong.. But for now, this is the smallest possible fix. Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
parent
d7cd8491b0
commit
6079b0c72c
4 changed files with 101 additions and 10 deletions
|
@ -28,6 +28,13 @@ git-annex (10.20220223) UNRELEASED; urgency=medium
|
|||
a new directory special remote, or change the configuration of your
|
||||
existing remote:
|
||||
git-annex enableremote foo ignoreinodes=yes
|
||||
* add: Avoid unncessarily converting a newly unlocked file to be stored
|
||||
in git when it is not modified, even when annex.largefiles does not
|
||||
match it.
|
||||
* The above change to add fixes a reversion in version 10.20220222,
|
||||
where git-annex unlock followed by git-annex add, followed by git
|
||||
commit file could result in git thinking the file was modified
|
||||
after the commit.
|
||||
|
||||
-- Joey Hess <id@joeyh.name> Wed, 23 Feb 2022 14:14:09 -0400
|
||||
|
||||
|
|
|
@ -78,16 +78,18 @@ seek o = startConcurrency commandStages $ do
|
|||
largematcher <- largeFilesMatcher
|
||||
addunlockedmatcher <- addUnlockedMatcher
|
||||
annexdotfiles <- getGitConfigVal annexDotFiles
|
||||
let gofile (si, file) = case largeFilesOverride o of
|
||||
let gofile includingsmall (si, file) = case largeFilesOverride o of
|
||||
Nothing ->
|
||||
ifM (pure (annexdotfiles || not (dotfile file))
|
||||
<&&> (checkFileMatcher largematcher file
|
||||
<||> Annex.getState Annex.force))
|
||||
( start o si file addunlockedmatcher
|
||||
, ifM (annexAddSmallFiles <$> Annex.getGitConfig)
|
||||
( startSmall o si file
|
||||
, stop
|
||||
)
|
||||
, if includingsmall
|
||||
then ifM (annexAddSmallFiles <$> Annex.getGitConfig)
|
||||
( startSmall o si file
|
||||
, stop
|
||||
)
|
||||
else stop
|
||||
)
|
||||
Just True -> start o si file addunlockedmatcher
|
||||
Just False -> startSmallOverridden o si file
|
||||
|
@ -96,7 +98,7 @@ seek o = startConcurrency commandStages $ do
|
|||
| updateOnly o ->
|
||||
giveup "--update --batch is not supported"
|
||||
| otherwise -> batchOnly Nothing (addThese o) $
|
||||
batchFiles fmt gofile
|
||||
batchFiles fmt (gofile True)
|
||||
NoBatch -> do
|
||||
-- Avoid git ls-files complaining about files that
|
||||
-- are not known to git yet, since this will add
|
||||
|
@ -104,11 +106,14 @@ seek o = startConcurrency commandStages $ do
|
|||
-- problems, like files that don't exist.
|
||||
let ww = WarnUnmatchWorkTreeItems
|
||||
l <- workTreeItems ww (addThese o)
|
||||
let go a = a ww (commandAction . gofile) l
|
||||
let go b a = a ww (commandAction . gofile b) l
|
||||
unless (updateOnly o) $
|
||||
go (withFilesNotInGit (checkGitIgnoreOption o))
|
||||
go withFilesMaybeModified
|
||||
go withUnmodifiedUnlockedPointers
|
||||
go True (withFilesNotInGit (checkGitIgnoreOption o))
|
||||
go True withFilesMaybeModified
|
||||
-- Convert newly unlocked files back to locked files,
|
||||
-- same as a modified unlocked file would get
|
||||
-- locked when added.
|
||||
go False withUnmodifiedUnlockedPointers
|
||||
|
||||
{- Pass file off to git-add. -}
|
||||
startSmall :: AddOptions -> SeekInput -> RawFilePath -> CommandStart
|
||||
|
|
|
@ -175,3 +175,5 @@ Lots! I love git-annex!
|
|||
|
||||
[[!meta author=bpoldrack]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
||||
|
|
|
@ -0,0 +1,77 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2022-03-21T17:33:28Z"
|
||||
content="""
|
||||
Isolated to [[!commit c68f52c6a25ebd8a12f4572da53fbba827d682aa]]
|
||||
|
||||
Since that commit is only a speed improvement, it can be reverted, which
|
||||
does avoid the problem.
|
||||
|
||||
But I don't really see a bug in that commit either. It only causes
|
||||
git to smudge the file when it's unlocked, rather than at some later
|
||||
point. The result should be the same either way. Except, of course,
|
||||
that you're varying the annex.largefiles config, so when git smudges
|
||||
the file later, it sees a different result than when it smudges the file
|
||||
while unlock is running.
|
||||
|
||||
I think that the actually surprising thing here is that
|
||||
`git-annex add somefile` re-adds the file (adding it to git per the new
|
||||
annex.largefiles configuration). The file is in typechange state at that
|
||||
point but is not modified, so it does not need to be re-added at all.
|
||||
But add seeks `withUnmodifiedUnlockedPointers` which finds files in
|
||||
typechange state.
|
||||
|
||||
So git-annex add decides to add the file to git even though it was
|
||||
previously added to the annex. At that point, I'd expect `git add` would
|
||||
update the index and replace the smudged content with the unsmudged
|
||||
content. However, it actually does not!
|
||||
Looking at `git ls-files --stage` before and after git-annex add,
|
||||
the annex pointer content remains staged:
|
||||
|
||||
100644 bf1114c25db2833499838552f3fa69635fcc1292 0 somefile
|
||||
add .gitattributes (non-large file; adding content to git repository) ok
|
||||
add somefile (non-large file; adding content to git repository) ok
|
||||
(recording state in git...)
|
||||
100644 bf1114c25db2833499838552f3fa69635fcc1292 0 somefile
|
||||
|
||||
The only reason that git commit goes on to commit the non-annexed
|
||||
content is because you pass somefile to it, so it re-adds somefile
|
||||
at that point, which runs the smudge filter, which passes the content
|
||||
through.
|
||||
|
||||
And that's why git status then sees the file as modified --
|
||||
the index has the annex pointer still staged in it, which is different
|
||||
from what got committed.
|
||||
|
||||
So, probably this same thing could happen in another situation, even if
|
||||
the change to git-annex unlock were reverted. Eg, if something else
|
||||
caused git smudge to run just after unlock, before the annex.largefiles
|
||||
config got changed.
|
||||
|
||||
Why is `git add` not updating the index? Presumably because the inode
|
||||
and mtime have not changed, so it does not think it needs to. Even though
|
||||
it's being run with the smudge filter disabled, and so sees a different
|
||||
hash than the one in the index. I think a case could be made for this
|
||||
being a bug in `git add`. Although maybe the `--renormalize` option
|
||||
is intended to handle this kind of situation.
|
||||
|
||||
The documented way for a user to change a file from being stored in git-annex
|
||||
to git is to first `git rm --cached` and then `git annex add --force-small`.
|
||||
Which avoids that `git add` behavior.
|
||||
|
||||
I tracked git-annex add's behavior here back to
|
||||
[[!commit 2743224658cee33657ad30653ff97de2363366c6]].
|
||||
Which addressed an old bug, filed by Kyle, which can be seen at
|
||||
[[!commit 5567496518]]. So it was intentional that unlocking a file
|
||||
and then `git annex add` should lock it, for consistency with v5
|
||||
and for consistency with unlocking a file
|
||||
and then modifying it, and then `git annex add`. I am actually a bit
|
||||
surprised at this behavior, but it was intended to handle locking files,
|
||||
not adding files to git.
|
||||
|
||||
So, I think it would be fine for `git-annex add` to skip over a typechanged
|
||||
file when it would be added to git rather than to the annex.
|
||||
|
||||
I've implemented that, and it does fix this bug.
|
||||
"""]]
|
Loading…
Reference in a new issue