fix reversion
add: Avoid unncessarily converting a newly unlocked file to be stored in git when it is not modified, even when annex.largefiles does not match it. This fixes a reversion in version 10.20220222, where git-annex unlock followed by git-annex add, followed by git commit file could result in git thinking the file was modified after the commit. I do have half a mind to remove the withUnmodifiedUnlockedPointers part of git-annex add. It seems weird, despite that old bug report arguing a case of consistency that it ought to behave that way. When git-annex add surpises me, it seems likely it's wrong.. But for now, this is the smallest possible fix. Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
parent
d7cd8491b0
commit
6079b0c72c
4 changed files with 101 additions and 10 deletions
|
@ -28,6 +28,13 @@ git-annex (10.20220223) UNRELEASED; urgency=medium
|
||||||
a new directory special remote, or change the configuration of your
|
a new directory special remote, or change the configuration of your
|
||||||
existing remote:
|
existing remote:
|
||||||
git-annex enableremote foo ignoreinodes=yes
|
git-annex enableremote foo ignoreinodes=yes
|
||||||
|
* add: Avoid unncessarily converting a newly unlocked file to be stored
|
||||||
|
in git when it is not modified, even when annex.largefiles does not
|
||||||
|
match it.
|
||||||
|
* The above change to add fixes a reversion in version 10.20220222,
|
||||||
|
where git-annex unlock followed by git-annex add, followed by git
|
||||||
|
commit file could result in git thinking the file was modified
|
||||||
|
after the commit.
|
||||||
|
|
||||||
-- Joey Hess <id@joeyh.name> Wed, 23 Feb 2022 14:14:09 -0400
|
-- Joey Hess <id@joeyh.name> Wed, 23 Feb 2022 14:14:09 -0400
|
||||||
|
|
||||||
|
|
|
@ -78,16 +78,18 @@ seek o = startConcurrency commandStages $ do
|
||||||
largematcher <- largeFilesMatcher
|
largematcher <- largeFilesMatcher
|
||||||
addunlockedmatcher <- addUnlockedMatcher
|
addunlockedmatcher <- addUnlockedMatcher
|
||||||
annexdotfiles <- getGitConfigVal annexDotFiles
|
annexdotfiles <- getGitConfigVal annexDotFiles
|
||||||
let gofile (si, file) = case largeFilesOverride o of
|
let gofile includingsmall (si, file) = case largeFilesOverride o of
|
||||||
Nothing ->
|
Nothing ->
|
||||||
ifM (pure (annexdotfiles || not (dotfile file))
|
ifM (pure (annexdotfiles || not (dotfile file))
|
||||||
<&&> (checkFileMatcher largematcher file
|
<&&> (checkFileMatcher largematcher file
|
||||||
<||> Annex.getState Annex.force))
|
<||> Annex.getState Annex.force))
|
||||||
( start o si file addunlockedmatcher
|
( start o si file addunlockedmatcher
|
||||||
, ifM (annexAddSmallFiles <$> Annex.getGitConfig)
|
, if includingsmall
|
||||||
( startSmall o si file
|
then ifM (annexAddSmallFiles <$> Annex.getGitConfig)
|
||||||
, stop
|
( startSmall o si file
|
||||||
)
|
, stop
|
||||||
|
)
|
||||||
|
else stop
|
||||||
)
|
)
|
||||||
Just True -> start o si file addunlockedmatcher
|
Just True -> start o si file addunlockedmatcher
|
||||||
Just False -> startSmallOverridden o si file
|
Just False -> startSmallOverridden o si file
|
||||||
|
@ -96,7 +98,7 @@ seek o = startConcurrency commandStages $ do
|
||||||
| updateOnly o ->
|
| updateOnly o ->
|
||||||
giveup "--update --batch is not supported"
|
giveup "--update --batch is not supported"
|
||||||
| otherwise -> batchOnly Nothing (addThese o) $
|
| otherwise -> batchOnly Nothing (addThese o) $
|
||||||
batchFiles fmt gofile
|
batchFiles fmt (gofile True)
|
||||||
NoBatch -> do
|
NoBatch -> do
|
||||||
-- Avoid git ls-files complaining about files that
|
-- Avoid git ls-files complaining about files that
|
||||||
-- are not known to git yet, since this will add
|
-- are not known to git yet, since this will add
|
||||||
|
@ -104,11 +106,14 @@ seek o = startConcurrency commandStages $ do
|
||||||
-- problems, like files that don't exist.
|
-- problems, like files that don't exist.
|
||||||
let ww = WarnUnmatchWorkTreeItems
|
let ww = WarnUnmatchWorkTreeItems
|
||||||
l <- workTreeItems ww (addThese o)
|
l <- workTreeItems ww (addThese o)
|
||||||
let go a = a ww (commandAction . gofile) l
|
let go b a = a ww (commandAction . gofile b) l
|
||||||
unless (updateOnly o) $
|
unless (updateOnly o) $
|
||||||
go (withFilesNotInGit (checkGitIgnoreOption o))
|
go True (withFilesNotInGit (checkGitIgnoreOption o))
|
||||||
go withFilesMaybeModified
|
go True withFilesMaybeModified
|
||||||
go withUnmodifiedUnlockedPointers
|
-- Convert newly unlocked files back to locked files,
|
||||||
|
-- same as a modified unlocked file would get
|
||||||
|
-- locked when added.
|
||||||
|
go False withUnmodifiedUnlockedPointers
|
||||||
|
|
||||||
{- Pass file off to git-add. -}
|
{- Pass file off to git-add. -}
|
||||||
startSmall :: AddOptions -> SeekInput -> RawFilePath -> CommandStart
|
startSmall :: AddOptions -> SeekInput -> RawFilePath -> CommandStart
|
||||||
|
|
|
@ -175,3 +175,5 @@ Lots! I love git-annex!
|
||||||
|
|
||||||
[[!meta author=bpoldrack]]
|
[[!meta author=bpoldrack]]
|
||||||
[[!tag projects/datalad]]
|
[[!tag projects/datalad]]
|
||||||
|
|
||||||
|
> [[fixed|done]] --[[Joey]]
|
||||||
|
|
|
@ -0,0 +1,77 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 2"""
|
||||||
|
date="2022-03-21T17:33:28Z"
|
||||||
|
content="""
|
||||||
|
Isolated to [[!commit c68f52c6a25ebd8a12f4572da53fbba827d682aa]]
|
||||||
|
|
||||||
|
Since that commit is only a speed improvement, it can be reverted, which
|
||||||
|
does avoid the problem.
|
||||||
|
|
||||||
|
But I don't really see a bug in that commit either. It only causes
|
||||||
|
git to smudge the file when it's unlocked, rather than at some later
|
||||||
|
point. The result should be the same either way. Except, of course,
|
||||||
|
that you're varying the annex.largefiles config, so when git smudges
|
||||||
|
the file later, it sees a different result than when it smudges the file
|
||||||
|
while unlock is running.
|
||||||
|
|
||||||
|
I think that the actually surprising thing here is that
|
||||||
|
`git-annex add somefile` re-adds the file (adding it to git per the new
|
||||||
|
annex.largefiles configuration). The file is in typechange state at that
|
||||||
|
point but is not modified, so it does not need to be re-added at all.
|
||||||
|
But add seeks `withUnmodifiedUnlockedPointers` which finds files in
|
||||||
|
typechange state.
|
||||||
|
|
||||||
|
So git-annex add decides to add the file to git even though it was
|
||||||
|
previously added to the annex. At that point, I'd expect `git add` would
|
||||||
|
update the index and replace the smudged content with the unsmudged
|
||||||
|
content. However, it actually does not!
|
||||||
|
Looking at `git ls-files --stage` before and after git-annex add,
|
||||||
|
the annex pointer content remains staged:
|
||||||
|
|
||||||
|
100644 bf1114c25db2833499838552f3fa69635fcc1292 0 somefile
|
||||||
|
add .gitattributes (non-large file; adding content to git repository) ok
|
||||||
|
add somefile (non-large file; adding content to git repository) ok
|
||||||
|
(recording state in git...)
|
||||||
|
100644 bf1114c25db2833499838552f3fa69635fcc1292 0 somefile
|
||||||
|
|
||||||
|
The only reason that git commit goes on to commit the non-annexed
|
||||||
|
content is because you pass somefile to it, so it re-adds somefile
|
||||||
|
at that point, which runs the smudge filter, which passes the content
|
||||||
|
through.
|
||||||
|
|
||||||
|
And that's why git status then sees the file as modified --
|
||||||
|
the index has the annex pointer still staged in it, which is different
|
||||||
|
from what got committed.
|
||||||
|
|
||||||
|
So, probably this same thing could happen in another situation, even if
|
||||||
|
the change to git-annex unlock were reverted. Eg, if something else
|
||||||
|
caused git smudge to run just after unlock, before the annex.largefiles
|
||||||
|
config got changed.
|
||||||
|
|
||||||
|
Why is `git add` not updating the index? Presumably because the inode
|
||||||
|
and mtime have not changed, so it does not think it needs to. Even though
|
||||||
|
it's being run with the smudge filter disabled, and so sees a different
|
||||||
|
hash than the one in the index. I think a case could be made for this
|
||||||
|
being a bug in `git add`. Although maybe the `--renormalize` option
|
||||||
|
is intended to handle this kind of situation.
|
||||||
|
|
||||||
|
The documented way for a user to change a file from being stored in git-annex
|
||||||
|
to git is to first `git rm --cached` and then `git annex add --force-small`.
|
||||||
|
Which avoids that `git add` behavior.
|
||||||
|
|
||||||
|
I tracked git-annex add's behavior here back to
|
||||||
|
[[!commit 2743224658cee33657ad30653ff97de2363366c6]].
|
||||||
|
Which addressed an old bug, filed by Kyle, which can be seen at
|
||||||
|
[[!commit 5567496518]]. So it was intentional that unlocking a file
|
||||||
|
and then `git annex add` should lock it, for consistency with v5
|
||||||
|
and for consistency with unlocking a file
|
||||||
|
and then modifying it, and then `git annex add`. I am actually a bit
|
||||||
|
surprised at this behavior, but it was intended to handle locking files,
|
||||||
|
not adding files to git.
|
||||||
|
|
||||||
|
So, I think it would be fine for `git-annex add` to skip over a typechanged
|
||||||
|
file when it would be added to git rather than to the annex.
|
||||||
|
|
||||||
|
I've implemented that, and it does fix this bug.
|
||||||
|
"""]]
|
Loading…
Add table
Reference in a new issue