preserve unlocked file mtime when dropping

When dropping an unlocked file, preserve its mtime, which avoids git status
unncessarily running the clean filter on the file.

If the index file has close to the same mtime as a work tree file, git will
not trust the index to be up-to-date, and re-runs the clean filter
unncessarily. Preserving the mtime when depopulating a pointer file avoids
git status doing a little (or maybe a lot) of unncessary work.

There are other places that the mtime could be preserved, including other
places where pointer files are written perhaps, but also
populatePointerFile. But, I don't know of cases where those lead to git
status doing unncessary work, so I just fixed the one I'm aware of for now.
This commit is contained in:
Joey Hess 2019-10-08 14:01:12 -04:00
parent 7c91eb35f7
commit 1113caa53e
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 88 additions and 2 deletions

View file

@ -9,15 +9,20 @@
module Annex.Content.PointerFile where
#if ! defined(mingw32_HOST_OS)
import System.Posix.Files
#else
import System.PosixCompat.Files
#endif
import Annex.Common
import Annex.Perms
import Annex.Link
import Annex.ReplaceFile
import Annex.InodeSentinal
import Utility.InodeCache
import Annex.Content.LowLevel
import Utility.InodeCache
import Utility.Touch
{- Populates a pointer file with the content of a key.
-
@ -48,10 +53,18 @@ populatePointerFile restage k obj f = go =<< liftIO (isPointerFile f)
- Does not check if the pointer file is modified. -}
depopulatePointerFile :: Key -> FilePath -> Annex ()
depopulatePointerFile key file = do
mode <- liftIO $ catchMaybeIO $ fileMode <$> getFileStatus file
st <- liftIO $ catchMaybeIO $ getFileStatus file
let mode = fmap fileMode st
secureErase file
liftIO $ nukeFile file
ic <- replaceFile file $ \tmp -> do
liftIO $ writePointerFile tmp key mode
#if ! defined(mingw32_HOST_OS)
-- Don't advance mtime; this avoids unncessary re-smudging
-- by git in some cases.
liftIO $ maybe noop
(\t -> touch tmp t False)
(fmap modificationTimeHiRes st)
#endif
withTSDelta (liftIO . genInodeCache tmp)
maybe noop (restagePointerFile (Restage True) file) ic

View file

@ -19,6 +19,8 @@ git-annex (7.20190913) UNRELEASED; urgency=medium
it was only updated after 1000 files were checked, which may be more
files that are possible to fsck in a given fsck time window.
Thanks to Peter Simons for help with analysis of this bug.
* When dropping an unlocked file, preserve its mtime, which avoids
git status unncessarily running the clean filter on the file.
-- Joey Hess <id@joeyh.name> Thu, 19 Sep 2019 11:11:19 -0400

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-10-08T16:22:16Z"
content="""
When using v7 repositories, git will do this when there are unlocked files
that have had their content changed.
git only displays that message when it thinks its going to have to
do a lot of work (re-smudging a lot of files), so you'd mostly see this
happen in a larger repository that has had a lot of (unlocked) files change
in some way.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2019-10-08T16:37:49Z"
content="""
I noticed that `git annex drop` of an unlocked file causes the next `git
status` to re-smudge the file. That's surprising because git-annex
internally updates the index using git update-index, so git should not see
any need to revisit it.
So my guess is you were getting or dropping a lot of unlocked files when
you saw that.
"""]]

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2019-10-08T16:43:49Z"
content="""
Dropping a single unlocked file and then running git
status with `GIT_TRACE=1` replicates the additional smudge every time.
The index before and after git status smudging have identical content, so
git-annex seems to have updated it correctly already. It seems that the
mtime of the index file is causing git to do that one additional smudge.
Eg, the file git-annex dropped had a mtime of 13:02:43.716964483. The
~/index file that git-annex generated has a mtime of 13:02:43.752964130
which is newer, but only by a fraction of a second. So, git probably
assumes the mtimes are sufficiently close that it can't trust that the
index file really reflects the current content of the work tree file. And
so it re-smudges the work tree file unncessarily.
I have not been able to find a number of files to drop that replicates
the bug report. When a lot of files are dropped, it starts taking
sufficiently long to update the index file that it ends up with a newer
timestamp, which avoids the unncessary additional smudging. The worse
case I have found here is dropping 9 files causes all 9 to get re-smudged,
but that's not enough to get git to use the progress display.
"""]]

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2019-10-08T17:32:52Z"
content="""
Anyway, if git-annex could preserve the mtime of an unlocked file when
writing its pointer file or when populating it with content, that would
avoid the unncessary smudging. (Which seems better than adding a delay when
updating the index file, or setting the index's mtime ahead of the current
time..)
That's easier done for pointer files than populated files, because
with annex.thin the content is a hard link and it would probably not be
good to change its mtime.
For now, I didn't do it extensively, but only in depopulatePointerFile.
That was enough to eliminate the unncessary smudging after drop that I was
seeing.
"""]]