omit inode from ContentIdentifier for directory special remote

Directory special remotes with importtree=yes now avoid unncessary overhead
when inodes of files have changed, as happens whenever a FAT filesystem
gets remounted.

A few unusual edge cases of modifications won't be detected and
imported. I think they're unusual enough not to be a concern. It would
be possible to add a config setting that controls whether to compare
inodes too, but does not seem worth bothering the user about currently.

I chose to continue to use the InodeCache serialization, just with the
inode zeroed. This way, if I later change my mind or make it
configurable, can parse it back to an InodeCache and operate on it. The
overhead of storing a 0 in the content identifier log seems worth it.

There is a one-time cost to this change; all directory special remotes
with importtree=yes will re-hash all files once, and will update the
content identifier logs with zeroed inodes.

This commit was sponsored by Brett Eisenberg on Patreon.
This commit is contained in:
Joey Hess 2021-01-19 12:57:15 -04:00
parent 7ccddd4aea
commit 73df633a62
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 34 additions and 11 deletions

View file

@ -30,6 +30,9 @@ git-annex (8.20201130) UNRELEASED; urgency=medium
Thanks, Grond for the patch. Thanks, Grond for the patch.
* Avoid crashing when there are remotes using unparseable urls. * Avoid crashing when there are remotes using unparseable urls.
Including the non-standard URI form that git-remote-gcrypt uses for rsync. Including the non-standard URI form that git-remote-gcrypt uses for rsync.
* Directory special remotes with importtree=yes now avoid unncessary
overhead when inodes of files have changed, as happens whenever a FAT
filesystem gets remounted.
-- Joey Hess <id@joeyh.name> Mon, 04 Jan 2021 12:52:41 -0400 -- Joey Hess <id@joeyh.name> Mon, 04 Jan 2021 12:52:41 -0400

View file

@ -1,6 +1,6 @@
{- A "remote" that is just a filesystem directory. {- A "remote" that is just a filesystem directory.
- -
- Copyright 2011-2020 Joey Hess <id@joeyh.name> - Copyright 2011-2021 Joey Hess <id@joeyh.name>
- -
- Licensed under the GNU AGPL version 3 or higher. - Licensed under the GNU AGPL version 3 or higher.
-} -}
@ -354,18 +354,21 @@ listImportableContentsM dir = liftIO $ do
sz <- getFileSize' f st sz <- getFileSize' f st
return $ Just (mkImportLocation relf, (cid, sz)) return $ Just (mkImportLocation relf, (cid, sz))
-- Make a ContentIdentifier that contains an InodeCache. -- Make a ContentIdentifier that contains the size and mtime of the file.
--
-- The InodeCache is generated without checking a sentinal file.
-- So in a case when a remount etc causes all the inodes to change,
-- files may appear to be modified when they are not, which will only
-- result in extra work to re-import them.
--
-- If the file is not a regular file, this will return Nothing. -- If the file is not a regular file, this will return Nothing.
--
-- The inode is zeroed because often this is used for import from a
-- FAT filesystem, whose inodes change each time it's mounted, and
-- including inodes would cause repeated re-hashing of files, and
-- bloat the git-annex branch with changes to content identifier logs.
--
-- This does mean that swaps of two files with the same size and
-- mtime won't be noticed, nor will modifications to files that
-- preserve the size and mtime. Both very unlikely so acceptable.
mkContentIdentifier :: RawFilePath -> FileStatus -> IO (Maybe ContentIdentifier) mkContentIdentifier :: RawFilePath -> FileStatus -> IO (Maybe ContentIdentifier)
mkContentIdentifier f st = mkContentIdentifier f st =
fmap (ContentIdentifier . encodeBS . showInodeCache) fmap (ContentIdentifier . encodeBS . showInodeCache)
<$> toInodeCache noTSDelta f st <$> toInodeCache' noTSDelta f st 0
guardSameContentIdentifiers :: a -> ContentIdentifier -> Maybe ContentIdentifier -> a guardSameContentIdentifiers :: a -> ContentIdentifier -> Maybe ContentIdentifier -> a
guardSameContentIdentifiers cont old new guardSameContentIdentifiers cont old new

View file

@ -24,6 +24,7 @@ module Utility.InodeCache (
showInodeCache, showInodeCache,
genInodeCache, genInodeCache,
toInodeCache, toInodeCache,
toInodeCache',
InodeCacheKey, InodeCacheKey,
inodeCacheToKey, inodeCacheToKey,
@ -189,7 +190,10 @@ genInodeCache f delta = catchDefaultIO Nothing $
toInodeCache delta f =<< R.getFileStatus f toInodeCache delta f =<< R.getFileStatus f
toInodeCache :: TSDelta -> RawFilePath -> FileStatus -> IO (Maybe InodeCache) toInodeCache :: TSDelta -> RawFilePath -> FileStatus -> IO (Maybe InodeCache)
toInodeCache (TSDelta getdelta) f s toInodeCache d f s = toInodeCache' d f s (fileID s)
toInodeCache' :: TSDelta -> RawFilePath -> FileStatus -> FileID -> IO (Maybe InodeCache)
toInodeCache' (TSDelta getdelta) f s inode
| isRegularFile s = do | isRegularFile s = do
delta <- getdelta delta <- getdelta
sz <- getFileSize' f s sz <- getFileSize' f s
@ -198,7 +202,7 @@ toInodeCache (TSDelta getdelta) f s
#else #else
let mtime = modificationTimeHiRes s let mtime = modificationTimeHiRes s
#endif #endif
return $ Just $ InodeCache $ InodeCachePrim (fileID s) sz (MTimeHighRes (mtime + highResTime delta)) return $ Just $ InodeCache $ InodeCachePrim inode sz (MTimeHighRes (mtime + highResTime delta))
| otherwise = pure Nothing | otherwise = pure Nothing
{- Some filesystem get new random inodes each time they are mounted. {- Some filesystem get new random inodes each time they are mounted.

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2021-01-19T16:53:54Z"
content="""
Well, the changes I made for that todo make changes to inodes due to
remounting in the middle of an import not cause behavior like this. Of
course I don't know that's what caused this behavior, but it does seem
likely those changes would turn out to have fixed this, if we understood
how to reproduce the problem.
"""]]

View file

@ -26,3 +26,5 @@ an inode sentinal file and checking it to tell when inodes have changed
would need importing to write to the drive. That seems strange, and the would need importing to write to the drive. That seems strange, and the
drive could even be read-only. May be the directory special remote should drive could even be read-only. May be the directory special remote should
just not use inode numbers at all? just not use inode numbers at all?
> [[done]] --[[Joey]]