enable filter.annex.process in v9

This has tradeoffs, but is generally a win, and users who it causes git add to
slow down unacceptably for can just disable it again.

It needed to happen in an upgrade, since there are git-annex versions
that do not support it, and using such an old version with a v8
repository with filter.annex.process set will cause bad behavior.
By enabling it in v9, it's guaranteed that any git-annex version that
can use the repository does support it. Although, this is not a perfect
protection against problems, since an old git-annex version, if it's
used with a v9 repository, will cause git add to try to run
git-annex filter-process, which will fail. But at least, the user is
unlikely to have an old git-annex in path if they are using a v9
repository, since it won't work in that repository.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2022-01-21 13:11:18 -04:00
parent fad11c2250
commit 47084b8a1d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
7 changed files with 31 additions and 8 deletions

View file

@ -57,6 +57,11 @@ setVersion (RepoVersion v) = setConfig versionField (show v)
removeVersion :: Annex ()
removeVersion = unsetConfig versionField
versionSupportsFilterProcess :: Maybe RepoVersion -> Bool
versionSupportsFilterProcess (Just v)
| v >= RepoVersion 9 = True
versionSupportsFilterProcess _ = False
versionNeedsWritableContentFiles :: Maybe RepoVersion -> Bool
versionNeedsWritableContentFiles (Just v)
| v >= RepoVersion 10 = False

View file

@ -9,6 +9,11 @@ git-annex (10.20220121) UNRELEASED; urgency=medium
upgrade, in order to allow time for any old git-annex processes that
are not aware of the locking change to finish. Or git-annex upgrade
can be used to upgrade to v10 immediately.
* In v9, set filter.annex.process. This makes git add/checkout faster when
there are a lot of unlocked annexed files or non-annexed files, but can
also makes git add of large files to the annex somewhat slower.
If this tradeoff does not work for your use case, you can still unset
filter.annex.process.
* export: When a non-annexed symlink is in the tree to be exported, skip it.
* import: When the previously exported tree contained a non-annexed symlink,
preserve it in the imported tree so it does not get deleted.

View file

@ -1,6 +1,6 @@
{- Git smudge filter configuration
-
- Copyright 2011-2019 Joey Hess <id@joeyh.name>
- Copyright 2011-2022 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@ -16,6 +16,7 @@ import qualified Git.Command
import Git.Types
import Config
import Utility.Directory.Create
import Annex.Version
import qualified System.FilePath.ByteString as P
@ -32,6 +33,8 @@ configureSmudgeFilter = unlessM (fromRepo Git.repoIsLocalBare) $ do
setConfig (ConfigKey "filter.annex.smudge") "git-annex smudge -- %f"
setConfig (ConfigKey "filter.annex.clean") "git-annex smudge --clean -- %f"
whenM (versionSupportsFilterProcess <$> getVersion)
configureSmudgeFilterProcess
lf <- Annex.fromRepo Git.attributesLocal
gf <- Annex.fromRepo Git.attributes
lfs <- readattr lf
@ -43,6 +46,10 @@ configureSmudgeFilter = unlessM (fromRepo Git.repoIsLocalBare) $ do
where
readattr = liftIO . catchDefaultIO "" . readFileStrict . fromRawFilePath
configureSmudgeFilterProcess :: Annex ()
configureSmudgeFilterProcess =
setConfig (ConfigKey "filter.annex.process") "git-annex filter-process"
stdattr :: [String]
stdattr =
[ "* filter=annex"

View file

@ -9,10 +9,13 @@ module Upgrade.V8 where
import Annex.Common
import Types.Upgrade
import Config.Smudge
upgrade :: Bool -> Annex UpgradeResult
upgrade automatic = do
unless automatic $
showAction "v8 to v9"
configureSmudgeFilterProcess
return UpgradeSuccess

View file

@ -17,7 +17,8 @@ to git. git-lfs uses it that way.
The first problem with the interface was that it ran a command once per
file. This was later fixed by extending it to support long-running filter
processes, which git-lfs uses. git-annex can also use that interface,
when `git-annex filter-process` is enabled, but it does not by default.
when `git-annex filter-process` is enabled. That is the case in v9
repositories and above.
A second problem with the interface, which affects git-lfs AFAIK, is that
git buffers the output of the smudge filter in memory before updating the
@ -81,12 +82,12 @@ And here's the consequences of git-annex's workarounds:
* It doesn't use the long-running filter process interface by default,
so `git add` of a lot of files runs `git-annex smudge --clean` once per file,
which is slower than it could be. Using `git-annex add` avoids this problem.
So does enabling `git-annex filter-process`.
So does enabling `git-annex filter-process`, which is default in v9.
* After a git-annex get/drop or a git checkout or pull that affects a lot
of files, the clean filter gets run once per file, which is again, slower
than ideal. Enabling `git-annex filter-process` can speed this up
in some cases.
in some cases, and is default in v9.
* When `git-annex filter-process` is enabled, it cannot use the trick
described above that `git-annex smudge --clean` uses to avoid git

View file

@ -1,7 +1,7 @@
When `git-annex filter-process` is enabled, `git add` pipes the content of
files into it, but that's thrown away, and the file is read again by git-annex
to generate a hash. It would improve performance to hash the content
provided via the pipe.
When `git-annex filter-process` is enabled (v9 and above), `git add` pipes
the content of files into it, but that's thrown away, and the file is read
again by git-annex to generate a hash. It would improve performance to hash
the content provided via the pipe.
When filter-process is not enabled, `git-annex smudge --clean` reads
the file to hash it, then reads it a second time to copy it into

View file

@ -18,3 +18,5 @@ could change and if it does, these things could be included.
seem worth it.
May want to implement [[incremental_hashing_for_add]] first.
[[done]] --[[Joey]]