git-remote-annex: brought back max-git-bundles config

An incremental push that gets converted to a full push due to this
config results in the inManifest having just one bundle in it, and the
outManifest listing every other bundle. So it actually takes up more
space on the special remote. But, it speeds up clone and fetch to not
have to download a long series of bundles for incremental pushes.
This commit is contained in:
Joey Hess 2024-05-28 13:26:21 -04:00
parent ce95cac195
commit 2ffe077cc2
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 54 additions and 30 deletions

View file

@ -273,6 +273,10 @@ fullPush :: State -> Remote -> [Ref] -> Annex (Bool, State)
fullPush st rmt refs = guardPush st $ do
oldmanifest <- maybe (downloadManifestWhenPresent rmt) pure
(manifestCache st)
fullPush' oldmanifest st rmt refs
fullPush' :: Manifest -> State -> Remote -> [Ref] -> Annex (Bool, State)
fullPush' oldmanifest st rmt refs = do
let bs = map Git.Bundle.fullBundleSpec refs
(bundlekey, uploadbundle) <- generateGitBundle rmt bs oldmanifest
let manifest = mkManifest [bundlekey] $
@ -297,14 +301,19 @@ guardPush st a = catchNonAsync a $ \ex -> do
incrementalPush :: State -> Remote -> M.Map Ref Sha -> M.Map Ref Sha -> Annex (Bool, State)
incrementalPush st rmt oldtrackingrefs newtrackingrefs = guardPush st $ do
oldmanifest <- maybe (downloadManifestWhenPresent rmt) pure (manifestCache st)
bs <- calc [] (M.toList newtrackingrefs)
(bundlekey, uploadbundle) <- generateGitBundle rmt bs oldmanifest
let manifest = oldmanifest <> mkManifest [bundlekey] mempty
manifest' <- startPush rmt manifest
uploadbundle
uploadManifest rmt manifest'
return (True, st { manifestCache = Nothing })
if length (inManifest oldmanifest) + 1 > remoteAnnexMaxGitBundles (Remote.gitconfig rmt)
then fullPush' oldmanifest st rmt (M.keys newtrackingrefs)
else go oldmanifest
where
go oldmanifest = do
bs <- calc [] (M.toList newtrackingrefs)
(bundlekey, uploadbundle) <- generateGitBundle rmt bs oldmanifest
let manifest = oldmanifest <> mkManifest [bundlekey] mempty
manifest' <- startPush rmt manifest
uploadbundle
uploadManifest rmt manifest'
return (True, st { manifestCache = Nothing })
calc c [] = return (reverse c)
calc c ((ref, sha):refs) = case M.lookup ref oldtrackingrefs of
Just oldsha

View file

@ -373,6 +373,7 @@ data RemoteGitConfig = RemoteGitConfig
, remoteAnnexBwLimitDownload :: Maybe BwRate
, remoteAnnexAllowUnverifiedDownloads :: Bool
, remoteAnnexConfigUUID :: Maybe UUID
, remoteAnnexMaxGitBundles :: Int
, remoteAnnexAllowEncryptedGitRepo :: Bool
, remoteUrl :: Maybe String
@ -453,6 +454,8 @@ extractRemoteGitConfig r remotename = do
readBwRatePerSecond =<< getmaybe "bwlimit-download"
, remoteAnnexAllowUnverifiedDownloads = (== Just "ACKTHPPT") $
getmaybe ("security-allow-unverified-downloads")
, remoteAnnexMaxGitBundles =
fromMaybe 100 (getmayberead "max-git-bundles")
, remoteAnnexConfigUUID = toUUID <$> getmaybe "config-uuid"
, remoteAnnexShell = getmaybe "shell"
, remoteAnnexSshOptions = getoptions "ssh-options"

View file

@ -1648,6 +1648,17 @@ Remotes are configured using these settings in `.git/config`.
remotes, and is set when using [[git-annex-initremote]](1) with the
`--private` option.
* `remote.<name>.annex-max-git-bundles`, `annex.max-git-bundles`
When using [[git-remote-annex]] to store a git repository in a special
remote, this configures how many separate git bundle objects to store
in the special remote before re-uploading a single git bundle that contains
the entire git repository.
The default is 100, which aims to avoid often needing to often re-upload,
while preventing a clone or fetch needing to download too many objects.
Set to 0 to disable re-uploading.
* `remote.<name>.annex-allow-encrypted-gitrepo`
Setting this to true allows using [[git-remote-annex]] to push the git

View file

@ -36,29 +36,11 @@ When using the shorthand "annex::" url, the full url will be displayed
each time you git pull or push, when it's possible for git-annex to
determine it.
When a special remote needs some additional credentials to be provided,
they are not included in the URL, and need to be provided when cloning from
the special remote. That is typically done by setting environment
variables. Some special remotes may also need environment variables to be
set when pulling or pushing.
The git repository is stored in the special remote using special annex objects
with names starting with "GITMANIFEST" and "GITBUNDLE". For details about
how the git repository is stored, see
<https://git-annex.branchable.com/internals/git-remote-annex/>
Pushes to a special remote are usually done incrementally. However,
sometimes the whole git repository (but not the annex) needs to be
re-uploaded. That is done when force pushing a ref, or deleting a
ref from the remote.
The special remote accumulates one GITBUNDLE object per push, and old
objects are usually not deleted. This means that refs pushed to the special
remote can still be accessed even after deleting or overwriting them.
A push that deletes every ref from the special remote does delete all
the accumulated GITBUNDLE objects. But of course, making such a push
means that someone clones from the special remote at that point in time
will see an empty remote.
When a special remote needs some credentials to be used, they are not
included in the URL, and will need to be provided when cloning from the
special remote. That is typically done by setting environment variables.
Some special remotes may also need environment variables to be set when
pulling or pushing.
Like any git repository, a git repository stored on a special remote can
have conflicting things pushed to it from different places. This mostly
@ -69,6 +51,25 @@ to be overwritten by the other one. In this situation, the overwritten
push will appear to have succeeded, but pulling later will show the true
situation.
The git repository is stored in the special remote using special annex objects
with names starting with "GITMANIFEST" and "GITBUNDLE". For details, see:
<https://git-annex.branchable.com/internals/git-remote-annex/>
Pushes to a special remote are usually done incrementally. However,
sometimes the whole git repository (but not the annex) needs to be
re-uploaded. That is done when force pushing a ref, or deleting a
ref from the remote. It's also done when too many git bundles
accumulate in the special remote, as configured by the
`remote.<name>.annex-max-git-bundles` git config.
Note that a re-upload of the repository does not delete old GITBUNDLE
objects from it. This means that refs pushed to the special
remote can still be accessed even after deleting or overwriting them.
A push that deletes every ref from the special remote will delete all
the accumulated GITBUNDLE objects. But of course, making such a push
means that someone who clones from the special remote at that point in time
will see an empty remote.
# SEE ALSO
gitremote-helpers(1)