only delete bundles on pushEmpty

This avoids some apparently otherwise unsolveable problems involving
races that resulted in the manifest listing bundles that were deleted.

Removed the annex-max-git-bundles config because it can't actually
result in deleting old bundles. It would still be possible to have a
config that controls how often to do a full push, which would avoid
needing to download too many bundles on clone, as well as needing to
checkpresent too many bundles in verifyManifest. But it would need a
different name and description.
This commit is contained in:
Joey Hess 2024-05-21 10:41:48 -04:00
parent f544946b09
commit 3e7324bbcb
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 53 additions and 103 deletions

View file

@ -272,9 +272,8 @@ fullPush' :: Manifest -> State -> Remote -> [Ref] -> Annex (Bool, State)
fullPush' oldmanifest st rmt refs =do fullPush' oldmanifest st rmt refs =do
let bs = map Git.Bundle.fullBundleSpec refs let bs = map Git.Bundle.fullBundleSpec refs
bundlekey <- generateAndUploadGitBundle rmt bs oldmanifest bundlekey <- generateAndUploadGitBundle rmt bs oldmanifest
oldmanifest' <- dropOldKeys rmt oldmanifest (/= bundlekey)
let manifest = mkManifest [bundlekey] let manifest = mkManifest [bundlekey]
(inManifest oldmanifest ++ outManifest oldmanifest') (inManifest oldmanifest ++ outManifest oldmanifest)
uploadManifest rmt manifest uploadManifest rmt manifest
return (True, st { manifestCache = Nothing }) return (True, st { manifestCache = Nothing })
@ -291,17 +290,11 @@ guardPush st a = catchNonAsync a $ \ex -> do
incrementalPush :: State -> Remote -> M.Map Ref Sha -> M.Map Ref Sha -> Annex (Bool, State) incrementalPush :: State -> Remote -> M.Map Ref Sha -> M.Map Ref Sha -> Annex (Bool, State)
incrementalPush st rmt oldtrackingrefs newtrackingrefs = guardPush st $ do incrementalPush st rmt oldtrackingrefs newtrackingrefs = guardPush st $ do
oldmanifest <- maybe (downloadManifestWhenPresent rmt) pure (manifestCache st) oldmanifest <- maybe (downloadManifestWhenPresent rmt) pure (manifestCache st)
if length (inManifest oldmanifest) + 1 > remoteAnnexMaxGitBundles (Remote.gitconfig rmt) bs <- calc [] (M.toList newtrackingrefs)
then fullPush' oldmanifest st rmt (M.keys newtrackingrefs) bundlekey <- generateAndUploadGitBundle rmt bs oldmanifest
else go oldmanifest uploadManifest rmt (oldmanifest <> mkManifest [bundlekey] [])
return (True, st { manifestCache = Nothing })
where where
go oldmanifest = do
bs <- calc [] (M.toList newtrackingrefs)
bundlekey <- generateAndUploadGitBundle rmt bs oldmanifest
oldmanifest' <- dropOldKeys rmt oldmanifest (/= bundlekey)
uploadManifest rmt (oldmanifest' <> mkManifest [bundlekey] [])
return (True, st { manifestCache = Nothing })
calc c [] = return (reverse c) calc c [] = return (reverse c)
calc c ((ref, sha):refs) = case M.lookup ref oldtrackingrefs of calc c ((ref, sha):refs) = case M.lookup ref oldtrackingrefs of
Just oldsha Just oldsha
@ -610,7 +603,7 @@ downloadManifest rmt = get mkmain >>= maybe (get mkbak) (pure . Just)
_ <- dl tmp _ <- dl tmp
b <- liftIO (B.readFile tmp) b <- liftIO (B.readFile tmp)
case parseManifest b of case parseManifest b of
Right m -> return (Just m) Right m -> Just <$> verifyManifest rmt m
Left err -> giveup err Left err -> giveup err
getexport _ [] = return Nothing getexport _ [] = return Nothing
@ -634,13 +627,6 @@ downloadManifest rmt = get mkmain >>= maybe (get mkbak) (pure . Just)
-- So this may be interrupted and leave the manifest key not present. -- So this may be interrupted and leave the manifest key not present.
-- To deal with that, there is a backup manifest key. This takes care -- To deal with that, there is a backup manifest key. This takes care
-- to ensure that one of the two keys will always exist. -- to ensure that one of the two keys will always exist.
--
-- Once the manifest has been uploaded, attempts to drop all outManifest
-- keys. A failure to drop does not cause an error to be thrown, because
-- the push has already succeeded. Avoids re-uploading the manifest with
-- the dropped keys removed from outManifest, because dropping the keys
-- takes some time and another push may have already overwritten
-- the manifest in the meantime.
uploadManifest :: Remote -> Manifest -> Annex () uploadManifest :: Remote -> Manifest -> Annex ()
uploadManifest rmt manifest = do uploadManifest rmt manifest = do
ok <- ifM (Remote.checkPresent rmt mkbak) ok <- ifM (Remote.checkPresent rmt mkbak)
@ -650,9 +636,8 @@ uploadManifest rmt manifest = do
-- This ensures that at no point are both deleted. -- This ensures that at no point are both deleted.
, put mkbak <&&> dropandput mkmain , put mkbak <&&> dropandput mkmain
) )
if ok unless ok
then void $ dropOldKeys rmt manifest (const True) uploadfailed
else uploadfailed
where where
mkmain = genManifestKey (Remote.uuid rmt) mkmain = genManifestKey (Remote.uuid rmt)
mkbak = genBackupManifestKey (Remote.uuid rmt) mkbak = genBackupManifestKey (Remote.uuid rmt)
@ -696,6 +681,17 @@ dropOldKeys rmt manifest p =
mkManifest (inManifest manifest) mkManifest (inManifest manifest)
<$> filterM (dropKey rmt) (filter p (outManifest manifest)) <$> filterM (dropKey rmt) (filter p (outManifest manifest))
-- When pushEmpty raced with another push, it could result in the manifest
-- listing bundles that it deleted. Such a manifest has to be treated the
-- same as an empty manifest. To detect that, this checks that all the
-- bundles listed in the manifest still exist on the remote.
verifyManifest :: Remote -> Manifest -> Annex Manifest
verifyManifest rmt manifest =
ifM (allM (checkPresentGitBundle rmt) (inManifest manifest))
( return manifest
, return $ mkManifest [] (inManifest manifest <> outManifest manifest)
)
-- Downloads a git bundle to the annex objects directory, unless -- Downloads a git bundle to the annex objects directory, unless
-- the object file is already present. Returns the filename of the object -- the object file is already present. Returns the filename of the object
-- file. -- file.
@ -732,6 +728,16 @@ downloadGitBundle rmt k = getKeyExportLocations rmt k >>= \case
rsp = Remote.retrievalSecurityPolicy rmt rsp = Remote.retrievalSecurityPolicy rmt
vc = Remote.RemoteVerify rmt vc = Remote.RemoteVerify rmt
-- Checks if a bundle is present. Throws errors if the remote cannot be
-- accessed.
checkPresentGitBundle :: Remote -> Key -> Annex Bool
checkPresentGitBundle rmt k =
getKeyExportLocations rmt k >>= \case
Nothing -> Remote.checkPresent rmt k
Just locs -> anyM checkexport locs
where
checkexport = Remote.checkPresentExport (Remote.exportActions rmt) k
-- Uploads a bundle or manifest object from the annex objects directory -- Uploads a bundle or manifest object from the annex objects directory
-- to the remote. -- to the remote.
-- --

View file

@ -373,7 +373,6 @@ data RemoteGitConfig = RemoteGitConfig
, remoteAnnexBwLimitDownload :: Maybe BwRate , remoteAnnexBwLimitDownload :: Maybe BwRate
, remoteAnnexAllowUnverifiedDownloads :: Bool , remoteAnnexAllowUnverifiedDownloads :: Bool
, remoteAnnexConfigUUID :: Maybe UUID , remoteAnnexConfigUUID :: Maybe UUID
, remoteAnnexMaxGitBundles :: Int
, remoteAnnexAllowEncryptedGitRepo :: Bool , remoteAnnexAllowEncryptedGitRepo :: Bool
{- These settings are specific to particular types of remotes {- These settings are specific to particular types of remotes
@ -479,8 +478,6 @@ extractRemoteGitConfig r remotename = do
, remoteAnnexDdarRepo = getmaybe "ddarrepo" , remoteAnnexDdarRepo = getmaybe "ddarrepo"
, remoteAnnexHookType = notempty $ getmaybe "hooktype" , remoteAnnexHookType = notempty $ getmaybe "hooktype"
, remoteAnnexExternalType = notempty $ getmaybe "externaltype" , remoteAnnexExternalType = notempty $ getmaybe "externaltype"
, remoteAnnexMaxGitBundles =
fromMaybe 100 (getmayberead "max-git-bundles")
, remoteAnnexAllowEncryptedGitRepo = , remoteAnnexAllowEncryptedGitRepo =
getbool "allow-encrypted-gitrepo" False getbool "allow-encrypted-gitrepo" False
} }

View file

@ -1648,17 +1648,6 @@ Remotes are configured using these settings in `.git/config`.
remotes, and is set when using [[git-annex-initremote]](1) with the remotes, and is set when using [[git-annex-initremote]](1) with the
`--private` option. `--private` option.
* `remote.<name>.annex-max-git-bundles`, `annex.max-git-bundles`
When using [[git-remote-annex]] to store a git repository in a special
remote, this configures how many separate git bundle objects to store
in the special remote before re-uploading a single git bundle that contains
the entire git repository.
The default is 100, which aims to avoid often needing to often re-upload,
while preventing a new clone needing to download too many objects. Set to
0 to disable re-uploading.
* `remote.<name>.annex-allow-encrypted-gitrepo` * `remote.<name>.annex-allow-encrypted-gitrepo`
Setting this to true allows using [[git-remote-annex]] to push the git Setting this to true allows using [[git-remote-annex]] to push the git

View file

@ -38,23 +38,29 @@ variables. Some special remotes may also need environment variables to be
set when pulling or pushing. set when pulling or pushing.
The git repository is stored in the special remote using special annex objects The git repository is stored in the special remote using special annex objects
with names starting with "GITMANIFEST--" and "GITBUNDLE--". For details about with names starting with "GITMANIFEST" and "GITBUNDLE". For details about
how the git repository is stored, see how the git repository is stored, see
<https://git-annex.branchable.com/internals/git-remote-annex/> <https://git-annex.branchable.com/internals/git-remote-annex/>
Pushes to a special remote are usually done incrementally. However, Pushes to a special remote are usually done incrementally. However,
sometimes the whole git repository (but not the annex) needs to be sometimes the whole git repository (but not the annex) needs to be
re-uploaded. That is done when force pushing a ref, or deleting a re-uploaded. That is done when force pushing a ref, or deleting a
ref from the remote. It's also done when too many git bundles ref from the remote.
accumulate in the special remote, as configured by the
`remote.<name>.annex-max-git-bundles` git config. The special remote accumulates one GITBUNDLE object per push, and old
objects are usually not deleted. This means that refs pushed to the special
remote can still be accessed even after deleting or overwriting them.
A push that deletes every ref from the special remote does delete all
the accumulated GITBUNDLE objects. But of course, making such a push
means that someone clones from the special remote at that point in time
will see an empty remote.
Like any git repository, a git repository stored on a special remote can Like any git repository, a git repository stored on a special remote can
have conflicting things pushed to it from different places. This mostly have conflicting things pushed to it from different places. This mostly
works the same as any other git repository, eg a push that overwrites other works the same as any other git repository, eg a push that overwrites other
work will be prevented unless forced. However, it is possible, when work will be prevented unless forced. However, it is possible, when
conflicting pushes are being done at the same time, for one of the pushes conflicting pushes are being done at the same time, for one of the pushes
to be overwritten by the other one. In this sitiation, the push will appear to be overwritten by the other one. In this situation, the push will appear
to have succeeded, but pulling later will show the true situation. to have succeeded, but pulling later will show the true situation.
# SEE ALSO # SEE ALSO

View file

@ -43,6 +43,13 @@ stored in such a special remote, this procedure will work:
(Note that later bundles can update refs from the versions in previous (Note that later bundles can update refs from the versions in previous
bundles.) bundles.)
When the special remote is encrypted, the GITMANIFEST and GITBUNDLE will Note that, if a GITBUNDLE listed in the GITMANIFEST turns out not to exist,
also be encrypted. To decrypt those manually, see this a clone should treat this the same as if the GITMANIFEST were empty.
[[fairly simple shell script using standard tools|tips/Decrypting_files_in_special_remotes_without_git-annex]]. bundle objects are deleted when a push is made to the remote that
deletes all refs from it, and in a race between such a push and another
push of some refs, it is possible for the GITMANIFEST to refer to deleted
bundles.
When the special remote is encrypted, both the names and content of
the GITMANIFEST and GITBUNDLE will also be encrypted. To
decrypt those manually, see this [[fairly simple shell script using standard tools|tips/Decrypting_files_in_special_remotes_without_git-annex]].

View file

@ -12,6 +12,9 @@ This is implememented and working. Remaining todo list for it:
* Test incremental push edge cases involving checkprereq. * Test incremental push edge cases involving checkprereq.
* Cloning a special remote with an empty manifest results in a repo where
git fetch fails, claiming the special remote is encrypted, when it's not.
* Cloning from an annex:: url with importtree=yes doesn't work * Cloning from an annex:: url with importtree=yes doesn't work
(with or without exporttree=yes). This is because the ContentIdentifier (with or without exporttree=yes). This is because the ContentIdentifier
db is not populated. It should be possible to work around this. db is not populated. It should be possible to work around this.
@ -60,61 +63,3 @@ This is implememented and working. Remaining todo list for it:
They would have to use git fetch --prune to notice the deletion. They would have to use git fetch --prune to notice the deletion.
Once the user does notice, they can re-push their ref to recover. Once the user does notice, they can re-push their ref to recover.
Can this be improved? Can this be improved?
## bundle deletion problems
Deleting bundles results in some problems involving races,
detailed below, that result in the manifest file listing a bundle that has
been deleted. Which breaks cloning, and is data loss, and so *must*
be solved before release.
* A race between an incremental push and a full push can result in
a bundle that the incremental push is based on being deleted by the full
push, and then incremental push's manifest file being written later.
Which will prevent cloning or some pulls from working.
A fix: Make each full push (and emptying push) also write to a fallback
manifest file that is only written by full pushes (and emptying pushes),
not incremental pushes. When fetching the main manifest file, always
check that all bundles mentioned in it are still in the remote. If any
are missing, fetch and use the fallback manifest file instead.
* A race between two full pushes can also result in the manifest file listing
a bundle that has been deleted:
Start with a full push of bundle A.
Then there are 2 racing full pushes X and Y, of bundle A and B
respectively. With this series of operations:
1. Y: write bundle B
1. Y: read manifest (listing A)
1. Y: write B to manifest
1. X: write bundle A
1. Y: delete bundle A
1. X: read manifest (listing B)
1. X: write A to manifest
1. X: delete bundle B
Which results in a manifest that lists A, but that bundle was deleted.
The problems above *could* be solved by not deleting bundles, but that is
unsatisfactory.
Old bundles could be deleted after some period of time. But a process can
be suspended at any point and resumed later, so the race windows can be
arbitrarily wide.
What if only emptying pushes delete bundles? If a manifest file refers to a
bundle that has been deleted, that can be treated the same as if the
manifest file was empty, because we know that, for that bundle to have been
deleted, there must have been an emptying push. So this would work.
It is kind of a cop-out, because it requires the user to do an emptying
push from time to time. But by doing that, the user will expect that
someone who pulls at that point gets an empty repository.
Note that a race between an emptying push an a ref push will result in the
emptying push winning, so the ref push is lost. This is the same behavior
as can happen in a push race not involving deletion though, and any
improvements that are made to the UI around that will also help with this.