only delete bundles on pushEmpty
This avoids some apparently otherwise unsolveable problems involving races that resulted in the manifest listing bundles that were deleted. Removed the annex-max-git-bundles config because it can't actually result in deleting old bundles. It would still be possible to have a config that controls how often to do a full push, which would avoid needing to download too many bundles on clone, as well as needing to checkpresent too many bundles in verifyManifest. But it would need a different name and description.
This commit is contained in:
parent
f544946b09
commit
3e7324bbcb
6 changed files with 53 additions and 103 deletions
|
@ -272,9 +272,8 @@ fullPush' :: Manifest -> State -> Remote -> [Ref] -> Annex (Bool, State)
|
||||||
fullPush' oldmanifest st rmt refs =do
|
fullPush' oldmanifest st rmt refs =do
|
||||||
let bs = map Git.Bundle.fullBundleSpec refs
|
let bs = map Git.Bundle.fullBundleSpec refs
|
||||||
bundlekey <- generateAndUploadGitBundle rmt bs oldmanifest
|
bundlekey <- generateAndUploadGitBundle rmt bs oldmanifest
|
||||||
oldmanifest' <- dropOldKeys rmt oldmanifest (/= bundlekey)
|
|
||||||
let manifest = mkManifest [bundlekey]
|
let manifest = mkManifest [bundlekey]
|
||||||
(inManifest oldmanifest ++ outManifest oldmanifest')
|
(inManifest oldmanifest ++ outManifest oldmanifest)
|
||||||
uploadManifest rmt manifest
|
uploadManifest rmt manifest
|
||||||
return (True, st { manifestCache = Nothing })
|
return (True, st { manifestCache = Nothing })
|
||||||
|
|
||||||
|
@ -291,17 +290,11 @@ guardPush st a = catchNonAsync a $ \ex -> do
|
||||||
incrementalPush :: State -> Remote -> M.Map Ref Sha -> M.Map Ref Sha -> Annex (Bool, State)
|
incrementalPush :: State -> Remote -> M.Map Ref Sha -> M.Map Ref Sha -> Annex (Bool, State)
|
||||||
incrementalPush st rmt oldtrackingrefs newtrackingrefs = guardPush st $ do
|
incrementalPush st rmt oldtrackingrefs newtrackingrefs = guardPush st $ do
|
||||||
oldmanifest <- maybe (downloadManifestWhenPresent rmt) pure (manifestCache st)
|
oldmanifest <- maybe (downloadManifestWhenPresent rmt) pure (manifestCache st)
|
||||||
if length (inManifest oldmanifest) + 1 > remoteAnnexMaxGitBundles (Remote.gitconfig rmt)
|
bs <- calc [] (M.toList newtrackingrefs)
|
||||||
then fullPush' oldmanifest st rmt (M.keys newtrackingrefs)
|
bundlekey <- generateAndUploadGitBundle rmt bs oldmanifest
|
||||||
else go oldmanifest
|
uploadManifest rmt (oldmanifest <> mkManifest [bundlekey] [])
|
||||||
|
return (True, st { manifestCache = Nothing })
|
||||||
where
|
where
|
||||||
go oldmanifest = do
|
|
||||||
bs <- calc [] (M.toList newtrackingrefs)
|
|
||||||
bundlekey <- generateAndUploadGitBundle rmt bs oldmanifest
|
|
||||||
oldmanifest' <- dropOldKeys rmt oldmanifest (/= bundlekey)
|
|
||||||
uploadManifest rmt (oldmanifest' <> mkManifest [bundlekey] [])
|
|
||||||
return (True, st { manifestCache = Nothing })
|
|
||||||
|
|
||||||
calc c [] = return (reverse c)
|
calc c [] = return (reverse c)
|
||||||
calc c ((ref, sha):refs) = case M.lookup ref oldtrackingrefs of
|
calc c ((ref, sha):refs) = case M.lookup ref oldtrackingrefs of
|
||||||
Just oldsha
|
Just oldsha
|
||||||
|
@ -610,7 +603,7 @@ downloadManifest rmt = get mkmain >>= maybe (get mkbak) (pure . Just)
|
||||||
_ <- dl tmp
|
_ <- dl tmp
|
||||||
b <- liftIO (B.readFile tmp)
|
b <- liftIO (B.readFile tmp)
|
||||||
case parseManifest b of
|
case parseManifest b of
|
||||||
Right m -> return (Just m)
|
Right m -> Just <$> verifyManifest rmt m
|
||||||
Left err -> giveup err
|
Left err -> giveup err
|
||||||
|
|
||||||
getexport _ [] = return Nothing
|
getexport _ [] = return Nothing
|
||||||
|
@ -634,13 +627,6 @@ downloadManifest rmt = get mkmain >>= maybe (get mkbak) (pure . Just)
|
||||||
-- So this may be interrupted and leave the manifest key not present.
|
-- So this may be interrupted and leave the manifest key not present.
|
||||||
-- To deal with that, there is a backup manifest key. This takes care
|
-- To deal with that, there is a backup manifest key. This takes care
|
||||||
-- to ensure that one of the two keys will always exist.
|
-- to ensure that one of the two keys will always exist.
|
||||||
--
|
|
||||||
-- Once the manifest has been uploaded, attempts to drop all outManifest
|
|
||||||
-- keys. A failure to drop does not cause an error to be thrown, because
|
|
||||||
-- the push has already succeeded. Avoids re-uploading the manifest with
|
|
||||||
-- the dropped keys removed from outManifest, because dropping the keys
|
|
||||||
-- takes some time and another push may have already overwritten
|
|
||||||
-- the manifest in the meantime.
|
|
||||||
uploadManifest :: Remote -> Manifest -> Annex ()
|
uploadManifest :: Remote -> Manifest -> Annex ()
|
||||||
uploadManifest rmt manifest = do
|
uploadManifest rmt manifest = do
|
||||||
ok <- ifM (Remote.checkPresent rmt mkbak)
|
ok <- ifM (Remote.checkPresent rmt mkbak)
|
||||||
|
@ -650,9 +636,8 @@ uploadManifest rmt manifest = do
|
||||||
-- This ensures that at no point are both deleted.
|
-- This ensures that at no point are both deleted.
|
||||||
, put mkbak <&&> dropandput mkmain
|
, put mkbak <&&> dropandput mkmain
|
||||||
)
|
)
|
||||||
if ok
|
unless ok
|
||||||
then void $ dropOldKeys rmt manifest (const True)
|
uploadfailed
|
||||||
else uploadfailed
|
|
||||||
where
|
where
|
||||||
mkmain = genManifestKey (Remote.uuid rmt)
|
mkmain = genManifestKey (Remote.uuid rmt)
|
||||||
mkbak = genBackupManifestKey (Remote.uuid rmt)
|
mkbak = genBackupManifestKey (Remote.uuid rmt)
|
||||||
|
@ -696,6 +681,17 @@ dropOldKeys rmt manifest p =
|
||||||
mkManifest (inManifest manifest)
|
mkManifest (inManifest manifest)
|
||||||
<$> filterM (dropKey rmt) (filter p (outManifest manifest))
|
<$> filterM (dropKey rmt) (filter p (outManifest manifest))
|
||||||
|
|
||||||
|
-- When pushEmpty raced with another push, it could result in the manifest
|
||||||
|
-- listing bundles that it deleted. Such a manifest has to be treated the
|
||||||
|
-- same as an empty manifest. To detect that, this checks that all the
|
||||||
|
-- bundles listed in the manifest still exist on the remote.
|
||||||
|
verifyManifest :: Remote -> Manifest -> Annex Manifest
|
||||||
|
verifyManifest rmt manifest =
|
||||||
|
ifM (allM (checkPresentGitBundle rmt) (inManifest manifest))
|
||||||
|
( return manifest
|
||||||
|
, return $ mkManifest [] (inManifest manifest <> outManifest manifest)
|
||||||
|
)
|
||||||
|
|
||||||
-- Downloads a git bundle to the annex objects directory, unless
|
-- Downloads a git bundle to the annex objects directory, unless
|
||||||
-- the object file is already present. Returns the filename of the object
|
-- the object file is already present. Returns the filename of the object
|
||||||
-- file.
|
-- file.
|
||||||
|
@ -732,6 +728,16 @@ downloadGitBundle rmt k = getKeyExportLocations rmt k >>= \case
|
||||||
rsp = Remote.retrievalSecurityPolicy rmt
|
rsp = Remote.retrievalSecurityPolicy rmt
|
||||||
vc = Remote.RemoteVerify rmt
|
vc = Remote.RemoteVerify rmt
|
||||||
|
|
||||||
|
-- Checks if a bundle is present. Throws errors if the remote cannot be
|
||||||
|
-- accessed.
|
||||||
|
checkPresentGitBundle :: Remote -> Key -> Annex Bool
|
||||||
|
checkPresentGitBundle rmt k =
|
||||||
|
getKeyExportLocations rmt k >>= \case
|
||||||
|
Nothing -> Remote.checkPresent rmt k
|
||||||
|
Just locs -> anyM checkexport locs
|
||||||
|
where
|
||||||
|
checkexport = Remote.checkPresentExport (Remote.exportActions rmt) k
|
||||||
|
|
||||||
-- Uploads a bundle or manifest object from the annex objects directory
|
-- Uploads a bundle or manifest object from the annex objects directory
|
||||||
-- to the remote.
|
-- to the remote.
|
||||||
--
|
--
|
||||||
|
|
|
@ -373,7 +373,6 @@ data RemoteGitConfig = RemoteGitConfig
|
||||||
, remoteAnnexBwLimitDownload :: Maybe BwRate
|
, remoteAnnexBwLimitDownload :: Maybe BwRate
|
||||||
, remoteAnnexAllowUnverifiedDownloads :: Bool
|
, remoteAnnexAllowUnverifiedDownloads :: Bool
|
||||||
, remoteAnnexConfigUUID :: Maybe UUID
|
, remoteAnnexConfigUUID :: Maybe UUID
|
||||||
, remoteAnnexMaxGitBundles :: Int
|
|
||||||
, remoteAnnexAllowEncryptedGitRepo :: Bool
|
, remoteAnnexAllowEncryptedGitRepo :: Bool
|
||||||
|
|
||||||
{- These settings are specific to particular types of remotes
|
{- These settings are specific to particular types of remotes
|
||||||
|
@ -479,8 +478,6 @@ extractRemoteGitConfig r remotename = do
|
||||||
, remoteAnnexDdarRepo = getmaybe "ddarrepo"
|
, remoteAnnexDdarRepo = getmaybe "ddarrepo"
|
||||||
, remoteAnnexHookType = notempty $ getmaybe "hooktype"
|
, remoteAnnexHookType = notempty $ getmaybe "hooktype"
|
||||||
, remoteAnnexExternalType = notempty $ getmaybe "externaltype"
|
, remoteAnnexExternalType = notempty $ getmaybe "externaltype"
|
||||||
, remoteAnnexMaxGitBundles =
|
|
||||||
fromMaybe 100 (getmayberead "max-git-bundles")
|
|
||||||
, remoteAnnexAllowEncryptedGitRepo =
|
, remoteAnnexAllowEncryptedGitRepo =
|
||||||
getbool "allow-encrypted-gitrepo" False
|
getbool "allow-encrypted-gitrepo" False
|
||||||
}
|
}
|
||||||
|
|
|
@ -1648,17 +1648,6 @@ Remotes are configured using these settings in `.git/config`.
|
||||||
remotes, and is set when using [[git-annex-initremote]](1) with the
|
remotes, and is set when using [[git-annex-initremote]](1) with the
|
||||||
`--private` option.
|
`--private` option.
|
||||||
|
|
||||||
* `remote.<name>.annex-max-git-bundles`, `annex.max-git-bundles`
|
|
||||||
|
|
||||||
When using [[git-remote-annex]] to store a git repository in a special
|
|
||||||
remote, this configures how many separate git bundle objects to store
|
|
||||||
in the special remote before re-uploading a single git bundle that contains
|
|
||||||
the entire git repository.
|
|
||||||
|
|
||||||
The default is 100, which aims to avoid often needing to often re-upload,
|
|
||||||
while preventing a new clone needing to download too many objects. Set to
|
|
||||||
0 to disable re-uploading.
|
|
||||||
|
|
||||||
* `remote.<name>.annex-allow-encrypted-gitrepo`
|
* `remote.<name>.annex-allow-encrypted-gitrepo`
|
||||||
|
|
||||||
Setting this to true allows using [[git-remote-annex]] to push the git
|
Setting this to true allows using [[git-remote-annex]] to push the git
|
||||||
|
|
|
@ -38,23 +38,29 @@ variables. Some special remotes may also need environment variables to be
|
||||||
set when pulling or pushing.
|
set when pulling or pushing.
|
||||||
|
|
||||||
The git repository is stored in the special remote using special annex objects
|
The git repository is stored in the special remote using special annex objects
|
||||||
with names starting with "GITMANIFEST--" and "GITBUNDLE--". For details about
|
with names starting with "GITMANIFEST" and "GITBUNDLE". For details about
|
||||||
how the git repository is stored, see
|
how the git repository is stored, see
|
||||||
<https://git-annex.branchable.com/internals/git-remote-annex/>
|
<https://git-annex.branchable.com/internals/git-remote-annex/>
|
||||||
|
|
||||||
Pushes to a special remote are usually done incrementally. However,
|
Pushes to a special remote are usually done incrementally. However,
|
||||||
sometimes the whole git repository (but not the annex) needs to be
|
sometimes the whole git repository (but not the annex) needs to be
|
||||||
re-uploaded. That is done when force pushing a ref, or deleting a
|
re-uploaded. That is done when force pushing a ref, or deleting a
|
||||||
ref from the remote. It's also done when too many git bundles
|
ref from the remote.
|
||||||
accumulate in the special remote, as configured by the
|
|
||||||
`remote.<name>.annex-max-git-bundles` git config.
|
The special remote accumulates one GITBUNDLE object per push, and old
|
||||||
|
objects are usually not deleted. This means that refs pushed to the special
|
||||||
|
remote can still be accessed even after deleting or overwriting them.
|
||||||
|
A push that deletes every ref from the special remote does delete all
|
||||||
|
the accumulated GITBUNDLE objects. But of course, making such a push
|
||||||
|
means that someone clones from the special remote at that point in time
|
||||||
|
will see an empty remote.
|
||||||
|
|
||||||
Like any git repository, a git repository stored on a special remote can
|
Like any git repository, a git repository stored on a special remote can
|
||||||
have conflicting things pushed to it from different places. This mostly
|
have conflicting things pushed to it from different places. This mostly
|
||||||
works the same as any other git repository, eg a push that overwrites other
|
works the same as any other git repository, eg a push that overwrites other
|
||||||
work will be prevented unless forced. However, it is possible, when
|
work will be prevented unless forced. However, it is possible, when
|
||||||
conflicting pushes are being done at the same time, for one of the pushes
|
conflicting pushes are being done at the same time, for one of the pushes
|
||||||
to be overwritten by the other one. In this sitiation, the push will appear
|
to be overwritten by the other one. In this situation, the push will appear
|
||||||
to have succeeded, but pulling later will show the true situation.
|
to have succeeded, but pulling later will show the true situation.
|
||||||
|
|
||||||
# SEE ALSO
|
# SEE ALSO
|
||||||
|
|
|
@ -43,6 +43,13 @@ stored in such a special remote, this procedure will work:
|
||||||
(Note that later bundles can update refs from the versions in previous
|
(Note that later bundles can update refs from the versions in previous
|
||||||
bundles.)
|
bundles.)
|
||||||
|
|
||||||
When the special remote is encrypted, the GITMANIFEST and GITBUNDLE will
|
Note that, if a GITBUNDLE listed in the GITMANIFEST turns out not to exist,
|
||||||
also be encrypted. To decrypt those manually, see this
|
a clone should treat this the same as if the GITMANIFEST were empty.
|
||||||
[[fairly simple shell script using standard tools|tips/Decrypting_files_in_special_remotes_without_git-annex]].
|
bundle objects are deleted when a push is made to the remote that
|
||||||
|
deletes all refs from it, and in a race between such a push and another
|
||||||
|
push of some refs, it is possible for the GITMANIFEST to refer to deleted
|
||||||
|
bundles.
|
||||||
|
|
||||||
|
When the special remote is encrypted, both the names and content of
|
||||||
|
the GITMANIFEST and GITBUNDLE will also be encrypted. To
|
||||||
|
decrypt those manually, see this [[fairly simple shell script using standard tools|tips/Decrypting_files_in_special_remotes_without_git-annex]].
|
||||||
|
|
|
@ -12,6 +12,9 @@ This is implememented and working. Remaining todo list for it:
|
||||||
|
|
||||||
* Test incremental push edge cases involving checkprereq.
|
* Test incremental push edge cases involving checkprereq.
|
||||||
|
|
||||||
|
* Cloning a special remote with an empty manifest results in a repo where
|
||||||
|
git fetch fails, claiming the special remote is encrypted, when it's not.
|
||||||
|
|
||||||
* Cloning from an annex:: url with importtree=yes doesn't work
|
* Cloning from an annex:: url with importtree=yes doesn't work
|
||||||
(with or without exporttree=yes). This is because the ContentIdentifier
|
(with or without exporttree=yes). This is because the ContentIdentifier
|
||||||
db is not populated. It should be possible to work around this.
|
db is not populated. It should be possible to work around this.
|
||||||
|
@ -60,61 +63,3 @@ This is implememented and working. Remaining todo list for it:
|
||||||
They would have to use git fetch --prune to notice the deletion.
|
They would have to use git fetch --prune to notice the deletion.
|
||||||
Once the user does notice, they can re-push their ref to recover.
|
Once the user does notice, they can re-push their ref to recover.
|
||||||
Can this be improved?
|
Can this be improved?
|
||||||
|
|
||||||
## bundle deletion problems
|
|
||||||
|
|
||||||
Deleting bundles results in some problems involving races,
|
|
||||||
detailed below, that result in the manifest file listing a bundle that has
|
|
||||||
been deleted. Which breaks cloning, and is data loss, and so *must*
|
|
||||||
be solved before release.
|
|
||||||
|
|
||||||
* A race between an incremental push and a full push can result in
|
|
||||||
a bundle that the incremental push is based on being deleted by the full
|
|
||||||
push, and then incremental push's manifest file being written later.
|
|
||||||
Which will prevent cloning or some pulls from working.
|
|
||||||
|
|
||||||
A fix: Make each full push (and emptying push) also write to a fallback
|
|
||||||
manifest file that is only written by full pushes (and emptying pushes),
|
|
||||||
not incremental pushes. When fetching the main manifest file, always
|
|
||||||
check that all bundles mentioned in it are still in the remote. If any
|
|
||||||
are missing, fetch and use the fallback manifest file instead.
|
|
||||||
|
|
||||||
* A race between two full pushes can also result in the manifest file listing
|
|
||||||
a bundle that has been deleted:
|
|
||||||
|
|
||||||
Start with a full push of bundle A.
|
|
||||||
|
|
||||||
Then there are 2 racing full pushes X and Y, of bundle A and B
|
|
||||||
respectively. With this series of operations:
|
|
||||||
|
|
||||||
1. Y: write bundle B
|
|
||||||
1. Y: read manifest (listing A)
|
|
||||||
1. Y: write B to manifest
|
|
||||||
1. X: write bundle A
|
|
||||||
1. Y: delete bundle A
|
|
||||||
1. X: read manifest (listing B)
|
|
||||||
1. X: write A to manifest
|
|
||||||
1. X: delete bundle B
|
|
||||||
|
|
||||||
Which results in a manifest that lists A, but that bundle was deleted.
|
|
||||||
|
|
||||||
The problems above *could* be solved by not deleting bundles, but that is
|
|
||||||
unsatisfactory.
|
|
||||||
|
|
||||||
Old bundles could be deleted after some period of time. But a process can
|
|
||||||
be suspended at any point and resumed later, so the race windows can be
|
|
||||||
arbitrarily wide.
|
|
||||||
|
|
||||||
What if only emptying pushes delete bundles? If a manifest file refers to a
|
|
||||||
bundle that has been deleted, that can be treated the same as if the
|
|
||||||
manifest file was empty, because we know that, for that bundle to have been
|
|
||||||
deleted, there must have been an emptying push. So this would work.
|
|
||||||
|
|
||||||
It is kind of a cop-out, because it requires the user to do an emptying
|
|
||||||
push from time to time. But by doing that, the user will expect that
|
|
||||||
someone who pulls at that point gets an empty repository.
|
|
||||||
|
|
||||||
Note that a race between an emptying push an a ref push will result in the
|
|
||||||
emptying push winning, so the ref push is lost. This is the same behavior
|
|
||||||
as can happen in a push race not involving deletion though, and any
|
|
||||||
improvements that are made to the UI around that will also help with this.
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue