S3: Detect when version=yes but an exported file lacks versioning, and refuse to delete it, to avoid data loss.
This commit was sponsored by Denis Dzyubenko on Patreon.
This commit is contained in:
parent
a4f71aa1e8
commit
a8f1add4d1
3 changed files with 46 additions and 22 deletions
|
@ -10,6 +10,8 @@ git-annex (7.20190123) UNRELEASED; urgency=medium
|
||||||
* enableremote S3: Do not let versioning=yes be set on existing remote,
|
* enableremote S3: Do not let versioning=yes be set on existing remote,
|
||||||
because when git-annex lacks S3 version IDs for files stored in
|
because when git-annex lacks S3 version IDs for files stored in
|
||||||
the bucket, deleting them would cause data loss.
|
the bucket, deleting them would cause data loss.
|
||||||
|
* S3: Detect when version=yes but an exported file lacks versioning,
|
||||||
|
and refuse to delete it, to avoid data loss.
|
||||||
|
|
||||||
-- Joey Hess <id@joeyh.name> Wed, 23 Jan 2019 12:54:56 -0400
|
-- Joey Hess <id@joeyh.name> Wed, 23 Jan 2019 12:54:56 -0400
|
||||||
|
|
||||||
|
|
24
Remote/S3.hs
24
Remote/S3.hs
|
@ -383,7 +383,7 @@ retrieveExportS3 u info mh _k loc f p =
|
||||||
exporturl = bucketExportLocation info loc
|
exporturl = bucketExportLocation info loc
|
||||||
|
|
||||||
removeExportS3 :: UUID -> S3Info -> Maybe S3Handle -> Key -> ExportLocation -> Annex Bool
|
removeExportS3 :: UUID -> S3Info -> Maybe S3Handle -> Key -> ExportLocation -> Annex Bool
|
||||||
removeExportS3 _u info (Just h) _k loc =
|
removeExportS3 u info (Just h) k loc = checkVersioning info u $
|
||||||
catchNonAsync go (\e -> warning (show e) >> return False)
|
catchNonAsync go (\e -> warning (show e) >> return False)
|
||||||
where
|
where
|
||||||
go = do
|
go = do
|
||||||
|
@ -406,7 +406,8 @@ checkPresentExportS3 u info Nothing k loc = case getPublicUrlMaker info of
|
||||||
|
|
||||||
-- S3 has no move primitive; copy and delete.
|
-- S3 has no move primitive; copy and delete.
|
||||||
renameExportS3 :: UUID -> S3Info -> Maybe S3Handle -> Key -> ExportLocation -> ExportLocation -> Annex Bool
|
renameExportS3 :: UUID -> S3Info -> Maybe S3Handle -> Key -> ExportLocation -> ExportLocation -> Annex Bool
|
||||||
renameExportS3 _u info (Just h) _k src dest = catchNonAsync go (\_ -> return False)
|
renameExportS3 u info (Just h) k src dest = checkVersioning info u k src $
|
||||||
|
catchNonAsync go (\_ -> return False)
|
||||||
where
|
where
|
||||||
go = do
|
go = do
|
||||||
let co = S3.copyObject (bucket info) dstobject
|
let co = S3.copyObject (bucket info) dstobject
|
||||||
|
@ -887,3 +888,22 @@ enableBucketVersioning ss c gc u = do
|
||||||
, "It's important you enable versioning before storing anything in the bucket!"
|
, "It's important you enable versioning before storing anything in the bucket!"
|
||||||
]
|
]
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
-- If the remote has versioning enabled, but the version ID is for some
|
||||||
|
-- reason not being recorded, it's not safe to perform an action that
|
||||||
|
-- will remove the unversioned file. The file may be the only copy of an
|
||||||
|
-- annex object.
|
||||||
|
--
|
||||||
|
-- This code could be removed eventually, since enableBucketVersioning
|
||||||
|
-- will avoid this situation. Before that was added, some remotes
|
||||||
|
-- were created without versioning, some unversioned files exported to
|
||||||
|
-- them, and then versioning enabled, and this is to avoid data loss in
|
||||||
|
-- those cases.
|
||||||
|
checkVersioning :: S3Info -> UUID -> Key -> Annex Bool -> Annex Bool
|
||||||
|
checkVersioning info u k a
|
||||||
|
| versioning info = getS3VersionID u k >>= \case
|
||||||
|
[] -> do
|
||||||
|
warning $ "Remote is configured to use versioning, but no S3 version ID is recorded for this key, so it cannot safely be modified."
|
||||||
|
return False
|
||||||
|
_ -> a
|
||||||
|
| otherwise = a
|
||||||
|
|
|
@ -3,10 +3,6 @@ it, and then it's later changed to also have versioning=yes, an exporttree
|
||||||
that removes some of the original files can lose the only remaining copy of
|
that removes some of the original files can lose the only remaining copy of
|
||||||
them.
|
them.
|
||||||
|
|
||||||
exporttree does not currently check numcopies before removing from an
|
|
||||||
export. Normally all export remotes are untrusted, so they can't count as a
|
|
||||||
copy, and so removing something from them cannot violate numcopies.
|
|
||||||
|
|
||||||
An appendonly remote, such as S3 with exporttree=yes, is supposed to not
|
An appendonly remote, such as S3 with exporttree=yes, is supposed to not
|
||||||
let git-annex remove content from it. So such a remote can be not
|
let git-annex remove content from it. So such a remote can be not
|
||||||
untrusted, and exporttree can remove content from its exported tree without
|
untrusted, and exporttree can remove content from its exported tree without
|
||||||
|
@ -19,25 +15,31 @@ appendonly remote. When exporttree removes a file from that S3 remote,
|
||||||
it could have contained the only copy of the file, and it may not have
|
it could have contained the only copy of the file, and it may not have
|
||||||
versioning info for that file, so the only copy is lost.
|
versioning info for that file, so the only copy is lost.
|
||||||
|
|
||||||
S3 remotes could refuse to allow versioning=yes to be set during
|
## Migration advice for users affected by this bug
|
||||||
enableremote, and only allow it at initremote time. And check that the
|
|
||||||
bucket does indeed have versioning enabled or refuse to allow that
|
|
||||||
configuration. That would avoid the problem.
|
|
||||||
|
|
||||||
(Unless the user changed the bucket configuration later to not allow
|
If you think your S3 remote may be affected by this problem, you should
|
||||||
versioning. But if they did so, and an old version of the bucket was the
|
immediately set it to untrusted to avoid data loss:
|
||||||
only place a file was stored, they would lose data without git-annex being
|
`git annex untrust $mys3remotename`
|
||||||
run at all, so it's equivilant to them deleting the bucket, so this seems
|
|
||||||
not something it needs to worry about).
|
|
||||||
|
|
||||||
Plan:
|
If you see a warning message "Remote is configured to use versioning, but no S3 version ID is recorded for this key",
|
||||||
|
your S3 remote is affected.
|
||||||
|
|
||||||
|
Also, the fixed git-annex (version 7.20190129) will detect the problem,
|
||||||
|
and refuse to delete unversioned files from your versioned S3 bucket.
|
||||||
|
|
||||||
|
This will leave you with a S3 remote containing some versioned and some
|
||||||
|
unversioned files. Kind of a mess. Best thing to do is to make a new
|
||||||
|
S3 remote, with versioning=yes exporttree=yes set from the beginning,
|
||||||
|
and copy all the content that was in the old S3 remote over to it.
|
||||||
|
Then you can delete the old S3 bucket, and use `git annex dead` to
|
||||||
|
make git-annex stop using it.
|
||||||
|
|
||||||
|
## Fix
|
||||||
|
|
||||||
* Wait for the PutBucketVersioning pull request to be merged.
|
|
||||||
(Done, not in a release yet, but will probably be aws-0.22)
|
|
||||||
* Auto-enable versioning during initremote (and not enableremote)
|
* Auto-enable versioning during initremote (and not enableremote)
|
||||||
when versioning=yes. (Or prompt user to do it when aws is too old.)
|
when versioning=yes. (Or prompt user to do it when aws is too old.)
|
||||||
(Done)
|
|
||||||
* Do not allow changing versioning= during enableremote.
|
* Do not allow changing versioning= during enableremote.
|
||||||
* Any repos that previously enabled versioning after storing some
|
* Make removeExport and renameExport check that
|
||||||
unversioned files still are at risk of data loss. Detect this
|
there is a S3 version ID known, and fail if not.
|
||||||
case and treat them as versioning=no. How?
|
|
||||||
|
[[done]] --[[Joey]]
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue