S3: Detect when version=yes but an exported file lacks versioning, and refuse to delete it, to avoid data loss.

This commit was sponsored by Denis Dzyubenko on Patreon.
This commit is contained in:
Joey Hess 2019-01-29 15:07:27 -04:00
parent a4f71aa1e8
commit a8f1add4d1
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 46 additions and 22 deletions

View file

@ -10,6 +10,8 @@ git-annex (7.20190123) UNRELEASED; urgency=medium
* enableremote S3: Do not let versioning=yes be set on existing remote, * enableremote S3: Do not let versioning=yes be set on existing remote,
because when git-annex lacks S3 version IDs for files stored in because when git-annex lacks S3 version IDs for files stored in
the bucket, deleting them would cause data loss. the bucket, deleting them would cause data loss.
* S3: Detect when version=yes but an exported file lacks versioning,
and refuse to delete it, to avoid data loss.
-- Joey Hess <id@joeyh.name> Wed, 23 Jan 2019 12:54:56 -0400 -- Joey Hess <id@joeyh.name> Wed, 23 Jan 2019 12:54:56 -0400

View file

@ -383,7 +383,7 @@ retrieveExportS3 u info mh _k loc f p =
exporturl = bucketExportLocation info loc exporturl = bucketExportLocation info loc
removeExportS3 :: UUID -> S3Info -> Maybe S3Handle -> Key -> ExportLocation -> Annex Bool removeExportS3 :: UUID -> S3Info -> Maybe S3Handle -> Key -> ExportLocation -> Annex Bool
removeExportS3 _u info (Just h) _k loc = removeExportS3 u info (Just h) k loc = checkVersioning info u $
catchNonAsync go (\e -> warning (show e) >> return False) catchNonAsync go (\e -> warning (show e) >> return False)
where where
go = do go = do
@ -406,7 +406,8 @@ checkPresentExportS3 u info Nothing k loc = case getPublicUrlMaker info of
-- S3 has no move primitive; copy and delete. -- S3 has no move primitive; copy and delete.
renameExportS3 :: UUID -> S3Info -> Maybe S3Handle -> Key -> ExportLocation -> ExportLocation -> Annex Bool renameExportS3 :: UUID -> S3Info -> Maybe S3Handle -> Key -> ExportLocation -> ExportLocation -> Annex Bool
renameExportS3 _u info (Just h) _k src dest = catchNonAsync go (\_ -> return False) renameExportS3 u info (Just h) k src dest = checkVersioning info u k src $
catchNonAsync go (\_ -> return False)
where where
go = do go = do
let co = S3.copyObject (bucket info) dstobject let co = S3.copyObject (bucket info) dstobject
@ -887,3 +888,22 @@ enableBucketVersioning ss c gc u = do
, "It's important you enable versioning before storing anything in the bucket!" , "It's important you enable versioning before storing anything in the bucket!"
] ]
#endif #endif
-- If the remote has versioning enabled, but the version ID is for some
-- reason not being recorded, it's not safe to perform an action that
-- will remove the unversioned file. The file may be the only copy of an
-- annex object.
--
-- This code could be removed eventually, since enableBucketVersioning
-- will avoid this situation. Before that was added, some remotes
-- were created without versioning, some unversioned files exported to
-- them, and then versioning enabled, and this is to avoid data loss in
-- those cases.
checkVersioning :: S3Info -> UUID -> Key -> Annex Bool -> Annex Bool
checkVersioning info u k a
| versioning info = getS3VersionID u k >>= \case
[] -> do
warning $ "Remote is configured to use versioning, but no S3 version ID is recorded for this key, so it cannot safely be modified."
return False
_ -> a
| otherwise = a

View file

@ -3,10 +3,6 @@ it, and then it's later changed to also have versioning=yes, an exporttree
that removes some of the original files can lose the only remaining copy of that removes some of the original files can lose the only remaining copy of
them. them.
exporttree does not currently check numcopies before removing from an
export. Normally all export remotes are untrusted, so they can't count as a
copy, and so removing something from them cannot violate numcopies.
An appendonly remote, such as S3 with exporttree=yes, is supposed to not An appendonly remote, such as S3 with exporttree=yes, is supposed to not
let git-annex remove content from it. So such a remote can be not let git-annex remove content from it. So such a remote can be not
untrusted, and exporttree can remove content from its exported tree without untrusted, and exporttree can remove content from its exported tree without
@ -19,25 +15,31 @@ appendonly remote. When exporttree removes a file from that S3 remote,
it could have contained the only copy of the file, and it may not have it could have contained the only copy of the file, and it may not have
versioning info for that file, so the only copy is lost. versioning info for that file, so the only copy is lost.
S3 remotes could refuse to allow versioning=yes to be set during ## Migration advice for users affected by this bug
enableremote, and only allow it at initremote time. And check that the
bucket does indeed have versioning enabled or refuse to allow that
configuration. That would avoid the problem.
(Unless the user changed the bucket configuration later to not allow If you think your S3 remote may be affected by this problem, you should
versioning. But if they did so, and an old version of the bucket was the immediately set it to untrusted to avoid data loss:
only place a file was stored, they would lose data without git-annex being `git annex untrust $mys3remotename`
run at all, so it's equivilant to them deleting the bucket, so this seems
not something it needs to worry about).
Plan: If you see a warning message "Remote is configured to use versioning, but no S3 version ID is recorded for this key",
your S3 remote is affected.
Also, the fixed git-annex (version 7.20190129) will detect the problem,
and refuse to delete unversioned files from your versioned S3 bucket.
This will leave you with a S3 remote containing some versioned and some
unversioned files. Kind of a mess. Best thing to do is to make a new
S3 remote, with versioning=yes exporttree=yes set from the beginning,
and copy all the content that was in the old S3 remote over to it.
Then you can delete the old S3 bucket, and use `git annex dead` to
make git-annex stop using it.
## Fix
* Wait for the PutBucketVersioning pull request to be merged.
(Done, not in a release yet, but will probably be aws-0.22)
* Auto-enable versioning during initremote (and not enableremote) * Auto-enable versioning during initremote (and not enableremote)
when versioning=yes. (Or prompt user to do it when aws is too old.) when versioning=yes. (Or prompt user to do it when aws is too old.)
(Done)
* Do not allow changing versioning= during enableremote. * Do not allow changing versioning= during enableremote.
* Any repos that previously enabled versioning after storing some * Make removeExport and renameExport check that
unversioned files still are at risk of data loss. Detect this there is a S3 version ID known, and fail if not.
case and treat them as versioning=no. How?
[[done]] --[[Joey]]