S3: Detect when version=yes but an exported file lacks versioning, and refuse to delete it, to avoid data loss.

This commit was sponsored by Denis Dzyubenko on Patreon.
This commit is contained in:
Joey Hess 2019-01-29 15:07:27 -04:00
parent a4f71aa1e8
commit a8f1add4d1
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 46 additions and 22 deletions

View file

@ -10,6 +10,8 @@ git-annex (7.20190123) UNRELEASED; urgency=medium
* enableremote S3: Do not let versioning=yes be set on existing remote,
because when git-annex lacks S3 version IDs for files stored in
the bucket, deleting them would cause data loss.
* S3: Detect when version=yes but an exported file lacks versioning,
and refuse to delete it, to avoid data loss.
-- Joey Hess <id@joeyh.name> Wed, 23 Jan 2019 12:54:56 -0400

View file

@ -383,7 +383,7 @@ retrieveExportS3 u info mh _k loc f p =
exporturl = bucketExportLocation info loc
removeExportS3 :: UUID -> S3Info -> Maybe S3Handle -> Key -> ExportLocation -> Annex Bool
removeExportS3 _u info (Just h) _k loc =
removeExportS3 u info (Just h) k loc = checkVersioning info u $
catchNonAsync go (\e -> warning (show e) >> return False)
where
go = do
@ -406,7 +406,8 @@ checkPresentExportS3 u info Nothing k loc = case getPublicUrlMaker info of
-- S3 has no move primitive; copy and delete.
renameExportS3 :: UUID -> S3Info -> Maybe S3Handle -> Key -> ExportLocation -> ExportLocation -> Annex Bool
renameExportS3 _u info (Just h) _k src dest = catchNonAsync go (\_ -> return False)
renameExportS3 u info (Just h) k src dest = checkVersioning info u k src $
catchNonAsync go (\_ -> return False)
where
go = do
let co = S3.copyObject (bucket info) dstobject
@ -887,3 +888,22 @@ enableBucketVersioning ss c gc u = do
, "It's important you enable versioning before storing anything in the bucket!"
]
#endif
-- If the remote has versioning enabled, but the version ID is for some
-- reason not being recorded, it's not safe to perform an action that
-- will remove the unversioned file. The file may be the only copy of an
-- annex object.
--
-- This code could be removed eventually, since enableBucketVersioning
-- will avoid this situation. Before that was added, some remotes
-- were created without versioning, some unversioned files exported to
-- them, and then versioning enabled, and this is to avoid data loss in
-- those cases.
checkVersioning :: S3Info -> UUID -> Key -> Annex Bool -> Annex Bool
checkVersioning info u k a
| versioning info = getS3VersionID u k >>= \case
[] -> do
warning $ "Remote is configured to use versioning, but no S3 version ID is recorded for this key, so it cannot safely be modified."
return False
_ -> a
| otherwise = a

View file

@ -3,10 +3,6 @@ it, and then it's later changed to also have versioning=yes, an exporttree
that removes some of the original files can lose the only remaining copy of
them.
exporttree does not currently check numcopies before removing from an
export. Normally all export remotes are untrusted, so they can't count as a
copy, and so removing something from them cannot violate numcopies.
An appendonly remote, such as S3 with exporttree=yes, is supposed to not
let git-annex remove content from it. So such a remote can be not
untrusted, and exporttree can remove content from its exported tree without
@ -19,25 +15,31 @@ appendonly remote. When exporttree removes a file from that S3 remote,
it could have contained the only copy of the file, and it may not have
versioning info for that file, so the only copy is lost.
S3 remotes could refuse to allow versioning=yes to be set during
enableremote, and only allow it at initremote time. And check that the
bucket does indeed have versioning enabled or refuse to allow that
configuration. That would avoid the problem.
## Migration advice for users affected by this bug
(Unless the user changed the bucket configuration later to not allow
versioning. But if they did so, and an old version of the bucket was the
only place a file was stored, they would lose data without git-annex being
run at all, so it's equivilant to them deleting the bucket, so this seems
not something it needs to worry about).
If you think your S3 remote may be affected by this problem, you should
immediately set it to untrusted to avoid data loss:
`git annex untrust $mys3remotename`
Plan:
If you see a warning message "Remote is configured to use versioning, but no S3 version ID is recorded for this key",
your S3 remote is affected.
Also, the fixed git-annex (version 7.20190129) will detect the problem,
and refuse to delete unversioned files from your versioned S3 bucket.
This will leave you with a S3 remote containing some versioned and some
unversioned files. Kind of a mess. Best thing to do is to make a new
S3 remote, with versioning=yes exporttree=yes set from the beginning,
and copy all the content that was in the old S3 remote over to it.
Then you can delete the old S3 bucket, and use `git annex dead` to
make git-annex stop using it.
## Fix
* Wait for the PutBucketVersioning pull request to be merged.
(Done, not in a release yet, but will probably be aws-0.22)
* Auto-enable versioning during initremote (and not enableremote)
when versioning=yes. (Or prompt user to do it when aws is too old.)
(Done)
* Do not allow changing versioning= during enableremote.
* Any repos that previously enabled versioning after storing some
unversioned files still are at risk of data loss. Detect this
case and treat them as versioning=no. How?
* Make removeExport and renameExport check that
there is a S3 version ID known, and fail if not.
[[done]] --[[Joey]]