From 362a2808a5acf2b5df92e088486e721c7a9ffa33 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 8 Dec 2023 14:25:05 -0400 Subject: [PATCH] split out todo for special remotes and close the main todo --- doc/todo/distributed_migration.mdwn | 40 ++++-------------- ...ributed_migration_for_special_remotes.mdwn | 41 +++++++++++++++++++ 2 files changed, 49 insertions(+), 32 deletions(-) create mode 100644 doc/todo/distributed_migration_for_special_remotes.mdwn diff --git a/doc/todo/distributed_migration.mdwn b/doc/todo/distributed_migration.mdwn index 12d7f8da3c..45a6d14398 100644 --- a/doc/todo/distributed_migration.mdwn +++ b/doc/todo/distributed_migration.mdwn @@ -20,29 +20,10 @@ just hard link object files from the old to new key, and update the location log for the new key to indicate the content is present in the repo. This command could be something like `git-annex migrate --update`. -That wouldn't be entirely sufficient though, because special remotes from -pre-migration will be populated with the old keys. A similar command could -upload the new content to special remotes, but that would double the data -stored in a special remote (or drop the old keys from them), -and use a lot of bandwidth. Probably not a good idea. - -Alternatively, the old key could be left on a special remote, but update -the location log for the special remote to say it has the new key, -and have git-annex request the old key when it wants to get (or checkpresent) -the new key from the special remote. -This would need the mapping to be cheap enough to query that it won't -signficantly slow down accessing a special remote. - -Dropping the new key from the special remote would then need to drop the -old key. But that could violate numcopies for the old key. Perhaps it could -check numcopies for the old key and drop it, otherwise leave the old key on -the special remote. - Rather than a dedicated command that users need to remember to run, distributed migration could be done automatically when merging a git-annex branch that adds migration information. Just hardlink object files and -update the location log for the local repo and for available special -remotes. +update the location log. It would be possible to avoid updating the location log, but then all location log queries would have to check the migration mapping. It would be @@ -51,6 +32,8 @@ queries the location log for each file. --[[Joey]] +> [[done]] --[[Joey]] + # security It is possible for bad migration information to be recorded in the @@ -59,20 +42,10 @@ when bad migration information is recorded: * When updating the local repository with a migration, verify that the object file hashes to the new key before hardlinking. -* When downloading content from a special remote by getting the old - pre-migration key, verify that download hashes to the new key. -That leaves at least two possible security problems: +> This was done. -* checkpresent against the special remote has to trust that the content - stored on it for the old key will hash to the new key. This could result - in data loss when a bad migration is provided, and the special remote is - trusted. - - Eg, if key A is locally present, and B is present on the special - remote, and then wrong migration is recorded from B to A, - the special remote will be treated as containing a copy of A, - allowing dropping the local copy of A, which was the only copy. +That leaves at these possible security problems: * DOS by flooding the git-annex branch with migrations, resulting in lots of hard links (or copies on filesystems not supporting hard links) @@ -93,3 +66,6 @@ remote to contain the only copy. If we pull a git-annex branch from someone, they can already DOS disk space and CPU by checking a lot of junk into git. So maybe a DOS by migration is not really a concern. + +> If people are worried about this kind of thing, they can avoid using the +> feature. --[[Joey]] diff --git a/doc/todo/distributed_migration_for_special_remotes.mdwn b/doc/todo/distributed_migration_for_special_remotes.mdwn new file mode 100644 index 0000000000..5794aed185 --- /dev/null +++ b/doc/todo/distributed_migration_for_special_remotes.mdwn @@ -0,0 +1,41 @@ +[[distributed_migration]] is implemented for local repositories via +`git-annex migrate --update`. + +That leaves updating special remotes after a migration as the main pain +point in doing migrations. + +One approach would be a command like `git-annex migrate +--update-remote=foo` that uploads new keys and drops old keys. +But that would double the data stored in the special remote and use a lot +of bandwidth. + +Alternatively, the old key could be left on a special remote, but update +the location log for the special remote to say it has the new key, +and have git-annex request the old key when it wants to get (or checkpresent) +the new key from the special remote. +This would need the mapping to be cheap enough to query that it won't +signficantly slow down accessing a special remote. + +Dropping the new key from the special remote would then need to drop the +old key. But that could violate numcopies for the old key. Perhaps it could +check numcopies for the old key and drop it, otherwise leave the old key on +the special remote. + +--[[Joey]] + +# security + + +When downloading content from a special remote by getting the old +pre-migration key it's important to verify that download hashes to the new key. +See [[distributed_migration]]'s security section for relevant background. + +Another problem to consider: checkpresent against the special remote has to +trust that the content stored on it for the old key will hash to the new +key. This could result in data loss when a bad migration is provided, and +the special remote is trusted. + +Eg, if key A is locally present, and B is present on the special +remote, and then wrong migration is recorded from B to A, +the special remote will be treated as containing a copy of A, +allowing dropping the local copy of A, which was the only copy.