split out todo for special remotes and close the main todo

2023-12-08 14:25:05 -04:00 · 2023-12-08 14:25:05 -04:00 · 362a2808a5
commit 362a2808a5
parent 76e11e4458
2 changed files with 49 additions and 32 deletions
--- a/doc/todo/distributed_migration.mdwn
+++ b/doc/todo/distributed_migration.mdwn
@ -20,29 +20,10 @@ just hard link object files from the old to new key, and update the location
 log for the new key to indicate the content is present in the repo.
 This command could be something like `git-annex migrate --update`.
 That wouldn't be entirely sufficient though, because special remotes from
 pre-migration will be populated with the old keys. A similar command could
 upload the new content to special remotes, but that would double the data
 stored in a special remote (or drop the old keys from them), 
 and use a lot of bandwidth. Probably not a good idea.
 Alternatively, the old key could be left on a special remote, but update
 the location log for the special remote to say it has the new key,
 and have git-annex request the old key when it wants to get (or checkpresent)
 the new key from the special remote.
 This would need the mapping to be cheap enough to query that it won't
 signficantly slow down accessing a special remote.
 Dropping the new key from the special remote would then need to drop the
 old key. But that could violate numcopies for the old key. Perhaps it could
 check numcopies for the old key and drop it, otherwise leave the old key on
 the special remote.
 Rather than a dedicated command that users need to remember to run,
 distributed migration could be done automatically when merging a git-annex
 branch that adds migration information. Just hardlink object files and
-update the location log for the local repo and for available special
+update the location log.
 remotes.
 It would be possible to avoid updating the location log, but then all
 location log queries would have to check the migration mapping. It would be
@ -51,6 +32,8 @@ queries the location log for each file.
 --[[Joey]]
 > [[done]] --[[Joey]]
 # security
 It is possible for bad migration information to be recorded in the
@ -59,20 +42,10 @@ when bad migration information is recorded:
 * When updating the local repository with a migration, verify that
  the object file hashes to the new key before hardlinking.
 * When downloading content from a special remote by getting the old
  pre-migration key, verify that download hashes to the new key.
-That leaves at least two possible security problems:
+> This was done.
-* checkpresent against the special remote has to trust that the content
+That leaves at these possible security problems:
  stored on it for the old key will hash to the new key. This could result
  in data loss when a bad migration is provided, and the special remote is
  trusted.
  Eg, if key A is locally present, and B is present on the special
  remote, and then wrong migration is recorded from B to A, 
  the special remote will be treated as containing a copy of A,
  allowing dropping the local copy of A, which was the only copy.
 * DOS by flooding the git-annex branch with migrations, resulting in 
  lots of hard links (or copies on filesystems not supporting hard links)
@ -93,3 +66,6 @@ remote to contain the only copy.
 If we pull a git-annex branch from someone, they can already DOS disk space
 and CPU by checking a lot of junk into git. So maybe a DOS by migration is
 not really a concern.
 > If people are worried about this kind of thing, they can avoid using the
 > feature. --[[Joey]]
--- a/doc/todo/distributed_migration_for_special_remotes.mdwn
+++ b/doc/todo/distributed_migration_for_special_remotes.mdwn
@ -0,0 +1,41 @@
 [[distributed_migration]] is implemented for local repositories via
 `git-annex migrate --update`.
 That leaves updating special remotes after a migration as the main pain
 point in doing migrations.
 One approach would be a command like `git-annex migrate
 --update-remote=foo` that uploads new keys and drops old keys.
 But that would double the data stored in the special remote and use a lot
 of bandwidth.
 Alternatively, the old key could be left on a special remote, but update
 the location log for the special remote to say it has the new key,
 and have git-annex request the old key when it wants to get (or checkpresent)
 the new key from the special remote.
 This would need the mapping to be cheap enough to query that it won't
 signficantly slow down accessing a special remote.
 Dropping the new key from the special remote would then need to drop the
 old key. But that could violate numcopies for the old key. Perhaps it could
 check numcopies for the old key and drop it, otherwise leave the old key on
 the special remote.
 --[[Joey]]
 # security
 When downloading content from a special remote by getting the old
 pre-migration key it's important to verify that download hashes to the new key.
 See [[distributed_migration]]'s security section for relevant background.
 Another problem to consider: checkpresent against the special remote has to
 trust that the content stored on it for the old key will hash to the new
 key. This could result in data loss when a bad migration is provided, and
 the special remote is trusted.
 Eg, if key A is locally present, and B is present on the special
 remote, and then wrong migration is recorded from B to A, 
 the special remote will be treated as containing a copy of A,
 allowing dropping the local copy of A, which was the only copy.