git-annex/doc/todo/distributed_migration.mdwn

71 lines
3.1 KiB
Markdown

Currently `git-annex migrate` only hard links the objects in the local
repo. This leaves other clones without the new keys' objects unless
they re-download them, or unless the same migrate command is
re-run, in the same tree, on each clone.
It would be good to support distributed migration, so that whatever
migration is done in one repo is reflected in the other repos.
This needs some way to store, in the git repo, a mapping between the old
key and the new key it has been migrated to. (I investigated
how much space that would need in the git repo, in
[this comment](https://git-annex.branchable.com/todo/alternate_keys_for_same_content/#comment-917eba0b2d1637236c5d900ecb5d8da0).)
The mapping might be communicated via the git branch but be locally stored
in a sqlite database to make querying it fast.
Once that mapping is available, one simple way to use it would be a
git-annex command that updates the local repo to reflect migrations that
have happened elsewhere. It would not touch the HEAD branch, but would
just hard link object files from the old to new key, and update the location
log for the new key to indicate the content is present in the repo.
This command could be something like `git-annex migrate --update`.
Rather than a dedicated command that users need to remember to run,
distributed migration could be done automatically when merging a git-annex
branch that adds migration information. Just hardlink object files and
update the location log.
It would be possible to avoid updating the location log, but then all
location log queries would have to check the migration mapping. It would be
hard to make that fast enough. Consider `git-annex find --in foo`, which
queries the location log for each file.
--[[Joey]]
> [[done]] --[[Joey]]
# security
It is possible for bad migration information to be recorded in the
git-annex branch by someone malicious. To avoid bad or insecure behavior
when bad migration information is recorded:
* When updating the local repository with a migration, verify that
the object file hashes to the new key before hardlinking.
> This was done.
That leaves at these possible security problems:
* DOS by flooding the git-annex branch with migrations, resulting in
lots of hard links (or copies on filesystems not supporting hard links)
and hashing of large files.
Note that a malicious person who can write to the git-annex branch
can already set their own repo as trusted, wait for someone
to drop their local copy, and then demand a ransom for the content.
For that matter, someone hosting a git-annex remote on a server can wait
for someone to rely on it to contain the only copy of content and ransom
it then.
git-annex is probably not normally used in situations where we
need to worry about this kind of attack; if we don't trust someone we
shouldn't pull the git-annex branch from them, and should not trust their
remote to contain the only copy.
If we pull a git-annex branch from someone, they can already DOS disk space
and CPU by checking a lot of junk into git. So maybe a DOS by migration is
not really a concern.
> If people are worried about this kind of thing, they can avoid using the
> feature. --[[Joey]]