git-annex/doc/todo/alternate_keys_for_same_content.mdwn
Joey Hess cffa2446e8
tagged the past 2 years of open todos and followed up to a few of them
also moved some that were really bug reports to bugs/ and closed a
couple
2020-01-30 15:22:05 -04:00

12 lines
1.2 KiB
Markdown

Designate a metadata field, say alt_keys, to store alternate keys for the content designated by the key with the metadata.
Then, after initially adding a URL key, and after some time getting its content, a checksum-based key such as MD5 could be added as the URL key's metadata.
Then, without needing to migrate, the URL key could be treated like checksum-based keys, e.g. downloaded from untrusted remotes, fsck'ed, etc.
The problem with migrating keys is that a separate copy of the contents is stored in the annex under the old key; but it you force-drop that, symlinks in
older commits will become invalid. You could rewrite git history, but that brings its own problems.
Also, sometimes one can determine the MD5 from the URL without downloading the file; e.g. with gsutil stat for gs:// URIs, or by downloading an .md5 file stored next to the main file,
or because an MD5 was computed by a workflow manager that produced the file (Cromwell does this). The special remote's "CHECKURL" implementation could record an MD5E key in the
alt_keys metadata field of the URL key. Then 'addurl --fast' could check alt_keys, and store in git an MD5E key rather than a URL key, if available.
[[!tag unlikely]]