git-annex/doc/todo/alternate_keys_for_same_content.mdwn

12 lines
1.2 KiB
Text
Raw Normal View History

Designate a metadata field, say alt_keys, to store alternate keys for the content designated by the key with the metadata.
Then, after initially adding a URL key, and after some time getting its content, a checksum-based key such as MD5 could be added as the URL key's metadata.
Then, without needing to migrate, the URL key could be treated like checksum-based keys, e.g. downloaded from untrusted remotes, fsck'ed, etc.
The problem with migrating keys is that a separate copy of the contents is stored in the annex under the old key; but it you force-drop that, symlinks in
older commits will become invalid. You could rewrite git history, but that brings its own problems.
2018-10-30 01:40:18 +00:00
Also, sometimes one can determine the MD5 from the URL without downloading the file; e.g. with gsutil stat for gs:// URIs, or by downloading an .md5 file stored next to the main file,
or because an MD5 was computed by a workflow manager that produced the file (Cromwell does this). The special remote's "CHECKURL" implementation could record an MD5E key in the
alt_keys metadata field of the URL key. Then 'addurl --fast' could check alt_keys, and store in git an MD5E key rather than a URL key, if available.