git-annex/doc/todo/key_checksum_from_chunk_checksums.mdwn

Would it be hard to add a variantion to checksumming [[backends]], that would change how the checksum is computed: instead of computing it on the whole file, it would first be computed on file chunks of given size, and then the final checksum computed on the concatenation of the chunk checksums?  You'd add a new [[key field|internals/key_format]], say cNNNNN, specifying the chunking size (the last chunk might be shorter).  Then (1) for large files, checksum computation could be parallelized (there could be a config option specifying the default chunk size for newly added files); (2) I often have large files on a remote, for which I have md5 for each chunk, but not for the full file; this would enable me to register the location of these fies with git-annex without downloading them, while still using a checksum-based key.

> Closing, because [[external_backends]] is implemented, so you should be
> able to roll your own backend for your use case here. --[[Joey]]
re: backend variants that compute checksum of chunk checksums 2019-04-24 17:40:13 +00:00			Would it be hard to add a variantion to checksumming [[backends]], that would change how the checksum is computed: instead of computing it on the whole file, it would first be computed on file chunks of given size, and then the final checksum computed on the concatenation of the chunk checksums? You'd add a new [[key field\|internals/key_format]], say cNNNNN, specifying the chunking size (the last chunk might be shorter). Then (1) for large files, checksum computation could be parallelized (there could be a config option specifying the default chunk size for newly added files); (2) I often have large files on a remote, for which I have md5 for each chunk, but not for the full file; this would enable me to register the location of these fies with git-annex without downloading them, while still using a checksum-based key.
tagged the past 2 years of open todos and followed up to a few of them also moved some that were really bug reports to bugs/ and closed a couple 2020-01-30 19:22:05 +00:00
external backends implemented 2020-07-29 21:24:21 +00:00			`> Closing, because [[external_backends]] is implemented, so you should be`
			`> able to roll your own backend for your use case here. --[[Joey]]`