external backends implemented

This commit is contained in:
Joey Hess 2020-07-29 17:24:21 -04:00
parent ea63d1dfe3
commit 049807dbba
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 27 additions and 4 deletions

View file

@ -1,5 +1,8 @@
git-annex (8.20200720.2) UNRELEASED; urgency=medium
* Added support for external backend programs. So if you want a hash
that git-annex doesn't support, or something stranger, you can write a
small program to implement it.
* Fix a lock file descriptor leak that could occur when running commands
like git-annex add with -J. Bug was introduced as part of a different FD
leak fix in version 6.20160318.

View file

@ -2,4 +2,7 @@ Would it be hard to support MD5E keys that omit the -sSIZE part, the way this is
Another (and more generally useful) solution would be [[todo/alternate_keys_for_same_content/]]. Then can start with a URL-based key but then attach an MD5 to it as metadata, and have the key treated as a checksum-containing key, without needing to migrate the contents to a new key.
[[!tag moreinfo]]
> Closing, because [[external_backends]] is implemented, so you should be
> able to roll your own backend for your use case here. Assuming you can't
> just use regular MD5E and omit the file size field, which will work too.
> --[[Joey]]

View file

@ -4,5 +4,6 @@ It would be good if one could define custom external [[backends]], the way one c
Thoughts?
[[!tag needsthought]]
[[!tag projects/datalad]]
> fully implemented. [[done]] --[[Joey]]

View file

@ -1,3 +1,4 @@
Would it be hard to add a variantion to checksumming [[backends]], that would change how the checksum is computed: instead of computing it on the whole file, it would first be computed on file chunks of given size, and then the final checksum computed on the concatenation of the chunk checksums? You'd add a new [[key field|internals/key_format]], say cNNNNN, specifying the chunking size (the last chunk might be shorter). Then (1) for large files, checksum computation could be parallelized (there could be a config option specifying the default chunk size for newly added files); (2) I often have large files on a remote, for which I have md5 for each chunk, but not for the full file; this would enable me to register the location of these fies with git-annex without downloading them, while still using a checksum-based key.
[[!tag needsthought]]
> Closing, because [[external_backends]] is implemented, so you should be
> able to roll your own backend for your use case here. --[[Joey]]

View file

@ -12,4 +12,9 @@ This enables attaching metadata not to file contents, but to the file itself; or
deduplication. This loss may be acceptable. The loss can be mitigated for local repo and non-special remotes: after storing an object with e.g. MD5 d41d8cd98f00b204e9800998ecf8427e under .git/annex/objects, check if there is a symlink .git/annex/contenthash/d41d8cd98f00b204e9800998ecf8427e ; if not, make this a symlink to the object just stored; if yes,
erase the object just stored, and hardlink the symlink's target instead.
[[!tag unlikely moreinfo]]
> Closing since [[external_backends]] is implemented, and you could do this
> using it. Whether that's a good idea, I'm fairly doubtful about. Be sure
> to read "considerations for generating keys" in
> <https://git-annex.branchable.com/design/external_backend_protocol/#index7h2>
>
> [[done]] --[[Joey]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""comment 9"""
date="2020-07-29T21:22:42Z"
content="""
[[external_backends]] is now implemented, so you can write a program that
makes keys use some other, shorter hash encoding.
I don't know if that's really sufficient to close this.
"""]]