Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
d6b2468b4c
4 changed files with 146 additions and 0 deletions
35
doc/todo/consider_meow_backend.mdwn
Normal file
35
doc/todo/consider_meow_backend.mdwn
Normal file
|
@ -0,0 +1,35 @@
|
|||
I recently discovered (thanks to Paul Wise) the [Meow hash][]. The
|
||||
TL;DR: is that it's a fast non-crypto hash which might be useful for
|
||||
git-annex. Here's their intro, quoted from the website:
|
||||
|
||||
[Meow hash]: https://mollyrocket.com/meowhash
|
||||
|
||||
> The Meow hash is a high-speed hash function named after the character
|
||||
> Meow in [Meow the Infinite][]. We developed the hash function at
|
||||
> [Molly Rocket][] for use in the asset pipeline of [1935][].
|
||||
>
|
||||
> Because we have to process hundreds of gigabytes of art assets to build
|
||||
> game packages, we wanted a fast, non-cryptographic hash for use in
|
||||
> change detection and deduplication. We had been using a cryptographic
|
||||
> hash ([SHA-1][]), but it was
|
||||
> unnecessarily slowing things down.
|
||||
>
|
||||
> To our surprise, we found a lack of published, well-optimized,
|
||||
> large-data hash functions. Most hash work seems to focus on small input
|
||||
> sizes (for things like dictionary lookup) or on cryptographic quality.
|
||||
> We wanted the fastest possible hash that would be collision-free in
|
||||
> practice (like SHA-1 was), and we didn't need any cryptograhic security.
|
||||
>
|
||||
> We ended up creating Meow to fill this niche.
|
||||
|
||||
[1935]: https://molly1935.com/
|
||||
[Molly Rocket]: https://mollyrocket.com/
|
||||
[Meow the Infinite]: https://meowtheinfinite.com/
|
||||
[SHA-1]: https://en.m.wikipedia.org/wiki/SHA-1
|
||||
|
||||
I don't an immediate use case for this right now, but I think it could
|
||||
be useful to speed up checks on larger files. The license is a
|
||||
*little* weird but seems close enough to a BSD to be acceptable.
|
||||
|
||||
I know it might sound like a conflict of interest, but I *swear* I am
|
||||
not bringing this up only as a oblique feline reference. ;) -- [[anarcat]]
|
5
doc/todo/external_backends.mdwn
Normal file
5
doc/todo/external_backends.mdwn
Normal file
|
@ -0,0 +1,5 @@
|
|||
It would be good if one could define custom external [[backends]], the way one can define external custom remotes. This would solve [[todo/consider_meow_backend]] but also have other uses. For instance, sometimes files contain details irrelevant to the file's semantics (e.g. comments), but that change the file's checksum; with a custom backend, one could "canonicalize" a file before computing the checksum.
|
||||
|
||||
@joey pointed out a potential problem: "needing to deal with the backend being missing or failing to work could have wide repurcussions in the code base." I wonder if there are ways around that. Suppose you specified a default backend to use in case a custom one was unavailable? Then you could always compute a key from a file, even if it's not in the right backend. And once a key is stored in git-annex, most of git-annex treats the key as just a string. If the custom backend supports checksum verification, without the backend's implementation, keys from that backend would be treated like WORM/URL keys that do not support checksum checking.
|
||||
|
||||
Thoughts?
|
Loading…
Add table
Add a link
Reference in a new issue