git-annex/doc/todo/consider_meow_backend.mdwn

36 lines
1.6 KiB
Text
Raw Normal View History

2018-10-22 01:56:22 +00:00
I recently discovered (thanks to Paul Wise) the [Meow hash][]. The
TL;DR: is that it's a fast non-crypto hash which might be useful for
git-annex. Here's their intro, quoted from the website:
[Meow hash]: https://mollyrocket.com/meowhash
> The Meow hash is a high-speed hash function named after the character
> Meow in [Meow the Infinite][]. We developed the hash function at
> [Molly Rocket][] for use in the asset pipeline of [1935][].
>
> Because we have to process hundreds of gigabytes of art assets to build
> game packages, we wanted a fast, non-cryptographic hash for use in
> change detection and deduplication. We had been using a cryptographic
> hash ([SHA-1][]), but it was
> unnecessarily slowing things down.
>
> To our surprise, we found a lack of published, well-optimized,
> large-data hash functions. Most hash work seems to focus on small input
> sizes (for things like dictionary lookup) or on cryptographic quality.
> We wanted the fastest possible hash that would be collision-free in
> practice (like SHA-1 was), and we didn't need any cryptograhic security.
>
> We ended up creating Meow to fill this niche.
[1935]: https://molly1935.com/
[Molly Rocket]: https://mollyrocket.com/
[Meow the Infinite]: https://meowtheinfinite.com/
[SHA-1]: https://en.m.wikipedia.org/wiki/SHA-1
I don't an immediate use case for this right now, but I think it could
be useful to speed up checks on larger files. The license is a
*little* weird but seems close enough to a BSD to be acceptable.
I know it might sound like a conflict of interest, but I *swear* I am
not bringing this up only as a oblique feline reference. ;) -- [[anarcat]]