From 7a3650fac7351a88700b406f1099fa323524f56e Mon Sep 17 00:00:00 2001 From: anarcat Date: Mon, 22 Oct 2018 01:56:22 +0000 Subject: [PATCH] a possible new and faster hash backend --- doc/todo/consider_meow_backend.mdwn | 35 +++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 doc/todo/consider_meow_backend.mdwn diff --git a/doc/todo/consider_meow_backend.mdwn b/doc/todo/consider_meow_backend.mdwn new file mode 100644 index 0000000000..cb2aea4e3e --- /dev/null +++ b/doc/todo/consider_meow_backend.mdwn @@ -0,0 +1,35 @@ +I recently discovered (thanks to Paul Wise) the [Meow hash][]. The +TL;DR: is that it's a fast non-crypto hash which might be useful for +git-annex. Here's their intro, quoted from the website: + +[Meow hash]: https://mollyrocket.com/meowhash + +> The Meow hash is a high-speed hash function named after the character +> Meow in [Meow the Infinite][]. We developed the hash function at +> [Molly Rocket][] for use in the asset pipeline of [1935][]. +> +> Because we have to process hundreds of gigabytes of art assets to build +> game packages, we wanted a fast, non-cryptographic hash for use in +> change detection and deduplication. We had been using a cryptographic +> hash ([SHA-1][]), but it was +> unnecessarily slowing things down. +> +> To our surprise, we found a lack of published, well-optimized, +> large-data hash functions. Most hash work seems to focus on small input +> sizes (for things like dictionary lookup) or on cryptographic quality. +> We wanted the fastest possible hash that would be collision-free in +> practice (like SHA-1 was), and we didn't need any cryptograhic security. +> +> We ended up creating Meow to fill this niche. + + [1935]: https://molly1935.com/ + [Molly Rocket]: https://mollyrocket.com/ + [Meow the Infinite]: https://meowtheinfinite.com/ + [SHA-1]: https://en.m.wikipedia.org/wiki/SHA-1 + +I don't an immediate use case for this right now, but I think it could +be useful to speed up checks on larger files. The license is a +*little* weird but seems close enough to a BSD to be acceptable. + +I know it might sound like a conflict of interest, but I *swear* I am +not bringing this up only as a oblique feline reference. ;) -- [[anarcat]]