git-annex

Author	SHA1	Message	Date
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	25f912de5b	benchmark: Add --databases to benchmark sqlite databases Rescued from commit `11d6e2e260` which removed db benchmarks in favor of benchmarking arbitrary git-annex commands. Which is nice and general, but microbenchmarks are useful too.	2019-10-29 16:59:27 -04:00
Joey Hess	0c6b7e288d	Add BLAKE2BP512 and BLAKE2BP512E backends using a blake2 variant optimised for 4-way CPUs This had been deferred because the Debian package of cryptonite, and possibly other builds, was broken for blake2bp, but I've confirmed #892855 is fixed. This commit was sponsored by Brett Eisenberg on Patreon.	2019-07-05 15:30:03 -04:00
Joey Hess	9a5ddda511	remove many old version ifdefs Drop support for building with ghc older than 8.4.4, and with older versions of serveral haskell libraries than will be included in Debian 10. The only remaining version ifdefs in the entire code base are now a couple for aws! This commit should only be merged after the Debian 10 release. And perhaps it will need to wait longer than that; it would make backporting new versions of git-annex to Debian 9 (stretch) which has been actively happening as recently as this year. This commit was sponsored by Ilya Shlyakhter.	2019-07-05 15:09:37 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	a5764c4a78	fix build with old ghc	2019-01-18 13:59:29 -04:00
Joey Hess	c3afb3434d	remove recently added cache from KeyVariety Adding that field broke the Read/Show serialization back-compat, and also the Eq and Ord instances were not blinded to it, which broke git annex fsck and probably more. I think that the new approach used in formatKeyVariety will be nearly as fast, but have not benchmarked it.	2019-01-16 16:33:08 -04:00
Joey Hess	96aba8eff7	Revert "cache the serialization of a Key" This reverts commit `4536c93bb2`. That broke Read/Show of a Key, and unfortunately Key is read in at least one place; the GitAnnexDistribution data type. It would be worth bringing this optimisation back, but it would need either a custom Read/Show instance that preserves back-compat, or wrapping Key in a data type that contains the serialization, or changing how GitAnnexDistribution is serialized. Also, the Eq instance would need to compare keys with and without a cached seralization the same.	2019-01-16 16:21:59 -04:00
Joey Hess	4536c93bb2	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. It means that every place a Key has any of its fields changed, the cache has to be dropped. I've grepped and found them all. But, it would be better to avoid that gotcha somehow..	2019-01-14 16:37:28 -04:00
Joey Hess	151562b537	convert key2file and file2key to use builder and attoparsec The new parser is significantly stricter than the old one: The old file2key allowed the fields to come in any order, but the new one requires the fixed order that git-annex has always used. Hopefully this will not cause any breakage. And the old file2key allowed eg SHA1-m1-m2-m3-m4-m5-m6--xxxx while the new does not allow duplication of fields. This could potentially improve security, because allowing lots of extra junk like that in a key could potentially be used in a SHA1 collision attack, although the current attacks need binary data and not this kind of structured numeric data. Speed improved of course, and fairly substantially, in microbenchmarks: benchmarking old/key2file time 2.264 μs (2.257 μs .. 2.273 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.265 μs (2.260 μs .. 2.275 μs) std dev 21.17 ns (13.06 ns .. 39.26 ns) benchmarking new/key2file' time 1.744 μs (1.741 μs .. 1.747 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.745 μs (1.742 μs .. 1.751 μs) std dev 13.55 ns (9.099 ns .. 21.89 ns) benchmarking old/file2key time 6.114 μs (6.102 μs .. 6.129 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 6.118 μs (6.106 μs .. 6.143 μs) std dev 55.00 ns (30.08 ns .. 100.2 ns) benchmarking new/file2key' time 1.791 μs (1.782 μs .. 1.801 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 1.792 μs (1.785 μs .. 1.804 μs) std dev 32.46 ns (20.59 ns .. 50.82 ns) variance introduced by outliers: 19% (moderately inflated)	2019-01-11 16:33:42 -04:00
Joey Hess	b552551b33	use ByteString in Key for speed This is an easy win for parseKeyVariety: benchmarking old/parseKeyVariety time 1.515 μs (1.512 μs .. 1.517 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.515 μs (1.513 μs .. 1.517 μs) std dev 6.417 ns (4.992 ns .. 8.113 ns) benchmarking new/parseKeyVariety time 54.97 ns (54.70 ns .. 55.40 ns) 0.999 R² (0.999 R² .. 1.000 R²) mean 55.42 ns (55.05 ns .. 56.03 ns) std dev 1.562 ns (969.5 ps .. 2.442 ns) variance introduced by outliers: 44% (moderately inflated) For formatKeyVariety, using a Builder is marginally worse than building a String... (This is with criterion evaluating fully to nf not whnf) benchmarking old/formatKeyVariety time 434.3 ns (428.0 ns .. 440.4 ns) 0.999 R² (0.999 R² .. 1.000 R²) mean 430.6 ns (428.2 ns .. 433.9 ns) std dev 9.166 ns (6.932 ns .. 11.94 ns) variance introduced by outliers: 27% (moderately inflated) benchmarking Builder/formatKeyVariety time 526.5 ns (524.7 ns .. 528.8 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 526.1 ns (524.9 ns .. 528.5 ns) std dev 5.687 ns (3.762 ns .. 8.000 ns) Manually building the ByteString was better, but still slightly slower than String, due to innefficient need to S.pack . show the HashSize: benchmarking formatKeyVariety time 459.5 ns (455.8 ns .. 463.2 ns) 1.000 R² (0.999 R² .. 1.000 R²) mean 459.9 ns (457.4 ns .. 466.6 ns) std dev 11.65 ns (6.860 ns .. 21.41 ns) variance introduced by outliers: 35% (moderately inflated) So I cheated and made parseKeyVariety cache the original ByteString, for formatKeyVariety to use instead of re-building it. Final benchmark: benchmarking new/formatKeyVariety time 50.64 ns (50.57 ns .. 50.73 ns) 1.000 R² (0.999 R² .. 1.000 R²) mean 51.05 ns (50.60 ns .. 52.71 ns) std dev 2.790 ns (259.6 ps .. 5.916 ns) variance introduced by outliers: 75% (severely inflated) benchmarking new/parseKeyVariety time 71.88 ns (71.54 ns .. 72.24 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 71.97 ns (71.69 ns .. 72.47 ns) std dev 1.249 ns (910.7 ps .. 1.791 ns) variance introduced by outliers: 22% (moderately inflated)	2019-01-11 16:32:51 -04:00
Joey Hess	4315bb9e42	add retrievalSecurityPolicy This will be used to protect against CVE-2018-10859, where an encrypted special remote is fed the wrong encrypted data, and so tricked into decrypting something that the user encrypted with their gpg key and did not store in git-annex. It also protects against CVE-2018-10857, where a remote follows a http redirect to a file:// url or to a local private web server. While that's already been prevented in git-annex's own use of http, external special remotes, hooks, etc use other http implementations and could still be vulnerable. The policy is not yet enforced, this commit only adds the appropriate metadata to remotes. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-06-21 11:36:36 -04:00
Joey Hess	521d4ede1e	fix build with cryptonite-0.20 Some blake hash varieties were not yet available in that version. Rather than tracking exact details of what cryptonite supported when, disable blake unless using a current cryptonite.	2018-03-15 11:16:00 -04:00
Joey Hess	050ada746f	Added backends for the BLAKE2 family of hashes. There are a lot of different variants and sizes, I suppose we might as well export all the common ones. Bump dep to cryptonite to 0.16, earlier versions lacked BLAKE2 support. Even android has 0.16 or newer. On Debian, Blake2bp_512 is buggy, so I have omitted it for now. http://bugs.debian.org/892855 This commit was sponsored by andrea rota.	2018-03-13 16:23:42 -04:00
Joey Hess	c8e1e3dada	AssociatedFile newtype To prevent any further mistakes like `301aff34c4` This commit was sponsored by Francois Marier on Patreon.	2017-03-10 13:35:31 -04:00
Joey Hess	5383340691	improve layout	2017-03-01 12:46:01 -04:00
Joey Hess	ea1f812ebf	Fix reversion in yesterday's release that made SHA1E and MD5E backends not work.	2017-03-01 12:43:15 -04:00
Joey Hess	0fda7c08d0	add cryptographicallySecure Note that GPGHMAC keys are not cryptographically secure, because their content has no relation to the name of the key. So, things that use this function to avoid sending keys to a remote will need to special case in support for those keys. If GPGHMAC keys were accepted as cryptographically secure, symlinks using them could be committed to a git repo, and their content would be accepted into the repo, with no guarantee that two repos got the same content, which is what we're aiming to prevent.	2017-02-27 12:54:06 -04:00
Joey Hess	9c4650358c	add KeyVariety type Where before the "name" of a key and a backend was a string, this makes it a concrete data type. This is groundwork for allowing some varieties of keys to be disabled in file2key, so git-annex won't use them at all. Benchmarks ran in my big repo: old git-annex info: real 0m3.338s user 0m3.124s sys 0m0.244s new git-annex info: real 0m3.216s user 0m3.024s sys 0m0.220s new git-annex find: real 0m7.138s user 0m6.924s sys 0m0.252s old git-annex find: real 0m7.433s user 0m7.240s sys 0m0.232s Surprising result; I'd have expected it to be slower since it now parses all the key varieties. But, the parser is very simple and perhaps sharing KeyVarieties uses less memory or something like that. This commit was supported by the NSF-funded DataLad project.	2017-02-24 15:16:56 -04:00
Joey Hess	ca0daa8bb8	factor non-type stuff out of Key	2017-02-24 13:42:30 -04:00
Joey Hess	35739a74c2	make file2key reject E* backend keys with a long extension I am not happy that I had to put backend-specific code in file2key. But it would be very difficult to avoid this layering violation. Most of the time, when parsing a Key from a symlink target, git-annex never looks up its Backend at all, so adding this check to a method of the Backend object would not work. The Key could be made to contain the appropriate Backend, but since Backend is parameterized on an "a" that is fixed to the Annex monad later, that would need Key to change to "Key a". The only way to clean this up that I can see would be to have the Key contain a LowlevelBackend, and put the validation in LowlevelBackend. Perhaps later, but that would be an extensive change, so let's not do it in this commit which may want to cherry-pick to backports. This commit was sponsored by Ethan Aubin.	2017-02-24 11:22:15 -04:00
Joey Hess	60d99a80a6	Tighten key parser to not accept keys containing a non-numeric fields, which could be used to embed data useful for a SHA1 attack against git. Also todo about why this is important, and with some further hardening to add. This commit was sponsored by Ignacio on Patreon.	2017-02-24 00:17:25 -04:00
Joey Hess	65e903397c	implementation of peer-to-peer protocol For use with tor hidden services, and perhaps other transports later. Based on Utility.SimpleProtocol, it's a line-based protocol, interspersed with transfers of bytestrings of a specified size. Implementation of the local and remote sides of the protocol is done using a free monad. This lets monadic code be included here, without tying it to any particular way to get bytes peer-to-peer. This adds a dependency on the haskell package "free", although that was probably pulled in transitively from other dependencies already. This commit was sponsored by Jeff Goeke-Smith on Patreon.	2016-11-17 18:30:50 -04:00
Joey Hess	928fbb162d	improved use of Aeson for JSONActionItem	2016-07-26 19:50:02 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	b0626230b7	fix use of hifalutin terminology	2015-11-16 14:37:31 -04:00
Joey Hess	adba0595bd	use bloom filter in second pass of sync --all --content This is needed because when preferred content matches on files, the second pass would otherwise want to drop all keys. Using a bloom filter avoids this, and in the case of a false positive, a key will be left undropped that preferred content would allow dropping. Chances of that happening are a mere 1 in 1 million.	2015-06-16 18:50:13 -04:00
Joey Hess	a0a8127956	instance Hashable Key for bloomfilter	2015-06-16 18:37:41 -04:00
Joey Hess	36b9c9ca5f	fromkey, registerurl: Improve handling of urls that happen to also be parsable as strange keys.	2015-05-30 02:08:49 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	7b50b3c057	fix some mixed space+tab indentation This fixes all instances of " \t" in the code base. Most common case seems to be after a "where" line; probably vim copied the two space layout of that line. Done as a background task while listening to episode 2 of the Type Theory podcast.	2014-10-09 15:09:11 -04:00
Joey Hess	32e4368377	S3: support chunking The assistant defaults to 1MiB chunk size for new S3 special remotes. Which will work around a couple of bugs: http://git-annex.branchable.com/bugs/S3_memory_leaks/ http://git-annex.branchable.com/bugs/S3_upload_not_using_multipart/	2014-08-02 15:51:58 -04:00
Joey Hess	9d4a766cd7	resume interrupted chunked downloads Leverage the new chunked remotes to automatically resume downloads. Sort of like rsync, although of course not as efficient since this needs to start at a chunk boundry. But, unlike rsync, this method will work for S3, WebDAV, external special remotes, etc, etc. Only directory special remotes so far, but many more soon! This implementation will also properly handle starting a download from one remote, interrupting, and resuming from another one, and so on. (Resuming interrupted chunked uploads is similarly doable, although slightly more expensive.) This commit was sponsored by Thomas Djärv.	2014-07-27 18:56:32 -04:00
Joey Hess	8f93982df6	use same hash directories for chunked key as are used for its parent This avoids a proliferation of hash directories when using new-style chunking, and should improve performance since chunks are accessed in sequence and so should have a common locality. Of course, when a chunked key is encrypted, its hash directories have no relation to the parent key. This commit was sponsored by Christian Kellermann.	2014-07-25 16:09:23 -04:00
Joey Hess	d751591ac8	add chunk metadata to Key Added new fields for chunk number, and chunk size. These will not appear in normal keys ever, but will be used for chunked data stored on special remotes. This commit was sponsored by Jouni K Seppanen.	2014-07-24 13:36:23 -04:00
Joey Hess	7f9a0c153b	thought of another way to break prop_idempotent_key_decode	2014-03-05 00:23:22 -04:00
Joey Hess	e8ab82390e	quickcheck says: "a-s--a" is not a legal key filename Found this in failed armhf build log, where quickcheck found a way to break prop_idempotent_key_decode. The "s" indicates size, but since nothing comes after it, that's not valid. When encoding the resulting key, no size was present, so it encoded to "a--a". Also, "a-sX--a" is not legal, since X is not a number. Not found by quickcheck.	2014-03-05 00:10:11 -04:00
Joey Hess	4b32a6c711	file2key should return Nothing if the backend is empty This failed a quickcheck test on the filename "-a"	2013-11-11 15:41:31 -04:00
Joey Hess	396e47b07e	tighten file2key to not produce invalid keys with no keyName A file named "foo-" or "foo-bar" was taken as a key's file, with a backend of "foo", and an empty keyName. This led to various problems, especially because converting that key back to a file did not yeild the same filename.	2013-10-16 12:46:24 -04:00
Joey Hess	d14b13e18e	add - and _	2013-09-11 13:02:10 -04:00
Joey Hess	e4d0b2f180	Fix problem with test suite in non-unicode locale.	2013-09-11 12:07:59 -04:00
Joey Hess	e214114d3b	layout	2013-07-04 02:45:46 -04:00
Joey Hess	7a7e426352	moved AssociatedFile definition	2013-07-04 02:36:02 -04:00
Joey Hess	24316f6562	improve imports	2013-02-27 21:48:46 -04:00
Joey Hess	a2f17146fa	move Arbitrary instances out of Test and into modules that define the types This is possible now that we build-depend on QuickCheck.	2013-02-27 21:42:07 -04:00
Joey Hess	2172cc586e	where indenting	2012-11-11 00:51:07 -04:00
Joey Hess	77af38ec6c	git-annex-shell transferinfo command TODO: Use this when running sendkey, to feed back transfer info from the client side rsync.	2012-09-21 16:23:25 -04:00
Joey Hess	94fcd0cf59	add routes to pause/start/cancel transfers This commit includes a paydown on technical debt incurred two years ago, when I didn't know that it was bad to make custom Read and Show instances for types. As the routes need Read and Show for Transfer, which includes a Key, and deriving my own Read instance of key was not practical, I had to finally clean that up. So the compact Key read and show functions are now file2key and key2file, and Read and Show are now derived instances. Changed all code that used the old instances, compiler checked. (There were a few places, particularly in Command.Unused, and the test suite where the Show instance continue to be used for legitimate comparisons; ie show key_x == show key_y (though really in a bloom filter))	2012-08-08 16:20:24 -04:00
Joey Hess	ba6088b249	rename readMaybe to readish a stricter (but also partial) readMaybe is getting added to base	2012-01-23 17:00:10 -04:00

1 2

53 commits