git-annex

Author	SHA1	Message	Date
Joey Hess	c31e1be781	convert KeySource to RawFilePath	2020-02-21 10:04:44 -04:00
Joey Hess	e468fbc518	simplify	2020-02-20 18:22:31 -04:00
Joey Hess	09df58c4ea	handle keys with extensions consistently in all locales Fix some cases where handling of keys with extensions varied depending on the locale. A filename with a unicode extension would before generate a key with an extension in a unicode locale, but not in LANG=C, because the extension was not all alphanumeric. Also the the length of the extension could be counted differently depending on the locale. In a non-unicode locale, git-annex migrate would see that the extension was not all alphanumeric and want to "upgrade" it. Now that doesn't happen. As far as backwards compatability, this does mean that unicode extensions are counted by the number of bytes, not number of characters. So, if someone is using unicode extensions, they may find git-annex stops using them when adding files, because their extensions are too long. Keys already in their repo with the "too long" extensions will still work though, so this only prevents adding the same content with the same extension generating the same key. Documented this by documenting that annex.maxextensionlength is a number of bytes. Also, if a filename has an extension that is not valid utf-8 and the locale is utf-8, the extension will be allowed now, and an old git-annex, in the same locale would not, and would also want to "upgrade" that.	2020-02-20 17:30:25 -04:00
Joey Hess	322c542b5c	fix ByteString conversion on windows the encode' and decode' functions on Windows should not apply the filesystem encoding, which does not work there. Instead, convert to and from UTF-8. Also, avoid exporting encodeW8 and decodeW8. Both use the filesystem encoding, so won't work as expected on windows.	2019-12-18 13:32:56 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	fc09a41ed1	storing objects in git-lfs is working Still need to record the sha256 and size when they cannot be determined by inspecting the key.	2019-08-02 13:56:55 -04:00
Joey Hess	0c6b7e288d	Add BLAKE2BP512 and BLAKE2BP512E backends using a blake2 variant optimised for 4-way CPUs This had been deferred because the Debian package of cryptonite, and possibly other builds, was broken for blake2bp, but I've confirmed #892855 is fixed. This commit was sponsored by Brett Eisenberg on Patreon.	2019-07-05 15:30:03 -04:00
Joey Hess	9a5ddda511	remove many old version ifdefs Drop support for building with ghc older than 8.4.4, and with older versions of serveral haskell libraries than will be included in Debian 10. The only remaining version ifdefs in the entire code base are now a couple for aws! This commit should only be merged after the Debian 10 release. And perhaps it will need to wait longer than that; it would make backporting new versions of git-annex to Debian 9 (stretch) which has been actively happening as recently as this year. This commit was sponsored by Ilya Shlyakhter.	2019-07-05 15:09:37 -04:00
Joey Hess	554b307931	update progress meter while hashing files The hash was actually not being fully evaluated before, used rnf to fix that. The added dependency on deepseq is a free dependency, because eg text depends on it.	2019-06-25 13:10:06 -04:00
Joey Hess	8355dba5cc	plumb MeterUpdate into getKey No behavior changes, but this shows everywhere that a progress meter could be displayed when hashing a file to add to the annex. Many of the places don't make sense to display a progress meter though, eg when importing the copy of the file probably swamps the hashing of the file.	2019-06-25 11:43:24 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	c3afb3434d	remove recently added cache from KeyVariety Adding that field broke the Read/Show serialization back-compat, and also the Eq and Ord instances were not blinded to it, which broke git annex fsck and probably more. I think that the new approach used in formatKeyVariety will be nearly as fast, but have not benchmarked it.	2019-01-16 16:33:08 -04:00
Joey Hess	96aba8eff7	Revert "cache the serialization of a Key" This reverts commit `4536c93bb2`. That broke Read/Show of a Key, and unfortunately Key is read in at least one place; the GitAnnexDistribution data type. It would be worth bringing this optimisation back, but it would need either a custom Read/Show instance that preserves back-compat, or wrapping Key in a data type that contains the serialization, or changing how GitAnnexDistribution is serialized. Also, the Eq instance would need to compare keys with and without a cached seralization the same.	2019-01-16 16:21:59 -04:00
Joey Hess	4536c93bb2	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. It means that every place a Key has any of its fields changed, the cache has to be dropped. I've grepped and found them all. But, it would be better to avoid that gotcha somehow..	2019-01-14 16:37:28 -04:00
Joey Hess	727767e1e2	make everything build again after ByteString Key changes	2019-01-11 16:39:46 -04:00
Joey Hess	7d51b0c109	import Utility.FileSystemEncoding in Common	2019-01-03 11:37:02 -04:00
Joey Hess	b3c69eaaf8	strict bytestring encoders and decoders Only had lazy ones before. Already sped up a few parts of the code.	2019-01-01 14:55:15 -04:00
Joey Hess	4ecba916a1	annex.maxextensionlength Added annex.maxextensionlength for use cases where extensions longer than 4 characters are needed. This commit was sponsored by Henrik Riomar on Patreon.	2018-09-24 12:10:18 -04:00
Joey Hess	c565340adc	stop using external hash programs, since cryptonite is faster In 2013, I wrote "Cryptohash benchmarks 90 to 101% faster than external hashers". Re-benchmarking today, I found cryptonite's sha256 consistently outperformed coreutils by 10% for large files. Tested 10 mb, 100 mb, 1 gb files with both sha256 and sha512. And for smaller files, the external process startup time swamps the hash time. Perhaps cryptonite has improved. Or it could just do better on my current CPU Intel(R) Pentium(R) CPU 4410Y @ 1.50GHz). Anyway, even if cryptonite is slower in some situations, seems likely it would only be marginally slower; it's got the same class of highly optimised C code under the hood as coreutils. The main difference between the two sha256 implementations seems to be how much of the inner loop they unroll.. This commit was sponsored by Henrik Riomar on Patreon.	2018-08-28 18:10:58 -04:00
Joey Hess	2da2ae0919	fix migration bug and make fsck warn * migrate: Fix bug in migration between eg SHA256 and SHA256E, that caused the extension to be included in SHA256 keys, and omitted from SHA256E keys. (Bug introduced in version 6.20170214) * migrate: Check for above bug when migrating from SHA256 to SHA256 (and same for SHA1 to SHA1 etc), and remove the extension that should not be in the SHA256 key. * fsck: Detect and warn when keys need an upgrade, either to fix up from the above migrate bug, or to add missing size information (a long ago transition), or because of a few other past key related bugs. This commit was sponsored by Henrik Riomar on Patreon.	2018-05-23 14:07:51 -04:00
Joey Hess	521d4ede1e	fix build with cryptonite-0.20 Some blake hash varieties were not yet available in that version. Rather than tracking exact details of what cryptonite supported when, disable blake unless using a current cryptonite.	2018-03-15 11:16:00 -04:00
Joey Hess	050ada746f	Added backends for the BLAKE2 family of hashes. There are a lot of different variants and sizes, I suppose we might as well export all the common ones. Bump dep to cryptonite to 0.16, earlier versions lacked BLAKE2 support. Even android has 0.16 or newer. On Debian, Blake2bp_512 is buggy, so I have omitted it for now. http://bugs.debian.org/892855 This commit was sponsored by andrea rota.	2018-03-13 16:23:42 -04:00
Joey Hess	07e253b1fb	Improve SHA*E extension extraction code Do not treat parts of the filename that contain punctuation or other non-alphanumeric characters as extensions. Before, such characters were filtered out. Note that in `45308ec78b` "foo.ba__________r" was munged to ".bar" and so incorrectly treated as an extension. That was fixed by changing the filter order, but not allowing punctuation seems a better fix. This assumes that extensions containing punctuation are rare. "_" seems the most likely character; I used it in ikiwiki "._comment" files. But I can't recall seeing it anywhere else. It certianly seems that no commonly used extensions contain punctuation. If git-annex doesn't treat "._comment" as an extension, it's not likely to break software that expects to see that extension like some software expects to see .epub or .mp3. This commit was sponsored by Jack Hill on Patreon.	2018-03-05 11:25:01 -04:00
Joey Hess	308cd1383c	fold Build/SysConfig.hs into BuildInfo via include This avoids warnings from stack about the module not being listed in the cabal file. So, the generated file is also renamed to Build/SysConfig. Note that the setup program seems to be cached despite these changes; I had to cabal clean to get cabal to update it so that Build/SysConfig was written. This commit was sponsored by Jochen Bartl on Patreon.	2017-12-14 12:46:57 -04:00
Joey Hess	fc845e6530	more lambda-case conversion	2017-12-05 15:00:50 -04:00
Joey Hess	96c055eda2	migrate: WORM keys containing spaces will be migrated to not contain spaces anymore To work around the problem that the external special remote protocol does not support keys containing spaces. This commit was sponsored by Denis Dzyubenko on Patreon.	2017-08-17 15:09:38 -04:00
Joey Hess	6dd806f1ad	stop using MissingH for MD5 Cryptonite is faster and allocates less, and I want to get rid of MissingH use. Note that the new dependency on memory is free; it's a dependency of cryptonite. This commit was supported by the NSF-funded DataLad project.	2017-05-15 21:36:03 -04:00
Joey Hess	c8e1e3dada	AssociatedFile newtype To prevent any further mistakes like `301aff34c4` This commit was sponsored by Francois Marier on Patreon.	2017-03-10 13:35:31 -04:00
Joey Hess	40327cab6e	Removed support for building with the old cryptohash library. Building with that library made git-annex not support SHA3; it's time for that to always be supported in case SHA2 dominoes.	2017-02-24 20:56:26 -04:00
Joey Hess	9c4650358c	add KeyVariety type Where before the "name" of a key and a backend was a string, this makes it a concrete data type. This is groundwork for allowing some varieties of keys to be disabled in file2key, so git-annex won't use them at all. Benchmarks ran in my big repo: old git-annex info: real 0m3.338s user 0m3.124s sys 0m0.244s new git-annex info: real 0m3.216s user 0m3.024s sys 0m0.220s new git-annex find: real 0m7.138s user 0m6.924s sys 0m0.252s old git-annex find: real 0m7.433s user 0m7.240s sys 0m0.232s Surprising result; I'd have expected it to be slower since it now parses all the key varieties. But, the parser is very simple and perhaps sharing KeyVarieties uses less memory or something like that. This commit was supported by the NSF-funded DataLad project.	2017-02-24 15:16:56 -04:00
Joey Hess	9eb10caa27	Some optimisations to string splitting code. Turns out that Data.List.Utils.split is slow and makes a lot of allocations. Here's a much simpler single character splitter that behaves the same (even in wacky corner cases) while running in half the time and 75% the allocations. As well as being an optimisation, this helps move toward eliminating use of missingh. (Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and allocates even more.) I have not benchmarked the effect on git-annex, but would not be surprised to see some parsing of eg, large streams from git commands run twice as fast, and possibly in less memory. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2017-01-31 19:06:22 -04:00
Joey Hess	8484c0c197	Always use filesystem encoding for all file and handle reads and writes. This is a big scary change. I have convinced myself it should be safe. I hope!	2016-12-24 14:46:31 -04:00
Joey Hess	45308ec78b	Improve SHA*E extension extraction code. Filter out over-long "extensions" before stripping out non-alphanumerics from them, so that eg "foo.ba__________r" is not considered a .bar extension.	2016-05-27 13:14:51 -04:00
Joey Hess	d2fa4a6873	rename function	2016-05-27 13:10:23 -04:00
Joey Hess	b946ca44c3	Support --metadata field<number, --metadata field>number etc to match ranges of numeric values. Similarly (well, for free), support preferred content expressions like metadata=field<number and metadata=field>number	2016-02-27 10:55:02 -04:00
Joey Hess	ac8af8da07	better forcing of hash	2016-02-26 16:36:24 -04:00
Joey Hess	e3a73e5bb7	try again at forcing file read while hashing	2016-02-26 14:04:10 -04:00
Joey Hess	d1f87e8c8e	test revert "force hash to finish with file before returning" This reverts commit `7482853ddd`. This seems to have caused a memory leak.	2016-02-26 13:30:38 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	7482853ddd	force hash to finish with file before returning Fixes a minor fd leak, never more than 1 in normal use, which broke the test suite when I tried to write to a file that was still open for a previous hashing.	2016-01-06 22:09:36 -04:00
Joey Hess	a0fcb8ec93	generalize catchHardwareFault to catchIOErrorType	2015-12-06 16:26:38 -04:00
Joey Hess	fa9333e99f	use action, not sideAction sideAction is for things not generally related to the current action being performed. And, it adds a newline after the side action. This was not the right thing to use for stuff like "checksum", where doing a checksum is part of the git annex get process, and indeed we want it to display "(checksum...) ok"	2015-10-11 13:29:44 -04:00
Joey Hess	cad3349001	rename fsckKey to verifyKeyContent No behavior changes.	2015-10-01 13:29:17 -04:00
Joey Hess	0ec9bc2200	Added support for SHA3 hashed keys (in 8 varieties), when git-annex is built using the cryptonite library. While cryptohash has SHA3 support, it has not been updated for the final version of the spec. Note that cryptonite has not been ported to all arches that cryptohash builds on yet.	2015-08-06 15:02:25 -04:00
Joey Hess	8990d4cc68	fsck: When checksumming a file fails due to a hardware fault, the file is now moved to the bad directory, and the fsck proceeds. Before, the fsck immediately failed.	2015-05-27 16:40:03 -04:00
Joey Hess	30960c0465	refactor	2015-05-27 16:00:44 -04:00
Joey Hess	b256f861ca	if external hash command fails for any reason, fall back to internal hashing This way, if a system's sha1sum etc is broken, it will be tried if git-annex was built to use it, but at least it will fall back to using internal hashing when it fails. A side benefit of this is that hashFile consistently throws an IOError if the file is unable to be read. In particular, if the disk is failing with IO errors, and external hash command is used, it used to throw a user error with the error message from externalSHA. Now, the external hash command will fail, that message will be printed as a warning, and it'll fall back to the internal hash command. If the disk IO error is not intermittent, it will re-occur, and so an IOError will be thrown. Of course, this can mean it reads a file twice, but only in edge cases.	2015-05-27 15:58:32 -04:00
Joey Hess	77c43a388e	fromkey, registerurl: Allow urls to be specified instead of keys, and generate URL keys. This is especially useful because the caller doesn't need to generate valid url keys, which involves some escaping of characters, and may involve taking a md5sum of the url if it's too long.	2015-05-22 22:41:36 -04:00
Joey Hess	8eb01bc894	Added MD5 and MD5E backends.	2015-02-04 13:47:54 -04:00
Joey Hess	95c1593098	Remove support for building without cryptohash. This will prevent backporting to wheezy, but it's time to simplify the code.	2015-02-04 13:41:26 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	4f657aa14e	add getFileSize, which can get the real size of a large file on Windows Avoid using fileSize which maxes out at just 2 gb on Windows. Instead, use hFileSize, which doesn't have a bounded size. Fixes support for files > 2 gb on Windows. Note that the InodeCache code only needs to compare a file size, so it doesn't matter it the file size wraps. So it has been left as-is. This was necessary both to avoid invalidating existing inode caches, and because the code passed FileStatus around and would have become more expensive if it called getFileSize. This commit was sponsored by Christian Dietrich.	2015-01-20 17:09:24 -04:00
Joey Hess	c0f2b992ed	Generate shorter keys for WORM and URL, avoiding keys that are longer than used for SHA256, so as to not break on systems like Windows that have very small maximum path length limits.	2015-01-06 17:58:57 -04:00
Joey Hess	73928c2274	Avoid re-checksumming when migrating from hash to hashE backend. Closes: #774494	2015-01-04 12:33:10 -04:00
Joey Hess	7b50b3c057	fix some mixed space+tab indentation This fixes all instances of " \t" in the code base. Most common case seems to be after a "where" line; probably vim copied the two space layout of that line. Done as a background task while listening to episode 2 of the Type Theory podcast.	2014-10-09 15:09:11 -04:00
Joey Hess	9711d529c8	WORM backend: Switched to include the relative path to the file inside the repository, rather than just the file's base name. Note that if you're relying on such things to keep files separate with WORM, you should really be using a better backend.	2014-09-11 14:50:18 -04:00
Joey Hess	f0df660570	WORM backend: When adding a file in a subdirectory, avoid including the subdirectory in the key name.	2014-08-12 14:38:53 -04:00
Joey Hess	9720ee9e56	testremote: New command to test uploads/downloads to a remote. This only performs some basic tests so far; no testing of chunking or resuming. Also, the existing encryption type of the remote is used; it would be good later to derive an encrypted and a non-encrypted version of the remote and test them both. This commit was sponsored by Joseph Liu.	2014-08-01 15:10:01 -04:00
Joey Hess	13bbb61a51	add key stability checking interface Needed for resuming from chunks. Url keys are considered not stable. I considered treating url keys with a known size as stable, but just don't feel that is enough information.	2014-07-27 12:33:46 -04:00
Joey Hess	d751591ac8	add chunk metadata to Key Added new fields for chunk number, and chunk size. These will not appear in normal keys ever, but will be used for chunked data stored on special remotes. This commit was sponsored by Jouni K Seppanen.	2014-07-24 13:36:23 -04:00
Joey Hess	9d71903c2f	migrate: Avoid re-checksumming when migrating from hashE to hash backend.	2014-07-10 17:06:04 -04:00
Joey Hess	c3d2d371ee	bring back the (checksum) when fscking This is useful because it shows users which files it checksums, vs ones that are not present, or don't use a hash backend, or --fast	2014-02-20 16:06:51 -04:00
Joey Hess	64160a9679	import: Add --skip-duplicates option. Note that the hash backends were made to stop printing a (checksum..) message as part of this, since it showed up without a file when deciding whether to act on a file. Should have probably removed that message a while ago anyway, I suppose.	2013-12-04 13:13:30 -04:00
Joey Hess	1be4d281d6	Better sanitization of problem characters when generating URL and WORM keys. FAT has a lot of characters it does not allow in filenames, like ? and * It's probably the worst offender, but other filesystems also have limitiations. In 2011, I made keyFile escape : to handle FAT, but missed the other characters. It also turns out that when I did that, I was also living dangerously; any existing keys that contained a : had their object location change. Oops. So, adding new characters to escape to keyFile is out. Well, it would be possible to make keyFile behave differently on a per-filesystem basis, but this would be a real nightmare to get right. Consider that a rsync special remote uses keyFile to determine the filenames to use, and we don't know the underlying filesystem on the rsync server.. Instead, I have gone for a solution that is backwards compatable and simple. Its only downside is that already generated URL and WORM keys might not be able to be stored on FAT or some other filesystem that dislikes a character used in the key. (In this case, the user can just migrate the problem keys to a checksumming backend. If this became a big problem, fsck could be made to detect these and suggest a migration.) Going forward, new keys that are created will escape all characters that are likely to cause problems. And if some filesystem comes along that's even worse than FAT (seems unlikely, but here it is 2013, and people are still using FAT!), additional characters can be added to the set that are escaped without difficulty. (Also, made WORM limit the part of the filename that is embedded in the key, to deal with filesystem filename length limits. This could have already been a problem, but is more likely now, since the escaping of the filename can make it longer.) This commit was sponsored by Ian Downes	2013-10-05 15:01:49 -04:00
Joey Hess	20fb905bb6	allow building w/o cryptohash Mostly for the debian stable autobuilds, which have a too old version to use the Crypto.Hash module.	2013-10-03 12:33:38 -04:00
Joey Hess	a3692b4ab2	better name	2013-10-01 22:32:44 -04:00
Joey Hess	547a18019f	ensure that hash representations don't change in future	2013-10-01 21:11:47 -04:00
Joey Hess	a05b763b01	Added SKEIN256 and SKEIN512 backends SHA3 is still waiting for final standardization. Although this is looking less likely given https://www.cdt.org/blogs/joseph-lorenzo-hall/2409-nist-sha-3 In the meantime, cryptohash implements skein, and it's used by some of the haskell ecosystem (for yesod sessions, IIRC), so this implementation is likely to continue working. Also, I've talked with the cryprohash author and he's a reasonable guy. It makes sense to have an alternate high security hash, in case some horrible attack is found against SHA2 tomorrow, or in case SHA3 comes out and worst fears are realized. I'd also like to support using skein for HMAC. But no hurry there and a new version of cryptohash has much nicer HMAC code, so I will probably wait until I can use that version.	2013-10-01 20:34:36 -04:00
Joey Hess	b405295aee	hlint test suite still passes	2013-09-25 03:09:06 -04:00
Joey Hess	7390f08ef9	Use cryptohash rather than SHA for hashing. This is a massive win on OSX, which doesn't have a sha256sum normally. Only use external hash commands when the file is > 1 mb, since cryptohash is quite close to them in speed. SHA is still used to calculate HMACs. I don't quite understand cryptohash's API for those. Used the following benchmark to arrive at the 1 mb number. 1 mb file: benchmarking sha256/internal mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950 std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950 found 5 outliers among 100 samples (5.0%) 4 (4.0%) high mild 1 (1.0%) high severe variance introduced by outliers: 10.415% variance is moderately inflated by outliers benchmarking sha256/external mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950 std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950 found 3 outliers among 100 samples (3.0%) 2 (2.0%) high mild 1 (1.0%) high severe 2 mb file: benchmarking sha256/internal mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950 std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950 variance introduced by outliers: 35.540% variance is moderately inflated by outliers benchmarking sha256/external mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950 std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950 found 6 outliers among 100 samples (6.0%) import Crypto.Hash import Data.ByteString.Lazy as L import Criterion.Main import Common testfile :: FilePath testfile = "/run/shm/data" -- on ram disk main = defaultMain [ bgroup "sha256" [ bench "internal" $ whnfIO internal , bench "external" $ whnfIO external ] ] sha256 :: L.ByteString -> Digest SHA256 sha256 = hashlazy internal :: IO String internal = show . sha256 <$> L.readFile testfile external :: IO String external = do s <- readProcess "sha256sum" [testfile] return $ fst $ separate (== ' ') s	2013-09-22 20:06:02 -04:00
Joey Hess	ddd46db09a	Fix a few bugs involving filenames that are at or near the filesystem's maximum filename length limit. Started with a problem when running addurl on a really long url, because the whole url is munged into the filename. Ended up doing a fairly extensive review for places where filenames could get too large, although it's hard to say I'm not missed any.. Backend.Url had a 128 character limit, which is fine when the limit is 255, but not if it's a lot shorter on some systems. So check the pathconf() limit. Note that this could result in fromUrl creating different keys for the same url, if run on systems with different limits. I don't see this is likely to cause any problems. That can already happen when using addurl --fast, or if the content of an url changes. Both Command.AddUrl and Backend.Url assumed that urls don't contain a lot of multi-byte unicode, and would fail to truncate an url that did properly. A few places use a filename as the template to make a temp file. While that's nice in that the temp file name can be easily related back to the original filename, it could lead to `git annex add` failing to add a filename that was at or close to the maximum length. Note that in Command.Add.lockdown, the template is still derived from the filename, just with enough space left to turn it into a temp file. This is an important optimisation, because the assistant may lock down a bunch of files all at once, and using the same template for all of them would cause openTempFile to iterate through the same set of names, looking for an unused temp file. I'm not very happy with the relatedTemplate hack, but it avoids that slowdown. Backend.WORM does not limit the filename stored in the key. I have not tried to change that; so git annex add will fail on really long filenames when using the WORM backend. It seems better to preserve the invariant that a WORM key always contains the complete filename, since the filename is the only unique material in the key, other than mtime and size. Since nobody has complained about add failing (I think I saw it once?) on WORM, probably it's ok, or nobody but me uses it. There may be compatability problems if using git annex addurl --fast or the WORM backend on a system with the 255 limit and then trying to use that repo in a system with a smaller limit. I have not tried to deal with those. This commit was sponsored by Alexander Brem. Thanks!	2013-07-30 19:18:29 -04:00
Joey Hess	abe8d549df	fix permission damage (thanks, Windows)	2013-05-11 23:54:25 -04:00
Joey Hess	18bdff3fae	clean up from windows porting	2013-05-11 18:23:41 -04:00
Joey Hess	3c7e30a295	git-annex now builds on Windows (doesn't work)	2013-05-11 15:03:00 -05:00
Joey Hess	d38854f3d1	configure: Better checking that sha commands output in the desired format. Run the same code git-annex used to get the sha, including its sanity checking. Much better than old grep. Should detect FreeBSD systems with sha commands that output in stange format.	2013-05-08 11:17:09 -04:00
Joey Hess	cda0ed5d25	SHA: Add a runtime sanity check that sha commands output something that appears to be a real sha. This after fielding a bug where git-annex was built with a sha256 program whose output checked out, but was then run with one that output lines like: SHA256 (file) = <sha here> Which it then parsed as having a SHA256 of "SHA256"! Now the output of the command is required to be of the right length, and contain only the right characters.	2013-05-07 20:19:37 -04:00
Joey Hess	8a2d1988d3	expose Control.Monad.join I think I've been looking for that function for some time. Ie, I remember wanting to collapse Just Nothing to Nothing.	2013-04-22 20:24:53 -04:00
Joey Hess	bd0d06be23	SHAE backends: Exclude non-alphanumeric characters from extensions. SHAE backends: Exclude non-alphanumeric characters from extensions. migrate: Remove leading \ in SHA* checksums, and non-alphanumerics from extensions of SHA*E keys.	2012-12-20 17:16:55 -04:00
Joey Hess	e71f85645e	handle shasum's leading \ in checksum with certian unsual filenames Bugfix: Remove leading \ from checksums output by shasum commands, when the filename contains \ or a newline. Closes: #696384 fsck: Still accept checksums with a leading \ as valid, now that above bug is fixed. * migrate: Remove leading \ in checksums	2012-12-20 17:07:10 -04:00
Joey Hess	2172cc586e	where indenting	2012-11-11 00:51:07 -04:00
Joey Hess	0b12db64d8	Avoid crashing on encoding errors in filenames when writing transfer info files and reading from checksum commands.	2012-09-16 01:53:06 -04:00
Joey Hess	3724344461	SHA256E is new default backend The default backend used when adding files to the annex is changed from SHA256 to SHA256E, to simplify interoperability with OSX, media players, and various programs that needlessly look at symlink targets. To get old behavior, add a .gitattributes containing: * annex.backend=SHA256	2012-09-12 13:22:16 -04:00
Joey Hess	1f83dafc7e	Bugfix: Fix fsck in SHA*E backends, when the key contains composite extensions, as added in 3.20120721.	2012-08-24 12:16:17 -04:00
Joey Hess	9fc94d780b	better readProcess	2012-07-19 00:57:40 -04:00
Joey Hess	1db7d27a45	add back debug logging Make Utility.Process wrap the parts of System.Process that I use, and add debug logging to them. Also wrote some higher-level code that allows running an action with handles to a processes stdin or stdout (or both), and checking its exit status, all in a single function call. As a bonus, the debug logging now indicates whether the process is being run to read from it, feed it data, chat with it (writing and reading), or just call it for its side effect.	2012-07-19 00:46:52 -04:00
Joey Hess	d1da9cf221	switch from System.Cmd.Utils to System.Process Test suite now passes with -threaded! I traced back all the hangs with -threaded to System.Cmd.Utils. It seems it's just crappy/unsafe/outdated, and should not be used. System.Process seems to be the cool new thing, so converted all the code to use it instead. In the process, --debug stopped printing commands it runs. I may try to bring that back later. Note that even SafeSystem was switched to use System.Process. Since that was a modified version of code from System.Cmd.Utils, it needed to be converted too. I also got rid of nearly all calls to forkProcess, and all calls to executeFile, which I'm also doubtful about working well with -threaded.	2012-07-18 18:00:24 -04:00
Joey Hess	8ad844e45c	fix leading period before two-element extensions	2012-07-06 17:22:56 -06:00
Joey Hess	5a753a7b8a	SHAnE backends are now smarter about composite extensions, such as .tar.gz Closes: #680450	2012-07-05 16:24:02 -06:00
Joey Hess	40729e7fa2	Use SHA library for files less than 50 kb in size, at which point it's faster than forking the more optimised external program.	2012-07-04 13:04:01 -04:00
Joey Hess	1da79ea61f	When shaNsum commands cannot be found, use the Haskell SHA library (already a dependency) to do the checksumming. This may be slower, but avoids portability problems. Using Crypto's version of the hashes would be another option. I need to benchmark it. The SHA2 library (which provides SHA1 also, confusing name) may be the fastest option, but is not currently in Debian.	2012-07-04 09:11:36 -04:00
Joey Hess	e0fdfb2e70	maintain set of files pendingAdd Kqueue needs to remember which files failed to be added due to being open, and retry them. This commit gets the data in place for such a retry thread. Broke KeySource out into its own file, and added Eq and Ord instances so it can be stored in a Set.	2012-06-20 16:31:46 -04:00
Joey Hess	d3cee987ca	separate source of content from the filename associated with the key when generating a key This already made migrate's code a lot simpler.	2012-06-05 19:51:03 -04:00
Joey Hess	2183fd2abd	Require that the SHA256 backend can be used when building, since it's the default.	2012-05-31 23:15:40 -04:00
Joey Hess	8f9b501515	handle really long urls Using the whole url as a key can make the filename too long. Truncate and use a md5sum for uniqueness if necessary.	2012-02-16 02:05:06 -04:00
Joey Hess	17fed709c8	addurl --fast: Verifies that the url can be downloaded (only getting its head), and records the size in the key.	2012-02-10 19:23:46 -04:00
Joey Hess	90319afa41	fsck --from Fscking a remote is now supported. It's done by retrieving the contents of the specified files from the remote, and checking them, so can be an expensive operation. (Several optimisations are possible, to speed it up, of course.. This is the slow and stupid remote fsck to start with.) Still, if the remote is a special remote, or a git repository that you cannot run fsck in locally, it's nice to have the ability to fsck it. If you have any directory special remotes, now would be a good time to fsck them, in case you were hit by the data loss bug fixed in the previous release!	2012-01-19 15:24:05 -04:00
Joey Hess	d36525e974	convert fsckKey to a Maybe This way it's clear when a backend does not implement its own fsck checks.	2012-01-19 13:51:30 -04:00
Joey Hess	4a02c2ea62	type alias cleanup	2011-12-31 04:11:58 -04:00

1 2 3 4 5 ...

287 commits