git-annex

Author	SHA1	Message	Date
Joey Hess	a05b763b01	Added SKEIN256 and SKEIN512 backends SHA3 is still waiting for final standardization. Although this is looking less likely given https://www.cdt.org/blogs/joseph-lorenzo-hall/2409-nist-sha-3 In the meantime, cryptohash implements skein, and it's used by some of the haskell ecosystem (for yesod sessions, IIRC), so this implementation is likely to continue working. Also, I've talked with the cryprohash author and he's a reasonable guy. It makes sense to have an alternate high security hash, in case some horrible attack is found against SHA2 tomorrow, or in case SHA3 comes out and worst fears are realized. I'd also like to support using skein for HMAC. But no hurry there and a new version of cryptohash has much nicer HMAC code, so I will probably wait until I can use that version.	2013-10-01 20:34:36 -04:00
Joey Hess	b405295aee	hlint test suite still passes	2013-09-25 03:09:06 -04:00
Joey Hess	7390f08ef9	Use cryptohash rather than SHA for hashing. This is a massive win on OSX, which doesn't have a sha256sum normally. Only use external hash commands when the file is > 1 mb, since cryptohash is quite close to them in speed. SHA is still used to calculate HMACs. I don't quite understand cryptohash's API for those. Used the following benchmark to arrive at the 1 mb number. 1 mb file: benchmarking sha256/internal mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950 std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950 found 5 outliers among 100 samples (5.0%) 4 (4.0%) high mild 1 (1.0%) high severe variance introduced by outliers: 10.415% variance is moderately inflated by outliers benchmarking sha256/external mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950 std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950 found 3 outliers among 100 samples (3.0%) 2 (2.0%) high mild 1 (1.0%) high severe 2 mb file: benchmarking sha256/internal mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950 std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950 variance introduced by outliers: 35.540% variance is moderately inflated by outliers benchmarking sha256/external mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950 std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950 found 6 outliers among 100 samples (6.0%) import Crypto.Hash import Data.ByteString.Lazy as L import Criterion.Main import Common testfile :: FilePath testfile = "/run/shm/data" -- on ram disk main = defaultMain [ bgroup "sha256" [ bench "internal" $ whnfIO internal , bench "external" $ whnfIO external ] ] sha256 :: L.ByteString -> Digest SHA256 sha256 = hashlazy internal :: IO String internal = show . sha256 <$> L.readFile testfile external :: IO String external = do s <- readProcess "sha256sum" [testfile] return $ fst $ separate (== ' ') s	2013-09-22 20:06:02 -04:00
Joey Hess	ddd46db09a	Fix a few bugs involving filenames that are at or near the filesystem's maximum filename length limit. Started with a problem when running addurl on a really long url, because the whole url is munged into the filename. Ended up doing a fairly extensive review for places where filenames could get too large, although it's hard to say I'm not missed any.. Backend.Url had a 128 character limit, which is fine when the limit is 255, but not if it's a lot shorter on some systems. So check the pathconf() limit. Note that this could result in fromUrl creating different keys for the same url, if run on systems with different limits. I don't see this is likely to cause any problems. That can already happen when using addurl --fast, or if the content of an url changes. Both Command.AddUrl and Backend.Url assumed that urls don't contain a lot of multi-byte unicode, and would fail to truncate an url that did properly. A few places use a filename as the template to make a temp file. While that's nice in that the temp file name can be easily related back to the original filename, it could lead to `git annex add` failing to add a filename that was at or close to the maximum length. Note that in Command.Add.lockdown, the template is still derived from the filename, just with enough space left to turn it into a temp file. This is an important optimisation, because the assistant may lock down a bunch of files all at once, and using the same template for all of them would cause openTempFile to iterate through the same set of names, looking for an unused temp file. I'm not very happy with the relatedTemplate hack, but it avoids that slowdown. Backend.WORM does not limit the filename stored in the key. I have not tried to change that; so git annex add will fail on really long filenames when using the WORM backend. It seems better to preserve the invariant that a WORM key always contains the complete filename, since the filename is the only unique material in the key, other than mtime and size. Since nobody has complained about add failing (I think I saw it once?) on WORM, probably it's ok, or nobody but me uses it. There may be compatability problems if using git annex addurl --fast or the WORM backend on a system with the 255 limit and then trying to use that repo in a system with a smaller limit. I have not tried to deal with those. This commit was sponsored by Alexander Brem. Thanks!	2013-07-30 19:18:29 -04:00
Joey Hess	abe8d549df	fix permission damage (thanks, Windows)	2013-05-11 23:54:25 -04:00
Joey Hess	18bdff3fae	clean up from windows porting	2013-05-11 18:23:41 -04:00
Joey Hess	3c7e30a295	git-annex now builds on Windows (doesn't work)	2013-05-11 15:03:00 -05:00
Joey Hess	d38854f3d1	configure: Better checking that sha commands output in the desired format. Run the same code git-annex used to get the sha, including its sanity checking. Much better than old grep. Should detect FreeBSD systems with sha commands that output in stange format.	2013-05-08 11:17:09 -04:00
Joey Hess	cda0ed5d25	SHA: Add a runtime sanity check that sha commands output something that appears to be a real sha. This after fielding a bug where git-annex was built with a sha256 program whose output checked out, but was then run with one that output lines like: SHA256 (file) = <sha here> Which it then parsed as having a SHA256 of "SHA256"! Now the output of the command is required to be of the right length, and contain only the right characters.	2013-05-07 20:19:37 -04:00
Joey Hess	8a2d1988d3	expose Control.Monad.join I think I've been looking for that function for some time. Ie, I remember wanting to collapse Just Nothing to Nothing.	2013-04-22 20:24:53 -04:00
Joey Hess	bd0d06be23	SHAE backends: Exclude non-alphanumeric characters from extensions. SHAE backends: Exclude non-alphanumeric characters from extensions. migrate: Remove leading \ in SHA* checksums, and non-alphanumerics from extensions of SHA*E keys.	2012-12-20 17:16:55 -04:00
Joey Hess	e71f85645e	handle shasum's leading \ in checksum with certian unsual filenames Bugfix: Remove leading \ from checksums output by shasum commands, when the filename contains \ or a newline. Closes: #696384 fsck: Still accept checksums with a leading \ as valid, now that above bug is fixed. * migrate: Remove leading \ in checksums	2012-12-20 17:07:10 -04:00
Joey Hess	2172cc586e	where indenting	2012-11-11 00:51:07 -04:00
Joey Hess	0b12db64d8	Avoid crashing on encoding errors in filenames when writing transfer info files and reading from checksum commands.	2012-09-16 01:53:06 -04:00
Joey Hess	3724344461	SHA256E is new default backend The default backend used when adding files to the annex is changed from SHA256 to SHA256E, to simplify interoperability with OSX, media players, and various programs that needlessly look at symlink targets. To get old behavior, add a .gitattributes containing: * annex.backend=SHA256	2012-09-12 13:22:16 -04:00
Joey Hess	1f83dafc7e	Bugfix: Fix fsck in SHA*E backends, when the key contains composite extensions, as added in 3.20120721.	2012-08-24 12:16:17 -04:00
Joey Hess	9fc94d780b	better readProcess	2012-07-19 00:57:40 -04:00
Joey Hess	1db7d27a45	add back debug logging Make Utility.Process wrap the parts of System.Process that I use, and add debug logging to them. Also wrote some higher-level code that allows running an action with handles to a processes stdin or stdout (or both), and checking its exit status, all in a single function call. As a bonus, the debug logging now indicates whether the process is being run to read from it, feed it data, chat with it (writing and reading), or just call it for its side effect.	2012-07-19 00:46:52 -04:00
Joey Hess	d1da9cf221	switch from System.Cmd.Utils to System.Process Test suite now passes with -threaded! I traced back all the hangs with -threaded to System.Cmd.Utils. It seems it's just crappy/unsafe/outdated, and should not be used. System.Process seems to be the cool new thing, so converted all the code to use it instead. In the process, --debug stopped printing commands it runs. I may try to bring that back later. Note that even SafeSystem was switched to use System.Process. Since that was a modified version of code from System.Cmd.Utils, it needed to be converted too. I also got rid of nearly all calls to forkProcess, and all calls to executeFile, which I'm also doubtful about working well with -threaded.	2012-07-18 18:00:24 -04:00
Joey Hess	8ad844e45c	fix leading period before two-element extensions	2012-07-06 17:22:56 -06:00
Joey Hess	5a753a7b8a	SHAnE backends are now smarter about composite extensions, such as .tar.gz Closes: #680450	2012-07-05 16:24:02 -06:00
Joey Hess	40729e7fa2	Use SHA library for files less than 50 kb in size, at which point it's faster than forking the more optimised external program.	2012-07-04 13:04:01 -04:00
Joey Hess	1da79ea61f	When shaNsum commands cannot be found, use the Haskell SHA library (already a dependency) to do the checksumming. This may be slower, but avoids portability problems. Using Crypto's version of the hashes would be another option. I need to benchmark it. The SHA2 library (which provides SHA1 also, confusing name) may be the fastest option, but is not currently in Debian.	2012-07-04 09:11:36 -04:00
Joey Hess	e0fdfb2e70	maintain set of files pendingAdd Kqueue needs to remember which files failed to be added due to being open, and retry them. This commit gets the data in place for such a retry thread. Broke KeySource out into its own file, and added Eq and Ord instances so it can be stored in a Set.	2012-06-20 16:31:46 -04:00
Joey Hess	d3cee987ca	separate source of content from the filename associated with the key when generating a key This already made migrate's code a lot simpler.	2012-06-05 19:51:03 -04:00
Joey Hess	2183fd2abd	Require that the SHA256 backend can be used when building, since it's the default.	2012-05-31 23:15:40 -04:00
Joey Hess	8f9b501515	handle really long urls Using the whole url as a key can make the filename too long. Truncate and use a md5sum for uniqueness if necessary.	2012-02-16 02:05:06 -04:00
Joey Hess	17fed709c8	addurl --fast: Verifies that the url can be downloaded (only getting its head), and records the size in the key.	2012-02-10 19:23:46 -04:00
Joey Hess	90319afa41	fsck --from Fscking a remote is now supported. It's done by retrieving the contents of the specified files from the remote, and checking them, so can be an expensive operation. (Several optimisations are possible, to speed it up, of course.. This is the slow and stupid remote fsck to start with.) Still, if the remote is a special remote, or a git repository that you cannot run fsck in locally, it's nice to have the ability to fsck it. If you have any directory special remotes, now would be a good time to fsck them, in case you were hit by the data loss bug fixed in the previous release!	2012-01-19 15:24:05 -04:00
Joey Hess	d36525e974	convert fsckKey to a Maybe This way it's clear when a backend does not implement its own fsck checks.	2012-01-19 13:51:30 -04:00
Joey Hess	4a02c2ea62	type alias cleanup	2011-12-31 04:11:58 -04:00
Joey Hess	95d2391f58	more partial function removal Left a few Prelude.head's in where it was checked not null and too hard to remove, etc.	2011-12-15 18:19:36 -04:00
Joey Hess	480495beb4	Prevent key names from containing newlines. There are several places where it's assumed a key can be written on one line. One is in the format of the .git/annex/unused files. The difficult one is that filenames derived from keys are fed into git cat-file --batch, which has a line based input. (And no -z option.) So, for now it's best to block such keys being created.	2011-12-06 13:03:09 -04:00
Joey Hess	da9cd315be	add support for using hashDirLower in addition to hashDirMixed Supporting multiple directory hash types will allow converting to a different one, without a flag day. gitAnnexLocation now checks which of the possible locations have a file. This means more statting of files. Several places currently use gitAnnexLocation and immediately check if the returned file exists; those need to be optimised.	2011-11-28 22:43:51 -04:00
Joey Hess	bf460a0a98	reorder repo parameters last Many functions took the repo as their first parameter. Changing it consistently to be the last parameter allows doing some useful things with currying, that reduce boilerplate. In particular, g <- gitRepo is almost never needed now, instead use inRepo to run an IO action in the repo, and fromRepo to get a value from the repo. This also provides more opportunities to use monadic and applicative combinators.	2011-11-08 16:27:20 -04:00
Joey Hess	ef3457196a	use SHA256 by default To get old behavior, add a .gitattributes containing: * annex.backend=WORM I feel that SHA256 is a better default for most people, as long as their systems are fast enough that checksumming their files isn't a problem. git-annex should default to preserving the integrity of data as well as git does. Checksum backends also work better with editing files via unlock/lock. I considered just using SHA1, but since that hash is believed to be somewhat near to being broken, and git-annex deals with large files which would be a perfect exploit medium, I decided to go to a SHA-2 hash. SHA512 is annoyingly long when displayed, and git-annex displays it in a few places (and notably it is shown in ls -l), so I picked the shorter hash. Considered SHA224 as it's even shorter, but feel it's a bit weird. I expect git-annex will use SHA-3 at some point in the future, but probably not soon! Note that systems without a sha256sum (or sha256) program will fall back to defaulting to SHA1.	2011-11-04 15:51:01 -04:00
Joey Hess	eec137f33a	Record uuid when auto-initializing a remote so it shows in status.	2011-11-02 14:18:21 -04:00
Joey Hess	c643136e32	playing with >=> Apparently in haskell if you teach a man to fish, he'll write more pointfree code.	2011-10-31 23:39:55 -04:00
Joey Hess	b505ba83e8	minor syntax changes	2011-10-11 14:43:45 -04:00
Joey Hess	6a6ea06cee	rename	2011-10-05 16:02:51 -04:00
Joey Hess	cfe21e85e7	rename	2011-10-04 00:59:08 -04:00
Joey Hess	8ef2095fa0	factor out common imports no code changes	2011-10-03 23:29:48 -04:00
Joey Hess	9f6b7935dd	go go gadget hlint	2011-09-20 23:24:48 -04:00
Joey Hess	203148363f	split groups of related functions out of Utility	2011-08-22 16:14:12 -04:00
Joey Hess	737b5d14c9	moved files around	2011-08-20 16:11:42 -04:00
Joey Hess	dede05171b	addurl: --fast can be used to avoid immediately downloading the url. The tricky part about this is that to generate a key, the file must be present already. Worked around by adding (back) an URL key type, which is used for addurl --fast.	2011-08-06 14:57:22 -04:00
Joey Hess	3ffc0bb4f5	foo	2011-08-06 12:50:20 -04:00
Joey Hess	00153eed48	unify elipsis handling And add a simple dots-based progress display, currently only used in v2 upgrade.	2011-07-19 14:07:23 -04:00
Joey Hess	e784757376	hlint tweaks Did all sources except Remotes/* and Command/*	2011-07-15 03:12:05 -04:00
Joey Hess	9f1577f746	remove unused backend machinery The only remaining vestiage of backends is different types of keys. These are still called "backends", mostly to avoid needing to change user interface and configuration. But everything to do with storing keys in different backends was gone; instead different types of remotes are used. In the refactoring, lots of code was moved out of odd corners like Backend.File, to closer to where it's used, like Command.Drop and Command.Fsck. Quite a lot of dead code was removed. Several data structures became simpler, which may result in better runtime efficiency. There should be no user-visible changes.	2011-07-05 19:57:46 -04:00
Joey Hess	fb58d1a560	wording	2011-07-01 17:17:51 -04:00
Joey Hess	2cdacfbae6	remove URL backend	2011-07-01 16:01:04 -04:00
Joey Hess	cdbcd6f495	add web special remote Generalized LocationLog to PresenceLog, and use a presence log to record urls for the web special remote.	2011-07-01 15:30:42 -04:00
Joey Hess	f6063a094e	renamed GitRepo to Git It was always imported qualified as Git anyway	2011-06-30 13:21:39 -04:00
Joey Hess	e3384eb476	tweak fsck wording so file is at the end of the line	2011-06-23 19:56:24 -04:00
Joey Hess	7ee636f6dd	avoid unnecessary read of trust.log	2011-06-23 13:39:04 -04:00
Joey Hess	1870186632	fixed logFile	2011-06-22 16:17:16 -04:00
Joey Hess	d3f0106f2e	move LocationLog into Annex monad from IO It will need to run in Annex so it can use Branch	2011-06-22 14:27:50 -04:00
Joey Hess	9a272815dd	Bugfix: Fix fsck to not think all SHAnE keys are bad.	2011-06-10 11:43:28 -04:00
Joey Hess	90dd245522	get --from is the same as copy --from get not honoring --from has surprised me a few times, so least surprise suggests it should just behave like copy --from. This leaves the difference between get and copy being that copy always requires the remote to copy from, while get will decide whether to get a file from a key/value store or a remote.	2011-06-09 18:54:49 -04:00
Joey Hess	703c437bd9	rename modules for data types into Types/ directory	2011-06-01 21:56:04 -04:00
Joey Hess	971ab27e78	better types allowed breaking module dep loop	2011-06-01 19:11:27 -04:00
Joey Hess	a8fb97d2ce	Add --trust, --untrust, and --semitrust options.	2011-06-01 17:57:31 -04:00
Joey Hess	3d567aa64f	Add --numcopies option.	2011-06-01 16:49:17 -04:00
Joey Hess	2a8efc7af1	Added filename extension preserving variant backends SHA1E, SHA256E, etc.	2011-05-16 11:46:34 -04:00
Joey Hess	5d8e0d5a1c	remove unused file	2011-04-29 12:20:59 -04:00
Joey Hess	b889543507	let's use Maybe String for commands that may not be avilable	2011-04-07 21:47:56 -04:00
Fraser Tweedale	f5b2d650bb	recognise differently-named shaN programs	2011-04-08 10:08:11 +10:00
Joey Hess	48418cb92b	reexport RemoteClass from Remote for cleanliness	2011-03-27 17:12:32 -04:00
Joey Hess	f30320aa75	add remotes slot to Annex This required parameterizing the type for Remote, to avoid a cycle.	2011-03-27 16:17:56 -04:00
Joey Hess	b40f253d6e	start of generalizing remotes Goal is to support multiple different types of remotes, some of which are not git repositories. To that end, added a Remote class, and moved git remote specific code into Remote.GitRemote. Remotes.hs is still present as some code has not been converted to use the new Remote class yet.	2011-03-27 16:04:25 -04:00
Joey Hess	6246b807f7	migrate: Support migrating v1 SHA keys to v2 SHA keys with size information that can be used for free space checking.	2011-03-23 17:57:10 -04:00
Joey Hess	c43e3b5c78	check key size when available, no matter the backend Now that SHA and other backends can have size info, fsck should check it whenever available.	2011-03-23 02:10:59 -04:00
Joey Hess	c21998722c	fast mode Add --fast flag, that can enable less expensive, but also less thurough versions of some commands. * Add --fast flag, that can enable less expensive, but also less thurough versions of some commands. * fsck: In fast mode, avoid checking checksums. * unused: In fast mode, just show all existing temp files as unused, and avoid expensive scan for other unused content.	2011-03-22 17:41:06 -04:00
Joey Hess	7b5b127608	Fix dropping of files using the URL backend.	2011-03-17 11:49:21 -04:00
Joey Hess	da504f647f	fromkey, and url backend download work now	2011-03-15 22:28:18 -04:00
Joey Hess	4594bd51c1	rename file	2011-03-15 22:04:50 -04:00
Joey Hess	9d49fe2c17	first pass at using new keys It compiles. It sorta works. Several subcommands are FIXME marked and broken, because things that used to accept separate --backend and --key params need to be changed to accept just a --key that encodes all the key info, now that there is metadata in keys.	2011-03-15 21:34:13 -04:00
Joey Hess	72d2684016	Rethink filename encoding handling for display. Since filename encoding may or may not match locale settings, any attempt to decode filenames will fail for some files. So instead, do all output in binary mode.	2011-03-12 15:30:17 -04:00
Joey Hess	a3daac8a8b	only enable SHA backends that configure finds support for	2011-03-02 13:47:45 -04:00
Joey Hess	1b9c4477fb	New backends: SHA512 SHA384 SHA256 SHA224	2011-03-01 17:07:15 -04:00
Joey Hess	b7f4801801	generic SHA size support	2011-03-01 16:50:53 -04:00
Joey Hess	4cd96ad2db	rename	2011-02-28 16:25:31 -04:00
Joey Hess	fcdc4797a9	use ShellParam type So, I have a type checked safe handling of filenames starting with dashes, throughout the code.	2011-02-28 16:18:55 -04:00
Joey Hess	836e71297b	Support filenames that start with a dash; when such a file is passed to a utility it will be escaped to avoid it being interpreted as an option.	2011-02-25 01:13:01 -04:00
Joey Hess	5a50a7cf13	update unicode FilePath handling Based on http://hackage.haskell.org/trac/ghc/ticket/3307 , whether FilePath contains decoded unicode varies by OS. So, add a configure check for it. Also, renamed showFile to filePathToString	2011-02-11 15:37:37 -04:00
Joey Hess	fe55b4644e	Fix display of unicode filenames. Internally, the filenames are stored as un-decoded unicode. I tried decoding them, but then haskell tries to access the wrong files. Hmm. So, I've unhappily chosen option "B", which is to decode filenames before they are displayed.	2011-02-10 14:21:44 -04:00
Joey Hess	1b0edc1ab2	idiomatic elem	2011-01-30 12:13:34 -04:00
Joey Hess	167523f09d	better directory handling Rename Locations functions for better consitency, and make their values more consistent too. Used </> rather than manually building paths. There are still more places that manually do so, but are tricky, due to the behavior of </> when the second FilePath is absolute. So I only changed places where it obviously was relative.	2011-01-27 17:00:32 -04:00
Joey Hess	5e54eb79b8	less verbose	2011-01-27 15:12:38 -04:00
Joey Hess	e1d213d6e3	make filename available to fsck messages	2011-01-26 20:37:46 -04:00
Joey Hess	3cb5cb6bf6	bring back display of keys in fsck -q, that's the only way to know what file it means	2011-01-26 20:08:37 -04:00
Joey Hess	ee2e94f087	this should be a warning	2011-01-26 20:03:12 -04:00
Joey Hess	1a11085a50	drop: suppprt untrusted repos	2011-01-26 19:35:35 -04:00
Joey Hess	6b48f740f1	rework note	2011-01-26 17:47:02 -04:00
Joey Hess	ba748a1198	fsck: handle untrusted repos	2011-01-26 17:44:40 -04:00
Joey Hess	b7903eb2d1	move partitioning out of keyPossibilities And a bug fix in passing.	2011-01-26 16:44:14 -04:00
Joey Hess	616d1d4a20	rename TypeInternals to BackendTypes Now that it only contains types used by the backends	2011-01-26 00:37:50 -04:00
Joey Hess	6a97b10fcb	rework config storage Moved away from a map of flags to storing config directly in the AnnexState structure. Got rid of most accessor functions in Annex. This allowed supporting multiple --exclude flags.	2011-01-26 00:17:38 -04:00
Joey Hess	082b022f9a	successfully split Annex and AnnexState out of TypeInternals	2011-01-25 21:49:04 -04:00

1 2 3 4 5

218 commits