git-annex

Author	SHA1	Message	Date
Joey Hess	ef28b3fef7	split out Git/Command.hs	2011-12-14 15:56:11 -04:00
Joey Hess	02f1bd2bf4	split more stuff out of Git.hs	2011-12-14 15:43:13 -04:00
Joey Hess	acd7a52dfd	always find optimal merge Testing `b9ac585454`, it didn't find the optimal union merge, the second sha was the one to use, at least in the case I tried. Let's just try all shas to see if any can be reused. I stopped using the expensive nub, so despite the use of sets to sort/uniq file contents, this is probably as fast or faster than it was before.	2011-12-12 01:59:29 -04:00
Joey Hess	0cbab5de65	refactor	2011-12-12 00:48:25 -04:00
Joey Hess	b9ac585454	more efficient union merges Tries to avoid generating a new object when the merged content has the same lines that were in the old object. I've noticed some merge commits that only move lines around, like this: - 1323478057.181191s 1 be23c3ac-0ee5-11e0-b185-3b0f9b5b00c5 1323204972.062151s 1 87e06c7a-7388-11e0-ba07-03cdf300bd87 ++1323478057.181191s 1 be23c3ac-0ee5-11e0-b185-3b0f9b5b00c5 Unsure if this will really save anything in practice, since it only looks at one of the two old objects, and maybe I didn't pick the best one.	2011-12-11 23:02:25 -04:00
Joey Hess	d64132a43a	hslint	2011-12-09 01:57:13 -04:00
Joey Hess	9290095fc2	improve type signatures with a Ref newtype In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for those. Note that this does not prevent mixing up of eg, refs and branches at the type level. Since git really doesn't care, except rare cases like git update-ref, or git tag -d, that seems ok for now. There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref may or may not be a tree-ish, depending on the object type, so there seems no point in trying to represent it at the type level.	2011-11-16 02:41:46 -04:00
Joey Hess	272a67921c	better name	2011-11-16 01:46:46 -04:00
Joey Hess	e83b966eb5	cleanup	2011-11-15 23:51:24 -04:00
Joey Hess	21a925dcf1	merge: Now runs in constant space. Before, a merge was first calculated, by running various actions that called git and built up a list of lines, which were at the end sent to git update-index. This necessarily used space proportional to the size of the diff between the trees being merged. Now, lines are streamed into git update-index from each of the actions in turn. Runtime size of git-annex merge when merging 50000 location log files drops from around 100 mb to a constant 4 mb. Presumably it runs quite a lot faster, too.	2011-11-15 23:28:01 -04:00
Joey Hess	922e9af528	cleanup	2011-11-15 22:40:40 -04:00
Joey Hess	b76dc2d210	avoid space leak writing merge This reduces the memory use of a merge by 1/3rd. The space leak was apparently because the whole update-index input was generated strictly, not lazily. I wondered if the change to ByteStrings contributed to this, due to the need to convert with L.pack here. But going back to the old code, I still see a much similar leak, and worse performance besides due to it not using ByteStrings. The fix is to just hPutStr the lines repeatedly. (Note the \0 is written separately, to avoid allocation overheads in adding it to the string.) The Git.pipeWrite interface is probably just wrong for any large inputs to git. This was the only place using it for input of any size. There is still at least one other space leak in the merge code.	2011-11-15 22:19:12 -04:00
Joey Hess	04edae6791	Optimised union merging; now only runs git cat-file once.	2011-11-12 17:45:12 -04:00
Joey Hess	637b5feb45	lint	2011-11-11 01:52:58 -04:00
Joey Hess	bf460a0a98	reorder repo parameters last Many functions took the repo as their first parameter. Changing it consistently to be the last parameter allows doing some useful things with currying, that reduce boilerplate. In particular, g <- gitRepo is almost never needed now, instead use inRepo to run an IO action in the repo, and fromRepo to get a value from the repo. This also provides more opportunities to use monadic and applicative combinators.	2011-11-08 16:27:20 -04:00
Joey Hess	3acdba3995	faster union merge of multiple branches into index only write index once	2011-10-07 13:36:48 -04:00
Joey Hess	7ff89ccfee	convert all git read/write functions to use ByteStrings This yields a second or so speedup in unused, find, etc. Seems that even when the ByteString is immediately split and then converted to Strings, it's faster. I may try to push ByteStrings out into more of git-annex gradually, although I suspect most of the time-critical parts are already covered now, and many of the rest rely on libraries that only support Strings.	2011-09-29 23:48:57 -04:00
Joey Hess	67f2b7cb3e	use ByteStrings when reading content of files didn't bother to benchmark this	2011-09-29 19:19:28 -04:00
Joey Hess	203148363f	split groups of related functions out of Utility	2011-08-22 16:14:12 -04:00
Joey Hess	e784757376	hlint tweaks Did all sources except Remotes/* and Command/*	2011-07-15 03:12:05 -04:00
Joey Hess	896726cde4	rename GitUnionMerge to Git.UnionMerge Also, moved commit function into Git proper, it's not union merge specific.	2011-06-30 13:32:47 -04:00

21 commits