Commit graph

522 commits

Author SHA1 Message Date
Joey Hess
21a925dcf1 merge: Now runs in constant space.
Before, a merge was first calculated, by running various actions that
called git and built up a list of lines, which were at the end sent
to git update-index. This necessarily used space proportional to the size
of the diff between the trees being merged.

Now, lines are streamed into git update-index from each of the actions in
turn.

Runtime size of git-annex merge when merging 50000 location log files
drops from around 100 mb to a constant 4 mb.

Presumably it runs quite a lot faster, too.
2011-11-15 23:28:01 -04:00
Joey Hess
922e9af528 cleanup 2011-11-15 22:40:40 -04:00
Joey Hess
b76dc2d210 avoid space leak writing merge
This reduces the memory use of a merge by 1/3rd. The space leak was
apparently because the whole update-index input was generated strictly, not
lazily.

I wondered if the change to ByteStrings contributed to this, due to the
need to convert with L.pack here. But going back to the old code, I still
see a much similar leak, and worse performance besides due to it not using
ByteStrings.

The fix is to just hPutStr the lines repeatedly. (Note the \0 is written
separately, to avoid allocation overheads in adding it to the string.)
The Git.pipeWrite interface is probably just wrong for any large inputs to
git. This was the only place using it for input of any size.

There is still at least one other space leak in the merge code.
2011-11-15 22:19:12 -04:00
Joey Hess
04edae6791 Optimised union merging; now only runs git cat-file once. 2011-11-12 17:45:12 -04:00
Joey Hess
637b5feb45 lint 2011-11-11 01:52:58 -04:00
Joey Hess
bf460a0a98 reorder repo parameters last
Many functions took the repo as their first parameter. Changing it
consistently to be the last parameter allows doing some useful things with
currying, that reduce boilerplate.

In particular, g <- gitRepo is almost never needed now, instead
use inRepo to run an IO action in the repo, and fromRepo to get
a value from the repo.

This also provides more opportunities to use monadic and applicative
combinators.
2011-11-08 16:27:20 -04:00
Joey Hess
3acdba3995 faster union merge of multiple branches into index
only write index once
2011-10-07 13:36:48 -04:00
Joey Hess
7ff89ccfee convert all git read/write functions to use ByteStrings
This yields a second or so speedup in unused, find, etc. Seems that even
when the ByteString is immediately split and then converted to Strings,
it's faster.

I may try to push ByteStrings out into more of git-annex gradually,
although I suspect most of the time-critical parts are already covered
now, and many of the rest rely on libraries that only support Strings.
2011-09-29 23:48:57 -04:00
Joey Hess
949ef94d5e layout 2011-09-29 22:31:20 -04:00
Joey Hess
67f2b7cb3e use ByteStrings when reading content of files
didn't bother to benchmark this
2011-09-29 19:19:28 -04:00
Joey Hess
a91c8a15d5 Sped up unused.
Added Git.ByteString which replaces Git IO methods with ones using lazy
ByteStrings. This can be more efficient when large quantities of data are
being read from git.

In Git.LsTree, parse git ls-tree output more efficiently, thanks
to ByteString. This benchmarks 25% faster, in a benchmark that includes
(probably predominately) the run time for git ls-tree itself.

In real world numbers, this makes git annex unused 2 seconds faster for
each branch it needs to check, in my usual large repo.
2011-09-29 19:04:24 -04:00
Joey Hess
297bc648b9 make unused check branches and tags too
needs time and space optimisation
2011-09-28 16:43:10 -04:00
Joey Hess
ad245a6375 refactor catfile code
split into generic IO code, and a thin Annex wrapper
2011-09-28 15:17:36 -04:00
Joey Hess
a3cb5c47e5 use FileMode 2011-09-28 14:14:52 -04:00
Joey Hess
93807564d0 add ls-tree interface
This parser should be fast. I hope.
2011-09-28 14:03:59 -04:00
Joey Hess
7724f895a8 tweak 2011-09-25 14:37:13 -04:00
Joey Hess
203148363f split groups of related functions out of Utility 2011-08-22 16:14:12 -04:00
Joey Hess
e784757376 hlint tweaks
Did all sources except Remotes/* and Command/*
2011-07-15 03:12:05 -04:00
Joey Hess
ded2591124 unannex: Clean up use of git commit -a.
This was more complex than would be expected. unannex has to use git commit -a
since it's removing files from git; git commit filelist won't do.

Allow commands to be added to the Git queue that have no associated files,
and run such commands once.
2011-07-14 17:15:37 -04:00
Joey Hess
896726cde4 rename GitUnionMerge to Git.UnionMerge
Also, moved commit function into Git proper, it's not union merge specific.
2011-06-30 13:32:47 -04:00
Joey Hess
f0497312a7 rename GitQueue to Git.Queue 2011-06-30 13:25:37 -04:00
Joey Hess
f6063a094e renamed GitRepo to Git
It was always imported qualified as Git anyway
2011-06-30 13:21:39 -04:00