git-annex

Author	SHA1	Message	Date
Joey Hess	0093a456e8	test suite saved my bacon git config reading memoization shouldn't be used when changing config	2012-05-19 10:22:43 -04:00
Joey Hess	a1885bd116	make GIT_DIR, GIT_WORK_TREE absolute GIT_DIR is set to something relative, like ".git" in the pre-commit hook. But internally all the directories are assumed to be absolute.	2012-05-18 18:32:19 -04:00
Joey Hess	eb6cb1b87f	Add support for core.worktree, and fix support for GIT_WORK_TREE and GIT_DIR. The environment needs to override git-config. Changed when git config is read, and avoid rereading it once it's been read. chdir for both worktree settings.	2012-05-18 18:20:53 -04:00
Joey Hess	bb4f31a0ee	Clean up handling of git directory and git worktree. Baked into the code was an assumption that a repository's git directory could be determined by adding ".git" to its work tree (or nothing for bare repos). That fails when core.worktree, or GIT_DIR and GIT_WORK_TREE are used to separate the two. This was attacked at the type level, by storing the gitdir and worktree separately, so Nothing for the worktree means a bare repo. A complication arose because we don't learn where a repository is bare until its configuration is read. So another Location type handles repositories that have not had their config read yet. I am not entirely happy with this being a Location type, rather than representing them entirely separate from the Git type. The new code is not worse than the old, but better types could enforce more safety. Added support for core.worktree. Overriding it with -c isn't supported because it's not really clear what to do if a git repo's config is read, is not bare, and is then overridden to bare. What is the right git directory in this case? I will worry about this if/when someone has a use case for overriding core.worktree with -c. (See Git.Config.updateLocation) Also removed and renamed some functions like gitDir and workTree that misused git's terminology. One minor regression is known: git annex add in a bare repository does not print a nice error message, but runs git ls-files in a way that fails earlier with a less nice error message. This is because before --work-tree was always passed to git commands, even in a bare repo, while now it's not.	2012-05-18 17:03:12 -04:00
Joey Hess	84ac8c58db	Add annex.httpheaders and annex.httpheader-command config settings Allow custom headers to be sent with all HTTP requests. (Requested by the Internet Archive)	2012-04-22 01:13:09 -04:00
Joey Hess	ed79596b75	noop	2012-04-21 23:32:33 -04:00
Joey Hess	b4a5e39ee6	Support git's core.sharedRepository configuration This is incomplete, it does not honor it yet for hash directories and other annex bookkeeping files. Some of that is not needed for a bare repo; some of it may be.	2012-04-21 15:36:52 -04:00
Joey Hess	70538dac84	compute distance in correct direction	2012-04-14 16:01:08 -04:00
Joey Hess	52a158a7c6	autocorrection git-annex (but not git-annex-shell) supports the git help.autocorrect configuration setting, doing fuzzy matching using the restricted Damerau-Levenshtein edit distance, just as git does. This adds a build dependency on the haskell edit-distance library.	2012-04-12 15:37:21 -04:00
Joey Hess	c924542e61	bup: Properly handle key names with spaces or other things that are not legal git refs. Continue using the key name as bup ref name, to preserve backwards compatability, unless it is an illegal git ref. In that case, use a sha256 of the key name instead.	2012-04-11 12:45:49 -04:00
Joey Hess	378f61d0ef	nicer style; also empty refs are implicitly not allowed	2012-04-11 12:29:31 -04:00
Joey Hess	0be6ebb0aa	added a git ref legality checker git-check-ref-format is .. wow. Good design on one level, but what a mess.	2012-04-11 12:21:54 -04:00
Joey Hess	184a69171d	removed another 10 lines via ifM	2012-03-16 01:59:07 -04:00
Joey Hess	00d814aecc	fix filename encoding for git cat-file The filename sent to git cat-file needs to be sent on a File encoded handle. Also set the read handle to use the File encoding, so that any error message mentioning the filename is received properly. The actual file content is read using Data.ByteString.Char8, which will ignore the read handle's encoding, so this won't change that. (Whether that is entirely correct remains to be seen.)	2012-02-26 14:11:50 -04:00
Joey Hess	cac130b205	cleanup	2012-02-21 00:16:24 -04:00
Joey Hess	6c0155efb7	refactor	2012-02-20 15:22:21 -04:00
Joey Hess	f0f07db01d	reorder prams and put -- after atrributes, for compatability with old git (cherry picked from commit `c8ec0e233e`)	2012-02-15 14:01:06 -04:00
Joey Hess	52c5b164d8	Added a annex.queuesize setting useful when adding hundreds of thousands of files on a system with plenty of memory. git add gets quite slow in such a large repository, so if the system has more than the ~32 mb of memory the queue can use by default, it's a useful optimisation to increase the queue size, in order to decrease the number of times git add is run.	2012-02-15 11:14:19 -04:00
Joey Hess	7ebd98d8d8	fix memory leak when staging the journal The list of files had to be retained until the end so it could be deleted. Also, a list of update-index lines was generated and only then fed into it. Now everything streams in constant space.	2012-02-14 14:37:59 -04:00
Joey Hess	a40ec5e03e	Fixed a memory leak due to excessive strictness when committing journal files. When hashing the files, the entire list of shas was read strictly. That was entirely unnecessary, since there's a cleanup action run after they're consumed.	2012-02-14 11:20:34 -04:00
Joey Hess	8f76d66f32	set fileEncoding on CheckAttr handles Seemed to work without it, but this is correct.	2012-02-14 04:31:39 -04:00
Joey Hess	a2f241d503	fix LsFiles.typeChanged paths Passing absolute paths to Command.Add used to work, but after recent changes doesn't. All LsFiles should use relative paths anyway, so fix it there.	2012-02-14 00:22:42 -04:00
Joey Hess	cbaebf538a	rework git check-attr interface Now gitattributes are looked up, efficiently, in only the places that really need them, using the same approach used for cat-file. The old CheckAttr code seemed very fragile, in the way it streamed files through git check-attr. I actually found that `cad8824852` was still deadlocking with ghc 7.4, at the end of adding a lot of files. This should fix that problem, and avoid future ones. The best part is that this removes withAttrFilesInGit and withNumCopies, which were complicated Seek methods, as well as simplfying the types for several other Seek methods that had a Backend tupled in.	2012-02-13 23:52:21 -04:00
Joey Hess	d35a8d85b5	another place hGetBoth was used without a writer thread	2012-02-13 20:23:45 -04:00
Joey Hess	cad8824852	thinko I removed the now unnecessary forkProcess, but forgot to change back to pipeBoth, so there was no writer thread.	2012-02-13 20:01:37 -04:00
Joey Hess	3ac2677e00	comment typo	2012-02-13 16:58:26 -04:00
Joey Hess	e4d0923544	wording	2012-02-09 17:35:36 -04:00
Joey Hess	dc682e53a2	use fileEncoding for git-update-index input handle	2012-02-04 13:03:33 -04:00
Joey Hess	586be39952	fix file encoding of HashObject	2012-02-04 13:01:00 -04:00
Joey Hess	d8fb97806c	support all filename encodings with ghc 7.4 Under ghc 7.4, this seems to be able to handle all filename encodings again. Including filename encodings that do not match the LANG setting. I think this will not work with earlier versions of ghc, it uses some ghc internals. Turns out that ghc 7.4 has a special filesystem encoding that it uses when reading/writing filenames (as FilePaths). This encoding is documented to allow "arbitrary undecodable bytes to be round-tripped through it". So, to get FilePaths from eg, git ls-files, set the Handle that is reading from git to use this encoding. Then things basically just work. However, I have not found a way to make Text read using this encoding. Text really does assume unicode. So I had to switch back to using String when reading/writing data to git. Which is a pity, because it's some percent slower, but at least it works. Note that stdout and stderr also have to be set to this encoding, or printing out filenames that contain undecodable bytes causes a crash. IMHO this is a misfeature in ghc, that the user can pass you a filename, which you can readFile, etc, but that default, putStr of filename may cause a crash! Git.CheckAttr gave me special trouble, because the filenames I got back from git, after feeding them in, had further encoding breakage. Rather than try to deal with that, I just zip up the input filenames with the attributes. Which must be returned in the same order queried for this to work. Also of note is an apparent GHC bug I worked around in Git.CheckAttr. It used to forkProcess and feed git from the child process. Unfortunatly, after this forkProcess, accessing the `files` variable from the parent returns []. Not the value that was passed into the function. This screams of a bad bug, that's clobbering a variable, but for now I just avoid forkProcess there to work around it. That forkProcess was itself only added because of a ghc bug, #624389. I've confirmed that the test case for that bug doesn't reproduce it with ghc 7.4. So that's ok, except for the new ghc bug I have not isolated and reported. Why does this simple bit of code magnet the ghc bugs? :) Also, the symlink touching code is currently broken, when used on utf-8 filenames in a non-utf-8 locale, or probably on any filename containing undecodable bytes, and I temporarily commented it out.	2012-02-03 16:23:20 -04:00
Joey Hess	3d49258e5b	attempt at a quick, utf-8 only fix to the ghc 7.4 problem If you have only utf-8 filenames, and need to build git-annex with ghc 7.4, this will work. But, it will crash on non-utf-8 filenames.	2012-02-01 16:16:08 -04:00
Joey Hess	a964012fc3	switch to the strict state monad I had not realized what a memory leak the lazy state monad could be, although I have not seen much evidence of actual leaking in git-annex. However, if running git-annex on a great many files, this could matter. The additional Utility.State.changeState adds even more strictness, avoiding a problem I saw in github-backup where repeatedly modifying state built up a huge pile of thunks.	2012-01-29 22:55:06 -04:00
Joey Hess	97209ac08d	fix error message	2012-01-25 20:43:01 -04:00
Joey Hess	3ca7cf5db1	export fromPath Not used in git-annex, but I am using it in git-backup	2012-01-25 20:42:05 -04:00
Joey Hess	ce5637498f	remove Utility.Conditional and use IfElse This drops the >>! and >>? with the nice low fixity. IfElse does have undocumented >>=>>! and >>=>>? operators, but I deem that too fishy. Anyway, using whenM and unlessM is easier; I sometimes mixed the operators up.	2012-01-24 16:22:07 -04:00
Joey Hess	ba6088b249	rename readMaybe to readish a stricter (but also partial) readMaybe is getting added to base	2012-01-23 17:00:10 -04:00
Joey Hess	8c87293b48	avoid unnecessary stats when traversing to parent	2012-01-14 11:48:10 -04:00
Joey Hess	92a4af8b20	avoid unnecessary chdir	2012-01-14 11:42:51 -04:00
Joey Hess	1f66af2b53	optimize away 3 stats	2012-01-14 11:28:49 -04:00
Joey Hess	ff5703ce77	tweak	2012-01-13 21:06:00 -04:00
Joey Hess	66aac77467	support relative GIT_DIR	2012-01-13 14:40:36 -04:00
Joey Hess	1ae780ee79	git-annex, git-union-merge: Support GIT_DIR and GIT_WORK_TREE. Note that GIT_WORK_TREE cannot influence GIT_DIR; that is necessary for git-fake-bare and vcsh type things to work.	2012-01-13 12:52:09 -04:00
Joey Hess	0d5c402210	Add annex-trustlevel configuration settings, which can be used to override the trust level of a remote. This overrides the trust.log, and is overridden by the command-line trust parameters. It would have been nicer to have Logs.Trust.trustMap just look up the configuration for all remotes, but a dependency loop prevented that (Remotes depends on Logs.Trust in several ways). So instead, look up the configuration when building remotes, storing it in the same forcetrust field used for the command-line trust parameters.	2012-01-09 23:31:44 -04:00
Joey Hess	9fb5f3edc7	log --after=date	2012-01-06 17:24:03 -04:00
Joey Hess	0b27e6baa0	Support unescaped repository urls, like git does. Turns out that git will accept a .git/config containing an url with eg, spaces in its name. Handle this by escaping the url if it's not valid. This also fixes support for urls containing escaped characters like %20 for space. Before, the path from the url was not unescaped properly.	2012-01-05 14:32:20 -04:00
Joey Hess	f0957426c5	skip local remotes that are not available (ie, not mounted) With --fast, unavailable local remotes are filtered out of the fast set. This way, if there are local remotes, --fast always acts only on them, and if none are mounted, acts on nothing. This consistency is better than --fast acting on different remotes depending on what's mounted.	2011-12-31 04:50:39 -04:00
Joey Hess	a2ec2d3760	refactor and check for a detached HEAD	2011-12-31 03:38:58 -04:00
Joey Hess	52104dae6f	refactor	2011-12-30 18:36:40 -04:00
Joey Hess	26040d6419	add base, under The describe function was only intended to generate a human-visible description of a branch, but taking the base of a branch is a useful operation to be able to do no matter the human-visible representation. Converting a branch like refs/heads/master to refs/heads/origin/master is also a useful operation, and under can do that.	2011-12-30 16:48:26 -04:00
Joey Hess	5287d1dc3f	fixed behavior when multiple insteadOf configs are provided for the same url base Consider this git config --list case: url.git+ssh://git@example.com/.insteadOf=gl url.git+ssh://git@example.com/.insteadOf=shared Since config is stored in a Map, only the last of the values for this key was stored and available for use by the insteadOf code. But that is wrong; git allows either "gl" or "shared" to be used in an url and the insteadOf value to be substituted in. To support this, it seems best to keep the existing config map as-is, and add a second map that accumulates a list of multiple values for config keys. This new fullconfig map can be used in the rare places where multiple values for a key make sense, without needing to complicate everything else. Haskell's laziness and data sharing keep the overhead of adding this second map low.	2011-12-30 14:07:46 -04:00
Joey Hess	cba3ce08df	handle C-style escapes in Format I was happily able to repurpose some code from Git.Filename to handle this. I remember writing that code... a whole afternoon at a coffee shop, after which I felt I'd struggled with Haskell and git, and sorta lost, in needing to write this nasty peice of code. But was also pleased at the use of a pair of functions and quickcheck that allowed me to get it 100% right. So, turns out I not only got it right, but the code wasn't as special-purpose as I'd feared. Yay!	2011-12-23 01:05:16 -04:00
Joey Hess	5a275a3f5d	Can now be built with older git versions (before 1.7.7); the resulting binary should only be used with old git. Remove git old version check from configure, and use the git version it was built against in the git check-attr code.	2011-12-22 15:01:13 -04:00
Joey Hess	6bffe509d7	Add --include, which is the same as --not --exclude.	2011-12-22 14:00:17 -04:00
Joey Hess	ee3b5b2a42	use Common in a few more modules	2011-12-20 14:37:53 -04:00
Joey Hess	95d2391f58	more partial function removal Left a few Prelude.head's in where it was checked not null and too hard to remove, etc.	2011-12-15 18:19:36 -04:00
Joey Hess	fbc3d32f7d	avoid partial function, and parse git-ref output better It's possible that a ref name might contain a space, this properly preserves the space.	2011-12-15 16:58:04 -04:00
Joey Hess	eb132a854e	avoid partial head function (although it was used safely)	2011-12-15 16:04:08 -04:00
Joey Hess	111b6937ec	avoid partial functions, and added check for correct sha content	2011-12-15 15:57:47 -04:00
Joey Hess	a8643ca44c	refactor	2011-12-15 13:05:47 -04:00
Joey Hess	09cd042775	Properly handle multiline git config values. A crash on parsing was fixed a while ago. This adds support for fully correctly parsing multiline git config values, using git config --null. Since git-annex-shell configlist uses normal git config output, I left in support for that too; the two forms of config output can be easily identified by the parser. Since configlist only prints the annex.uuid config, there's no risk of multiline values there, so no need to change it.	2011-12-15 12:48:27 -04:00
Joey Hess	ef28b3fef7	split out Git/Command.hs	2011-12-14 15:56:11 -04:00
Joey Hess	02f1bd2bf4	split more stuff out of Git.hs	2011-12-14 15:43:13 -04:00
Joey Hess	9db8ec210f	split out two more Git modules	2011-12-13 15:24:23 -04:00
Joey Hess	25b2cc4148	move commit to Git.Branch	2011-12-13 15:08:44 -04:00
Joey Hess	13fff71f20	split out three modules from Git Constructors and configuration make sense in separate modules. A separate Git.Types is needed to avoid cycles.	2011-12-13 15:06:49 -04:00
Joey Hess	46588674b0	avoid closing pipe before all the shas are read from it Could have just used hGetContentsStrict here, but that would require storing all the shas in memory. Since this is called at the end of a git-annex run, it may have created a lot of shas, so I avoid that memory use and stream them out like before.	2011-12-12 21:41:37 -04:00
Joey Hess	0e45b762a0	broke out Git/HashObject.hs	2011-12-12 21:24:55 -04:00
Joey Hess	31a0c07ee9	broke out Git/Branch.hs and reorganized	2011-12-12 21:12:51 -04:00
Joey Hess	543d0d2501	split out Git/Ref.hs	2011-12-12 18:30:33 -04:00
Joey Hess	acd7a52dfd	always find optimal merge Testing `b9ac585454`, it didn't find the optimal union merge, the second sha was the one to use, at least in the case I tried. Let's just try all shas to see if any can be reused. I stopped using the expensive nub, so despite the use of sets to sort/uniq file contents, this is probably as fast or faster than it was before.	2011-12-12 01:59:29 -04:00
Joey Hess	0cbab5de65	refactor	2011-12-12 00:48:25 -04:00
Joey Hess	b9ac585454	more efficient union merges Tries to avoid generating a new object when the merged content has the same lines that were in the old object. I've noticed some merge commits that only move lines around, like this: - 1323478057.181191s 1 be23c3ac-0ee5-11e0-b185-3b0f9b5b00c5 1323204972.062151s 1 87e06c7a-7388-11e0-ba07-03cdf300bd87 ++1323478057.181191s 1 be23c3ac-0ee5-11e0-b185-3b0f9b5b00c5 Unsure if this will really save anything in practice, since it only looks at one of the two old objects, and maybe I didn't pick the best one.	2011-12-11 23:02:25 -04:00
Joey Hess	d64132a43a	hslint	2011-12-09 01:57:13 -04:00
Joey Hess	9290095fc2	improve type signatures with a Ref newtype In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for those. Note that this does not prevent mixing up of eg, refs and branches at the type level. Since git really doesn't care, except rare cases like git update-ref, or git tag -d, that seems ok for now. There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref may or may not be a tree-ish, depending on the object type, so there seems no point in trying to represent it at the type level.	2011-11-16 02:41:46 -04:00
Joey Hess	272a67921c	better name	2011-11-16 01:46:46 -04:00
Joey Hess	e83b966eb5	cleanup	2011-11-15 23:51:24 -04:00
Joey Hess	21a925dcf1	merge: Now runs in constant space. Before, a merge was first calculated, by running various actions that called git and built up a list of lines, which were at the end sent to git update-index. This necessarily used space proportional to the size of the diff between the trees being merged. Now, lines are streamed into git update-index from each of the actions in turn. Runtime size of git-annex merge when merging 50000 location log files drops from around 100 mb to a constant 4 mb. Presumably it runs quite a lot faster, too.	2011-11-15 23:28:01 -04:00
Joey Hess	922e9af528	cleanup	2011-11-15 22:40:40 -04:00
Joey Hess	b76dc2d210	avoid space leak writing merge This reduces the memory use of a merge by 1/3rd. The space leak was apparently because the whole update-index input was generated strictly, not lazily. I wondered if the change to ByteStrings contributed to this, due to the need to convert with L.pack here. But going back to the old code, I still see a much similar leak, and worse performance besides due to it not using ByteStrings. The fix is to just hPutStr the lines repeatedly. (Note the \0 is written separately, to avoid allocation overheads in adding it to the string.) The Git.pipeWrite interface is probably just wrong for any large inputs to git. This was the only place using it for input of any size. There is still at least one other space leak in the merge code.	2011-11-15 22:19:12 -04:00
Joey Hess	04edae6791	Optimised union merging; now only runs git cat-file once.	2011-11-12 17:45:12 -04:00
Joey Hess	637b5feb45	lint	2011-11-11 01:52:58 -04:00
Joey Hess	bf460a0a98	reorder repo parameters last Many functions took the repo as their first parameter. Changing it consistently to be the last parameter allows doing some useful things with currying, that reduce boilerplate. In particular, g <- gitRepo is almost never needed now, instead use inRepo to run an IO action in the repo, and fromRepo to get a value from the repo. This also provides more opportunities to use monadic and applicative combinators.	2011-11-08 16:27:20 -04:00
Joey Hess	3acdba3995	faster union merge of multiple branches into index only write index once	2011-10-07 13:36:48 -04:00
Joey Hess	7ff89ccfee	convert all git read/write functions to use ByteStrings This yields a second or so speedup in unused, find, etc. Seems that even when the ByteString is immediately split and then converted to Strings, it's faster. I may try to push ByteStrings out into more of git-annex gradually, although I suspect most of the time-critical parts are already covered now, and many of the rest rely on libraries that only support Strings.	2011-09-29 23:48:57 -04:00
Joey Hess	949ef94d5e	layout	2011-09-29 22:31:20 -04:00
Joey Hess	67f2b7cb3e	use ByteStrings when reading content of files didn't bother to benchmark this	2011-09-29 19:19:28 -04:00
Joey Hess	a91c8a15d5	Sped up unused. Added Git.ByteString which replaces Git IO methods with ones using lazy ByteStrings. This can be more efficient when large quantities of data are being read from git. In Git.LsTree, parse git ls-tree output more efficiently, thanks to ByteString. This benchmarks 25% faster, in a benchmark that includes (probably predominately) the run time for git ls-tree itself. In real world numbers, this makes git annex unused 2 seconds faster for each branch it needs to check, in my usual large repo.	2011-09-29 19:04:24 -04:00
Joey Hess	297bc648b9	make unused check branches and tags too needs time and space optimisation	2011-09-28 16:43:10 -04:00
Joey Hess	ad245a6375	refactor catfile code split into generic IO code, and a thin Annex wrapper	2011-09-28 15:17:36 -04:00
Joey Hess	a3cb5c47e5	use FileMode	2011-09-28 14:14:52 -04:00
Joey Hess	93807564d0	add ls-tree interface This parser should be fast. I hope.	2011-09-28 14:03:59 -04:00
Joey Hess	7724f895a8	tweak	2011-09-25 14:37:13 -04:00
Joey Hess	203148363f	split groups of related functions out of Utility	2011-08-22 16:14:12 -04:00
Joey Hess	e784757376	hlint tweaks Did all sources except Remotes/* and Command/*	2011-07-15 03:12:05 -04:00
Joey Hess	ded2591124	unannex: Clean up use of git commit -a. This was more complex than would be expected. unannex has to use git commit -a since it's removing files from git; git commit filelist won't do. Allow commands to be added to the Git queue that have no associated files, and run such commands once.	2011-07-14 17:15:37 -04:00
Joey Hess	896726cde4	rename GitUnionMerge to Git.UnionMerge Also, moved commit function into Git proper, it's not union merge specific.	2011-06-30 13:32:47 -04:00
Joey Hess	f0497312a7	rename GitQueue to Git.Queue	2011-06-30 13:25:37 -04:00
Joey Hess	f6063a094e	renamed GitRepo to Git It was always imported qualified as Git anyway	2011-06-30 13:21:39 -04:00

... 2 3 4 5 6

298 commits