git-annex

Author	SHA1	Message	Date
Joey Hess	b49c0c2633	add annex.alwayscommit option To avoid commits of data to the git-annex branch after each command is run, set annex.alwayscommit=false. Its data will then be committed less frequently, when a merge or sync is done.	2012-02-25 15:31:42 -04:00
Joey Hess	bd66f962d3	Deal with NFS problem that caused a failure to remove a directory when removing content from the annex. I was able to reproduce this on linux using the kernel's nfs server and mounting localhost:/. Determined that removing the directory fails when the just-deleted file in it was locked. Considered dropping the lock before removing the directory, but this would complicate parts of the code that should not need to worry about locking. So instead, ignore the failure to remove the directory in this case. While I was at it, made it attempt to remove both levels of hash directories, in case they're empty.	2012-02-24 16:30:47 -04:00
Joey Hess	a1e52f0ce5	hlint	2012-02-16 00:44:51 -04:00
Joey Hess	52c5b164d8	Added a annex.queuesize setting useful when adding hundreds of thousands of files on a system with plenty of memory. git add gets quite slow in such a large repository, so if the system has more than the ~32 mb of memory the queue can use by default, it's a useful optimisation to increase the queue size, in order to decrease the number of times git add is run.	2012-02-15 11:14:19 -04:00
Joey Hess	03c559f8d6	tweak	2012-02-14 14:51:26 -04:00
Joey Hess	7ebd98d8d8	fix memory leak when staging the journal The list of files had to be retained until the end so it could be deleted. Also, a list of update-index lines was generated and only then fed into it. Now everything streams in constant space.	2012-02-14 14:37:59 -04:00
Joey Hess	a40ec5e03e	Fixed a memory leak due to excessive strictness when committing journal files. When hashing the files, the entire list of shas was read strictly. That was entirely unnecessary, since there's a cleanup action run after they're consumed.	2012-02-14 11:20:34 -04:00
Joey Hess	cbaebf538a	rework git check-attr interface Now gitattributes are looked up, efficiently, in only the places that really need them, using the same approach used for cat-file. The old CheckAttr code seemed very fragile, in the way it streamed files through git check-attr. I actually found that `cad8824852` was still deadlocking with ghc 7.4, at the end of adding a lot of files. This should fix that problem, and avoid future ones. The best part is that this removes withAttrFilesInGit and withNumCopies, which were complicated Seek methods, as well as simplfying the types for several other Seek methods that had a Backend tupled in.	2012-02-13 23:52:21 -04:00
Joey Hess	d55f3c0716	Fix teardown of stale cached ssh connections.	2012-02-09 21:49:46 -04:00
Joey Hess	146c36ca54	IO exception rework ghc 7.4 comaplains about use of System.IO.Error to catch exceptions. Ok, use Control.Exception, with variants specialized to only catch IO exceptions.	2012-02-03 16:47:24 -04:00
Joey Hess	b81d662cbf	Avoid repeated location log commits when a remote is receiving files. Done by adding a oneshot mode, in which location log changes are written to the journal, but not committed. Taking advantage of git-annex's existing ability to recover in this situation. This is used by git-annex-shell and other places where changes are made to a remote's location log.	2012-01-28 15:41:52 -04:00
Joey Hess	ba6088b249	rename readMaybe to readish a stricter (but also partial) readMaybe is getting added to base	2012-01-23 17:00:10 -04:00
Joey Hess	eb9001044f	order user provided params after connection caching params So the user can override them.	2012-01-20 17:32:32 -04:00
Joey Hess	6ef82665de	add annex.sshcaching config setting	2012-01-20 17:15:46 -04:00
Joey Hess	47250a153a	ssh connection caching Ssh connection caching is now enabled automatically by git-annex. Only one ssh connection is made to each host per git-annex run, which can speed some things up a lot, as well as avoiding repeated password prompts. Concurrent git-annex processes also share ssh connections. Cached ssh connections are shut down when git-annex exits. Note: The rsync special remote does not yet participate in the ssh connection caching.	2012-01-20 17:14:56 -04:00
Joey Hess	61dbad505d	fsck --from remote --fast Avoids expensive file transfers, at the expense of checking file size and/or contents. Required some reworking of the remote code.	2012-01-20 13:23:11 -04:00
Joey Hess	effaa298fa	optimise fsck --from normal git remotes For a local git remote, can symlink the file. For a git remote using rsync, can preseed any local content. There are a few reasons to use fsck --from on a normal git remote. One is if it's using gitosis or similar, and you don't have shell access to run git annex locally. Another reason could be if you just want to fsck certian files of a bare remote.	2012-01-19 17:10:44 -04:00
Joey Hess	81856c3175	add a configure check for StatFS This way, the build log will indicate whether StatFS can be relied on. I've tested all the failing architectures now, and on all of them, the StatFS code now returns Nothing, rather than Just nonsense. Also, if annex.diskreserve is set on a platform where StatFS is not working, git-annex will complain. Also, the Makefile was missing the sources target used when building with cabal.	2012-01-15 13:49:32 -04:00
Joey Hess	a3d97e0c85	tweak	2012-01-14 14:31:16 -04:00
Joey Hess	5e2b4e16ba	avoid multiple unnecessary stats of the index file Up to one per file processed.	2012-01-14 12:07:36 -04:00
Joey Hess	abdacf58ed	tweaks	2012-01-11 00:06:54 -04:00
Joey Hess	16e7178f20	reorg	2012-01-10 15:29:10 -04:00
Joey Hess	a3a9f87047	log: New command that displays the location log for file, showing each repository they were added to and removed from. This needs to run git log on the location log files to get at all past versions of the file, which tends to be a bit slow. It would be possible to make a version optimised for showing the location logs for every key. That would only need to run git log once, so would be faster, but it would need to process an enormous amount of data, so would not speed up the individual file case. In the future it would be nice to support log --format. log --json also doesn't work right yet.	2012-01-06 15:40:07 -04:00
Joey Hess	aa0882691b	Added remote.name.annex-web-options configuration setting, which can be used to provide parameters to whichever of wget or curl git-annex uses (depends on which is available, but most of their important options suitable for use here are the same).	2012-01-02 14:20:20 -04:00
Joey Hess	252376d639	Merge branch 'master' into autosync	2011-12-30 20:38:59 -04:00
Joey Hess	52104dae6f	refactor	2011-12-30 18:36:40 -04:00
Joey Hess	925b6390aa	add forceUpdate This code is picked from my tweak-fetch branch, which already did the needed refactoring.	2011-12-30 15:57:28 -04:00
Joey Hess	6d4382a89e	Merge branch 'new-monad-control'	2011-12-24 23:02:42 -04:00
Joey Hess	ee3b5b2a42	use Common in a few more modules	2011-12-20 14:37:53 -04:00
Joey Hess	ef28b3fef7	split out Git/Command.hs	2011-12-14 15:56:11 -04:00
Joey Hess	02f1bd2bf4	split more stuff out of Git.hs	2011-12-14 15:43:13 -04:00
Joey Hess	25b2cc4148	move commit to Git.Branch	2011-12-13 15:08:44 -04:00
Joey Hess	13fff71f20	split out three modules from Git Constructors and configuration make sense in separate modules. A separate Git.Types is needed to avoid cycles.	2011-12-13 15:06:49 -04:00
Joey Hess	46588674b0	avoid closing pipe before all the shas are read from it Could have just used hGetContentsStrict here, but that would require storing all the shas in memory. Since this is called at the end of a git-annex run, it may have created a lot of shas, so I avoid that memory use and stream them out like before.	2011-12-12 21:41:37 -04:00
Joey Hess	0e45b762a0	broke out Git/HashObject.hs	2011-12-12 21:24:55 -04:00
Joey Hess	31a0c07ee9	broke out Git/Branch.hs and reorganized	2011-12-12 21:12:51 -04:00
Joey Hess	543d0d2501	split out Git/Ref.hs	2011-12-12 18:30:33 -04:00
Joey Hess	da95cbadca	split out Annex/Journal.hs	2011-12-12 18:03:28 -04:00
Joey Hess	98dfc0c9b0	split out Annex/BranchState.hs	2011-12-12 17:38:46 -04:00
Joey Hess	b2f934e07a	update comment	2011-12-12 17:24:12 -04:00
Joey Hess	79345ad5fc	optimisation avoids a redundant call to git show-ref	2011-12-12 03:30:47 -04:00
Joey Hess	f9cd3f6ad1	optimisation avoids a useless diff from git-annex..refs/heads/git-annex	2011-12-12 02:31:07 -04:00
Joey Hess	2332afb4bc	cleanup	2011-12-12 02:04:48 -04:00
Joey Hess	29b88ad657	avoid redundant call to updateIndex commitBranch calls updateIndex	2011-12-11 21:46:21 -04:00
Joey Hess	c4c965d602	detect and recover from branch push/commit race Dealing with a race without using locking is exceedingly difficult and tricky. Fully tested, I hope. There are three places left where the branch can be updated, that are not covered by the race recovery code. Let's prove they're all immune to the race: 1. tryFastForwardTo checks to see if a fast-forward can be done, and then does git-update-ref on the branch to fast-forward it. If a push comes in before the check, then either no fast-forward will be done (ok), or the push set the branch to a ref that can still be fast-forwarded (also ok) If a push comes in after the check, the git-update-ref will undo the ref change made by the push. It's as if the push did not come in, and the next git-push will see this, and try to re-do it. (acceptable) 2. When creating the branch for the very first time, an empty index is created, and a commit of it made to the branch. The commit's ref is recorded as the current state of the index. If a push came in during that, it will be noticed the next time a commit is made to the branch, since the branch will have changed. (ok) 3. Creating the branch from an existing remote branch involves making the branch, and then getting its ref, and recording that the index reflects that ref. If a push creates the branch first, git-branch will fail (ok). If the branch is created and a racing push is then able to change it (highly unlikely!) we're still ok, because it first records the ref into the index.lck, and then updating the index. The race can cause the index.lck to have the old branch ref, while the index has the newly pushed branch merged into it, but that only results in an unnecessary update of the index file later on.	2011-12-11 20:41:35 -04:00
Joey Hess	e04852c8af	Merge branch 'master' into new-monad-control Conflicts: git-annex.cabal	2011-12-11 16:55:36 -04:00
Joey Hess	cfbbda99f4	optimize index updating The last branch ref that the index was updated to is stored in .git/annex/index.lck, and the index only updated when the current branch ref differs. (The .lck file should later be used for locking too.) Some more optimization is still needed, since there is some redundancy in calls to git show-ref.	2011-12-11 16:14:59 -04:00
Joey Hess	8680c415de	slow, stupid, and safe index updating Always merge the git-annex branch into .git/annex/index before making a commit from the index. This ensures that, when the branch has been changed in any way (by a push being received, or changes pulled directly into it, or even by the user checking it out, and committing a change), the index reflects those changes. This is much too slow; it needs to be optimised to only update the index when the branch has really changed, not every time. Also, there is an unhandled race, when a change is made to the branch right after the index gets updated. I left it in for now because it's unlikely and I didn't want to complicate things with additional locking yet.	2011-12-11 15:05:53 -04:00
Joey Hess	0ba4b1de18	move a file location to Locations.hs	2011-12-11 14:14:28 -04:00
Joey Hess	eecaf42485	no need to show, it's a string	2011-12-10 12:30:31 -04:00
Joey Hess	d64132a43a	hslint	2011-12-09 01:57:13 -04:00
Joey Hess	f3a2f60abc	adjust to build with monad-control-0.3 I had to, I hope temporarily, lose my nice Annex newtype, and use a type synonym. This because I cannot find a way to derive a MonadBaseControl instance of the Annex newtype. I've emailed Bas van Dijk in hope he can help get the newtype back. Otherwise appears to build & work.	2011-12-05 22:51:37 -04:00
Joey Hess	598eb2e2da	cleanup	2011-11-30 12:01:15 -04:00
Joey Hess	da9cd315be	add support for using hashDirLower in addition to hashDirMixed Supporting multiple directory hash types will allow converting to a different one, without a flag day. gitAnnexLocation now checks which of the possible locations have a file. This means more statting of files. Several places currently use gitAnnexLocation and immediately check if the returned file exists; those need to be optimised.	2011-11-28 22:43:51 -04:00
Joey Hess	6869e6023e	support .git/annex on a different disk than the rest of the repo The only fully supported thing is to have the main repository on one disk, and .git/annex on another. Only commands that move data in/out of the annex will need to copy it across devices. There is only partial support for putting arbitrary subdirectories of .git/annex on different devices. For one thing, but this can require more copies to be done. For example, when .git/annex/tmp is on one device, and .git/annex/journal on another, every journal write involves a call to mv(1). Also, there are a few places that make hard links between various subdirectories of .git/annex with createLink, that are not handled. In the common case without cross-device, the new moveFile is actually faster than renameFile, avoiding an unncessary stat to check that a file (not a directory) is being moved. Of course if a cross-device move is needed, it is as slow as mv(1) of the data.	2011-11-28 16:17:55 -04:00
Joey Hess	128b4bd015	tweaks	2011-11-19 15:57:08 -04:00
Joey Hess	0fa1d136dc	tweak	2011-11-19 15:40:40 -04:00
Joey Hess	1ffd54ef78	ensure branch exists before trying to update it The branch may not exist, if .git/annex has been copied over from another repo (or a corrupted repo). I suppose it could also have gotten deleted somehow. Without this, there is a confusing failure.	2011-11-16 18:56:06 -04:00
Joey Hess	9290095fc2	improve type signatures with a Ref newtype In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for those. Note that this does not prevent mixing up of eg, refs and branches at the type level. Since git really doesn't care, except rare cases like git update-ref, or git tag -d, that seems ok for now. There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref may or may not be a tree-ish, depending on the object type, so there seems no point in trying to represent it at the type level.	2011-11-16 02:41:46 -04:00
Joey Hess	272a67921c	better name	2011-11-16 01:46:46 -04:00
Joey Hess	21a925dcf1	merge: Now runs in constant space. Before, a merge was first calculated, by running various actions that called git and built up a list of lines, which were at the end sent to git update-index. This necessarily used space proportional to the size of the diff between the trees being merged. Now, lines are streamed into git update-index from each of the actions in turn. Runtime size of git-annex merge when merging 50000 location log files drops from around 100 mb to a constant 4 mb. Presumably it runs quite a lot faster, too.	2011-11-15 23:28:01 -04:00
Joey Hess	04edae6791	Optimised union merging; now only runs git cat-file once.	2011-11-12 17:45:12 -04:00
Joey Hess	e9bfa8eaed	avoid unnecessary auto-merge when only changing a file in the branch. Avoids doing auto-merging in commands that don't need fully current information from the git-annex branch. In particular, git annex add no longer needs to auto-merge. Affected commands: Anything that doesn't look up data from the branch, but does write a change to it. It might seem counterintuitive that we can change a value without first making sure we have the current value. This optimisation works because these two sequences are equivilant: 1. pull from remote 2. union merge 3. read file from branch 4. modify file and write to branch vs. 1. read file from branch 2. modify file and write to branch 3. pull from remote 4. union merge After either sequence, the git-annex branch contains the same logical content for the modified file. (Possibly with lines in a different order or additional old lines of course).	2011-11-12 15:15:57 -04:00
Joey Hess	897bf938f6	merge: Improve commit messages to mention what was merged.	2011-11-12 14:51:19 -04:00
Joey Hess	637b5feb45	lint	2011-11-11 01:52:58 -04:00
Joey Hess	49d2177d51	factored out some useful error catching methods	2011-11-10 20:57:28 -04:00
Joey Hess	9570421251	better message when content is locked	2011-11-10 02:59:13 -04:00
Joey Hess	a218ce41cf	exclusive locks, ugh	2011-11-09 22:15:33 -04:00
Joey Hess	cf0174c922	content locking I've tested that this solves the cyclic drop problem. Have not looked at cyclic move, etc.	2011-11-09 21:54:42 -04:00
Joey Hess	d3e1a3619f	safer inannex checking git-annex-shell inannex now returns always 0, 1, or 100 (the last when it's unclear if content is currently in the index due to it currently being moved or dropped). (Actual locking code still not yet written.)	2011-11-09 18:33:15 -04:00
Joey Hess	8ce7e73f74	reorg to allow taking content lock The lock will only persist during the perform stage, so the content must be removed from the annex then, rather than in the cleanup stage. (No lock is actually taken yet.)	2011-11-09 16:54:18 -04:00
Joey Hess	56b8194470	cleanup	2011-11-09 01:33:20 -04:00
Joey Hess	bf460a0a98	reorder repo parameters last Many functions took the repo as their first parameter. Changing it consistently to be the last parameter allows doing some useful things with currying, that reduce boilerplate. In particular, g <- gitRepo is almost never needed now, instead use inRepo to run an IO action in the repo, and fromRepo to get a value from the repo. This also provides more opportunities to use monadic and applicative combinators.	2011-11-08 16:27:20 -04:00
Joey Hess	b11a63a860	clean up read/show abuse Avoid ever using read to parse a non-haskell formatted input string. show :: Key is arguably still show abuse, but displaying Keys as filenames is just too useful to give up.	2011-11-08 00:17:54 -04:00
Joey Hess	63a292324d	add a UUID type Should have done this a long time ago.	2011-11-07 15:59:16 -04:00
Joey Hess	f229911715	optimization The last commit added some git-log calls to a merge. This removes some, by only merging branches that have unique refs.	2011-11-06 15:33:15 -04:00
Joey Hess	c99fb58909	merge: Use fast-forward merges when possible. Thanks Valentin Haenel for a test case showing how non-fast-forward merges could result in an ongoing pull/merge/push cycle. While the git-annex branch is fast-forwarded, git-annex's index file is still updated using the union merge strategy as before. There's no other way to update the index that would be any faster. It is possible that a union merge and a fast-forward result in different file contents: Files should have the same lines, but a union merge may change their order. If this happens, the next commit made to the git-annex branch will have some unnecessary changes to line orders, but the consistency of data should be preserved. Note that when the journal contains changes, a fast-forward is never attempted, which is fine, because committing those changes would be vanishingly unlikely to leave the git-annex branch at a commit that already exists in one of the remotes. The real difficulty is handling the case where multiple remotes have all changed. git-annex does find the best (ie, newest) one and fast forwards to it. If the remotes are diverged, no fast-forward is done at all. It would be possible to pick one, fast forward to it, and make a merge commit to the rest, I see no benefit to adding that complexity. Determining the best of N changed remotes requires N*2+1 calls to git-log, but these are fast git-log calls, and N is typically small. Also, typically some or all of the remote refs will be the same, and git-log is not called to compare those. In the real world I expect this will almost always add only 1 git-log call to the merge process. (Which already makes N anyway.)	2011-11-06 15:22:40 -04:00
Joey Hess	5f3dd3d246	ensure directory exists when locking journal Fixes git annex init in a bare repository that already has a git-annex branch.	2011-11-02 15:09:19 -04:00
Joey Hess	1826b3bd67	cleanup	2011-10-27 18:01:52 -04:00
Joey Hess	373cad993d	Sped up some operations on remotes that are on the same host. Specifically, disabled trying to update the git-annex branch on the remote, since that data is never used by operations that act on such remotes. Also, when copying content to such a remote, skip committing the presence information changes to its git-annex branch. Leaving it in the journal there is ok: Any command run on the remote that needs the info will flush the journal. This may partially solve this bug: http://git-annex.branchable.com/bugs/fails_to_handle_lot_of_files/ Although I still see unreaped git processes piling up when doing a copy --to.	2011-10-27 14:55:06 -04:00
Joey Hess	91366c896d	clean Annex stuff out of Utility/	2011-10-16 00:04:26 -04:00
Joey Hess	ee9af605bc	break out non-log stuff to separate module	2011-10-15 17:47:03 -04:00
Joey Hess	1a29b5b52e	reorganize log modules no code changes	2011-10-15 16:21:08 -04:00
Joey Hess	b505ba83e8	minor syntax changes	2011-10-11 14:43:45 -04:00
Joey Hess	025ded4a2d	tweaks	2011-10-10 17:37:44 -04:00
Joey Hess	f0153f9fd7	fix a race Another process may stage journalled files before the lock is taken, so need to get the list of journalled files afterwards. It's unfortunate this means getting the directory contents twice, but it seems better to do that than sometimes take the lock unnecessarily.	2011-10-09 16:19:09 -04:00
Joey Hess	dfee6e1ed6	better layout And a theoretical fix to branchstate cache invalidation, but not a bug that could actually happen.	2011-10-07 13:59:34 -04:00
Joey Hess	82e655efd0	performance fix It was checking if it needed to merge on every branch access, fix it to only check once.	2011-10-07 13:38:56 -04:00
Joey Hess	44fc358885	avoid merging multiple branches that point to the same tree avoids git warning "error: duplicate parent xxx ignored"	2011-10-07 13:37:01 -04:00
Joey Hess	3acdba3995	faster union merge of multiple branches into index only write index once	2011-10-07 13:36:48 -04:00
Joey Hess	6a6ea06cee	rename	2011-10-05 16:02:51 -04:00
Joey Hess	cfe21e85e7	rename	2011-10-04 00:59:08 -04:00
Joey Hess	ff21fd4a65	factor out Annex exception handling module	2011-10-04 00:34:04 -04:00

... 2 3 4 5 6

293 commits