git-annex

Author	SHA1	Message	Date
Joey Hess	e8188ea611	flip catchDefaultIO	2012-09-17 00:18:07 -04:00
Joey Hess	750c4ac6c2	bugfix: avoid staging but not committing changes to git-annex branch Branch.get is not able to see changes that have been staged to the index but not committed. This is a limitation of git cat-file --batch; when reading from the index, as opposed to from a branch, it does not notice changes made after the first time it reads the index. So, had to revert the changes made in `1f73db3469` to make annex.alwayscommit=false stage changes. Also, ensure that Branch.change and Branch.get always see changes at all points during a commit, by not deleting journal files when staging to the index. Delete them only after committing the branch. Before, there was a race during commits where a different git-annex could see out-of-date info from the branch while a commit was in progress. That's also done when updating the branch to merge in remote branches. In the case where the local git-annex branch has had changes pushed into it that are not yet reflected in the index, and there are journalled changes as well, a merge commit has to be done.	2012-09-15 20:15:16 -04:00
Joey Hess	a1f93f06fd	eliminate some commits to the git-annex branch Commits used to be made to the git-annex branch whenever there were journalled changes from a previous command, and the current command looked up the value of a file. This no longer happens. This means that transferkey, which is a oneshot command that stages changes, can be run multiple times by the assistant, without each of them committing the changes made by the command before. Which will be a lot faster and use less space by batching up the commits. Commits still happen if a remote git-annex branch has been changed and is merged in.	2012-09-15 18:36:42 -04:00
Joey Hess	87fb9c690e	remove withIndexUpdate helper	2012-09-15 15:48:21 -04:00
Joey Hess	c9b3b8829d	thread safe git-annex index file use	2012-08-24 20:50:39 -04:00
Joey Hess	5c3e14649e	avoid unnecessary transfer scans when syncing a disconnected remote Found a very cheap way to determine when a disconnected remote has diverged, and has new content that needs to be transferred: Piggyback on the git-annex branch update, which already checks for divergence. However, this does not check if new content has appeared locally while disconnected, that should be transferred to the remote. Also, this does not handle cases where the two git repos are in sync, but their content syncing has not caught up yet. This code could have its efficiency improved: * When multiple remotes are synced, if any one has diverged, they're all queued for transfer scans. * The transfer scanner could be told whether the remote has new content, the local repo has new content, or both, and could optimise its scan accordingly.	2012-08-22 15:05:57 -04:00
Joey Hess	942d8f7298	hlint	2012-06-12 11:32:06 -04:00
Joey Hess	d45a9a7831	refactor and function name cleanup (oops, I had a calcMerge and a calc_merge!)	2012-06-08 00:29:39 -04:00
Joey Hess	b819f644ad	close the git add race There's a race adding a new file to the annex: The file is moved to the annex and replaced with a symlink, and then we git add the symlink. If someone comes along in the meantime and replaces the symlink with something else, such as a new large file, we add that instead. Which could be bad.. This race is fixed by avoiding using git add, instead the symlink is directly staged into the index. It would be nice to make `git annex add` use this same technique. I have not done so yet because it currently runs git update-index once per file, which would slow does `git annex add`. A future enhancement would be to extend the Git.Queue to include the ability to run update-index with a list of Streamers.	2012-06-06 14:29:10 -04:00
Joey Hess	f1bd72ea54	factor out generic update-index code from unionmerge code	2012-06-06 00:10:34 -04:00
Joey Hess	76102c1c75	display "Recording state in git..." when staging the journal A bit tricky to avoid printing it twice in a row when there are queued git commands to run and journal to stage. Added a generic way to run an action that may output multiple side messages, with only the first displayed.	2012-04-27 13:54:33 -04:00
Joey Hess	bee420bd2d	in which I discover void void :: Functor f => f a -> f () -- ah, of course that's useful :)	2012-04-21 23:06:19 -04:00
Joey Hess	b98b69e8c6	honor core.sharedRepository when making all the other files in the annex Lock files, directories, etc.	2012-04-21 19:36:03 -04:00
Joey Hess	1f73db3469	improve alwayscommit=false mode Now changes are staged into the branch's index, but not committed, which avoids growing a large journal. And sync and merge always explicitly commit, ensuring that even when they do nothing else, they commit the staged changes. Added a flag file to indicate that the branch's journal contains uncommitted changes. (Could use git ls-files, but don't want to run that every time.) In the future, this ability to have uncommitted changes staged in the journal might be used on remotes after a series of oneshot commands.	2012-02-25 16:18:55 -04:00
Joey Hess	a1e52f0ce5	hlint	2012-02-16 00:44:51 -04:00
Joey Hess	03c559f8d6	tweak	2012-02-14 14:51:26 -04:00
Joey Hess	7ebd98d8d8	fix memory leak when staging the journal The list of files had to be retained until the end so it could be deleted. Also, a list of update-index lines was generated and only then fed into it. Now everything streams in constant space.	2012-02-14 14:37:59 -04:00
Joey Hess	a40ec5e03e	Fixed a memory leak due to excessive strictness when committing journal files. When hashing the files, the entire list of shas was read strictly. That was entirely unnecessary, since there's a cleanup action run after they're consumed.	2012-02-14 11:20:34 -04:00
Joey Hess	5e2b4e16ba	avoid multiple unnecessary stats of the index file Up to one per file processed.	2012-01-14 12:07:36 -04:00
Joey Hess	abdacf58ed	tweaks	2012-01-11 00:06:54 -04:00
Joey Hess	a3a9f87047	log: New command that displays the location log for file, showing each repository they were added to and removed from. This needs to run git log on the location log files to get at all past versions of the file, which tends to be a bit slow. It would be possible to make a version optimised for showing the location logs for every key. That would only need to run git log once, so would be faster, but it would need to process an enormous amount of data, so would not speed up the individual file case. In the future it would be nice to support log --format. log --json also doesn't work right yet.	2012-01-06 15:40:07 -04:00
Joey Hess	52104dae6f	refactor	2011-12-30 18:36:40 -04:00
Joey Hess	925b6390aa	add forceUpdate This code is picked from my tweak-fetch branch, which already did the needed refactoring.	2011-12-30 15:57:28 -04:00
Joey Hess	ef28b3fef7	split out Git/Command.hs	2011-12-14 15:56:11 -04:00
Joey Hess	02f1bd2bf4	split more stuff out of Git.hs	2011-12-14 15:43:13 -04:00
Joey Hess	25b2cc4148	move commit to Git.Branch	2011-12-13 15:08:44 -04:00
Joey Hess	46588674b0	avoid closing pipe before all the shas are read from it Could have just used hGetContentsStrict here, but that would require storing all the shas in memory. Since this is called at the end of a git-annex run, it may have created a lot of shas, so I avoid that memory use and stream them out like before.	2011-12-12 21:41:37 -04:00
Joey Hess	0e45b762a0	broke out Git/HashObject.hs	2011-12-12 21:24:55 -04:00
Joey Hess	31a0c07ee9	broke out Git/Branch.hs and reorganized	2011-12-12 21:12:51 -04:00
Joey Hess	543d0d2501	split out Git/Ref.hs	2011-12-12 18:30:33 -04:00
Joey Hess	da95cbadca	split out Annex/Journal.hs	2011-12-12 18:03:28 -04:00
Joey Hess	98dfc0c9b0	split out Annex/BranchState.hs	2011-12-12 17:38:46 -04:00
Joey Hess	b2f934e07a	update comment	2011-12-12 17:24:12 -04:00
Joey Hess	79345ad5fc	optimisation avoids a redundant call to git show-ref	2011-12-12 03:30:47 -04:00
Joey Hess	f9cd3f6ad1	optimisation avoids a useless diff from git-annex..refs/heads/git-annex	2011-12-12 02:31:07 -04:00
Joey Hess	2332afb4bc	cleanup	2011-12-12 02:04:48 -04:00
Joey Hess	29b88ad657	avoid redundant call to updateIndex commitBranch calls updateIndex	2011-12-11 21:46:21 -04:00
Joey Hess	c4c965d602	detect and recover from branch push/commit race Dealing with a race without using locking is exceedingly difficult and tricky. Fully tested, I hope. There are three places left where the branch can be updated, that are not covered by the race recovery code. Let's prove they're all immune to the race: 1. tryFastForwardTo checks to see if a fast-forward can be done, and then does git-update-ref on the branch to fast-forward it. If a push comes in before the check, then either no fast-forward will be done (ok), or the push set the branch to a ref that can still be fast-forwarded (also ok) If a push comes in after the check, the git-update-ref will undo the ref change made by the push. It's as if the push did not come in, and the next git-push will see this, and try to re-do it. (acceptable) 2. When creating the branch for the very first time, an empty index is created, and a commit of it made to the branch. The commit's ref is recorded as the current state of the index. If a push came in during that, it will be noticed the next time a commit is made to the branch, since the branch will have changed. (ok) 3. Creating the branch from an existing remote branch involves making the branch, and then getting its ref, and recording that the index reflects that ref. If a push creates the branch first, git-branch will fail (ok). If the branch is created and a racing push is then able to change it (highly unlikely!) we're still ok, because it first records the ref into the index.lck, and then updating the index. The race can cause the index.lck to have the old branch ref, while the index has the newly pushed branch merged into it, but that only results in an unnecessary update of the index file later on.	2011-12-11 20:41:35 -04:00
Joey Hess	cfbbda99f4	optimize index updating The last branch ref that the index was updated to is stored in .git/annex/index.lck, and the index only updated when the current branch ref differs. (The .lck file should later be used for locking too.) Some more optimization is still needed, since there is some redundancy in calls to git show-ref.	2011-12-11 16:14:59 -04:00
Joey Hess	8680c415de	slow, stupid, and safe index updating Always merge the git-annex branch into .git/annex/index before making a commit from the index. This ensures that, when the branch has been changed in any way (by a push being received, or changes pulled directly into it, or even by the user checking it out, and committing a change), the index reflects those changes. This is much too slow; it needs to be optimised to only update the index when the branch has really changed, not every time. Also, there is an unhandled race, when a change is made to the branch right after the index gets updated. I left it in for now because it's unlikely and I didn't want to complicate things with additional locking yet.	2011-12-11 15:05:53 -04:00
Joey Hess	0ba4b1de18	move a file location to Locations.hs	2011-12-11 14:14:28 -04:00
Joey Hess	eecaf42485	no need to show, it's a string	2011-12-10 12:30:31 -04:00
Joey Hess	d64132a43a	hslint	2011-12-09 01:57:13 -04:00
Joey Hess	598eb2e2da	cleanup	2011-11-30 12:01:15 -04:00
Joey Hess	6869e6023e	support .git/annex on a different disk than the rest of the repo The only fully supported thing is to have the main repository on one disk, and .git/annex on another. Only commands that move data in/out of the annex will need to copy it across devices. There is only partial support for putting arbitrary subdirectories of .git/annex on different devices. For one thing, but this can require more copies to be done. For example, when .git/annex/tmp is on one device, and .git/annex/journal on another, every journal write involves a call to mv(1). Also, there are a few places that make hard links between various subdirectories of .git/annex with createLink, that are not handled. In the common case without cross-device, the new moveFile is actually faster than renameFile, avoiding an unncessary stat to check that a file (not a directory) is being moved. Of course if a cross-device move is needed, it is as slow as mv(1) of the data.	2011-11-28 16:17:55 -04:00
Joey Hess	1ffd54ef78	ensure branch exists before trying to update it The branch may not exist, if .git/annex has been copied over from another repo (or a corrupted repo). I suppose it could also have gotten deleted somehow. Without this, there is a confusing failure.	2011-11-16 18:56:06 -04:00
Joey Hess	9290095fc2	improve type signatures with a Ref newtype In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for those. Note that this does not prevent mixing up of eg, refs and branches at the type level. Since git really doesn't care, except rare cases like git update-ref, or git tag -d, that seems ok for now. There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref may or may not be a tree-ish, depending on the object type, so there seems no point in trying to represent it at the type level.	2011-11-16 02:41:46 -04:00
Joey Hess	272a67921c	better name	2011-11-16 01:46:46 -04:00
Joey Hess	21a925dcf1	merge: Now runs in constant space. Before, a merge was first calculated, by running various actions that called git and built up a list of lines, which were at the end sent to git update-index. This necessarily used space proportional to the size of the diff between the trees being merged. Now, lines are streamed into git update-index from each of the actions in turn. Runtime size of git-annex merge when merging 50000 location log files drops from around 100 mb to a constant 4 mb. Presumably it runs quite a lot faster, too.	2011-11-15 23:28:01 -04:00
Joey Hess	04edae6791	Optimised union merging; now only runs git cat-file once.	2011-11-12 17:45:12 -04:00

1 2

68 commits