git-annex

Author	SHA1	Message	Date
Joey Hess	c9b3b8829d	thread safe git-annex index file use	2012-08-24 20:50:39 -04:00
Joey Hess	5c3e14649e	avoid unnecessary transfer scans when syncing a disconnected remote Found a very cheap way to determine when a disconnected remote has diverged, and has new content that needs to be transferred: Piggyback on the git-annex branch update, which already checks for divergence. However, this does not check if new content has appeared locally while disconnected, that should be transferred to the remote. Also, this does not handle cases where the two git repos are in sync, but their content syncing has not caught up yet. This code could have its efficiency improved: * When multiple remotes are synced, if any one has diverged, they're all queued for transfer scans. * The transfer scanner could be told whether the remote has new content, the local repo has new content, or both, and could optimise its scan accordingly.	2012-08-22 15:05:57 -04:00
Joey Hess	9fc94d780b	better readProcess	2012-07-19 00:57:40 -04:00
Joey Hess	1db7d27a45	add back debug logging Make Utility.Process wrap the parts of System.Process that I use, and add debug logging to them. Also wrote some higher-level code that allows running an action with handles to a processes stdin or stdout (or both), and checking its exit status, all in a single function call. As a bonus, the debug logging now indicates whether the process is being run to read from it, feed it data, chat with it (writing and reading), or just call it for its side effect.	2012-07-19 00:46:52 -04:00
Joey Hess	d1da9cf221	switch from System.Cmd.Utils to System.Process Test suite now passes with -threaded! I traced back all the hangs with -threaded to System.Cmd.Utils. It seems it's just crappy/unsafe/outdated, and should not be used. System.Process seems to be the cool new thing, so converted all the code to use it instead. In the process, --debug stopped printing commands it runs. I may try to bring that back later. Note that even SafeSystem was switched to use System.Process. Since that was a modified version of code from System.Cmd.Utils, it needed to be converted too. I also got rid of nearly all calls to forkProcess, and all calls to executeFile, which I'm also doubtful about working well with -threaded.	2012-07-18 18:00:24 -04:00
Joey Hess	05310538ef	more debugging	2012-07-18 13:31:00 -04:00
Joey Hess	75b6ee81f9	avoid ByteString.Char8 where not needed Its truncation behavior is a red flag, so avoid using it in these places where only raw ByteStrings are used, without looking at the data inside.	2012-06-20 13:13:40 -04:00
Joey Hess	e0095b0bdc	fishy commit	2012-06-14 00:01:48 -04:00
Joey Hess	942d8f7298	hlint	2012-06-12 11:32:06 -04:00
Joey Hess	ca9ee21bd7	crazy optimisation Crazy like a fox..	2012-06-10 19:58:34 -04:00
Joey Hess	c5707c84d3	queue size fix Increase queue size for update-index actions, because otherwise they'll never be flushed.	2012-06-10 13:56:04 -04:00
Joey Hess	d45a9a7831	refactor and function name cleanup (oops, I had a calcMerge and a calc_merge!)	2012-06-08 00:29:39 -04:00
Joey Hess	20f425be19	make watch use the queue May not work. Certianly needs to flush the queue from time to time when only symlink changes are being made.	2012-06-07 15:40:44 -04:00
Joey Hess	0a11b35d89	extend Git.Queue to be able to queue more than simple git commands While I was in there, I noticed and fixed a bug in the queue size calculations. It was never encountered only because Queue.add was only ever run with 1 file in the list.	2012-06-07 15:19:44 -04:00
Joey Hess	b819f644ad	close the git add race There's a race adding a new file to the annex: The file is moved to the annex and replaced with a symlink, and then we git add the symlink. If someone comes along in the meantime and replaces the symlink with something else, such as a new large file, we add that instead. Which could be bad.. This race is fixed by avoiding using git add, instead the symlink is directly staged into the index. It would be nice to make `git annex add` use this same technique. I have not done so yet because it currently runs git update-index once per file, which would slow does `git annex add`. A future enhancement would be to extend the Git.Queue to include the ability to run update-index with a list of Streamers.	2012-06-06 14:29:10 -04:00
Joey Hess	993e6459a3	factor out nukeFile	2012-06-06 13:13:13 -04:00
Joey Hess	27cfeca4ea	Merge branch 'master' into watch	2012-06-06 02:16:21 -04:00
Joey Hess	f1bd72ea54	factor out generic update-index code from unionmerge code	2012-06-06 00:10:34 -04:00
Joey Hess	7a6fb8ae4e	flush the git queue when a new type of action is being added to it This allows the queue to be used in a single process for multiple possibly conflicting commands, like add and rm, without running them out of order. This assumes that running the same git subcommand with different parameters cannot itself conflict.	2012-06-04 20:41:22 -04:00
Joey Hess	bb4f31a0ee	Clean up handling of git directory and git worktree. Baked into the code was an assumption that a repository's git directory could be determined by adding ".git" to its work tree (or nothing for bare repos). That fails when core.worktree, or GIT_DIR and GIT_WORK_TREE are used to separate the two. This was attacked at the type level, by storing the gitdir and worktree separately, so Nothing for the worktree means a bare repo. A complication arose because we don't learn where a repository is bare until its configuration is read. So another Location type handles repositories that have not had their config read yet. I am not entirely happy with this being a Location type, rather than representing them entirely separate from the Git type. The new code is not worse than the old, but better types could enforce more safety. Added support for core.worktree. Overriding it with -c isn't supported because it's not really clear what to do if a git repo's config is read, is not bare, and is then overridden to bare. What is the right git directory in this case? I will worry about this if/when someone has a use case for overriding core.worktree with -c. (See Git.Config.updateLocation) Also removed and renamed some functions like gitDir and workTree that misused git's terminology. One minor regression is known: git annex add in a bare repository does not print a nice error message, but runs git ls-files in a way that fails earlier with a less nice error message. This is because before --work-tree was always passed to git commands, even in a bare repo, while now it's not.	2012-05-18 17:03:12 -04:00
Joey Hess	f7d8982672	Fix use of several config settings annex.ssh-options, annex.rsync-options, annex.bup-split-options. And adjust types to avoid the bugs that broke several config settings recently. Now "annex." prefixing is enforced at the type level.	2012-05-05 20:16:56 -04:00
Joey Hess	76102c1c75	display "Recording state in git..." when staging the journal A bit tricky to avoid printing it twice in a row when there are queued git commands to run and journal to stage. Added a generic way to run an action that may output multiple side messages, with only the first displayed.	2012-04-27 13:54:33 -04:00
Joey Hess	e0b7012ccc	uninit: Clear annex.uuid from .git/config. Closes: #670639	2012-04-27 12:21:38 -04:00
Joey Hess	84ac8c58db	Add annex.httpheaders and annex.httpheader-command config settings Allow custom headers to be sent with all HTTP requests. (Requested by the Internet Archive)	2012-04-22 01:13:09 -04:00
Joey Hess	ed79596b75	noop	2012-04-21 23:32:33 -04:00
Joey Hess	bee420bd2d	in which I discover void void :: Functor f => f a -> f () -- ah, of course that's useful :)	2012-04-21 23:06:19 -04:00
Joey Hess	cab63b89f2	cache parsed core.sharedrepository	2012-04-21 19:42:49 -04:00
Joey Hess	b98b69e8c6	honor core.sharedRepository when making all the other files in the annex Lock files, directories, etc.	2012-04-21 19:36:03 -04:00
Joey Hess	7e45712d19	better file mode setting code	2012-04-21 16:01:56 -04:00
Joey Hess	b4a5e39ee6	Support git's core.sharedRepository configuration This is incomplete, it does not honor it yet for hash directories and other annex bookkeeping files. Some of that is not needed for a bare repo; some of it may be.	2012-04-21 15:36:52 -04:00
Joey Hess	b65e257b13	inverted logic	2012-04-20 16:16:13 -04:00
Joey Hess	262017e17d	export a more generalized checkDiskSpace	2012-04-20 16:06:10 -04:00
Joey Hess	e38a839a80	Rewrote free disk space checking code Moving the portability handling into a small C library cleans up things a lot, avoiding the pain of unpacking structs from inside haskell code.	2012-03-22 17:32:47 -04:00
Joey Hess	f1398b5583	use new getConfig	2012-03-22 17:32:47 -04:00
Joey Hess	4eb5112681	rationalize getConfig getConfig got a remote-specific config, and this confusing name caused it to be used a couple of places that only were interested in global configs. Rename to getRemoteConfig and make getConfig only get global configs. There are no behavior changes here, but remote.<name>.annex-web-options never actually worked (and per-remote web options is a very unlikely to be useful case so I didn't make it work), so fix the documentation for it.	2012-03-22 17:32:47 -04:00
Joey Hess	188e2edc41	status: Prints available local disk space, or shows if git-annex doesn't know.	2012-03-21 21:55:02 -04:00
Joey Hess	181d2ccd20	Improve detection of inability to check free disk space. Don't check if configure indicated checks won't work. This should fix a FTBFS on mipsel, where configure correctly detects the checks won't work, while garbage is returned for disk space info at git-annex runtime. It also means that, when built via cabal, disk space checks are not enabled, unfortunatly.	2012-03-21 21:21:20 -04:00
Joey Hess	60ab3d84e1	added ifM and nuked 11 lines of code no behavior changes	2012-03-14 17:43:34 -04:00
Joey Hess	b325694645	getKeysPresent is now fully lazy .. Allowing it to be used by things in constant space! Random statistics: git annex status has gone from taking 239 mb of memory and 26 seconds in a repo, to 8 mb and 13 seconds. The trick here is the unsafeInterleaveIO, and the form of the function's recursion, which I cribbed heavily from System.IO.HVFS.Utils.recurseDirStat. The difference is, this one goes to a limited depth and avoids statting everything.	2012-03-11 18:04:58 -04:00
Joey Hess	ff3644ad38	status: Fixed to run in nearly constant space. Before, it leaked space due to caching lists of keys. Now all necessary data about keys is calculated as they stream in. The "nearly constant" is due to getKeysPresent, which builds up a lot of [] thunks as it traverses .git/annex/objects/. Will deal with it later.	2012-03-11 17:15:58 -04:00
Joey Hess	d08ee1a9d2	syscall optimisation	2012-03-06 13:56:20 -04:00
Joey Hess	12b89a3eb8	configure: Check if ssh connection caching is supported by the installed version of ssh and default annex.sshcaching accordingly.	2012-02-25 19:15:29 -04:00
Joey Hess	1f73db3469	improve alwayscommit=false mode Now changes are staged into the branch's index, but not committed, which avoids growing a large journal. And sync and merge always explicitly commit, ensuring that even when they do nothing else, they commit the staged changes. Added a flag file to indicate that the branch's journal contains uncommitted changes. (Could use git ls-files, but don't want to run that every time.) In the future, this ability to have uncommitted changes staged in the journal might be used on remotes after a series of oneshot commands.	2012-02-25 16:18:55 -04:00
Joey Hess	b49c0c2633	add annex.alwayscommit option To avoid commits of data to the git-annex branch after each command is run, set annex.alwayscommit=false. Its data will then be committed less frequently, when a merge or sync is done.	2012-02-25 15:31:42 -04:00
Joey Hess	bd66f962d3	Deal with NFS problem that caused a failure to remove a directory when removing content from the annex. I was able to reproduce this on linux using the kernel's nfs server and mounting localhost:/. Determined that removing the directory fails when the just-deleted file in it was locked. Considered dropping the lock before removing the directory, but this would complicate parts of the code that should not need to worry about locking. So instead, ignore the failure to remove the directory in this case. While I was at it, made it attempt to remove both levels of hash directories, in case they're empty.	2012-02-24 16:30:47 -04:00
Joey Hess	a1e52f0ce5	hlint	2012-02-16 00:44:51 -04:00
Joey Hess	52c5b164d8	Added a annex.queuesize setting useful when adding hundreds of thousands of files on a system with plenty of memory. git add gets quite slow in such a large repository, so if the system has more than the ~32 mb of memory the queue can use by default, it's a useful optimisation to increase the queue size, in order to decrease the number of times git add is run.	2012-02-15 11:14:19 -04:00
Joey Hess	03c559f8d6	tweak	2012-02-14 14:51:26 -04:00
Joey Hess	7ebd98d8d8	fix memory leak when staging the journal The list of files had to be retained until the end so it could be deleted. Also, a list of update-index lines was generated and only then fed into it. Now everything streams in constant space.	2012-02-14 14:37:59 -04:00
Joey Hess	a40ec5e03e	Fixed a memory leak due to excessive strictness when committing journal files. When hashing the files, the entire list of shas was read strictly. That was entirely unnecessary, since there's a cleanup action run after they're consumed.	2012-02-14 11:20:34 -04:00

1 2 3

136 commits