git-annex

Author	SHA1	Message	Date
Joey Hess	6e213d04f1	sync: Show a nicer message if a user tries to sync to a special remote.	2012-05-27 20:55:56 -04:00
Joey Hess	ab07762ddb	releasing version 3.20120522	2012-05-22 11:27:22 -04:00
Joey Hess	eb6cb1b87f	Add support for core.worktree, and fix support for GIT_WORK_TREE and GIT_DIR. The environment needs to override git-config. Changed when git config is read, and avoid rereading it once it's been read. chdir for both worktree settings.	2012-05-18 18:20:53 -04:00
Joey Hess	bb4f31a0ee	Clean up handling of git directory and git worktree. Baked into the code was an assumption that a repository's git directory could be determined by adding ".git" to its work tree (or nothing for bare repos). That fails when core.worktree, or GIT_DIR and GIT_WORK_TREE are used to separate the two. This was attacked at the type level, by storing the gitdir and worktree separately, so Nothing for the worktree means a bare repo. A complication arose because we don't learn where a repository is bare until its configuration is read. So another Location type handles repositories that have not had their config read yet. I am not entirely happy with this being a Location type, rather than representing them entirely separate from the Git type. The new code is not worse than the old, but better types could enforce more safety. Added support for core.worktree. Overriding it with -c isn't supported because it's not really clear what to do if a git repo's config is read, is not bare, and is then overridden to bare. What is the right git directory in this case? I will worry about this if/when someone has a use case for overriding core.worktree with -c. (See Git.Config.updateLocation) Also removed and renamed some functions like gitDir and workTree that misused git's terminology. One minor regression is known: git annex add in a bare repository does not print a nice error message, but runs git ls-files in a way that fails earlier with a less nice error message. This is because before --work-tree was always passed to git commands, even in a bare repo, while now it's not.	2012-05-18 17:03:12 -04:00
Joey Hess	e36808e167	Pass -a to cp even when it supports --reflink=auto, to preserve permissions. Amoung other things, this makes unlocking a WORM backed file and then re-adding it without making any changes not add a new object, as the timestamp is preserved.	2012-05-15 14:18:51 -04:00
Joey Hess	61a5df33d4	releasing version 3.20120511	2012-05-11 12:37:26 -04:00
Joey Hess	32a41f8af1	add a favicon	2012-05-10 14:18:35 -04:00
Joey Hess	bbfa74e7ac	format	2012-05-07 13:19:00 -04:00
Joey Hess	f7d8982672	Fix use of several config settings annex.ssh-options, annex.rsync-options, annex.bup-split-options. And adjust types to avoid the bugs that broke several config settings recently. Now "annex." prefixing is enforced at the type level.	2012-05-05 20:16:56 -04:00
Joey Hess	392931eca9	addunused: New command, the opposite of dropunused, it relinks unused content into the git repository.	2012-05-02 14:59:05 -04:00
Joey Hess	8f45300479	dropunused: Allow specifying ranges to drop. Sort of by popular demand, but the last straw for not using seq was that it can run into command line length limits.	2012-05-02 13:15:19 -04:00
Joey Hess	6d61067599	rsync shellescape disable option Rsync special remotes can be configured with shellescape=no to avoid shell quoting that is normally done when using rsync over ssh. This is known to be needed for certian rsync hosting providers (specificially hidrive.strato.com) that use rsync over ssh but do not pass it through the shell.	2012-05-02 13:08:33 -04:00
Joey Hess	76b80d6af0	releasing version 3.20120430	2012-04-30 13:59:28 -04:00
Joey Hess	1c16f616df	Added shared cipher mode to encryptable special remotes. This option avoids gpg key distribution, at the expense of flexability, and with the requirement that all clones of the git repository be equally trusted.	2012-04-29 14:02:43 -04:00
Joey Hess	e0b7012ccc	uninit: Clear annex.uuid from .git/config. Closes: #670639	2012-04-27 12:21:38 -04:00
Joey Hess	1db09af14c	fix names	2012-04-22 11:42:38 -04:00
Joey Hess	84ac8c58db	Add annex.httpheaders and annex.httpheader-command config settings Allow custom headers to be sent with all HTTP requests. (Requested by the Internet Archive)	2012-04-22 01:13:09 -04:00
Joey Hess	b4a5e39ee6	Support git's core.sharedRepository configuration This is incomplete, it does not honor it yet for hash directories and other annex bookkeeping files. Some of that is not needed for a bare repo; some of it may be.	2012-04-21 15:36:52 -04:00
Joey Hess	5cc76098ca	Directory special remotes now check annex.diskreserve.	2012-04-20 16:24:44 -04:00
Joey Hess	e807502666	had the wrong name for this	2012-04-20 16:14:29 -04:00
Joey Hess	840315c350	releasing version 3.20120418	2012-04-18 12:22:22 -04:00
Joey Hess	626697b459	cabal file now autodetects whether S3 support is available.	2012-04-14 14:22:33 -04:00
Joey Hess	1ca41044e8	cabal now installs git-annex-shell as a symlink to git-annex.	2012-04-14 14:01:14 -04:00
Joey Hess	3642c72320	Renamed diskfree.c to avoid OSX case insensativity bug.	2012-04-13 11:26:39 -04:00
Joey Hess	52a158a7c6	autocorrection git-annex (but not git-annex-shell) supports the git help.autocorrect configuration setting, doing fuzzy matching using the restricted Damerau-Levenshtein edit distance, just as git does. This adds a build dependency on the haskell edit-distance library.	2012-04-12 15:37:21 -04:00
Joey Hess	c924542e61	bup: Properly handle key names with spaces or other things that are not legal git refs. Continue using the key name as bup ref name, to preserve backwards compatability, unless it is an illegal git ref. In that case, use a sha256 of the key name instead.	2012-04-11 12:45:49 -04:00
Joey Hess	182778d664	bugfix: Adding a dotfile also caused all non-dotfiles to be added. When only a dotfile was specified, the list of non-dotfiles was empty, triggering the fallback behavior of finding all files.	2012-04-08 12:25:54 -04:00
Joey Hess	29acf62ba3	releasing version 3.20120406	2012-04-07 15:58:13 -04:00
Joey Hess	62c69e7e25	Disable diskfree on kfreebsd, as I have a build failure on kfreebsd-i386 that is quite likely caused by it.	2012-04-07 15:50:34 -04:00
Joey Hess	16acc507f3	releasing version 3.20120405	2012-04-05 16:37:44 -04:00
Joey Hess	a398db7885	update	2012-03-24 11:58:22 -04:00
Joey Hess	e38a839a80	Rewrote free disk space checking code Moving the portability handling into a small C library cleans up things a lot, avoiding the pain of unpacking structs from inside haskell code.	2012-03-22 17:32:47 -04:00
Joey Hess	188e2edc41	status: Prints available local disk space, or shows if git-annex doesn't know.	2012-03-21 21:55:02 -04:00
Joey Hess	181d2ccd20	Improve detection of inability to check free disk space. Don't check if configure indicated checks won't work. This should fix a FTBFS on mipsel, where configure correctly detects the checks won't work, while garbage is returned for disk space info at git-annex runtime. It also means that, when built via cabal, disk space checks are not enabled, unfortunatly.	2012-03-21 21:21:20 -04:00
Joey Hess	d1e136193b	releasing version 3.20120315	2012-03-15 12:23:34 -04:00
Joey Hess	d2769cf795	shave some 12 mb from the installed size * git-annex now behaves as git-annex-shell if symlinked to and run by that name. The Makefile sets this up, saving some 8 mb of installed size. * git-union-merge is a demo program, so it is no longer built by default.	2012-03-15 12:00:19 -04:00
Joey Hess	a4f72c9625	update	2012-03-14 12:44:17 -04:00
Joey Hess	342fc28437	Merge branch 'master' into bloom Conflicts: Command/Commit.hs debian/changelog	2012-03-14 12:41:48 -04:00
Joey Hess	5b869eef91	git-annex-shell: Runs hooks/annex-content after content is received or dropped.	2012-03-14 12:18:10 -04:00
Joey Hess	caf97fcffd	git-annex-shell: Runs hooks/annex-content after content is received or dropped.	2012-03-14 12:01:56 -04:00
Joey Hess	b27760aa68	Work around a bug in rsync (IMHO) introduced by openSUSE's SIP patch. openSUSE patches rsync with a patch adding SIP protocol support. https://gist.github.com/2026167 With this patch, running rsync with no hostname parameter is apparently supposed to list SIP hosts on the network. Practically, it does nothing and exits 0. git-annex uses rsync in a very special way to allow git-annex-shell to be run on the remote host, and so did not need to specify a hostname, or a file to transfer as a rsync parameter. So it sent ":", a degenerate case of "host:file". But the patch cannot differentiate ":" with no host parameter (a bug in the SIP patch surely). Results were that getting files failed, as rsync seemed to succeed, but the requested file failed to arrive. Also I think that sending files will make git-annex think a file has been transferred to the remote when really rsync does nothing. The workaround for this buggy rsync patch is to use "dummy:" as the hostname.	2012-03-12 22:53:43 -04:00
Joey Hess	94aff8b878	Merge branch 'master' into bloom Conflicts: debian/changelog	2012-03-12 16:32:29 -04:00
Joey Hess	25809ce2e0	finish bloom filters Add tuning, docs, etc. Not sure if status is the right place to remote size.. perhaps unused should report the size and also warn if it sees more keys than the bloom filter allows?	2012-03-12 16:18:35 -04:00
Joey Hess	89ee70c43a	status: More accurate display of sizes of tmp and bad keys. Can't trust the key size to be accurate for tmp and bad keys, so check actual file size. In the wild I saw the old code be wrong by a factor of about 100! If all tmp/bad keys are empty, they're not shown in status at all. Showing 0 bytes and suggesting to clean it up seemed weird..	2012-03-12 00:41:48 -04:00
Joey Hess	b325694645	getKeysPresent is now fully lazy .. Allowing it to be used by things in constant space! Random statistics: git annex status has gone from taking 239 mb of memory and 26 seconds in a repo, to 8 mb and 13 seconds. The trick here is the unsafeInterleaveIO, and the form of the function's recursion, which I cribbed heavily from System.IO.HVFS.Utils.recurseDirStat. The difference is, this one goes to a limited depth and avoids statting everything.	2012-03-11 18:04:58 -04:00
Joey Hess	ff3644ad38	status: Fixed to run in nearly constant space. Before, it leaked space due to caching lists of keys. Now all necessary data about keys is calculated as they stream in. The "nearly constant" is due to getKeysPresent, which builds up a lot of [] thunks as it traverses .git/annex/objects/. Will deal with it later.	2012-03-11 17:15:58 -04:00
Joey Hess	b086e32c63	unused: Reduce memory usage significantly. Much of the memory bloat turned out to be due to getKeysReferenced containing a mapM, which is strict and buffered the whole list rather than streaming it. The other half of the bloat was due to building a temporary Set in order to call S.difference. While that is more cpu efficient, I switched to successive S.delete, since with it, I can run a whole git annex unused in less than 8 mb of memory. The whole Set of keys with content available is still stored in memory, so running unused in a repo with a whole lot of file content will still use more memory. In a repo containing 6000 files, it needed 40 mb. Note that the status command still uses the bloatful getKeysReferenced.	2012-03-11 16:24:07 -04:00
Joey Hess	997e29f294	sync: Sync to lower cost remotes first. This has two benefits. 1. When a lot of refs are going to be received, get them via lower cost connection when possible. 2. Allows ctrl-c of sync after the cheaper remotes have been pulled from (or pushed to).	2012-03-10 15:37:38 -04:00
Joey Hess	5ab82230f7	fsck: Fix up any broken links and misplaced content caused by the directory hash calculation bug fixed in the last release.	2012-03-10 14:46:21 -04:00
Joey Hess	433b5fe59e	releasing version 3.20120309	2012-03-09 20:14:34 -04:00
Joey Hess	bca3fd65b9	fix key directory hash calculation code Fix Key directory hash calculation code to behave as it did before version 3.20120227 when a key contains non-ascii. The hash directories for a given Key are based on its md5sum. Prior to ghc 7.4, Keys contained raw, undecoded bytes, so the md5sum was taken of each byte in turn. With the ghc 7.4 filename encoding change, keys contains decoded unicode characters (possibly with surrigates for undecodable bytes). This changes the result of the md5sum, since the md5sum used is pure haskell and supports unicode. And that won't do, as git-annex will start looking in a different hash directory for the content of a key. The surrigates are particularly bad, since that's essentially a ghc implementation detail, so could change again at any time. Also, changing the locale changes how the bytes are decoded, which can also change the md5sum. Symptoms would include things like: * git annex fsck would complain that no copies existed of a file, despite its symlink pointing to the content that was locally present * git annex fix would change the symlink to use the wrong hash directory. Only WORM backend is likely to have been affected, since only it tends to include much filename data (SHA1E could in theory also be affected). I have not tried to support the hash directories used by git-annex versions 3.20120227 to 3.20120308, so things added with those versions with WORM will require manual fixups. Sorry for the inconvenience!	2012-03-09 20:03:51 -04:00
Joey Hess	0d41899304	releasing version 3.20120230	2012-03-05 13:47:20 -04:00
Joey Hess	51338486dc	Fix a bug in symlink calculation code, that triggered in rare cases where an annexed file is in a subdirectory that nearly matched to the .git/annex/object/xx/yy subdirectories. This is a straight up pure-code stinker. The relative path calculation looked for common subdirectories in the two paths, but failed to stop after the paths diverged. When a later pair of subdirectories were the same, the resulting relative path was wrong. Added regression test for this.	2012-03-05 12:42:52 -04:00
Joey Hess	52e88f3ebf	add remote start and stop hooks Locking is used, so that, if there are multiple git-annex processes using a remote concurrently, the stop hook is only run by the last process that uses it.	2012-03-04 19:12:58 -04:00
Joey Hess	9856c24a59	Add progress bar display to the directory special remote. So far I've only written progress bars for sending files, not yet receiving. No longer uses external cp at all. ByteString IO is fast enough.	2012-03-04 03:17:25 -04:00
Joey Hess	3436aba6de	Directory special remotes now support chunking files written to them Avoiding writing files larger than a specified size is useful on certian things. For example, box.com has a file size limit of 100 mb. Could also be useful on really crappy removable media.	2012-03-03 18:05:55 -04:00
Joey Hess	1098bc37ab	"here" can be used to refer to the current repository, which can read better than the old "." (which still works too).	2012-03-01 22:35:10 -04:00
Joey Hess	6571831b92	releasing version 3.20120229	2012-02-29 02:39:44 -04:00
Joey Hess	e5fee3f352	Fix test suite to not require a unicode locale. Without a unicode locale, it will fail to print a unicode filename to console, and fails.	2012-02-29 02:32:05 -04:00
Joey Hess	8cae4115a8	releasing version 3.20120227	2012-02-27 13:07:04 -04:00
Joey Hess	2fd294d06f	move --from, copy --from: 10 times faster scanning remote on local disk Rather than go through the location log to see which files are present on the remote, it simply looks at the disk contents directly. I benchmarked this speeding up scanning 834 files, from an annex on my phone's SSD, from 11.39 seconds to 1.31 seconds. (No files actually moved.) Also benchmarked 8139 files, from an annex on spinning storage, speeding up from 103.17 to 13.39 seconds. Note that benchmarking with an encrypted annex on flash actually showed a minor slowdown with this optimisation -- from 13.93 to 14.50 seconds. Seems the overhead of doing the crypto needed to get the filenames to directly check can be higher than the overhead of looking up data in the location log. (Which says good things about how well the location log and git have been optimised!) It may make sense to make encrypted local remotes not have hasKeyCheap set; further benchmarking is called for.	2012-02-26 14:59:48 -04:00
Joey Hess	b889581945	version dependency on openssh-client This is only to ensure that it's as new a version as it was built with, so partial upgrades work.	2012-02-25 19:31:46 -04:00
Joey Hess	12b89a3eb8	configure: Check if ssh connection caching is supported by the installed version of ssh and default annex.sshcaching accordingly.	2012-02-25 19:15:29 -04:00
Joey Hess	c3fbe07d7a	do a cleanup commit after moving data from or to a git remote Added Annex.cleanup, which is a general purpose interface for adding actions to run at the end. Remotes with the old git-annex-shell will commit every time, and have no commit command, so hide stderr when running the commit command.	2012-02-25 18:02:49 -04:00
Joey Hess	1f73db3469	improve alwayscommit=false mode Now changes are staged into the branch's index, but not committed, which avoids growing a large journal. And sync and merge always explicitly commit, ensuring that even when they do nothing else, they commit the staged changes. Added a flag file to indicate that the branch's journal contains uncommitted changes. (Could use git ls-files, but don't want to run that every time.) In the future, this ability to have uncommitted changes staged in the journal might be used on remotes after a series of oneshot commands.	2012-02-25 16:18:55 -04:00
Joey Hess	b49c0c2633	add annex.alwayscommit option To avoid commits of data to the git-annex branch after each command is run, set annex.alwayscommit=false. Its data will then be committed less frequently, when a merge or sync is done.	2012-02-25 15:31:42 -04:00
Joey Hess	df3a310b83	update copyright format url	2012-02-25 10:40:05 -04:00
Joey Hess	bd66f962d3	Deal with NFS problem that caused a failure to remove a directory when removing content from the annex. I was able to reproduce this on linux using the kernel's nfs server and mounting localhost:/. Determined that removing the directory fails when the just-deleted file in it was locked. Considered dropping the lock before removing the directory, but this would complicate parts of the code that should not need to worry about locking. So instead, ignore the failure to remove the directory in this case. While I was at it, made it attempt to remove both levels of hash directories, in case they're empty.	2012-02-24 16:30:47 -04:00
Joey Hess	5bf07b3b5c	Store web special remote url info in a more efficient location. storing it in remotes/web/xx/yy/foo.log meant lots of extra directory objects in git. Now I use xx/yy/foo.log.web, which is just as unique, but more efficient since foo.log is there anyway. Of course, it still looks in the old location too.	2012-02-17 23:15:29 -04:00
Joey Hess	db6b4cdfcf	rekey: New plumbing level command, can be used to change the keys used for files en masse.	2012-02-16 16:36:35 -04:00
Joey Hess	aeaaa0ff87	reorder	2012-02-16 15:07:59 -04:00
Joey Hess	39c3f56b33	addurl: Add --pathdepth option.	2012-02-16 12:25:19 -04:00
Joey Hess	4d8afc1713	tweak wording	2012-02-15 19:43:15 -04:00
Joey Hess	63152428e9	changelog	2012-02-15 17:33:21 -04:00
Joey Hess	52c5b164d8	Added a annex.queuesize setting useful when adding hundreds of thousands of files on a system with plenty of memory. git add gets quite slow in such a large repository, so if the system has more than the ~32 mb of memory the queue can use by default, it's a useful optimisation to increase the queue size, in order to decrease the number of times git add is run.	2012-02-15 11:14:19 -04:00
Joey Hess	7ebd98d8d8	fix memory leak when staging the journal The list of files had to be retained until the end so it could be deleted. Also, a list of update-index lines was generated and only then fed into it. Now everything streams in constant space.	2012-02-14 14:37:59 -04:00
Joey Hess	a40ec5e03e	Fixed a memory leak due to excessive strictness when committing journal files. When hashing the files, the entire list of shas was read strictly. That was entirely unnecessary, since there's a cleanup action run after they're consumed.	2012-02-14 11:20:34 -04:00
Joey Hess	cb631ce518	whereis: Prints the urls of files that the web special remote knows about.	2012-02-14 03:49:48 -04:00
Joey Hess	59b2adea4f	changelog for `a964012fc3` Turns out that commit really made some serious improvements to memory use. With the lazy state monad, git-annex add in a huge tree grew seemingly without bound until it overflowed the stack. With the strict monad, it uses 42 mb max. It's possible another change since the 3.20120123 release fixed that, but `a964012fc3` seems most likely.	2012-02-13 16:58:58 -04:00
Joey Hess	17fed709c8	addurl --fast: Verifies that the url can be downloaded (only getting its head), and records the size in the key.	2012-02-10 19:23:46 -04:00
Joey Hess	9030f68452	When checking that an url has a key, verify that the Content-Length, if available, matches the size of the key. If there's no Content-Length, or the key has no size, this check is not done, but it should happen most of the time, and protect against web content that has changed.	2012-02-10 19:23:41 -04:00
Joey Hess	d55f3c0716	Fix teardown of stale cached ssh connections.	2012-02-09 21:49:46 -04:00
Joey Hess	1c0bd81ba6	addurl: Normalize badly encoded urls.	2012-02-09 14:19:58 -04:00
Joey Hess	ef013506cb	addurl: Added a --file option Can be used to specify what file the url is added to. This can be used to override the default filename that is used when adding an url, which is based on the url. Or, when the file already exists, the url is recorded as another location of the file.	2012-02-08 15:35:29 -04:00
Joey Hess	57a747d081	S3: Fix irrefutable pattern failure when accessing encrypted S3 credentials.	2012-02-08 11:41:15 -04:00
Joey Hess	995bf51e10	correction	2012-02-07 16:52:39 -04:00
Joey Hess	3f4f96228e	changelog	2012-02-06 20:42:49 -04:00
Joey Hess	91fc975964	note 7.4 needed	2012-02-04 14:51:52 -04:00
Joey Hess	ed64bd8a4b	remove; unused	2012-01-30 13:20:36 -04:00
Joey Hess	b81d662cbf	Avoid repeated location log commits when a remote is receiving files. Done by adding a oneshot mode, in which location log changes are written to the journal, but not committed. Taking advantage of git-annex's existing ability to recover in this situation. This is used by git-annex-shell and other places where changes are made to a remote's location log.	2012-01-28 15:41:52 -04:00
Joey Hess	ce5637498f	remove Utility.Conditional and use IfElse This drops the >>! and >>? with the nice low fixity. IfElse does have undocumented >>=>>! and >>=>>? operators, but I deem that too fishy. Anyway, using whenM and unlessM is easier; I sometimes mixed the operators up.	2012-01-24 16:22:07 -04:00
Joey Hess	20d0288802	releasing version 3.20120123	2012-01-23 15:09:50 -04:00
Joey Hess	47250a153a	ssh connection caching Ssh connection caching is now enabled automatically by git-annex. Only one ssh connection is made to each host per git-annex run, which can speed some things up a lot, as well as avoiding repeated password prompts. Concurrent git-annex processes also share ssh connections. Cached ssh connections are shut down when git-annex exits. Note: The rsync special remote does not yet participate in the ssh connection caching.	2012-01-20 17:14:56 -04:00
Joey Hess	61dbad505d	fsck --from remote --fast Avoids expensive file transfers, at the expense of checking file size and/or contents. Required some reworking of the remote code.	2012-01-20 13:23:11 -04:00
Joey Hess	711c154561	update NEWS Add news item recommending fscking directory special remotes. Remote news item about URL backend being removed; it was later added back to be used by git annex addurl --fast. Link NEWS into top level.	2012-01-19 15:27:39 -04:00
Joey Hess	90319afa41	fsck --from Fscking a remote is now supported. It's done by retrieving the contents of the specified files from the remote, and checking them, so can be an expensive operation. (Several optimisations are possible, to speed it up, of course.. This is the slow and stupid remote fsck to start with.) Still, if the remote is a special remote, or a git repository that you cannot run fsck in locally, it's nice to have the ability to fsck it. If you have any directory special remotes, now would be a good time to fsck them, in case you were hit by the data loss bug fixed in the previous release!	2012-01-19 15:24:05 -04:00
Joey Hess	2837e8fef1	releasing version 3.20120116	2012-01-16 16:52:26 -04:00
Joey Hess	f161b5eb59	Fix data loss bug in directory special remote When moving a file to the remote failed, and partially transferred content was left behind in the directory, re-running the same move would think it succeeded and delete the local copy. I reproduced data loss when moving files to a partition that was almost full. Interrupting a transfer could have similar results. Easily fixed by using a temp file which is then moved atomically into place once the transfer completes. I've audited other calls to copyFileExternal, and other special remote file transfer code; everything else seems to use temp files correctly (rsync, git), or otherwise use atomic transfers (bup, S3).	2012-01-16 16:28:15 -04:00
Joey Hess	e3ea5fe938	debhelper v9 kills that ugly python message during build	2012-01-15 14:53:38 -04:00
Joey Hess	ce608303a3	releasing version 3.20120115	2012-01-15 14:02:32 -04:00

1 2 3 4 5 ...

650 commits