git-annex

Author	SHA1	Message	Date
Joey Hess	2fd294d06f	move --from, copy --from: 10 times faster scanning remote on local disk Rather than go through the location log to see which files are present on the remote, it simply looks at the disk contents directly. I benchmarked this speeding up scanning 834 files, from an annex on my phone's SSD, from 11.39 seconds to 1.31 seconds. (No files actually moved.) Also benchmarked 8139 files, from an annex on spinning storage, speeding up from 103.17 to 13.39 seconds. Note that benchmarking with an encrypted annex on flash actually showed a minor slowdown with this optimisation -- from 13.93 to 14.50 seconds. Seems the overhead of doing the crypto needed to get the filenames to directly check can be higher than the overhead of looking up data in the location log. (Which says good things about how well the location log and git have been optimised!) It may make sense to make encrypted local remotes not have hasKeyCheap set; further benchmarking is called for.	2012-02-26 14:59:48 -04:00
Joey Hess	a3c9d06a26	add git-annex-shell commit Eventually, git-annex might try running this after making changes to a remote. I have not yet thought of a good way for it to tell which remotes it needs to run it on though. It can't just do it when shutting down a cached ssh connection, because ssh connection caching is optional, and that would not handle local remotes not accessed over ssh either.	2012-02-25 16:47:28 -04:00
Joey Hess	1f73db3469	improve alwayscommit=false mode Now changes are staged into the branch's index, but not committed, which avoids growing a large journal. And sync and merge always explicitly commit, ensuring that even when they do nothing else, they commit the staged changes. Added a flag file to indicate that the branch's journal contains uncommitted changes. (Could use git ls-files, but don't want to run that every time.) In the future, this ability to have uncommitted changes staged in the journal might be used on remotes after a series of oneshot commands.	2012-02-25 16:18:55 -04:00
Joey Hess	779ec91908	more robustness fixes	2012-02-18 12:08:02 -04:00
Joey Hess	abd50e01fb	don't fail with --pathdepth when file already exists	2012-02-18 12:05:13 -04:00
Joey Hess	00340dfe49	don't error out entirely if an url cannot be downloaded	2012-02-18 11:44:21 -04:00
Joey Hess	1ed5e4d9e3	variable name	2012-02-17 00:21:35 -04:00
Joey Hess	f3c75b601f	reorg	2012-02-17 00:19:47 -04:00
Joey Hess	ba5515d422	reorder for clarity	2012-02-16 22:38:08 -04:00
Joey Hess	156a631f63	make Migrate use ReKey rather than the other way around as ReKey is plumbing, this makes sense	2012-02-16 22:36:56 -04:00
Joey Hess	69a0161c3a	fix filename limit when using --pathdepth	2012-02-16 19:37:02 -04:00
Joey Hess	db6b4cdfcf	rekey: New plumbing level command, can be used to change the keys used for files en masse.	2012-02-16 16:36:35 -04:00
Joey Hess	d05550e803	zero still bad	2012-02-16 14:28:54 -04:00
Joey Hess	346c934409	allow pathdepth to drop from the front or take from the end (negative)	2012-02-16 14:26:53 -04:00
Joey Hess	c2245260b1	improve usage	2012-02-16 12:37:30 -04:00
Joey Hess	39c3f56b33	addurl: Add --pathdepth option.	2012-02-16 12:25:19 -04:00
Joey Hess	a86d937b5b	avoid too long filename when making up a filename for addurl too	2012-02-16 02:09:09 -04:00
Joey Hess	a1e52f0ce5	hlint	2012-02-16 00:44:51 -04:00
Joey Hess	e7aaa55c53	create parent directories as needed for addurl --file	2012-02-16 00:05:49 -04:00
Joey Hess	90a8b38ac0	set oneshot mode on a per-command basis Avoids ugly (and test suite failing) hack in Command.Version	2012-02-14 12:40:40 -04:00
Joey Hess	2f1f1e6b13	avoid version saving state This is not the place to commit journal files.	2012-02-14 10:59:48 -04:00
Joey Hess	cb631ce518	whereis: Prints the urls of files that the web special remote knows about.	2012-02-14 03:49:48 -04:00
Joey Hess	cbaebf538a	rework git check-attr interface Now gitattributes are looked up, efficiently, in only the places that really need them, using the same approach used for cat-file. The old CheckAttr code seemed very fragile, in the way it streamed files through git check-attr. I actually found that `cad8824852` was still deadlocking with ghc 7.4, at the end of adding a lot of files. This should fix that problem, and avoid future ones. The best part is that this removes withAttrFilesInGit and withNumCopies, which were complicated Seek methods, as well as simplfying the types for several other Seek methods that had a Backend tupled in.	2012-02-13 23:52:21 -04:00
Joey Hess	a3ebf16e62	also verify new urls when adding them to existing files	2012-02-10 19:40:54 -04:00
Joey Hess	17fed709c8	addurl --fast: Verifies that the url can be downloaded (only getting its head), and records the size in the key.	2012-02-10 19:23:46 -04:00
Joey Hess	1c0bd81ba6	addurl: Normalize badly encoded urls.	2012-02-09 14:19:58 -04:00
Joey Hess	ac97454659	improve error message	2012-02-08 15:49:42 -04:00
Joey Hess	ef013506cb	addurl: Added a --file option Can be used to specify what file the url is added to. This can be used to override the default filename that is used when adding an url, which is based on the url. Or, when the file already exists, the url is recorded as another location of the file.	2012-02-08 15:35:29 -04:00
Joey Hess	a81297065d	use "known" instead of "visible" I think it's clearer, also it's the same length as "local" :)	2012-02-06 20:42:49 -04:00
Joey Hess	90ab17e153	remove old comment	2012-02-04 16:34:13 -04:00
Joey Hess	f1c7dc1212	fix touch and statfs to work on any files in any locale Use withCAString rather than withCString. XXX Actually, this only works in non-unicode locales when presented with unicode characters. Help?	2012-02-04 12:44:51 -04:00
Joey Hess	44b115e0b1	Merge branch 'master' into ghc7.4 Conflicts: Utility/Misc.hs	2012-02-03 16:48:40 -04:00
Joey Hess	146c36ca54	IO exception rework ghc 7.4 comaplains about use of System.IO.Error to catch exceptions. Ok, use Control.Exception, with variants specialized to only catch IO exceptions.	2012-02-03 16:47:24 -04:00
Joey Hess	d8fb97806c	support all filename encodings with ghc 7.4 Under ghc 7.4, this seems to be able to handle all filename encodings again. Including filename encodings that do not match the LANG setting. I think this will not work with earlier versions of ghc, it uses some ghc internals. Turns out that ghc 7.4 has a special filesystem encoding that it uses when reading/writing filenames (as FilePaths). This encoding is documented to allow "arbitrary undecodable bytes to be round-tripped through it". So, to get FilePaths from eg, git ls-files, set the Handle that is reading from git to use this encoding. Then things basically just work. However, I have not found a way to make Text read using this encoding. Text really does assume unicode. So I had to switch back to using String when reading/writing data to git. Which is a pity, because it's some percent slower, but at least it works. Note that stdout and stderr also have to be set to this encoding, or printing out filenames that contain undecodable bytes causes a crash. IMHO this is a misfeature in ghc, that the user can pass you a filename, which you can readFile, etc, but that default, putStr of filename may cause a crash! Git.CheckAttr gave me special trouble, because the filenames I got back from git, after feeding them in, had further encoding breakage. Rather than try to deal with that, I just zip up the input filenames with the attributes. Which must be returned in the same order queried for this to work. Also of note is an apparent GHC bug I worked around in Git.CheckAttr. It used to forkProcess and feed git from the child process. Unfortunatly, after this forkProcess, accessing the `files` variable from the parent returns []. Not the value that was passed into the function. This screams of a bad bug, that's clobbering a variable, but for now I just avoid forkProcess there to work around it. That forkProcess was itself only added because of a ghc bug, #624389. I've confirmed that the test case for that bug doesn't reproduce it with ghc 7.4. So that's ok, except for the new ghc bug I have not isolated and reported. Why does this simple bit of code magnet the ghc bugs? :) Also, the symlink touching code is currently broken, when used on utf-8 filenames in a non-utf-8 locale, or probably on any filename containing undecodable bytes, and I temporarily commented it out.	2012-02-03 16:23:20 -04:00
Joey Hess	3d49258e5b	attempt at a quick, utf-8 only fix to the ghc 7.4 problem If you have only utf-8 filenames, and need to build git-annex with ghc 7.4, this will work. But, it will crash on non-utf-8 filenames.	2012-02-01 16:16:08 -04:00
Joey Hess	a964012fc3	switch to the strict state monad I had not realized what a memory leak the lazy state monad could be, although I have not seen much evidence of actual leaking in git-annex. However, if running git-annex on a great many files, this could matter. The additional Utility.State.changeState adds even more strictness, avoiding a problem I saw in github-backup where repeatedly modifying state built up a huge pile of thunks.	2012-01-29 22:55:06 -04:00
Joey Hess	b81d662cbf	Avoid repeated location log commits when a remote is receiving files. Done by adding a oneshot mode, in which location log changes are written to the journal, but not committed. Taking advantage of git-annex's existing ability to recover in this situation. This is used by git-annex-shell and other places where changes are made to a remote's location log.	2012-01-28 15:41:52 -04:00
Joey Hess	61dbad505d	fsck --from remote --fast Avoids expensive file transfers, at the expense of checking file size and/or contents. Required some reworking of the remote code.	2012-01-20 13:23:11 -04:00
Joey Hess	f35a84fac7	use a different tmp file when fscking remote data Since the content might be symlinked into place, it's not appropriate to use withTmp here.	2012-01-19 16:56:07 -04:00
Joey Hess	06b0cb6224	add tmp flag parameter to retrieveKeyFile	2012-01-19 16:07:36 -04:00
Joey Hess	90319afa41	fsck --from Fscking a remote is now supported. It's done by retrieving the contents of the specified files from the remote, and checking them, so can be an expensive operation. (Several optimisations are possible, to speed it up, of course.. This is the slow and stupid remote fsck to start with.) Still, if the remote is a special remote, or a git repository that you cannot run fsck in locally, it's nice to have the ability to fsck it. If you have any directory special remotes, now would be a good time to fsck them, in case you were hit by the data loss bug fixed in the previous release!	2012-01-19 15:24:05 -04:00
Joey Hess	d36525e974	convert fsckKey to a Maybe This way it's clear when a backend does not implement its own fsck checks.	2012-01-19 13:51:30 -04:00
Joey Hess	abdacf58ed	tweaks	2012-01-11 00:06:54 -04:00
Joey Hess	16e7178f20	reorg	2012-01-10 15:29:10 -04:00
Joey Hess	07cacbeee9	break module dependancy loop A PITA but worth it to clean up the trust configuration code.	2012-01-10 13:32:38 -04:00
Joey Hess	7675b83efa	map: Fix display of remote repos A change to break local cycles made remote repos be dropped entirely.	2012-01-08 16:05:57 -04:00
Joey Hess	a35278430a	log: Add --gource mode, which generates output usable by gource. As part of this, I fixed up how log was getting the descriptions of remotes.	2012-01-07 18:18:09 -04:00
Joey Hess	bdc49ddbdb	typo	2012-01-07 00:45:01 -04:00
Joey Hess	dfa76069d4	reap zombies	2012-01-07 00:22:16 -04:00
Joey Hess	b8966433ef	sped up git annex log rather a lot See comment! Isn't git fun, always interesting approaches to optimise things that seemed unfixably slow.	2012-01-07 00:15:01 -04:00
Joey Hess	945f56f348	cleanup Broke out pure general functions etc.	2012-01-07 00:11:15 -04:00
Joey Hess	24b35113cf	tweak	2012-01-06 23:43:18 -04:00
Joey Hess	64f9d00bed	tweak	2012-01-06 21:51:39 -04:00
Joey Hess	2557bb8764	complete set of log options	2012-01-06 21:48:30 -04:00
Joey Hess	8e7de01047	log --before=date	2012-01-06 21:32:08 -04:00
Joey Hess	539f8c6f14	--boundry was not needed	2012-01-06 21:09:23 -04:00
Joey Hess	d8d72781af	better data type	2012-01-06 18:58:35 -04:00
Joey Hess	3c88d57399	log --max-count=n	2012-01-06 17:48:02 -04:00
Joey Hess	078788a9e7	change log display Including the file in the lines behaves better when limiting with --after, since only files that changed in the time period are shown. Still not fully happy with the line layout, but putting the +/- first followed by the date seems a good change.	2012-01-06 17:36:13 -04:00
Joey Hess	9fb5f3edc7	log --after=date	2012-01-06 17:24:03 -04:00
Joey Hess	47646d44b7	use a zipper	2012-01-06 16:24:40 -04:00
Joey Hess	a3a9f87047	log: New command that displays the location log for file, showing each repository they were added to and removed from. This needs to run git log on the location log files to get at all past versions of the file, which tends to be a bit slow. It would be possible to make a version optimised for showing the location logs for every key. That would only need to run git log once, so would be faster, but it would need to process an enormous amount of data, so would not speed up the individual file case. In the future it would be nice to support log --format. log --json also doesn't work right yet.	2012-01-06 15:40:07 -04:00
Joey Hess	1f8a1058c9	tweak	2012-01-06 10:57:57 -04:00
Joey Hess	df21cbfdd2	look up --to and --from remote names only once This will speed up commands like move and drop.	2012-01-06 04:06:13 -04:00
Joey Hess	0a36f92a31	more command-specific options Made --from and --to command-specific options. Added generic storage for values of command-specific options, which allows removing some of the special case fields in AnnexState. (Also added generic storage for command-specific flags, although there are not yet any.) Note that this storage uses a Map, so repeatedly looking up the same value is slightly more expensive than looking up an AnnexState field. But, the value can be looked up once in the seek stage, transformed as necessary, and passed in a closure to the start stage, and this avoids that overhead. Still, I'm hesitant to use this for things like force or fast flags. It's probably best to reserve it for flags that are only used by a few commands, or options like --from and --to that it's important only be allowed to be used with commands that implement them, to avoid user confusion.	2012-01-06 03:16:42 -04:00
Joey Hess	ad43f03626	per-command options Finally commands can define their own options. Moved --format and --print0 to be options only of find.	2012-01-05 23:11:07 -04:00
Joey Hess	a1aea174d7	fsck: Do backend-specific check before checking numcopies is satisfied. This way, when a checksum check fails and the content is moved aside, the numcopies check also warns if there are not enough copies.	2012-01-03 18:40:47 -04:00
Joey Hess	aa0882691b	Added remote.name.annex-web-options configuration setting, which can be used to provide parameters to whichever of wget or curl git-annex uses (depends on which is available, but most of their important options suitable for use here are the same).	2012-01-02 14:20:20 -04:00
Joey Hess	508b427c7b	tweak	2012-01-02 11:57:02 -04:00
Joey Hess	f0957426c5	skip local remotes that are not available (ie, not mounted) With --fast, unavailable local remotes are filtered out of the fast set. This way, if there are local remotes, --fast always acts only on them, and if none are mounted, acts on nothing. This consistency is better than --fast acting on different remotes depending on what's mounted.	2011-12-31 04:50:39 -04:00
Joey Hess	4a02c2ea62	type alias cleanup	2011-12-31 04:11:58 -04:00
Joey Hess	a2ec2d3760	refactor and check for a detached HEAD	2011-12-31 03:38:58 -04:00
Joey Hess	8a33573caf	better filtering out of special remotes	2011-12-31 03:27:37 -04:00
Joey Hess	6cd4c7efcd	never pick special remotes in --fast even if they have the lowest cost, we cannot use them	2011-12-31 03:14:05 -04:00
Joey Hess	c61642ef0c	remove unnecessary check mergeLocal always creates the local sync branch, so no need to check that it exists later.	2011-12-31 03:08:44 -04:00
Joey Hess	aa64b8ceaf	refactor	2011-12-31 03:01:18 -04:00
Joey Hess	2998340abb	really fix check that remote needs merged	2011-12-31 02:45:12 -04:00
Joey Hess	9a7a77488e	tweak	2011-12-31 02:18:16 -04:00
Joey Hess	0396f9c795	tweak	2011-12-31 02:15:13 -04:00
Joey Hess	f2b584ad74	fix check that remote branch needs merged	2011-12-31 02:03:39 -04:00
Joey Hess	79231bcff0	minor cleanups mergeFrom is never called on branches that don't exist anymore	2011-12-31 01:51:39 -04:00
Joey Hess	015a497914	avoid syncing remotes configured annex-ignore, unless explicitly specified	2011-12-31 01:42:42 -04:00
Joey Hess	e7d3e546c2	sync --fast: Selects some of the remotes with the lowest annex.cost and syncs those, in addition to any specified at the command line.	2011-12-30 21:17:36 -04:00
Joey Hess	a31b7d93c8	push when git-annex branch changed I was too heavy-handed in optimising away pushes	2011-12-30 19:38:46 -04:00
Joey Hess	79872e360e	automated syncing Some changes to make automated syncing nicer. Merge from both the remote's $branch and its synced/$branch; either could have new changes. Create synced/$branch on the remote when pushing.	2011-12-30 19:24:57 -04:00
Joey Hess	f6f7ee7131	automatically create the syncbranch	2011-12-30 18:52:24 -04:00
Joey Hess	14d16b77b3	refactor	2011-12-30 18:37:55 -04:00
Joey Hess	52104dae6f	refactor	2011-12-30 18:36:40 -04:00
Joey Hess	56488e807b	check that synced/master exists before trying to use it and a nice error message if syncing is not set up yet	2011-12-30 18:19:45 -04:00
Joey Hess	f2fa29bf3b	check if branches are up-to-date before merging, pushing This optimises away the need to run anything in some common cases. It's particularly useful on push; no need to push if the tracking branch we just pulled is the same as the branch we're going to push.	2011-12-30 18:04:01 -04:00
Joey Hess	9d85baa314	improve wording	2011-12-30 17:54:09 -04:00
Joey Hess	4400f65967	message cleanup	2011-12-30 17:38:38 -04:00
Joey Hess	556618a3ec	avoid using Git.Ref.describe except for when generating user messages The other uses of it can all be simplified using Git.Ref.base, Git.Ref.under, and show. In some cases, describe was being used to shorten the branch name unnecessarily, and I instead pass the fully qualified name to git.	2011-12-30 17:01:03 -04:00
Joey Hess	5d17da5eb3	update to my indentation style	2011-12-30 16:24:30 -04:00
Joey Hess	5728bb58e0	force git-annex branch update after fetching remotes git-annex normally only runs the branch update once per run, for speed, but since this fetches new remote git-annex tracking branches, they need to be merged in after that fetch. An earlier call to Remote.byName was causing the update to run before the fetch sometimes, but it could have been anything. Just force the update to happen in the right place.	2011-12-30 16:03:41 -04:00
Joachim Breitner	b6e7b40be4	By default, sync with all remotes having the synced/ branch	2011-12-29 20:50:57 +01:00
Joachim Breitner	0ee1141f30	Implement branch-syncing in Command.Sync as described in the previous commit to the documentation. The loggin UI is not great yet.	2011-12-29 18:37:30 +01:00
Joey Hess	b05c08b5c1	reorder less expensive terminal first Out of general principles, it did not seem to actually speed it up appreciably. (I suspect ghc is being smart.)	2011-12-23 13:19:28 -04:00
Joey Hess	fdf02986cf	find --json	2011-12-23 01:08:19 -04:00
Joey Hess	06bafae9e0	Format strings can be specified using the new --find option, to control what is output by git annex find.	2011-12-22 18:31:44 -04:00

1 2 3 4 5 ...

512 commits