git-annex

Author	SHA1	Message	Date
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	0be23bae2f	refactor Better to not have a single function module, and better to have a more specific type than Bool. This commit was sponsored by Jack Hill on Patreon	2019-11-11 19:10:52 -04:00
Joey Hess	3b34d123ed	Added annex.allowsign option. This commit was sponsored by Ilya Shlyakhter on Patreon.	2019-11-11 16:28:56 -04:00
Joey Hess	4306dfbe68	remove empty log files in transition forget --drop-dead: Remove several classes of git-annex log files when they become empty, further reducing the size of the git-annex branch. Noticed while testing sameas uuid removal, but it could happen other times too. An empty log file is always treated by git-annex the same as no file being present, and when the files are per-key, it can be a sizable space saving to exclude them from the tree.	2019-10-14 16:04:15 -04:00
Joey Hess	5e9a2cc37f	forget state of sameas remotes during DropDead transitions It would have been a lot less round-about to just make git annex dead also add the uuids of sameas remotes to the trust.log as dead. But, that would fail in the case where there's an unmerged other clone that has a sameas remote that the current repo does not know about. Then it would not get marked as dead. Handling it at transition time avoids that scenario. Note that the generation of trustmap' in dropDead should only happen once, due to the partial application.	2019-10-14 15:47:42 -04:00
Joey Hess	2a41712ef1	avoid stageJournal escaping withOtherTmp This is only done for correctness sake; I don't see any way that it would have caused a problem here. The jlog file escaped withOtherTmp so another process could swoop in and delete it, but the file is only used as a buffer for a list of filenames, and its handle gets rewound and they're read back out, which will still work even if it's already been deleted. The only reason I didn't just pre-delete the file and keep the handle open is I'm not sure that works on all OS's (eg Windows). If there was a problem that this fixed it might involve an OS that doesn't support deleting an open file or something like that.	2019-05-07 11:57:12 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	7af55de83c	optimisation: use graftTree to remember the export branch Sped up git-annex export in repositories with lots of keys. Old method read whole git-annex branch tree into memory.	2019-02-22 11:16:22 -04:00
Joey Hess	8fdea8f444	WIP Added graftTree but it's buggy. Should use graftTree in Annex.Branch.graftTreeish; it will be faster than the current implementation there. Started Annex.Import, but untested and it doesn't yet handle tree grafting.	2019-02-21 17:32:59 -04:00
Joey Hess	d5f2463702	misctmp cleanup * Switch to using .git/annex/othertmp for tmp files other than partial downloads, and make stale files left in that directory when git-annex is interrupted be cleaned up promptly by subsequent git-annex processes. * The .git/annex/misctmp directory is no longer used and git-annex will delete anything lingering in there after it's 1 week old. Also, in Annex.Ingest, made the filename it uses in the tmp dir be prefixed with "ingest-" to avoid potentially using a filename used by some other code.	2019-01-17 16:02:22 -04:00
Joey Hess	2eadb6cd68	convert transitions.log to attoparsec and bytestring-builder Not likely to be any speed gain here, but this completes porting every log file over. And, it let me get rid of code copied from ghc and modified, so simplifying the licensing.	2019-01-10 17:13:30 -04:00
Joey Hess	591e4b145f	convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log.	2019-01-10 16:34:20 -04:00
Joey Hess	232b1a08f3	simplification now that all logs use Builder	2019-01-09 14:10:05 -04:00
Joey Hess	bfc9039ead	convert git-annex branch access to ByteStrings and Builders Most of the individual logs are not converted yet, only presense logs have an efficient ByteString Builder implemented so far. The rest convert to and from String.	2019-01-03 13:21:48 -04:00
Joey Hess	7d51b0c109	import Utility.FileSystemEncoding in Common	2019-01-03 11:37:02 -04:00
Joey Hess	b3c69eaaf8	strict bytestring encoders and decoders Only had lazy ones before. Already sped up a few parts of the code.	2019-01-01 14:55:15 -04:00
Joey Hess	4c918437ab	Fix git-annex branch data loss that could occur after git-annex forget --drop-dead Added getStaged, to get the versions of git-annex branch files staged in its index, and use during transitions so the result of merging sibling branches is used. The catFileStop in performTransitionsLocked is absolutely necessary, without that the bug still occurred, because git cat-file was already running and was looking at the old index file. Note that getLocal still has cat-file look at the git-annex branch, not the index. It might be faster if it looked at the index, but probably only marginally so, and I've not benchmarked it to see if it's faster at all. I didn't want to change unrelated behavior as part of this bug fix. And as the need for catFileStop shows, using the index file has added complications. Anyway, it still seems fine for getLocal to look at the git-annex branch, because normally the index file is updated just before the git-annex branch is committed, and so they'll contain the same information. It's only during a transition that the two diverge. This commit was sponsored by Paul Walmsley in honor of Mark Phillips.	2018-08-06 17:36:30 -04:00
Joey Hess	ae11394efa	added annex.commitmessage Added annex.commitmessage config that can specify a commit message for the git-annex branch instead of the usual "update". This commit was supported by the NSF-funded DataLad project.	2018-08-02 14:06:06 -04:00
Joey Hess	0b7f6d24d3	rename BlobType and add submodule to it This was badly named, it's a not a blob necessarily, but anything that a tree can refer to. Also removed the Show instance which was used for serialization to git format, instead use fmtTreeItemType. This commit was supported by the NSF-funded DataLad project.	2018-05-14 14:45:41 -04:00
Joey Hess	d1961e4498	back out incorrect IO interleaving change Fix regression in last release that crashes when using --all or running git-annex in a bare repository. May have also affected git-annex unused and git-annex info. Reversed the order of the (++) in Annex.Branch.files so --all will stream lazily still when there are not a bunch of uncommitted journal files. Added a todo to maybe improve this later. This commit was sponsored by Trenton Cronholm on Patreon.	2018-05-08 13:54:42 -04:00
Joey Hess	bea0ad220a	avoid --all buffering list of all keys In Annex.Branch.branch, the (++) was killing laziness. Rewrote so it streams lazily. filterM also kills laziness, so made loggedKeys use a Unchecked type, and check if the key is dead in the seek loop. Note that loggedKeysFor still buffers, so git-annex info <remote> and git-annex unused --from remote still use more memory than necessary. Also removed some unused functions from Annex.Journal.	2018-04-26 16:00:20 -04:00
Joey Hess	09e73a3ab6	annex.merge-annex-branches Added annex.merge-annex-branches config setting which can be used to disable automatic merge of git-annex branches. I wonder if git-annex merge/sync/assistant should disable this setting? Not sure yet, so have not done so. May be that users will not set it in git config, but pass it via -c to commands that need it. Checking the config setting adds a very small overhead, but it's only checked once per command so should be insignificant. This commit was supported by the NSF-funded DataLad project.	2018-02-22 14:25:32 -04:00
Joey Hess	2e25185a9c	Remove temporary code added in 6.20160619 to prime the mergedrefs log. Repositories that are upgraded from before that version to this one will not break, but will just not see the benefit of the mergedrefs log speeding things up, until one new ref gets merged in.	2018-02-22 12:31:27 -04:00
Joey Hess	2b66492d6e	Improve startup time for commands that do not operate on remotes And for tab completion, by not unnessessarily statting paths to remotes, which used to cause eg, spin-up of removable drives. Got rid of the remotes member of Git.Repo. This was a bit painful. Remote.Git modifies the list of remotes as it reads their configs, so still need a persistent list of remotes. So, put it in as Annex.gitremotes. It's only populated by getGitRemotes, so commands like examinekey that don't care about remotes won't do so. This commit was sponsored by Jake Vosloo on Patreon.	2018-01-09 16:22:07 -04:00
Joey Hess	faf03ee4da	more core.sharedRepository perm fixes Fix more places where files in .git/annex/ were written with modes that did not take the core.sharedRepository config into account. This commit was sponsored by Jeff Goeke-Smith on Patreon.	2018-01-04 14:46:58 -04:00
Joey Hess	25703e1413	finally really add back custom-setup stanza Fourth or fifth try at this and finally found a way to make it work. Absurd amount of busy-work forced on me by change in cabal's behavior. Split up Utility modules that need posix stuff out of ones used by Setup. Various other hacks around inability for Setup to use anything that ifdefs a use of unix. Probably lost a full day of my life to this. This is how build systems make their users hate them. Just saying.	2017-12-31 16:36:39 -04:00
Joey Hess	187b3e7780	enable LambdaCase and convert around 10% of places that could use it Needs ghc 7.6.1, so minimum base version increased slightly. All builds are well above this version of ghc, and debian oldstable is as well. Code that could use lambdacase can be found by running: git grep -B 1 'case ' \| less and searching in less for "<-" This commit was sponsored by andrea rota.	2017-11-15 16:59:32 -04:00
Joey Hess	f8fd66d3f8	fix compaction of export.log It was not getting old lines removed, because the tree graft confused the updater, so it union merged from the previous git-annex branch, which still contained the old lines. Fixed by carefully using setIndexSha. This commit was supported by the NSF-funded DataLad project.	2017-09-12 18:30:36 -04:00
Joey Hess	5483ea90ec	graft exported tree into git-annex branch So it will be available later and elsewhere, even after GC. I first though to use git update-index to do this, but feeding it a line with a tree object seems to always cause it to generate a git subtree merge. So, fell back to using the Git.Tree interface to maniupulate the trees, and not involving the git-annex branch index file at all. This commit was sponsored by Andreas Karlsson.	2017-08-31 18:06:49 -04:00
Joey Hess	a1730cd6af	adeiu, MissingH Removed dependency on MissingH, instead depending on the split library. After laying groundwork for this since 2015, it was mostly straightforward. Added Utility.Tuple and Utility.Split. Eyeballed System.Path.WildMatch while implementing the same thing. Since MissingH's progress meter display was being used, I re-implemented my own. Bonus: Now progress is displayed for transfers of files of unknown size. This commit was sponsored by Shane-o on Patreon.	2017-05-16 01:03:52 -04:00
Edward Betts	0750913136	correct spelling mistakes	2017-02-12 17:30:23 -04:00
Joey Hess	339464e847	config: New command for storing configuration in the git-annex branch. Any config names can be set using this; git-annex commands will only look at specific ones that make sense and are worth the overhead of querying the branch. This might also be useful for storing whatever other config-type stuff the user might want to shove into the git-annex branch. This commit was sponsored by Jochen Bartl on Patreon.	2017-01-30 16:46:38 -04:00
Joey Hess	8484c0c197	Always use filesystem encoding for all file and handle reads and writes. This is a big scary change. I have convinced myself it should be safe. I hope!	2016-12-24 14:46:31 -04:00
Joey Hess	0a4479b8ec	Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin.	2016-11-15 21:29:54 -04:00
Joey Hess	154c939830	Speed up startup time by caching the refs that have been merged into the git-annex branch. This can speed up git-annex commands by as much as a second, depending on the number of remotes.	2016-07-17 12:24:34 -04:00
Joey Hess	823c28d2dc	nub transitionList to avoid ugly message after repeated transitions, and avoid redundant work for repeated ForgetDeadRemotes transitions	2016-05-18 12:26:38 -04:00
Joey Hess	93c03b5dd5	Work around git bug in handling of relative path to GIT_INDEX_FILE when in a subdirectory of the repository. This affected git annex view. It turns out that some other places that use GIT_INDEX_FILE were already working around the bug. I removed the workaround from Annex.Branch since the new workaround will do.	2016-05-17 13:29:51 -04:00
Joey Hess	b9e4e2ba84	new method for merging changes into adjusted branch that avoids unncessary merge conflicts Still needs work when there are actual merge conflicts.	2016-04-06 15:36:18 -04:00
Joey Hess	a2b668a8f6	reuse annex's HashObjectHandle	2016-03-14 16:29:59 -04:00
Joey Hess	2d234de781	Sped up git-annex merge by using git hash-object --batch. This does mean that it has to write out temp files containing updated objects for the merge. So may use more disk space, and disk IO, but that should generally win out over needing to launch N separate git hash-object processes.	2016-03-14 16:23:22 -04:00
Joey Hess	00d9da3534	use hash-object --batch Handle was plumbed through, but not used.	2016-03-14 16:12:55 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	f9adb905fc	Avoid unncessary write to the location log when a file is unlocked and then added back with unchanged content. Implemented with no additional overhead of compares etc. This is safe to do for presence logs because of their locality of change; a given repo's presence logs are only ever changed in that repo, or in a repo that has just been actively changing the content of that repo. So, we don't need to worry about a split-brain situation where there'd be disagreement about the location of a key in a repo. And so, it's ok to not update the timestamp when that's the only change that would be made due to logging presence info.	2015-10-12 14:46:47 -04:00
Joey Hess	24800b1bf1	Only look at reflogs for relevant branches, not for git-annex branches This speeds it up quite a bit.. May still be too slow in large repos.	2015-07-07 17:36:30 -04:00
Joey Hess	b11d2f5a8a	unused: --used-refspec can now be configured to look at refs in the reflog. This provides a way to not consider old versions of files to be unused after they have reached a specified age, when the old refs in the reflog expire. May be slow.	2015-07-07 17:13:50 -04:00
Joey Hess	f7dc20595e	refactor ls-tree params All in one place to avoid bugs like `174da80ddc`	2015-07-06 14:21:43 -04:00
Joey Hess	174da80ddc	bugfix: Pass --full-tree when using git ls-files to get a list of files on the git-annex branch, so it works when run in a subdirectory. This bug affected git-annex unused, and potentially also transitions running code and other things.	2015-07-06 14:09:54 -04:00
Joey Hess	eb33569f9d	remove Params constructor from Utility.SafeCommand This removes a bit of complexity, and should make things faster (avoids tokenizing Params string), and probably involve less garbage collection. In a few places, it was useful to use Params to avoid needing a list, but that is easily avoided. Problems noticed while doing this conversion: * Some uses of Params "oneword" which was entirely unnecessary overhead. * A few places that built up a list of parameters with ++ and then used Params to split it! Test suite passes.	2015-06-01 13:52:23 -04:00

1 2 3 4

173 commits