git-annex

Author	SHA1	Message	Date
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	5d98cba923	use ByteStrings when reading annex symlinks and pointers Now there's a ByteString used all the way from disk to Key. The main complication in this conversion was the use of fromInternalGitPath in several places to munge things on Windows. The things that used that were changed to parse the ByteString using either path separator. Also some code that had read from files to a String lazily was changed to read a minimal strict ByteString.	2019-01-14 15:37:08 -04:00
Joey Hess	7d51b0c109	import Utility.FileSystemEncoding in Common	2019-01-03 11:37:02 -04:00
Joey Hess	b3c69eaaf8	strict bytestring encoders and decoders Only had lazy ones before. Already sped up a few parts of the code.	2019-01-01 14:55:15 -04:00
Joey Hess	4a788fbb3b	sync --content now supports --hide-missing adjusted branches This relies on git ls-files --with-tree, which I'm using in a way that its man page does not document. Hm. I emailed the git list to try to get the docs improved, but at least the git test suite does test the same kind of use case I'm using here. Performance impact when not in an adjusted branch is limited to some additional MVar accesses, and a single git call to determine the name of the current branch. So very minimal. When in an adjusted branch, the performance impact is in Annex.WorkTree.lookupFile, which starts doing an equal amount of work for files that didn't exist as it already did for files that were unlocked. This commit was sponsored by Jochen Bartl on Patreon.	2018-10-19 17:51:25 -04:00
Joey Hess	10138056dc	v6: avoid accidental conversion when annex.largefiles is not configured v6: When annex.largefiles is not configured for a file, running git add or git commit, or otherwise using git to stage a file will add it to the annex if the file was in the annex before, and to git otherwise. This is to avoid accidental conversion. Note that git-annex add's behavior has not changed, for reasons explained in the added comment. Performance: No added overhead when annex.largefiles is configured. When not configured, there is an added call to catObjectMetaData, which involves a round trip through git cat-file --batch. However, the earlier catKeyFile primes the cache for it. This commit was supported by the NSF-funded DataLad project.	2018-08-27 14:51:10 -04:00
Joey Hess	8484c0c197	Always use filesystem encoding for all file and handle reads and writes. This is a big scary change. I have convinced myself it should be safe. I hope!	2016-12-24 14:46:31 -04:00
Joey Hess	34530e59d9	Avoid using a lot of memory when large objects are present in the git repository .. and have to be checked to see if they are a pointed to an annexed file. Cases where such memory use could occur included, but were not limited to: - git commit -a of a large unlocked file (in v5 mode) - git-annex adjust when a large file was checked into git directly Generally, any use of catKey was a potential problem. Fix by using git cat-file --batch-check to check size before catting. This adds another git batch process, which is included in the CatFileHandle for simplicity. There could be performance impact, anywhere catKey is used. Particularly likely to affect adjusted branch generation speed, and operations on unlocked files in v6 mode. Hopefully since the --batch-check and --batch read the same data, disk buffering will avoid most overhead. Leaving only the overhead of talking to the process over the pipe and whatever computation --batch-check needs to do. This commit was sponsored by Bruno BEAUFILS on Patreon.	2016-10-05 15:24:13 -04:00
Joey Hess	1cd02762bf	Optimisations to git-annex branch query and setting, avoiding repeated copies of the environment. Speeds up commands like "git-annex find --in remote" by over 50%. Profiling showed that adjustGitEnv was 21% of the time and 37% of the allocations of that command. It copied the environment each time with getEnvironment. The only repeated use of adjustGitEnv is in withIndexFile, which tends to be run at least once per file. So, it was optimised by keeping a cache of the environment, which can be reused. There could be other better ways to optimise this. Maybe get the while environment once at startup. But, then it would have to be serialized back out each time running a child process, so I doubt that would be a net win. It might be better to cache a version of the environment that is pre-modified to use .git-annex/index. But, profiling doesn't show that modifying the enviroment is taking any significant time.	2016-09-29 13:36:48 -04:00
Joey Hess	e91037a38b	use indexEnv	2016-05-17 13:38:04 -04:00
Joey Hess	04d4830ac3	add catCommit	2016-02-25 15:34:46 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	f324ad24c1	improve comment	2015-12-26 13:47:36 -04:00
Joey Hess	78a6b8ce05	refactor and improve pointer file handling code	2015-12-09 14:27:43 -04:00
Joey Hess	712c9fc590	require "annex/objects/" before key in pointer files This removes ambiguity, because while someone might have "WORM--foo" in a file that's not intended to be a git-annex pointer file, "annex/objects/WORM--foo" is less likely. Also, `664cc987e8` had a caveat about symlink targets being parsed as pointer files, and now the same parser is used for both. I did not include any hash directories before the key in the pointer file, as they're not needed. However, if they were included, the parser would still work ok.	2015-12-07 15:45:08 -04:00
Joey Hess	664cc987e8	support pointer files Backend.lookupFile is changed to always fall back to catKey when operating on a file that's not a symlink. catKey is changed to understand pointer files, as well as annex symlinks. Before, catKey needed a file mode witness, to be sure it was looking at a symlink. That was complicated stuff. Now, it doesn't actually care if a file in git is a symlink or not; in either case asking git for the content of the file will get the pointer to the key. This does mean that git-annex will treat a link foo -> WORM--bar as a git-annex file, and also treats a regular file containing annex/objects/WORM--bar as a git-annex file. Calling catKey could make git-annex commands need to do more work than before. This would especially be the case if a repo contained many regular files, and only a few annexed files, as now git-annex will need to ask git about the contents of the regular files.	2015-12-07 15:35:36 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	7b50b3c057	fix some mixed space+tab indentation This fixes all instances of " \t" in the code base. Most common case seems to be after a "where" line; probably vim copied the two space layout of that line. Done as a background task while listening to episode 2 of the Type Theory podcast.	2014-10-09 15:09:11 -04:00
Joey Hess	96dc423e39	When accessing a local remote, shut down git-cat-file processes afterwards, to ensure that remotes on removable media can be unmounted. Closes: #758630 This does mean that eg, copying multiple files to a local remote will become slightly slower, since it now restarts git-cat-file after each copy. Should not be significant slowdown. The reason git-cat-file is run on the remote at all is to update its location log. In order to add an item to it, it needs to get the current content of the log. Finding a way to avoid needing to do that would be a good path to avoiding this slowdown if it does become a problem somehow. This commit was sponsored by Evan Deaubl.	2014-08-20 12:07:57 -04:00
Joey Hess	ba42b67c70	Fix bug in automatic merge conflict resolution When one side is an annexed symlink, and the other side is a non-annexed symlink. In this case, git-merge does not replace the annexed symlink in the work tree with the non-annexed symlink, which is different from it's handling of conflicts between annexed symlinks and regular files or directories. So, while git-annex generated the correct merge commit, the work tree didn't get updated to reflect it. See comments on bug for additional analysis. Did not add this to the test suite yet; just unloaded a truckload of firewood and am feeling lazy. This commit was sponsored by Adam Spiers.	2014-07-08 13:55:11 -04:00
Joey Hess	1052eeface	Windows: Fix some filename encoding bugs. http://git-annex.branchable.com/bugs/Unicode_file_names_ignored_on_Windows/ Not a complete fix yet.	2014-03-19 15:57:56 -04:00
Joey Hess	1192d98721	sync: Fix bug in direct mode that caused a file not checked into git to be deleted when merging with a remote that added a file by the same name. (Thanks, jkt)	2014-03-03 14:57:16 -04:00
Joey Hess	8d5158fa31	Preserve metadata when staging a new version of an annexed file. Performance impact: When adding a large tree of new files, this needs to do some git cat-file queries to check if any of the files already existed and might need a metadata copy. I tried a benchmark in a copy of my sound repository (so there was already a significant git tree to check against. Adding 10000 small files, with a cold cache: before: 1m48.539s after: 1m52.791s So, impact is 0.0004 seconds per file added. Which seems acceptable, so did not add some kind of configuration to enable/disable this. This commit was sponsored by Lisa Feilen.	2014-02-24 14:41:33 -04:00
Joey Hess	7d5b25515c	Add plumbing-level lookupkey command.	2013-12-15 14:02:23 -04:00
Joey Hess	59ecc804cd	add new status command This works for both direct and indirect mode. It may need some performance tuning. Note that unlike git status, it only shows the status of the work tree, not the status of the index. So only one status letter, not two .. and since files that have been added and not yet committed do not differ between the work tree and the index, they are not shown. Might want to add display of the index vs the last commit eventually. This commit was sponsored by an unknown bitcoin contributor, whose contribution as been going up lately! ;)	2013-11-07 14:07:25 -04:00
Joey Hess	4f871f89ba	git-recover-repository 1/2 done	2013-10-20 17:50:51 -04:00
Joey Hess	3588729f0d	completely solve catKey memory leak Since `006cf7976f` was incomplete, not being able to get the right mode of the file when the index differs from HEAD, this is a final workaround. Only buffering the start of the file in this case avoids leaking memory. This does not prevent git-cat-file being asked to output the whole file, which needs to be consumed, and can be slow. But this only happens in a rare edge case.	2013-09-19 20:09:03 -04:00
Joey Hess	006cf7976f	more completely solve catKey memory leak Done using a mode witness, which ensures it's fixed everywhere. Fixing catFileKey was a bear, because git cat-file does not provide a nice way to query for the mode of a file and there is no other efficient way to do it. Oh, for libgit2.. Note that I am looking at tree objects from HEAD, rather than the index. Because I cat-file cannot show a tree object for the index. So this fix is technically incomplete. The only cases where it matters are: 1. A new large file has been directly staged in git, but not committed. 2. A file that was committed to HEAD as a symlink has been staged directly in the index. This could be fixed a lot better using libgit2.	2013-09-19 16:41:21 -04:00
Joey Hess	412dcb8017	Fix bug that caused typechanged symlinks to be assumed to be unlocked files, so they were added to the annex by the pre-commit hook.	2013-08-22 13:57:07 -04:00
Joey Hess	729eab1f89	assistant: Work around git-cat-file's not reloading the index after files are staged. Argh.	2013-05-25 00:37:41 -04:00
Joey Hess	aba49995b6	Merge branch 'master' into windows	2013-05-15 19:18:04 -04:00
Joey Hess	c62b54d80d	start one git-cat-file per index file This reverts `1c83b6c439` and properly fixes the issue discussed there. This makes git-annex behave much nicer in direct mode.	2013-05-15 18:46:38 -04:00
Joey Hess	03e8594369	fix the day's windows permissions damage	2013-05-12 19:09:48 -04:00
Joey Hess	73d2f8b280	deal with git using / internally, even on DOS	2013-05-12 17:29:49 -05:00
Joey Hess	1c83b6c439	work around a very strange git-cat-file behavior Sometimes it seems that git-cat-file --batch stops getting info for files in the current repo, when ":file" is fed to it. I have not reproduced this at the command line, but only when using git annex whereis and git annex move inside a direct mode repo. Those failed, because cat-file returned "file missing". OTOH, git annex find works fine, despite passing the same file to cat-file. It seems that the failing commands first asked cat-file to show a file on the git-annex branch. Perhaps it got "stuck" on that branch? But I cannot repoduce it running cat-file by hand. Most strange. HEAD is a workaround for this extreme weirdness, since I spent a good 2 hours struggling with it already.	2013-01-05 17:06:24 -04:00
Joey Hess	1cdf2b923d	assistant: Make expensive transfer scan work fully in direct mode. The expensive scan uses lookupFile, but in direct mode, that doesn't work for files that are present. So the scan was not finding things that are present that need to be uploaded. (It did find things not present that needed to be downloaded.) Now lookupFile also works in direct mode. Note that it still prefers symlinks on disk to info committed to git, in direct mode. This is necessary to make things like Assistant.Threads.Watcher.onAddSymlink work correctly, when given a new symlink not yet checked into git (or replacing a file checked into git).	2013-01-05 15:57:53 -04:00
Joey Hess	eb40227d15	assistant direct mode file add/change bookkeeping When a file is changed in direct mode, the old content is probably lost (at least from the local repo), and bookeeping needs to be updated to reflect this. Also, synthetic add events are generated at assistant startup, so make it detect when the file has not really changed, and avoid re-adding it. This does add the overhead of querying the runing git cat-file for the key that's recorded in git for the file, each time a file is added or modified in direct mode.	2012-12-25 15:48:15 -04:00
Joey Hess	b080a58b76	Merge branch 'master' into desymlink Conflicts: Annex/CatFile.hs Annex/Content.hs Git/LsFiles.hs Git/LsTree.hs	2012-12-13 00:29:06 -04:00
Joey Hess	f87a781aa6	finished where indentation changes	2012-12-13 00:24:19 -04:00
Joey Hess	e7b8cb0063	direct mode committing	2012-12-12 19:20:38 -04:00
Joey Hess	ca45cea113	Revert "add catFileIndex" This interface is not a good idea, because a running git cat-file --batch does not notice when existing files in the index are changed.	2012-09-15 18:30:53 -04:00
Joey Hess	e1baf48d88	add catFileIndex	2012-09-15 17:06:10 -04:00
Joey Hess	75b6ee81f9	avoid ByteString.Char8 where not needed Its truncation behavior is a red flag, so avoid using it in these places where only raw ByteStrings are used, without looking at the data inside.	2012-06-20 13:13:40 -04:00
Joey Hess	ca9ee21bd7	crazy optimisation Crazy like a fox..	2012-06-10 19:58:34 -04:00
Joey Hess	c4c965d602	detect and recover from branch push/commit race Dealing with a race without using locking is exceedingly difficult and tricky. Fully tested, I hope. There are three places left where the branch can be updated, that are not covered by the race recovery code. Let's prove they're all immune to the race: 1. tryFastForwardTo checks to see if a fast-forward can be done, and then does git-update-ref on the branch to fast-forward it. If a push comes in before the check, then either no fast-forward will be done (ok), or the push set the branch to a ref that can still be fast-forwarded (also ok) If a push comes in after the check, the git-update-ref will undo the ref change made by the push. It's as if the push did not come in, and the next git-push will see this, and try to re-do it. (acceptable) 2. When creating the branch for the very first time, an empty index is created, and a commit of it made to the branch. The commit's ref is recorded as the current state of the index. If a push came in during that, it will be noticed the next time a commit is made to the branch, since the branch will have changed. (ok) 3. Creating the branch from an existing remote branch involves making the branch, and then getting its ref, and recording that the index reflects that ref. If a push creates the branch first, git-branch will fail (ok). If the branch is created and a racing push is then able to change it (highly unlikely!) we're still ok, because it first records the ref into the index.lck, and then updating the index. The race can cause the index.lck to have the old branch ref, while the index has the newly pushed branch merged into it, but that only results in an unnecessary update of the index file later on.	2011-12-11 20:41:35 -04:00
Joey Hess	9290095fc2	improve type signatures with a Ref newtype In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for those. Note that this does not prevent mixing up of eg, refs and branches at the type level. Since git really doesn't care, except rare cases like git update-ref, or git tag -d, that seems ok for now. There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref may or may not be a tree-ish, depending on the object type, so there seems no point in trying to represent it at the type level.	2011-11-16 02:41:46 -04:00
Joey Hess	04edae6791	Optimised union merging; now only runs git cat-file once.	2011-11-12 17:45:12 -04:00
Joey Hess	637b5feb45	lint	2011-11-11 01:52:58 -04:00
Joey Hess	bf460a0a98	reorder repo parameters last Many functions took the repo as their first parameter. Changing it consistently to be the last parameter allows doing some useful things with currying, that reduce boilerplate. In particular, g <- gitRepo is almost never needed now, instead use inRepo to run an IO action in the repo, and fromRepo to get a value from the repo. This also provides more opportunities to use monadic and applicative combinators.	2011-11-08 16:27:20 -04:00

1 2

52 commits