git-annex

Author	SHA1	Message	Date
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	5f391179f1	use RawFilePath getFileStatus for speed Only done on those calls to getFileStatus that had a RawFilePath, not a FilePath. The others would probably be just as fast if converted to use it with toRawFilePath, but I'm not 100% sure. Note that genInodeCache' uses fromRawFilePath, but that value only gets used on Windows, so on unix the thunk will never be evaluated.	2019-12-06 14:44:42 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	99536e3a0b	remove one more warningIO Had to generalize Git.Queue so it can run an Annex action, yipes. Only remaining warningIO are in the legacy chunk code.	2019-11-12 10:45:52 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	1e95bc4fd1	avoid git warning about CRLF in restagePointerFile Saw it on Windows, could probably also happen on linux with some configuration. Since this is a pointer file, the warning does not apply.	2019-02-18 18:35:36 -04:00
Joey Hess	1a367cad83	Fix path separator bug on Windows that completely broke git-annex since version 7.20190122.	2019-02-18 17:16:39 -04:00
Joey Hess	5d98cba923	use ByteStrings when reading annex symlinks and pointers Now there's a ByteString used all the way from disk to Key. The main complication in this conversion was the use of fromInternalGitPath in several places to munge things on Windows. The things that used that were changed to parse the ByteString using either path separator. Also some code that had read from files to a String lazily was changed to read a minimal strict ByteString.	2019-01-14 15:37:08 -04:00
Joey Hess	53905490df	convert Git.HashObject to use ByteStrings Both lazy and strict, because sometimes it's more efficient to build a small strict bytestring, and other times better to lazily stream.	2019-01-03 13:21:01 -04:00
Joey Hess	7d51b0c109	import Utility.FileSystemEncoding in Common	2019-01-03 11:37:02 -04:00
Joey Hess	b3c69eaaf8	strict bytestring encoders and decoders Only had lazy ones before. Already sped up a few parts of the code.	2019-01-01 14:55:15 -04:00
Joey Hess	54d49eeac8	avoid update-index race This commit was supported by the NSF-funded DataLad project.	2018-08-17 16:03:40 -04:00
Joey Hess	0f25d48639	pass absolute path to update-index Test suite found a case where this is necessary. And the man page says this, although current behavior is not as documented.. Note that files beginning with . are discarded. This includes ./file and dir/./file. If you don’t want this, then use cleaner names. This may hit path length limits on Windows. shrug This commit was supported by the NSF-funded DataLad project.	2018-08-16 16:00:29 -04:00
Joey Hess	82a239675f	narrow the race where a file gets modified before update-index Check just before running update-index if the worktree file's content is still the same, don't update it when it's been modified. This narrows the race window a lot, from possibly minutes or hours, to seconds or less. (Use replaceFile so that the worktree update happens atomically, allowing the InodeCache of the new worktree file to itself be gathered w/o any other race.) This doesn't eliminate the race; it can still occur in the window before update-index runs. When annex.queue is large, a lot of files will be statted by the checks, and so the window may still be large enough to be a problem. When only a few files are being processed, the window is as small as it is in the race where a modification gets overwritten by git-annex when it updates the worktree. Or maybe as small as whatever race git checkout/pull/merge may have when the worktree gets modified during it. Still, I've kept a todo about this race. This commit was supported by the NSF-funded DataLad project.	2018-08-16 15:56:43 -04:00
Joey Hess	82cfcfc838	better index file refresh method Use git update-index --refresh, since it's a little bit more efficient and the user can be told to run it if a locked index prevents git-annex from running it. This also fixes the problem where an annexed file was deleted in the index and a get of another file that uses the same key caused the index update to add back the deleted file. update-index will not add back the deleted file. Documented in tips/unlocked_files.mdwn the gotcha that the index update may conflict with other operations. I can't see any way to possibly avoid that conflict. One new todo about a race that causes a modification to be accidentially staged. Note that the assistant only flushes the git command queue when it commits a modification. I have not tested the assistant with v6 unlocked files, but assume most users of the assistant won't care if the index shows a file as modified for a while. This commit was supported by the NSF-funded DataLad project.	2018-08-16 14:16:24 -04:00
Joey Hess	0b7f6d24d3	rename BlobType and add submodule to it This was badly named, it's a not a blob necessarily, but anything that a tree can refer to. Also removed the Show instance which was used for serialization to git format, instead use fmtTreeItemType. This commit was supported by the NSF-funded DataLad project.	2018-05-14 14:45:41 -04:00
Joey Hess	fc845e6530	more lambda-case conversion	2017-12-05 15:00:50 -04:00
Joey Hess	8484c0c197	Always use filesystem encoding for all file and handle reads and writes. This is a big scary change. I have convinced myself it should be safe. I hope!	2016-12-24 14:46:31 -04:00
Joey Hess	34530e59d9	Avoid using a lot of memory when large objects are present in the git repository .. and have to be checked to see if they are a pointed to an annexed file. Cases where such memory use could occur included, but were not limited to: - git commit -a of a large unlocked file (in v5 mode) - git-annex adjust when a large file was checked into git directly Generally, any use of catKey was a potential problem. Fix by using git cat-file --batch-check to check size before catting. This adds another git batch process, which is included in the CatFileHandle for simplicity. There could be performance impact, anywhere catKey is used. Particularly likely to affect adjusted branch generation speed, and operations on unlocked files in v6 mode. Hopefully since the --batch-check and --batch read the same data, disk buffering will avoid most overhead. Leaving only the overhead of talking to the process over the pipe and whatever computation --batch-check needs to do. This commit was sponsored by Bruno BEAUFILS on Patreon.	2016-10-05 15:24:13 -04:00
Joey Hess	b7c8bf5274	Preserve execute bits of unlocked files in v6 mode. When annex.thin is set, adding an object will add the execute bits to the work tree file, and this does mean that the annex object file ends up executable. This doesn't add any complexity that wasn't already present, because git annex add of an executable file has always ingested it so that the annex object ends up executable. But, since an annex object file can be executable or not, when populating an unlocked file from one, the executable bit is always added or removed to match the mode of the pointer file.	2016-04-14 14:47:08 -04:00
Joey Hess	2046502407	v6: Close pointer file handles more quickly, to avoid problems on Windows. Was using L.readFile, so the Handle would remain open until the garbage collector got around to it. Changed to explicit open and close, so we know it's always closed when the function returns.	2016-04-04 15:42:33 -04:00
Joey Hess	88a4a6f396	Sped up git-annex add in direct mode and v6 by using git hash-object --batch. Speeds up hashSymlink and hashPointerFile.	2016-03-14 15:58:46 -04:00
Joey Hess	1df49506c4	Correct git-annex info to include unlocked files in v6 repository. An unlocked present file does not have a pointer file in the worktree, so info skipped counting it. It may be that unused was also affected by the problem, but it seemed not to be in my tests. I think because of the use of the associatedFilesFilter. This fix slows down both info and unused a little bit, since they have to query the contents of files from git, but only when handling unlocked files.	2016-03-14 13:14:01 -04:00
Joey Hess	b0081598c7	Fix memory leak in last release, which affected commands like git-annex status when a large non-annexed file is present in the work tree. The whole file was strictly read, and so buffered in memory, and remained buffered for some time when running git-annex status.	2016-02-19 14:45:26 -04:00
Joey Hess	adc27f081a	escape slashes in annex pointer files The problem with having the slashes unescaped is, it broke parsing, since the parser takes the filename to get the part containing the key. That particularly affected URL keys. This makes the format be the same as symlinks point to, which keeps things simple. Existing pointer files will continue to work ok.	2016-02-16 14:10:08 -04:00
Joey Hess	7899f7248a	force strict file read Avoid possibly having the file open still when it gets deleted. Needed on Windows, particularly.	2016-02-15 16:47:34 -04:00
Joey Hess	4d89a1ffd1	allow \r in pointer files git-annex doesn't write \r, but it can be present due to line ending conversions or perhaps user edits.	2016-02-15 16:37:40 -04:00
Joey Hess	f9d79d194b	Windows: Fix v6 unlocked files to actually work. Pointer files were not being treated as annex content, so "git annex get" didn't replace them with the object.	2016-02-15 16:12:18 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	a2c056df65	convert isPointerFile from Annex to IO	2016-01-01 13:22:38 -04:00
Joey Hess	06a8256bf6	always format pointer file with a trailing newline Before the smudge filter added a trailing newline, but other things that wrote formatPointer to a file did not. also some new pointer staging code to use later	2015-12-10 16:06:58 -04:00
Joey Hess	78a6b8ce05	refactor and improve pointer file handling code	2015-12-09 14:27:43 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	6ecd3ff421	diffdriver: New git-annex command, to make git external diff drivers work with annexed files. Closes https://github.com/datalad/datalad/issues/18	2014-11-24 16:14:06 -04:00
Joey Hess	ba42b67c70	Fix bug in automatic merge conflict resolution When one side is an annexed symlink, and the other side is a non-annexed symlink. In this case, git-merge does not replace the annexed symlink in the work tree with the non-annexed symlink, which is different from it's handling of conflicts between annexed symlinks and regular files or directories. So, while git-annex generated the correct merge commit, the work tree didn't get updated to reflect it. See comments on bug for additional analysis. Did not add this to the test suite yet; just unloaded a truckload of firewood and am feeling lazy. This commit was sponsored by Adam Spiers.	2014-07-08 13:55:11 -04:00
Joey Hess	67fd06af76	add git annex view command (And a vpop command, which is still a bit buggy.) Still need to do vadd and vrm, though this also adds their documentation. Currently not very happy with the view log data serialization. I had to lose the TDFA regexps temporarily, so I can have Read/Show instances of View. I expect the view log format will change in some incompatable way later, probably adding last known refs for the parent branch to View or something like that. Anyway, it basically works, although it's a bit slow looking up the metadata. The actual git branch construction is about as fast as it can be using the current git plumbing. This commit was sponsored by Peter Hogg.	2014-02-18 18:22:20 -04:00
Joey Hess	1572c460e8	avoid using openFile when withFile can be used Potentially fixes some FD leak if an action on an opened file handle fails for some reason. There have been some hard to reproduce reports of git-annex leaking FDs, and this may solve them.	2014-02-03 10:19:06 -04:00
Joey Hess	b405295aee	hlint test suite still passes	2013-09-25 03:09:06 -04:00
Joey Hess	7b0970b340	Fix inverted logic in last release's fix for data loss bug, that caused git-annex sync on FAT or other crippled filesystems to add symlink standin files to the annex.	2013-07-30 16:08:09 -04:00
Joey Hess	ecdfa40cbe	avoid false positives when detecting core.symlinks=false symlink standin files If the file is > 8192 bytes, it's certianly not a symlink file. And if it contains nuls or newlines or whitespace, it's certianly not a link to annexed content. But it might be a tarball containing a git-annex repo.	2013-07-20 19:28:02 -04:00
Joey Hess	ae341c1a37	avoid reading files that are not symlinks when core.symlinks=false This hack is only needed on FAT filesystems, so there's no point in doing it the rest of the time. And it's possible for there to be a false positive, so it's best to avoid the hack when possible.	2013-07-20 19:14:29 -04:00
Joey Hess	d80a0f62a4	avoid lazy read of file contents On Windows, that means the file could still be open when later code wants to delete it, which fails. Since we're only reading 8k anyway, just read it, strictly. However, avoid reading the whole file strictly, so no getContentsStrict here.	2013-06-17 21:12:09 -04:00
Joey Hess	b7674b464b	typo in comment	2013-06-17 20:45:04 -04:00
Joey Hess	25cb9a48da	fix the day's Windows permissions damage	2013-05-14 20:15:14 -04:00
Joey Hess	8a2ff023a3	convert from internal git path when checking symlink standin file	2013-05-14 15:08:40 -05:00
Joey Hess	e7936b1a34	always try to read symlink; only fall back to looking inside file On Windows with Cygwin, checking out a git-annex repo will create symlinks on disk, so we need to always try to read the symlink, even when core.symlinks says they're not supported.	2013-05-14 14:18:47 -04:00
Joey Hess	03e8594369	fix the day's windows permissions damage	2013-05-12 19:09:48 -04:00
Joey Hess	73d2f8b280	deal with git using / internally, even on DOS	2013-05-12 17:29:49 -05:00

1 2

57 commits