git-annex

Author	SHA1	Message	Date
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	ee251b2e2e	implement updating the ContentIdentifier db with info from the git-annex branch untested This won't be super slow, but it does need to diff two likely large trees, and since the git-annex branch rarely sits still, it will most likely be run at the beginning of every import. A possible speed improvement would be to only run this when the database did not contain a ContentIdentifier. But that would only speed up imports when there is no new version of a file on the special remote, at most renames of existing files being imported. A better speed improvement would be to record something in the git-annex branch that indicates when an import has been run, and only do the diff if the git-annex branch has record of a newer import than we've seen before. Then, it would only run when there is in fact new ContentIdentifier information available from a remote. Certianly doable, but didn't want to complicate things yet.	2019-03-06 18:04:30 -04:00
Joey Hess	1b3d04979e	speed up slow quickcheck test Only test parsing of ContentIdentifier lists, not the whole log.	2019-03-06 16:43:41 -04:00
Joey Hess	6ef38df881	fix another parser bug	2019-03-06 14:51:31 -04:00
Joey Hess	c0bd202147	fix failing test case An empty list of [ContentIdenfier] serialized to the same thing as a single ContentIdentifier "". Avoid this ambiguity by requiring the list be non-empty.	2019-03-06 14:27:15 -04:00
Joey Hess	b0fe4916b7	forgot to change list delimiter in parser	2019-03-06 11:59:42 -04:00
Joey Hess	7af55de83c	optimisation: use graftTree to remember the export branch Sped up git-annex export in repositories with lots of keys. Old method read whole git-annex branch tree into memory.	2019-02-22 11:16:22 -04:00
Joey Hess	56137ce0d2	use colon not space to delimit content identifier list InodeCache serializes to a value with spaces, and seems likely other things will too, and want to avoid unncessary base64 of content identifiers when possible.	2019-02-21 13:45:16 -04:00
Joey Hess	1f6339ade7	fix parsing of empty content identifier Seems very unlikely an empty content identifier would be used, but quickcheck found this bug.	2019-02-21 13:44:09 -04:00
Joey Hess	9887a378fe	renamings to make clean when old-format logs are being used	2019-02-21 13:43:44 -04:00
Joey Hess	936aee6a60	quickcheck property for parsing of content identifier logs	2019-02-21 13:17:43 -04:00
Joey Hess	5a294f0dd7	add Logs.ContentIdentifier	2019-02-20 17:22:56 -04:00
Joey Hess	a818bc5e73	add Database.ContentIdentifier Does not yet have a way to update with new information from the git-annex branch, which will be needed when multiple repos are importing from the same remote.	2019-02-20 16:59:10 -04:00
Joey Hess	e8bfc3640b	storing ContentIdentifier in the git-annex branch	2019-02-20 15:40:07 -04:00
Joey Hess	ad1d422dd7	fix false positive in export conflict detection Like the earlier fixed one in Command.Export, it occurred when the same tree was exported by multiple clones. Previous fix was incomplete since several other places looked at the list of exported trees to detect when there was an export conflict. Added a single unified function to avoid missing any places it needed to be fixed. This commit was sponsored by mo on Patreon.	2019-01-30 12:36:30 -04:00
Joey Hess	d913961b7b	avoid shadowing warning	2019-01-21 20:40:30 -04:00
Joey Hess	b79ef475ac	still fixing this sigh	2019-01-21 18:39:25 -04:00
Joey Hess	f603fc252d	fixed a different way this test case could fail in LANG=C	2019-01-21 16:46:40 -04:00
Joey Hess	0aea1a7b93	fix inverted logic	2019-01-21 15:41:35 -04:00
Joey Hess	e15d18058b	add back exclusion of whitespace and =	2019-01-21 12:38:00 -04:00
Joey Hess	928e08904d	avoid two test failures with LANG=C Log.Remote.prop_parse_show_Config failed on an input of fromList [("\28162","")] in LANG=C, encodeBS "\28162" == "\STX=", while in UTF-8 locale, encodeBS "\28162" == "\230\184\130". So in the C locale, the String that's the parsed Map key ends up being encoded differently than it was in the input Map. Logs.Presence.Pure.prop_parse_build_log was failing in LANG=C because the Arbitrary LogLine for some reason sometimes generated LogInfo values containing \n or \r, despite using suchThat to prevent that. I don't understand why at all, but switching the suchThat to filter the ByteString instead of the String before conversion with encodeBS somehow avoids the problem. Both of these suggest something wonky with encodeBS in LANG=C, but I think it's not a problem except for with test data generated by Arbitrary.	2019-01-18 13:33:48 -04:00
Joey Hess	d3ab5e626b	rename key2file and file2key What these generate is not really suitable to be used as a filename, which is why keyFile and fileKey further escape it. These are just serializing Keys. Also removed a quickcheck test that was very unlikely to test anything useful, since it relied on random chance creating something that looks like a serialized key. The other test is sufficient for testing what that was intended to test anyway.	2019-01-14 13:03:35 -04:00
Joey Hess	2eadb6cd68	convert transitions.log to attoparsec and bytestring-builder Not likely to be any speed gain here, but this completes porting every log file over. And, it let me get rid of code copied from ghc and modified, so simplifying the licensing.	2019-01-10 17:13:30 -04:00
Joey Hess	591e4b145f	convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log.	2019-01-10 16:34:20 -04:00
Joey Hess	66603d6f75	attoparsec parsers for all new-format uuid-based logs There should be some speed gains here, especially for chunk and remote state logs, which are queried once per key. Now only old-format uuid-based logs still need to be converted to attoparsec.	2019-01-10 13:30:36 -04:00
Joey Hess	7e54c215b4	Remove redundant endOfInput parseLogLines consumes till endOfInput	2019-01-10 12:42:59 -04:00
Joey Hess	6f66b53a30	newtype Group to ByteString This may speed up queries for things in groups, due to Eq and Ord being faster.	2019-01-09 15:05:49 -04:00
Joey Hess	2fef43dd71	convert all per-uuid log files to use Builder Mostly didn't push the ByteStrings down very deep, but all of these log files are not written to frequently at all, so slight remaining innefficiency doesn't matter. In Logs.UUID, removed the fixBadUUID code that cleaned up after a bug in git-annex versions 3.20111105-3.20111110. In the unlikely event that a repo was last touched by that ancient git-annex version, the descriptions of remotes would appear missing when used with this version of git-annex. That is such minor breakage, and so unlikely to still be a problem for any repos, that it was not worth forward-porting that code to ByteString.	2019-01-09 14:00:35 -04:00
Joey Hess	de4980ef85	simplify Show instance by deriving	2019-01-09 13:13:31 -04:00
Joey Hess	11f4d993b5	fix format of show instance It's only used to show test suite failures..	2019-01-09 13:12:25 -04:00
Joey Hess	2d46038754	converting more log files to use Builder Probably not any particular speedup in this, since most of these logs are not written to often. Possibly chunk log writing is sped up, but writes to chunk logs are interleaved with expensive data transfers to remotes, so unlikely to be a noticiable speedup.	2019-01-09 13:06:37 -04:00
Joey Hess	cb375977a6	follow-on changes from MetaData type changes Including writing and parsing the metadata log files with bytestring-builder and attoparsec.	2019-01-07 15:51:05 -04:00
Joey Hess	ef8ddaa713	attoparsec parser for presence logs	2019-01-03 15:27:29 -04:00
Joey Hess	bfc9039ead	convert git-annex branch access to ByteStrings and Builders Most of the individual logs are not converted yet, only presense logs have an efficient ByteString Builder implemented so far. The rest convert to and from String.	2019-01-03 13:21:48 -04:00
Joey Hess	7d51b0c109	import Utility.FileSystemEncoding in Common	2019-01-03 11:37:02 -04:00
Joey Hess	894716512d	add a UUIDDesc type containing a ByteString Groundwork for handling uuid.log using ByteString	2019-01-01 16:17:54 -04:00
Joey Hess	9cc6d5549b	convert UUID from String to ByteString This should make == comparison of UUIDs somewhat faster, and perhaps a few other operations around maps of UUIDs etc. FromUUID/ToUUID are used to convert String, which is still used for all IO of UUIDs. Eventually the hope is those instances can be removed, and all git-annex branch log files etc use ByteString throughout, for a real speed improvement. Note the use of fromRawFilePath / toRawFilePath -- while a UUID usually contains only alphanumerics and so could be treated as ascii, it's conceivable that some git-annex repository has been initialized using a UUID that is not only not a canonical UUID, but contains high unicode or invalid unicode. Using the filesystem encoding avoids any problems with such a thing. However, a NUL in a UUID seems extremely unlikely, so I didn't use encodeBS / decodeBS to avoid their extra overhead in handling NULs. The Read/Show instance for UUID luckily serializes the same way for ByteString as it did for String.	2019-01-01 14:45:33 -04:00
Joey Hess	9127fe4821	add DebugLocks build flag Using the method described in https://www.fpcomplete.com/blog/2018/05/pinpointing-deadlocks-in-haskell but my own code to implement it, and with callstacks added. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-19 15:02:43 -04:00
Joey Hess	2e9f128dea	moved module and relicensed	2018-10-29 23:13:36 -04:00
Joey Hess	917a2c6095	defer updating unlocked files until after smudge filter The smuge filter no longer provides git with annexed file content, to avoid a git memory leak, and because that did not honor annex.thin. git annex smudge --update has to be run after a checkout to update unlocked files in the working tree with annexed file contents. No hooks yet to run it. This commit was sponsored by Nick Piper on Patreon.	2018-10-25 15:08:20 -04:00
Joey Hess	451171b7c1	clean up url removal presence update * rmurl: Fix a case where removing the last url left git-annex thinking content was still present in the web special remote. * SETURLPRESENT, SETURIPRESENT, SETURLMISSING, and SETURIMISSING used to update the presence information of the external special remote that called them; this was not documented behavior and is no longer done. Done by making setUrlPresent and setUrlMissing only update presence info for the web, and only when the url is a web url. See the comment for reasoning about why that's the right thing to do. In AddUrl, had to make it update location tracking, to handle the non-web-url case. This commit was sponsored by Ewen McNeill on Patreon.	2018-10-04 17:35:49 -04:00
Joey Hess	0a7c5a9982	dropdead per-remote metadata Had to refactor pure code into separate modules so it is accessible inside Annex.Branch.Transitions. This commit was sponsored by Peter on Patreon.	2018-09-05 13:52:46 -04:00
Joey Hess	b3d42283ad	use per-remote metadata storage for S3 version ID Since the same key can be stored in a versioned S3 bucket multiple times with different version IDs, this allows tracking them all. Not currently needed, but if we ever want to drop from a versioned S3 bucket, we'll need to know them all. This commit was supported by the NSF-funded DataLad project.	2018-08-31 13:27:29 -04:00
Joey Hess	5c99f6247e	per-remote metadata storage Actually very straightforward reuse of the metadata log file code. Although I had to add a todo item as git-annex forget won't clean up dead remote's metadata yet. This would be worth adding to the external special remote interface sometime. Have not opened a todo though, guess I'll wait until something needs it. This commit was supported by the NSF-funded DataLad project.	2018-08-31 12:23:22 -04:00
Joey Hess	358178fbfb	don't untrust appendonly exports Make exporttree=yes remotes that are appendonly not be untrusted, and not force verification of content, since the usual concerns about losing data when an export is updated by someone else don't apply. Note that all the remote operations on keys are left as usual for appendonly export remotes, except for storing content. This commit was supported by the NSF-funded DataLad project.	2018-08-30 11:48:04 -04:00
Joey Hess	10138056dc	v6: avoid accidental conversion when annex.largefiles is not configured v6: When annex.largefiles is not configured for a file, running git add or git commit, or otherwise using git to stage a file will add it to the annex if the file was in the annex before, and to git otherwise. This is to avoid accidental conversion. Note that git-annex add's behavior has not changed, for reasons explained in the added comment. Performance: No added overhead when annex.largefiles is configured. When not configured, there is an added call to catObjectMetaData, which involves a round trip through git cat-file --batch. However, the earlier catKeyFile primes the cache for it. This commit was supported by the NSF-funded DataLad project.	2018-08-27 14:51:10 -04:00
Joey Hess	ae11394efa	added annex.commitmessage Added annex.commitmessage config that can specify a commit message for the git-annex branch instead of the usual "update". This commit was supported by the NSF-funded DataLad project.	2018-08-02 14:06:06 -04:00
Joey Hess	2fc768ce72	avoid git annex info remote buffering list of keys This leaves git annex unused --from remote still using loggedKeysFor and buffering more than ought to be necessary, but I can't see a way to improve that.	2018-04-26 16:13:05 -04:00
Joey Hess	bea0ad220a	avoid --all buffering list of all keys In Annex.Branch.branch, the (++) was killing laziness. Rewrote so it streams lazily. filterM also kills laziness, so made loggedKeys use a Unchecked type, and check if the key is dead in the seek loop. Note that loggedKeysFor still buffers, so git-annex info <remote> and git-annex unused --from remote still use more memory than necessary. Also removed some unused functions from Annex.Journal.	2018-04-26 16:00:20 -04:00
Joey Hess	256d8f07e8	avoid insertWith' depreaction warning Switch to Data.Map.Strict everywhere that used it. There are still lots of lazy maps in git-annex. I think switching these is safe. The risk is that there might be a map that is used in a way that relies on the values not being evaluated to WHNF, and switching to strict might result in bad performance or memory use. So, I have not switched everything.	2018-04-22 13:28:31 -04:00

1 2 3 4 5 ...

392 commits