git-annex

Author	SHA1	Message	Date
Joey Hess	f603fc252d	fixed a different way this test case could fail in LANG=C	2019-01-21 16:46:40 -04:00
Joey Hess	0aea1a7b93	fix inverted logic	2019-01-21 15:41:35 -04:00
Joey Hess	e15d18058b	add back exclusion of whitespace and =	2019-01-21 12:38:00 -04:00
Joey Hess	928e08904d	avoid two test failures with LANG=C Log.Remote.prop_parse_show_Config failed on an input of fromList [("\28162","")] in LANG=C, encodeBS "\28162" == "\STX=", while in UTF-8 locale, encodeBS "\28162" == "\230\184\130". So in the C locale, the String that's the parsed Map key ends up being encoded differently than it was in the input Map. Logs.Presence.Pure.prop_parse_build_log was failing in LANG=C because the Arbitrary LogLine for some reason sometimes generated LogInfo values containing \n or \r, despite using suchThat to prevent that. I don't understand why at all, but switching the suchThat to filter the ByteString instead of the String before conversion with encodeBS somehow avoids the problem. Both of these suggest something wonky with encodeBS in LANG=C, but I think it's not a problem except for with test data generated by Arbitrary.	2019-01-18 13:33:48 -04:00
Joey Hess	d3ab5e626b	rename key2file and file2key What these generate is not really suitable to be used as a filename, which is why keyFile and fileKey further escape it. These are just serializing Keys. Also removed a quickcheck test that was very unlikely to test anything useful, since it relied on random chance creating something that looks like a serialized key. The other test is sufficient for testing what that was intended to test anyway.	2019-01-14 13:03:35 -04:00
Joey Hess	2eadb6cd68	convert transitions.log to attoparsec and bytestring-builder Not likely to be any speed gain here, but this completes porting every log file over. And, it let me get rid of code copied from ghc and modified, so simplifying the licensing.	2019-01-10 17:13:30 -04:00
Joey Hess	591e4b145f	convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log.	2019-01-10 16:34:20 -04:00
Joey Hess	66603d6f75	attoparsec parsers for all new-format uuid-based logs There should be some speed gains here, especially for chunk and remote state logs, which are queried once per key. Now only old-format uuid-based logs still need to be converted to attoparsec.	2019-01-10 13:30:36 -04:00
Joey Hess	7e54c215b4	Remove redundant endOfInput parseLogLines consumes till endOfInput	2019-01-10 12:42:59 -04:00
Joey Hess	6f66b53a30	newtype Group to ByteString This may speed up queries for things in groups, due to Eq and Ord being faster.	2019-01-09 15:05:49 -04:00
Joey Hess	2fef43dd71	convert all per-uuid log files to use Builder Mostly didn't push the ByteStrings down very deep, but all of these log files are not written to frequently at all, so slight remaining innefficiency doesn't matter. In Logs.UUID, removed the fixBadUUID code that cleaned up after a bug in git-annex versions 3.20111105-3.20111110. In the unlikely event that a repo was last touched by that ancient git-annex version, the descriptions of remotes would appear missing when used with this version of git-annex. That is such minor breakage, and so unlikely to still be a problem for any repos, that it was not worth forward-porting that code to ByteString.	2019-01-09 14:00:35 -04:00
Joey Hess	de4980ef85	simplify Show instance by deriving	2019-01-09 13:13:31 -04:00
Joey Hess	11f4d993b5	fix format of show instance It's only used to show test suite failures..	2019-01-09 13:12:25 -04:00
Joey Hess	2d46038754	converting more log files to use Builder Probably not any particular speedup in this, since most of these logs are not written to often. Possibly chunk log writing is sped up, but writes to chunk logs are interleaved with expensive data transfers to remotes, so unlikely to be a noticiable speedup.	2019-01-09 13:06:37 -04:00
Joey Hess	cb375977a6	follow-on changes from MetaData type changes Including writing and parsing the metadata log files with bytestring-builder and attoparsec.	2019-01-07 15:51:05 -04:00
Joey Hess	ef8ddaa713	attoparsec parser for presence logs	2019-01-03 15:27:29 -04:00
Joey Hess	bfc9039ead	convert git-annex branch access to ByteStrings and Builders Most of the individual logs are not converted yet, only presense logs have an efficient ByteString Builder implemented so far. The rest convert to and from String.	2019-01-03 13:21:48 -04:00
Joey Hess	7d51b0c109	import Utility.FileSystemEncoding in Common	2019-01-03 11:37:02 -04:00
Joey Hess	894716512d	add a UUIDDesc type containing a ByteString Groundwork for handling uuid.log using ByteString	2019-01-01 16:17:54 -04:00
Joey Hess	9cc6d5549b	convert UUID from String to ByteString This should make == comparison of UUIDs somewhat faster, and perhaps a few other operations around maps of UUIDs etc. FromUUID/ToUUID are used to convert String, which is still used for all IO of UUIDs. Eventually the hope is those instances can be removed, and all git-annex branch log files etc use ByteString throughout, for a real speed improvement. Note the use of fromRawFilePath / toRawFilePath -- while a UUID usually contains only alphanumerics and so could be treated as ascii, it's conceivable that some git-annex repository has been initialized using a UUID that is not only not a canonical UUID, but contains high unicode or invalid unicode. Using the filesystem encoding avoids any problems with such a thing. However, a NUL in a UUID seems extremely unlikely, so I didn't use encodeBS / decodeBS to avoid their extra overhead in handling NULs. The Read/Show instance for UUID luckily serializes the same way for ByteString as it did for String.	2019-01-01 14:45:33 -04:00
Joey Hess	9127fe4821	add DebugLocks build flag Using the method described in https://www.fpcomplete.com/blog/2018/05/pinpointing-deadlocks-in-haskell but my own code to implement it, and with callstacks added. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-19 15:02:43 -04:00
Joey Hess	2e9f128dea	moved module and relicensed	2018-10-29 23:13:36 -04:00
Joey Hess	917a2c6095	defer updating unlocked files until after smudge filter The smuge filter no longer provides git with annexed file content, to avoid a git memory leak, and because that did not honor annex.thin. git annex smudge --update has to be run after a checkout to update unlocked files in the working tree with annexed file contents. No hooks yet to run it. This commit was sponsored by Nick Piper on Patreon.	2018-10-25 15:08:20 -04:00
Joey Hess	451171b7c1	clean up url removal presence update * rmurl: Fix a case where removing the last url left git-annex thinking content was still present in the web special remote. * SETURLPRESENT, SETURIPRESENT, SETURLMISSING, and SETURIMISSING used to update the presence information of the external special remote that called them; this was not documented behavior and is no longer done. Done by making setUrlPresent and setUrlMissing only update presence info for the web, and only when the url is a web url. See the comment for reasoning about why that's the right thing to do. In AddUrl, had to make it update location tracking, to handle the non-web-url case. This commit was sponsored by Ewen McNeill on Patreon.	2018-10-04 17:35:49 -04:00
Joey Hess	0a7c5a9982	dropdead per-remote metadata Had to refactor pure code into separate modules so it is accessible inside Annex.Branch.Transitions. This commit was sponsored by Peter on Patreon.	2018-09-05 13:52:46 -04:00
Joey Hess	b3d42283ad	use per-remote metadata storage for S3 version ID Since the same key can be stored in a versioned S3 bucket multiple times with different version IDs, this allows tracking them all. Not currently needed, but if we ever want to drop from a versioned S3 bucket, we'll need to know them all. This commit was supported by the NSF-funded DataLad project.	2018-08-31 13:27:29 -04:00
Joey Hess	5c99f6247e	per-remote metadata storage Actually very straightforward reuse of the metadata log file code. Although I had to add a todo item as git-annex forget won't clean up dead remote's metadata yet. This would be worth adding to the external special remote interface sometime. Have not opened a todo though, guess I'll wait until something needs it. This commit was supported by the NSF-funded DataLad project.	2018-08-31 12:23:22 -04:00
Joey Hess	358178fbfb	don't untrust appendonly exports Make exporttree=yes remotes that are appendonly not be untrusted, and not force verification of content, since the usual concerns about losing data when an export is updated by someone else don't apply. Note that all the remote operations on keys are left as usual for appendonly export remotes, except for storing content. This commit was supported by the NSF-funded DataLad project.	2018-08-30 11:48:04 -04:00
Joey Hess	10138056dc	v6: avoid accidental conversion when annex.largefiles is not configured v6: When annex.largefiles is not configured for a file, running git add or git commit, or otherwise using git to stage a file will add it to the annex if the file was in the annex before, and to git otherwise. This is to avoid accidental conversion. Note that git-annex add's behavior has not changed, for reasons explained in the added comment. Performance: No added overhead when annex.largefiles is configured. When not configured, there is an added call to catObjectMetaData, which involves a round trip through git cat-file --batch. However, the earlier catKeyFile primes the cache for it. This commit was supported by the NSF-funded DataLad project.	2018-08-27 14:51:10 -04:00
Joey Hess	ae11394efa	added annex.commitmessage Added annex.commitmessage config that can specify a commit message for the git-annex branch instead of the usual "update". This commit was supported by the NSF-funded DataLad project.	2018-08-02 14:06:06 -04:00
Joey Hess	2fc768ce72	avoid git annex info remote buffering list of keys This leaves git annex unused --from remote still using loggedKeysFor and buffering more than ought to be necessary, but I can't see a way to improve that.	2018-04-26 16:13:05 -04:00
Joey Hess	bea0ad220a	avoid --all buffering list of all keys In Annex.Branch.branch, the (++) was killing laziness. Rewrote so it streams lazily. filterM also kills laziness, so made loggedKeys use a Unchecked type, and check if the key is dead in the seek loop. Note that loggedKeysFor still buffers, so git-annex info <remote> and git-annex unused --from remote still use more memory than necessary. Also removed some unused functions from Annex.Journal.	2018-04-26 16:00:20 -04:00
Joey Hess	256d8f07e8	avoid insertWith' depreaction warning Switch to Data.Map.Strict everywhere that used it. There are still lots of lazy maps in git-annex. I think switching these is safe. The risk is that there might be a map that is used in a way that relies on the values not being evaluated to WHNF, and switching to strict might result in bad performance or memory use. So, I have not switched everything.	2018-04-22 13:28:31 -04:00
Joey Hess	f56594af9e	finish fixing inverted Ord for TrustLevel Flipped all comparisons. When a TrustLevel list was wanted from Trusted downwards, used Down to compare it in that order. This commit was sponsored by mo on Patreon.	2018-04-13 15:17:54 -04:00
Joey Hess	a0e4b9678b	fix inverted Ord for TrustLevel (intermediate commit) This commit removes the Ord and Enum instances, commenting out all code that depends on them, to make sure that all code effected by the inversion fix has been identified. (Assuming no ifdefs involve TrustLevel.) The next commit will fix up all the identified code.	2018-04-13 14:50:14 -04:00
Joey Hess	10d3b7fc62	Fix reversion introduced in 6.20171214 that caused concurrent transfers to incorrectly fail with "transfer already in progress". Avoid creating transfer info file before transfer lock is created and locked. The wrong order for one thing caused transfer info to be overwritten when a transfer was already in progress. But worse, it caused checkTransfer to see the transfer info, and so lock the transfer lock in order to verify the transfer was not in progress. Which in a concurrent situation, prevented the transferrer from locking the transfer lock, so it failed with "transfer already in progress". Note that the transferinfo command does not lock the transfer lock before creating the transfer info. But, that's only run after recvkey is running, and recvkey does lock the transfer lock, so that seems more or less ok. (Other than being a super complicated legacy mess that the P2P code has mostly obsoleted now.) This commit was supported by the NSF-funded DataLad project.	2018-03-14 18:55:34 -04:00
Joey Hess	faf03ee4da	more core.sharedRepository perm fixes Fix more places where files in .git/annex/ were written with modes that did not take the core.sharedRepository config into account. This commit was sponsored by Jeff Goeke-Smith on Patreon.	2018-01-04 14:46:58 -04:00
Joey Hess	24df95f0f6	Fix several places where files in .git/annex/ were written with modes that did not take the core.sharedRepository config into account. git grep writeFile finds some more that might also be problems, but for now I've concentrated on .git/annex/ log files. There are certianly cases where writeFile is not a problem too. This commit was sponsored by mo on Patreon.	2018-01-02 17:25:25 -04:00
Joey Hess	edd25f04d9	unused: Write .git/annex/unused etc files with appropriate permissions for the core.sharedRepository config. This commit was sponsored by an anonymous bitcoin donor.	2018-01-02 16:25:27 -04:00
Joey Hess	4e38c4f57f	Allow exporttree remotes to be marked as dead. Union with max so that DeadTrusted wins over UnTrusted. This commit was sponsored by Trenton Cronholm on Patreon.	2017-12-05 13:46:55 -04:00
Joey Hess	3febb79c8f	wip	2017-11-28 17:17:40 -04:00
Joey Hess	7ad1d3210f	remove dead code	2017-11-07 14:18:10 -04:00
Joey Hess	e1ac299ad0	better dup key with -J fix This avoids all the complication about redundant work discussed in the previous try at fixing this. At the expense of needing each command that could have the problem to be patched to simply wrap the action in onlyActionOn once the key is known. But there do not seem to be many such commands. onlyActionOn' should not be used with a CommandStart (or CommandPerform), although the types do allow it. onlyActionOn handles running the whole CommandStart chain. I couldn't immediately see a way to avoid mistken use of onlyActionOn'. This commit was supported by the NSF-funded DataLad project.	2017-10-17 18:48:53 -04:00
Joey Hess	68a49adcda	Improve behavior when -J transfers multiple files that point to the same key After a false start, I found a fairly non-intrusive way to deal with it. Although it only handles transfers -- there may be issues with eg concurrent dropping of the same key, or other operations. There is no added overhead when -J is not used, other than an added inAnnex check. When -J is used, it has to maintain and check a small Set, which should be negligible overhead. It could output some message saying that the transfer is being done by another thread. Or it could even display the same progress info for both files that are being downloaded since they have the same content. But I opted to keep it simple, since this is rather an edge case, so it just doesn't say anything about the transfer of the file until the other thread finishes. Since the deferred transfer action still runs, actions that do more than transfer content will still get a chance to do their other work. (An example of something that needs to do such other work is P2P.Annex, where the download always needs to receive the content from the peer.) And, if the first thread fails to complete a transfer, the second thread can resume it. But, this unfortunately means that there's a risk of redundant work being done to transfer a key that just got transferred. That's not ideal, but should never cause breakage; the same thing can occur when running two separate git-annex processes. The get/move/copy/mirror --from commands had extra inAnnex checks added, inside the download actions. Without those checks, the first thread downloaded the content, and then the second thread woke up and downloaded the same content redundantly. move/copy/mirror --to is left doing redundant uploads for now. It would need a second checkPresent of the remote inside the upload to avoid them, which would be expensive. A better way to avoid redundant work needs to be found.. This commit was supported by the NSF-funded DataLad project.	2017-10-17 17:10:50 -04:00
Joey Hess	099bfc2daf	remove redundant definition of lck	2017-10-03 13:23:10 -04:00
Joey Hess	3cd47f9978	info: Improve cleanup of stale transfer info files. In my git-annex repos, I found some stale transfer info files without lock files. Pass a mode to tryLockExclusive, so it will create the lock file if not present, and so not fail to clean up such transfer info files. Normally, transfer info files are accompanied by a lock file. But, when alwaysRunTransfer is used, the locking can fail and it will still write the transfer info file. Perhaps there are other cases too? Note that mkProgressUpdater's meter writes to the transfer info file too, and it might be possible for that meter to fire after runTransfer has cleaned up. This commit was sponsored by andrea rota.	2017-10-02 13:55:26 -04:00
Joey Hess	4d0e522b72	Warn when metadata is inherited from a previous version of a file to avoid the user being surprised in cases where that behavior is not desired or expected This commit was supported by the NSF-funded DataLad project.	2017-09-28 12:56:35 -04:00
Joey Hess	b31d682539	cleanup imports	2017-09-13 12:53:19 -04:00
Joey Hess	f8fd66d3f8	fix compaction of export.log It was not getting old lines removed, because the tree graft confused the updater, so it union merged from the previous git-annex branch, which still contained the old lines. Fixed by carefully using setIndexSha. This commit was supported by the NSF-funded DataLad project.	2017-09-12 18:30:36 -04:00
Joey Hess	c8ed941a26	change export.log format to support multiple export remotes This breaks backwards compatibility, but only with unreleased versions of git-annex, which I think is acceptable. This commit was supported by the NSF-funded DataLad project.	2017-09-12 17:45:52 -04:00

1 2 3 4 5 ...

375 commits