git-annex

Author	SHA1	Message	Date
Joey Hess	2e9becf989	typo	2019-01-24 00:10:16 -04:00
Joey Hess	467c3b393d	refactor magic	2019-01-23 12:40:59 -04:00
Joey Hess	47cb1a98b6	remove seemingly bogus sigINT handler stuff I am very doubtful that commit `613e747d91` was right about this doing anything, and I've verified that without it, ctrl-c sends sigINT to child processes, and git-annex get does not continue to the next item. It seems likely that the real problem back then was something catching the async exception. Hard to see how installing a default signal handler could cause any change from default behavior either. One reason to want to get rid of this cruft now is that tasty has a sigINT handler of its own, and this would override it. (Tasty is not currently setting that handler up the way git-annex uses it, due to a problem in tasty, but that will hopefully change.)	2019-01-21 17:21:02 -04:00
Joey Hess	67c5a628eb	fix build with old ghc	2019-01-18 14:09:35 -04:00
Joey Hess	d5f2463702	misctmp cleanup * Switch to using .git/annex/othertmp for tmp files other than partial downloads, and make stale files left in that directory when git-annex is interrupted be cleaned up promptly by subsequent git-annex processes. * The .git/annex/misctmp directory is no longer used and git-annex will delete anything lingering in there after it's 1 week old. Also, in Annex.Ingest, made the filename it uses in the tmp dir be prefixed with "ingest-" to avoid potentially using a filename used by some other code.	2019-01-17 16:02:22 -04:00
Joey Hess	c3afb3434d	remove recently added cache from KeyVariety Adding that field broke the Read/Show serialization back-compat, and also the Eq and Ord instances were not blinded to it, which broke git annex fsck and probably more. I think that the new approach used in formatKeyVariety will be nearly as fast, but have not benchmarked it.	2019-01-16 16:33:08 -04:00
Joey Hess	96aba8eff7	Revert "cache the serialization of a Key" This reverts commit `4536c93bb2`. That broke Read/Show of a Key, and unfortunately Key is read in at least one place; the GitAnnexDistribution data type. It would be worth bringing this optimisation back, but it would need either a custom Read/Show instance that preserves back-compat, or wrapping Key in a data type that contains the serialization, or changing how GitAnnexDistribution is serialized. Also, the Eq instance would need to compare keys with and without a cached seralization the same.	2019-01-16 16:21:59 -04:00
Joey Hess	2be6130053	better function name	2019-01-14 20:59:09 -04:00
Joey Hess	1b6319a2c8	double speed of keyFile Optimising for the common case of nothing needing to be escaped, from 5.434 μs to 1.727 μs. In the uncommon case, it only runs around 70 ns slower.	2019-01-14 20:52:54 -04:00
Joey Hess	d9a33d98cf	remove unused import	2019-01-14 18:29:10 -04:00
Joey Hess	d5bbf123fd	bugfix The first item in the list from split '&' did not start with a '&'	2019-01-14 17:42:18 -04:00
Joey Hess	e0c4ac99b5	convert serializeKey' to strict ByteString The builder produces a lazy ByteString, and L.toStrict has to copy it, but needing to use the builder is no longer to common case; the serialization will normally be cached already as a strict ByteString, and this avoids keyFile' needing to use L.toStrict . serializeKey'	2019-01-14 17:03:46 -04:00
Joey Hess	4536c93bb2	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. It means that every place a Key has any of its fields changed, the cache has to be dropped. I've grepped and found them all. But, it would be better to avoid that gotcha somehow..	2019-01-14 16:37:28 -04:00
Joey Hess	5d98cba923	use ByteStrings when reading annex symlinks and pointers Now there's a ByteString used all the way from disk to Key. The main complication in this conversion was the use of fromInternalGitPath in several places to munge things on Windows. The things that used that were changed to parse the ByteString using either path separator. Also some code that had read from files to a String lazily was changed to read a minimal strict ByteString.	2019-01-14 15:37:08 -04:00
Joey Hess	0a8d93cb8a	convert to ByteString	2019-01-14 14:02:47 -04:00
Joey Hess	1791447cc8	avoid creating work tree files in subdirectories in an edge case A keyName could contain "/", though this is unlikely and certianly only ever could happen with WORM keys. The change to addunused to escape that is no problem at all. The change to VariantFile to escape it means that different versions of git-annex could resolve a merge conflict differently in this case, which is unfortunate. There would be different .variant files used, so the two resolutions would themselves merge together without additional conflicts, but the user would have to clean up the extra .variant files.	2019-01-14 13:14:25 -04:00
Joey Hess	d3ab5e626b	rename key2file and file2key What these generate is not really suitable to be used as a filename, which is why keyFile and fileKey further escape it. These are just serializing Keys. Also removed a quickcheck test that was very unlikely to test anything useful, since it relied on random chance creating something that looks like a serialized key. The other test is sufficient for testing what that was intended to test anyway.	2019-01-14 13:03:35 -04:00
Joey Hess	ff0a2bee2d	avoid unnecessary conversion from and back to ByteString	2019-01-14 12:40:13 -04:00
Joey Hess	727767e1e2	make everything build again after ByteString Key changes	2019-01-11 16:39:46 -04:00
Joey Hess	2eadb6cd68	convert transitions.log to attoparsec and bytestring-builder Not likely to be any speed gain here, but this completes porting every log file over. And, it let me get rid of code copied from ghc and modified, so simplifying the licensing.	2019-01-10 17:13:30 -04:00
Joey Hess	591e4b145f	convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log.	2019-01-10 16:34:20 -04:00
Joey Hess	66603d6f75	attoparsec parsers for all new-format uuid-based logs There should be some speed gains here, especially for chunk and remote state logs, which are queried once per key. Now only old-format uuid-based logs still need to be converted to attoparsec.	2019-01-10 13:30:36 -04:00
Joey Hess	1928b82867	marginally faster VectorClock Builder show of a POSIXTime is 7-bit ascii, so no need to use the filesystem encoding on it	2019-01-09 14:17:00 -04:00
Joey Hess	232b1a08f3	simplification now that all logs use Builder	2019-01-09 14:10:05 -04:00
Joey Hess	2fef43dd71	convert all per-uuid log files to use Builder Mostly didn't push the ByteStrings down very deep, but all of these log files are not written to frequently at all, so slight remaining innefficiency doesn't matter. In Logs.UUID, removed the fixBadUUID code that cleaned up after a bug in git-annex versions 3.20111105-3.20111110. In the unlikely event that a repo was last touched by that ancient git-annex version, the descriptions of remotes would appear missing when used with this version of git-annex. That is such minor breakage, and so unlikely to still be a problem for any repos, that it was not worth forward-porting that code to ByteString.	2019-01-09 14:00:35 -04:00
Joey Hess	de4980ef85	simplify Show instance by deriving	2019-01-09 13:13:31 -04:00
Joey Hess	2d46038754	converting more log files to use Builder Probably not any particular speedup in this, since most of these logs are not written to often. Possibly chunk log writing is sped up, but writes to chunk logs are interleaved with expensive data transfers to remotes, so unlikely to be a noticiable speedup.	2019-01-09 13:06:37 -04:00
Joey Hess	cb375977a6	follow-on changes from MetaData type changes Including writing and parsing the metadata log files with bytestring-builder and attoparsec.	2019-01-07 15:51:05 -04:00
Joey Hess	ef8ddaa713	attoparsec parser for presence logs	2019-01-03 15:27:29 -04:00
Joey Hess	bfc9039ead	convert git-annex branch access to ByteStrings and Builders Most of the individual logs are not converted yet, only presense logs have an efficient ByteString Builder implemented so far. The rest convert to and from String.	2019-01-03 13:21:48 -04:00
Joey Hess	53905490df	convert Git.HashObject to use ByteStrings Both lazy and strict, because sometimes it's more efficient to build a small strict bytestring, and other times better to lazily stream.	2019-01-03 13:21:01 -04:00
Joey Hess	7d51b0c109	import Utility.FileSystemEncoding in Common	2019-01-03 11:37:02 -04:00
Joey Hess	894716512d	add a UUIDDesc type containing a ByteString Groundwork for handling uuid.log using ByteString	2019-01-01 16:17:54 -04:00
Joey Hess	b3c69eaaf8	strict bytestring encoders and decoders Only had lazy ones before. Already sped up a few parts of the code.	2019-01-01 14:55:15 -04:00
Joey Hess	9cc6d5549b	convert UUID from String to ByteString This should make == comparison of UUIDs somewhat faster, and perhaps a few other operations around maps of UUIDs etc. FromUUID/ToUUID are used to convert String, which is still used for all IO of UUIDs. Eventually the hope is those instances can be removed, and all git-annex branch log files etc use ByteString throughout, for a real speed improvement. Note the use of fromRawFilePath / toRawFilePath -- while a UUID usually contains only alphanumerics and so could be treated as ascii, it's conceivable that some git-annex repository has been initialized using a UUID that is not only not a canonical UUID, but contains high unicode or invalid unicode. Using the filesystem encoding avoids any problems with such a thing. However, a NUL in a UUID seems extremely unlikely, so I didn't use encodeBS / decodeBS to avoid their extra overhead in handling NULs. The Read/Show instance for UUID luckily serializes the same way for ByteString as it did for String.	2019-01-01 14:45:33 -04:00
Joey Hess	84e71dae2e	comment typo	2018-12-30 15:51:20 -04:00
Joey Hess	a26514d67e	Fix doubled progress display when downloading an url when -J is used. downloadUrl uses meteredFile, which sets up one progress meter, and Remote.Web also uses metered, so two progress meters are displayed for the same download. Reversion introduced with the http-conduit switch in `c34152777b` -- I don't know why the extra call to metered was added there. When -J is not used, the extra progress meter didn't display, but an extra blank line did get output, which is also fixed. This commit was sponsored by John Pellman on Patreon.	2018-12-30 12:29:49 -04:00
Joey Hess	5759e93444	honor init --version=5 on crippled filesystem init: When --version=5 is passed on a crippled filesystem, use a v5 direct mode repo as requested, rather than upgrading to v7 adjusted unlocked. Fixed test suite on crippled filesystems, making it request --version=5 to test direct mode.	2018-12-19 13:17:04 -04:00
Joey Hess	6d381df0e6	sync --content: Fix dropping unwanted content from the local repository This fixes a bug with the numcopies counting when using sync --content. It did not always pass the local repo uuid to handleDropsFrom, and so the numcopies counting was off by one, and unwanted local content would only be dropped when there were numcopies+1 remote copies. Also, support dropping local content that has reached an exporttree remote that is not untrusted (currently only S3 remotes with versioning).	2018-12-18 13:58:12 -04:00
Joey Hess	bbf7dcc193	fix bugs involving v7 unlocked files and direct mode * Fix bug upgrading from direct mode to v7: when files in the repository were already committed as v7 unlocked files elsewhere, and the content was present in the direct mode repository, the annexed files got their full content checked into git. * Fix bug that caused v7 unlocked files in a direct mode repository to get locked when committing. This commit was sponsored by Nick Piper on Patreon.	2018-12-11 13:47:35 -04:00
Joey Hess	992110c1be	remove debug	2018-12-11 13:10:33 -04:00
Joey Hess	11dbb829bc	Fix a case where upgrade to v7 caused git to think that unlocked files were modified When a file was already unlocked, but the annex object was present, the upgrade process populated the unlocked file, but neglected to update the index. This commit was sponsored by Jochen Bartl on Patreon.	2018-12-11 13:05:03 -04:00
Joey Hess	029ae8d4db	support findred and --branch with file matching options * findref: Support file matching options: --include, --exclude, --want-get, --want-drop, --largerthan, --smallerthan, --accessedwithin * Commands supporting --branch now apply file matching options --include, --exclude, --want-get, --want-drop to filenames from the branch. Previously, combining --branch with those would fail to match anything. * add, import, findref: Support --time-limit. This commit was sponsored by Jake Vosloo on Patreon.	2018-12-09 13:38:35 -04:00
Joey Hess	aa8243df4c	dropunused edge case when annex.thin caused unused object to be modified dropunused: When an unused object file has gotten modified, eg due to annex.thin being set, don't silently skip it, but display a warning and let --force drop it. This commit was sponsored by Ethan Aubin.	2018-12-04 12:20:34 -04:00
Joey Hess	865d556103	fix init in cripped filesystem version issues * init: When a crippled filesystem causes an adjusted unlocked branch to be used, set repo version to 7, which it neglected to do before. * init: When on a crippled filesystem, and the git version is too old to use an adjusted unlocked branch, fall back to using direct mode. This commit was sponsored by Ilya Shlyakhter on Patreon.	2018-12-03 12:57:23 -04:00
Joey Hess	efbf889e36	clarify comment	2018-11-30 12:37:45 -04:00
Joey Hess	ecdba3ed3f	When running youtube-dl to get a filename, pass --no-playlist Seems that youtube-dl --get-filename on a playlist lists all the filenames for the playlist, which can take quite some time. The code already only took the first name, so --no-playlist can speed it up a lot. This commit was sponsored by Brett Eisenberg on Patreon.	2018-11-28 17:14:47 -04:00
Joey Hess	65bb30bcf5	fix accidental commit	2018-11-20 11:43:33 -04:00
Joey Hess	9c0cece35a	followup	2018-11-19 18:12:03 -04:00
Joey Hess	9127fe4821	add DebugLocks build flag Using the method described in https://www.fpcomplete.com/blog/2018/05/pinpointing-deadlocks-in-haskell but my own code to implement it, and with callstacks added. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-19 15:02:43 -04:00

1 2 3 4 5 ...

1172 commits