git-annex

Author	SHA1	Message	Date
Joey Hess	258a7c5cd1	add Key to all ActionItem constructors	2019-06-06 12:53:24 -04:00
Joey Hess	4932972487	fix STM deadlock `659640e224` was buggy, it had a STM deadlock because two actions both wanted to takeTMVar the WorkerPool and so blocked one-another. Fixed by completely reworking how the pool is maintained. Maintenace threads now wait for the Async actions and update the WorkerPool. This means twice as many threads as before, but green threads so will only use a few extra bytes ram per thread.	2019-06-05 20:07:35 -04:00
Joey Hess	659640e224	separate queue for cleanup actions When running multiple concurrent actions, the cleanup phase is run in a separate queue than the main action queue. This can make some commands faster, because less time is spent on bookkeeping in between each file transfer. But as far as I can see, nothing will be sped up much by this yet, because all the existing cleanup actions are very light-weight. This is just groundwork for deferring checksum verification to cleanup time. This change does mean that if the user expects -J2 will mean that they see no more than 2 jobs running at a time, they may be surprised to see 4 in some cases (if the cleanup actions are slow enough to notice). It might also make sense to enable background cleanup without the -J, for at least one cleanup action. Indeed, that's the behavior that -J1 has now. At some point in the future, it make make sense to make the behavior with no -J the same as -J1. The only reason it's not currently is that git-annex can build w/o concurrent-output, and also any bugs in concurrent-output (such as perhaps misbehaving on non-VT100 compatible terminals) are avoided by default by only using it when -J is used.	2019-06-05 17:54:35 -04:00
Joey Hess	c04b2af3e1	improved WorkerPool abstraction No behavior changes.	2019-06-05 14:26:48 -04:00
Joey Hess	0b5cc89687	wording	2019-06-05 12:20:11 -04:00
Joey Hess	6e51b9ae88	clarify	2019-06-04 21:49:53 -04:00
Joey Hess	500f72ec3d	comment typo	2019-06-04 14:40:07 -04:00
Joey Hess	1871295765	rename annex.security.allowed-http-addresses Renamed annex.security.allowed-http-addresses to annex.security.allowed-ip-addresses because it is not really specific to the http protocol, also limiting eg, git-annex's use of ftp and via youtube-dl, several other protocols. The old name for the config will still work. If both old and new name are set, the new name will win.	2019-05-30 12:43:40 -04:00
Joey Hess	e06feb7316	honor preferred content when importing Importing from a special remote honors its preferred content too; unwanted files are not imported. But, some preferred content expressions can't be checked before files are imported, and trying to import with such an expression will fail. Tested this with scenarios including changing the preferred content expression and making sure merging the import didn't delete files that were no longer wanted. There was one minor inefficiency mentioned in the todo that I punted on.	2019-05-21 14:38:06 -04:00
Joey Hess	82186ca58f	annex.jobs=cpus etc Added the ability to run one job per CPU (core), by setting annex.jobs=cpus, or using option --jobs=cpus or -Jcpus. Built with future expansion in mind, including not defaulting matching on Concurrency so more constructors can later be added, and using "cpu" instead of "0".	2019-05-10 13:27:08 -04:00
Joey Hess	9dd764e6f7	Added mimeencoding= term to annex.largefiles expressions. * Added mimeencoding= term to annex.largefiles expressions. This is probably mostly useful to match non-text files with eg "mimeencoding=binary" * git-annex matchexpression: Added --mimeencoding option.	2019-04-30 12:17:22 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	8ae0db925b	fix name of annex-tracking-branch config	2019-03-11 13:56:59 -04:00
Joey Hess	2912429640	better indicate when special remotes do not support renameExport Avoid a warning message when renameExport is not supported, and just fallback to deleting with a subsequent re-upload. Especially needed for importtree remotes, where renameExport needs to be disabled. This changes the external special remote protocol, but in a backwards-compatible way. A reply of UNSUPPORTED-REQUEST to an older version of git-annex will cause it to make renameExport return False.	2019-03-11 12:53:24 -04:00
Joey Hess	3b412aaae0	simplify Applicative instance	2019-03-06 16:44:17 -04:00
Joey Hess	c6c5f6336b	avoid whitespace in Arbitrary UUID and empty UUID	2019-03-06 15:44:27 -04:00
Joey Hess	46d33e804a	added checkPresentExportWithContentIdentifier Ugh, don't like needing to add this, but I can't see a way around it.	2019-03-05 16:03:03 -04:00
Joey Hess	8c54604e67	import+export from directory special remote fully working Had to add two more API calls to override export APIs that are not safe for use in combination with import. It's unfortunate that removeExportDirectory is documented to be allowed to remove non-empty directories. I'm not entirely sure why it's that way, my best guess is it was intended to make it easy to implement with just rm -rf.	2019-03-05 14:20:14 -04:00
Joey Hess	aaacf431d8	handle importtree=yes config For now, it's only allowed when exporttree=yes is also set. That simplified the implementation, but could later be changed if there's a remote that makes sense to be an import but not an export. However, it may work just as well to make a remote be readonly to prevent export to it while still allowing import.	2019-03-04 16:07:35 -04:00
Joey Hess	88ccfaa78c	storeExportWithContentIdentifierM for directory special remote Not sure if my reasoning about the races really holds. It would certianly be possible to better guard against races by using Linux-specific renameat2 with RENAME_EXCHANGE or RENAME_NOREPLACE. Or by using link and relying on it not overwriting existing files -- but that would need a filesystem that supports hard links and directory can be used in filesystems that don't.	2019-03-04 14:46:25 -04:00
Joey Hess	e2e57f8556	initial export support for directory special remote This does not guard against race condition yet, it's only for testing purposes.	2019-02-27 13:42:34 -04:00
Joey Hess	45aacd888b	import downloader complete (untested) Made some api changes. listImportableContents needs to provide the size of the data, so the downloader can check disk free space. retrieveExportWithContentIdentifier is passed the filepath to write to Use temporary "CID" key during download of a ContentIdentifier from a remote, so withTmp can be used and then move the content to the real key once it's known.	2019-02-27 13:15:02 -04:00
Joey Hess	f4b773e9a1	incomplete action to download files from import	2019-02-26 15:25:28 -04:00
Joey Hess	4747fa923d	export: Deprecated the --tracking option. Instead, users can configure remote.<name>.annex-tracking-branch themselves.	2019-02-23 15:54:33 -04:00
Joey Hess	bab6c570b0	buildImportTrees is fully working buildImportCommit not yet tested	2019-02-22 12:41:17 -04:00
Joey Hess	8fdea8f444	WIP Added graftTree but it's buggy. Should use graftTree in Annex.Branch.graftTreeish; it will be faster than the current implementation there. Started Annex.Import, but untested and it doesn't yet handle tree grafting.	2019-02-21 17:32:59 -04:00
Joey Hess	fd304dce60	split out Types.Import and some changes to the types in it	2019-02-21 13:39:09 -04:00
Joey Hess	936aee6a60	quickcheck property for parsing of content identifier logs	2019-02-21 13:17:43 -04:00
Joey Hess	ccc0684d21	no remotes support import yet	2019-02-20 16:59:04 -04:00
Joey Hess	0442842622	add import tree interface to Remote	2019-02-20 15:35:22 -04:00
Joey Hess	d839c2110a	fix encoding of metadata containing newlines This fixes a reversion in the ByteString conversion. The old code used isSpace to decide when the metadata value needs to be base64 encoded, and that incorrectly changed to only checking if it contained ' '. Note that only '\n' and '\r' were added and not other sorts of whitespace that isSpace matches, like '\t' and '\v'. Only the former would cause problems.	2019-02-20 14:26:18 -04:00
Joey Hess	1b8026b2cb	constrain Arbitrary MetaField to ascii Same reason other Arbitrary's have been. I saw a test failure on Windows that was probably caused by non-ascii there.	2019-02-18 17:50:06 -04:00
Joey Hess	9cebfd7002	purify exportActions Purifying exportActions will allow introspecting and modifying it, which is needed to add progress bar display to it. Only S3 and WebDAV ran an Annex action while constructing ExportActions. There was a small performance gain from them doing that, since a resource was able to be prepared and reused for multiple actions by Command.Export. As seen in commit `809cfbbd8a` and `5d394023eb` S3 and WebDAV actually create a new handle for each access in normal, non-export use. It doesn't seem worth making export use of them marginally more efficient than normal use. It would be better to do that work upfront when constructing the remote. Or perhaps use a MVar to cache a handle. This commit was sponsored by Nick Piper on Patreon.	2019-01-30 15:11:40 -04:00
Joey Hess	f76c4a0973	avoid Arbitrary generating excessivly long lists Turns what it was doing often generated too long lists, or spun with suchThat rejecting too large numbers. Limit lists to 10.	2019-01-21 13:50:24 -04:00
Joey Hess	e8ff3c3e73	fix build with old ghc	2019-01-18 14:08:10 -04:00
Joey Hess	a5764c4a78	fix build with old ghc	2019-01-18 13:59:29 -04:00
Joey Hess	d5f2463702	misctmp cleanup * Switch to using .git/annex/othertmp for tmp files other than partial downloads, and make stale files left in that directory when git-annex is interrupted be cleaned up promptly by subsequent git-annex processes. * The .git/annex/misctmp directory is no longer used and git-annex will delete anything lingering in there after it's 1 week old. Also, in Annex.Ingest, made the filename it uses in the tmp dir be prefixed with "ingest-" to avoid potentially using a filename used by some other code.	2019-01-17 16:02:22 -04:00
Joey Hess	c3afb3434d	remove recently added cache from KeyVariety Adding that field broke the Read/Show serialization back-compat, and also the Eq and Ord instances were not blinded to it, which broke git annex fsck and probably more. I think that the new approach used in formatKeyVariety will be nearly as fast, but have not benchmarked it.	2019-01-16 16:33:08 -04:00
Joey Hess	96aba8eff7	Revert "cache the serialization of a Key" This reverts commit `4536c93bb2`. That broke Read/Show of a Key, and unfortunately Key is read in at least one place; the GitAnnexDistribution data type. It would be worth bringing this optimisation back, but it would need either a custom Read/Show instance that preserves back-compat, or wrapping Key in a data type that contains the serialization, or changing how GitAnnexDistribution is serialized. Also, the Eq instance would need to compare keys with and without a cached seralization the same.	2019-01-16 16:21:59 -04:00
Joey Hess	4536c93bb2	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. It means that every place a Key has any of its fields changed, the cache has to be dropped. I've grepped and found them all. But, it would be better to avoid that gotcha somehow..	2019-01-14 16:37:28 -04:00
Joey Hess	d3ab5e626b	rename key2file and file2key What these generate is not really suitable to be used as a filename, which is why keyFile and fileKey further escape it. These are just serializing Keys. Also removed a quickcheck test that was very unlikely to test anything useful, since it relied on random chance creating something that looks like a serialized key. The other test is sufficient for testing what that was intended to test anyway.	2019-01-14 13:03:35 -04:00
Joey Hess	727767e1e2	make everything build again after ByteString Key changes	2019-01-11 16:39:46 -04:00
Joey Hess	151562b537	convert key2file and file2key to use builder and attoparsec The new parser is significantly stricter than the old one: The old file2key allowed the fields to come in any order, but the new one requires the fixed order that git-annex has always used. Hopefully this will not cause any breakage. And the old file2key allowed eg SHA1-m1-m2-m3-m4-m5-m6--xxxx while the new does not allow duplication of fields. This could potentially improve security, because allowing lots of extra junk like that in a key could potentially be used in a SHA1 collision attack, although the current attacks need binary data and not this kind of structured numeric data. Speed improved of course, and fairly substantially, in microbenchmarks: benchmarking old/key2file time 2.264 μs (2.257 μs .. 2.273 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.265 μs (2.260 μs .. 2.275 μs) std dev 21.17 ns (13.06 ns .. 39.26 ns) benchmarking new/key2file' time 1.744 μs (1.741 μs .. 1.747 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.745 μs (1.742 μs .. 1.751 μs) std dev 13.55 ns (9.099 ns .. 21.89 ns) benchmarking old/file2key time 6.114 μs (6.102 μs .. 6.129 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 6.118 μs (6.106 μs .. 6.143 μs) std dev 55.00 ns (30.08 ns .. 100.2 ns) benchmarking new/file2key' time 1.791 μs (1.782 μs .. 1.801 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 1.792 μs (1.785 μs .. 1.804 μs) std dev 32.46 ns (20.59 ns .. 50.82 ns) variance introduced by outliers: 19% (moderately inflated)	2019-01-11 16:33:42 -04:00
Joey Hess	b552551b33	use ByteString in Key for speed This is an easy win for parseKeyVariety: benchmarking old/parseKeyVariety time 1.515 μs (1.512 μs .. 1.517 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.515 μs (1.513 μs .. 1.517 μs) std dev 6.417 ns (4.992 ns .. 8.113 ns) benchmarking new/parseKeyVariety time 54.97 ns (54.70 ns .. 55.40 ns) 0.999 R² (0.999 R² .. 1.000 R²) mean 55.42 ns (55.05 ns .. 56.03 ns) std dev 1.562 ns (969.5 ps .. 2.442 ns) variance introduced by outliers: 44% (moderately inflated) For formatKeyVariety, using a Builder is marginally worse than building a String... (This is with criterion evaluating fully to nf not whnf) benchmarking old/formatKeyVariety time 434.3 ns (428.0 ns .. 440.4 ns) 0.999 R² (0.999 R² .. 1.000 R²) mean 430.6 ns (428.2 ns .. 433.9 ns) std dev 9.166 ns (6.932 ns .. 11.94 ns) variance introduced by outliers: 27% (moderately inflated) benchmarking Builder/formatKeyVariety time 526.5 ns (524.7 ns .. 528.8 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 526.1 ns (524.9 ns .. 528.5 ns) std dev 5.687 ns (3.762 ns .. 8.000 ns) Manually building the ByteString was better, but still slightly slower than String, due to innefficient need to S.pack . show the HashSize: benchmarking formatKeyVariety time 459.5 ns (455.8 ns .. 463.2 ns) 1.000 R² (0.999 R² .. 1.000 R²) mean 459.9 ns (457.4 ns .. 466.6 ns) std dev 11.65 ns (6.860 ns .. 21.41 ns) variance introduced by outliers: 35% (moderately inflated) So I cheated and made parseKeyVariety cache the original ByteString, for formatKeyVariety to use instead of re-building it. Final benchmark: benchmarking new/formatKeyVariety time 50.64 ns (50.57 ns .. 50.73 ns) 1.000 R² (0.999 R² .. 1.000 R²) mean 51.05 ns (50.60 ns .. 52.71 ns) std dev 2.790 ns (259.6 ps .. 5.916 ns) variance introduced by outliers: 75% (severely inflated) benchmarking new/parseKeyVariety time 71.88 ns (71.54 ns .. 72.24 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 71.97 ns (71.69 ns .. 72.47 ns) std dev 1.249 ns (910.7 ps .. 1.791 ns) variance introduced by outliers: 22% (moderately inflated)	2019-01-11 16:32:51 -04:00
Joey Hess	ed8d9a29fe	add missing case	2019-01-10 17:17:37 -04:00
Joey Hess	591e4b145f	convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log.	2019-01-10 16:34:20 -04:00
Joey Hess	6f66b53a30	newtype Group to ByteString This may speed up queries for things in groups, due to Eq and Ord being faster.	2019-01-09 15:05:49 -04:00
Joey Hess	2fef43dd71	convert all per-uuid log files to use Builder Mostly didn't push the ByteStrings down very deep, but all of these log files are not written to frequently at all, so slight remaining innefficiency doesn't matter. In Logs.UUID, removed the fixBadUUID code that cleaned up after a bug in git-annex versions 3.20111105-3.20111110. In the unlikely event that a repo was last touched by that ancient git-annex version, the descriptions of remotes would appear missing when used with this version of git-annex. That is such minor breakage, and so unlikely to still be a problem for any repos, that it was not worth forward-porting that code to ByteString.	2019-01-09 14:00:35 -04:00
Joey Hess	16c798b5ef	switch MetaValue to ByteString and MetaField to Text MetaField was already limited to alphanumerics, so it makes sense to use Text for it. Note that technically a UUID can contain invalid UTF-8, and so remoteMetaDataPrefix's use of T.pack . fromUUID could replace non-UTF8 values with '?' or whatever. In practice, a UUID is usually also text, I only kept open the possibility of it containing invalid UTF-8 to avoid breaking parsing of strange UUIDs in git-annex branch files. So, I decided to let this edge case slip by. Have not updated the rest of the code base yet for this change, as the change took 2.5 hours longer than I expected to get working properly.	2019-01-07 14:18:24 -04:00
Joey Hess	11d6e2e260	new improved benchmark command that can benchmark anything git-annex does	2019-01-04 13:46:36 -04:00

1 2 3 4 5 ...

491 commits