git-annex

Author	SHA1	Message	Date
Joey Hess	029c883713	Merge branch 'master' into v8	2020-02-19 14:32:11 -04:00
Joey Hess	c9357bdc0e	ifdef persistent-template 2.8.0 fixes The i386ancient build has a ghc too old for these extensions. Build with persistent-template 2.8.0 tested.	2020-02-04 13:53:00 -04:00
Joey Hess	4920df6573	Fix build with newest version of persistent-template. This is untested because of rain, also I am operating from truncated copiler error messages in a bug report that also doesn't mention what the library version is. Still, it should work. May break builds with old ghc, in particular DerivingStrategies is I think fairly new? The pragmas could be ifdefed if necessary. Works with ghc 8.6.5.	2020-02-04 12:03:30 -04:00
Joey Hess	6db4aee7df	use --no-abbrev instead of --abbrev=40 This avoids hardcoding the sha size, so when git uses sha256, it will output the full sha256 and not a truncation to 40 characters. I reviewed git's history, and while there have been some bugs with commands not supporting --no-abbrev (eg git diff --no-index --no-abbrev was broken in git 2.1), none of the commands git-annex uses will be impacted by those old bugs.	2020-01-07 12:29:37 -04:00
Joey Hess	5e4deb3620	support sha256 git repos Git will eventually switch to sha2 and there will not be one single shaSize anymore, but two (40 and 64). Changed all parsers for git plumbing output to support both sizes of shas. One potential problem this does not deal with is, if somewhere in git-annex it reads two shas from different sources, and compares them to see if they're the same sha, it would fail if they're sha1 and sha256 of the same value. I don't know if that will really be a concern.	2020-01-07 12:22:19 -04:00
Joey Hess	02e00fd7ab	Merge branch 'master' into sqlite	2019-12-19 16:33:42 -04:00
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	535b153381	building again after merge Nice, several conversions fell out.	2019-12-18 15:02:46 -04:00
Joey Hess	d5628a16b8	Merge branch 'bs' into sqlite-bs	2019-12-18 14:51:03 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	2f9a80d803	merging sqlite and bs branches Since the sqlite branch uses blobs extensively, there are some performance benefits, ByteStrings now get stored and retrieved w/o conversion in some cases like in Database.Export.	2019-12-06 15:30:45 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	70a8716324	improve benchmark addAssociatedFileNewBench would sometimes pick a random number that a previous call had already added. Using a MVar, make it always advance, so the same behavior is benchmarked each time.	2019-11-22 13:20:22 -04:00
Joey Hess	7263aafd2b	Merge branch 'master' into sqlite	2019-11-22 12:49:35 -04:00
Joey Hess	92e1bb250b	simplify the name of the test cases	2019-11-21 17:38:58 -04:00
Joey Hess	d4661959de	Merge branch 'master' into sqlite	2019-11-21 17:26:50 -04:00
Joey Hess	25ba8156bc	improve benchmark --databases * benchmark: Changed --databases to take a parameter specifiying the size of the database to benchmark. * benchmark --databases: Display size of the populated database. * benchmark --databases: Improve the "addAssociatedFile to (new)" benchmark to really add new values, not overwriting old values.	2019-11-21 17:25:20 -04:00
Joey Hess	1b5f4b67b5	use new name for new format fsck db The old db is cleaned up when a new incremental fsck is started. The incremental fsck won't pick up where the old one left off, but I consider this a minor enough thing that it can just be documented and won't be a problem.	2019-11-06 16:27:25 -04:00
Joey Hess	d3e4de0175	fix test suite The test suite found a bug; select_ can fail now because a uniqueness constrain has been added. Now the test suite passes. Also, I'm satisfied the changed PersistField instances work. Looking over what changed, and what I've already tested, Key, FilePath, and InodeCache are known working; ContentIdentifier is trivial ByteString to blob; and SSha is trivial String to varchar. Both are tested by the test suite. I've also tested the new FileSize and EpochTime instances already, and they work.	2019-10-30 15:51:37 -04:00
Joey Hess	4940a135af	eliminate raw sql LIKE query	2019-10-30 15:19:52 -04:00
Joey Hess	9085a2cfec	make sure all sqlite selects have indexes Bearing in mind that these indexes are really uniqueness constraints that just happen to also make sqlite generate indexes. In Database.ContentIndentifier, the ContentIndentifiersKeyRemoteCidIndex is fine as a uniqueness constraint because it contains all rows from the table. The ContentIndentifiersCidRemoteIndex is also ok because there can only be one key for a given (cid, uuid) combination. In Database.Export, the new ExportTreeFileKeyIndex is the same pair as the old ExportTreeKeyFileIndex (previously ExportTreeIndex). And in Database.Keys.SQL, the new InodeCacheKeyIndex is the same pair as the old KeyInodeCacheIndex.	2019-10-30 13:46:52 -04:00
Joey Hess	3732f27722	document indexes This was really confusing, though the code was ok. I think I now understand it fully again.	2019-10-30 13:28:00 -04:00
Joey Hess	f6cfb84dfe	rename row	2019-10-30 13:22:41 -04:00
Joey Hess	aa27969e55	improve layout	2019-10-29 17:08:36 -04:00
Joey Hess	e8437ae7a3	removed now unused import	2019-10-29 17:07:15 -04:00
Joey Hess	c35a9047d3	improve data types for sqlite This is a non-backwards compatable change, so not suitable for merging w/o a annex.version bump and transition code. Not yet tested. This improves performance of git-annex benchmark --databases across the board by 10-25%, since eg Key roundtrips as a ByteString. (serializeKey' produces a lazy ByteString, so there is still a copy involved in converting it to a strict ByteString. It may be faster to switch to using bytestring-strict-builder.) FilePath and Key are both stored as blobs. This avoids mojibake in some situations. It would be possible to use varchar instead, if persistent could avoid converting that to Text, but it seems there is no good way to do so. See doc/todo/sqlite_database_improvements.mdwn Eliminated some ugly artifacts of using Read/Show serialization; constructors and quoted strings are no longer stored in sqlite. Renamed SRef to SSha to reflect that it is only ever a git sha, not a ref name. Since it is limited to the characters in a sha, it is not affected by mojibake, so still uses String.	2019-10-29 17:05:36 -04:00
Joey Hess	e1b21a0491	benchmark: Add --databases to benchmark sqlite databases Rescued from commit `11d6e2e260` which removed db benchmarks in favor of benchmarking arbitrary git-annex commands. Which is nice and general, but microbenchmarks are useful too.	2019-10-29 17:05:10 -04:00
Joey Hess	25f912de5b	benchmark: Add --databases to benchmark sqlite databases Rescued from commit `11d6e2e260` which removed db benchmarks in favor of benchmarking arbitrary git-annex commands. Which is nice and general, but microbenchmarks are useful too.	2019-10-29 16:59:27 -04:00
Joey Hess	0f7fd008d4	fix sql syntax	2019-10-24 11:57:17 -04:00
Joey Hess	098afe144e	display sqlite error message when it crashes	2019-10-24 11:50:55 -04:00
Joey Hess	94efc400e9	horrible impementation of isInodeKnown The only good thing about it is it does not require a major version bump to improve the database. That will need to happen at some point though. Potentially very very slow in a large repository. Ugly use of raw sql.	2019-10-23 14:37:29 -04:00
Joey Hess	904b175707	Fix build with persistent-2.10. Added an additional constraint that persistent needs. This also builds with persistent-2.9.2 without needing any cpp.	2019-10-17 11:58:31 -04:00
Joey Hess	9828f45d85	add RemoteStateHandle This solves the problem of sameas remotes trampling over per-remote state. Used for: * per-remote state, of course * per-remote metadata, also of course * per-remote content identifiers, because two remote implementations could in theory generate the same content identifier for two different peices of content While chunk logs are per-remote data, they don't use this, because the number and size of chunks stored is a common property across sameas remotes. External special remote had a complication, where it was theoretically possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or EXPORTSUPPORTED. Since the uuid of the remote is typically generate in Remote.setup, it would only be possible to pass a Maybe RemoteStateHandle into it, and it would otherwise have to construct its own. Rather than go that route, I decided to send an ERROR in this case. It seems unlikely that any existing external special remote will be affected. They would have to make up a git-annex key, and set state for some reason during INITREMOTE. I can imagine such a hack, but it doesn't seem worth complicating the code in such an ugly way to support it. Unfortunately, both TestRemote and Annex.Import needed the Remote to have a new field added that holds its RemoteStateHandle.	2019-10-14 13:51:42 -04:00
Joey Hess	2e6fd5de71	fix flipped diffUTCTime fsck --incremental/--more: Fix bug that prevented the incremental fsck information from being updated every 5 minutes as it was supposed to be; it was only updated after 1000 files were checked, which may be more files that are possible to fsck in a given fsck time window. Thanks to Peter Simons for help with analysis of this bug. Auditing for other cases of the same mistake, the keys db also had it backwards. This seems unlikely to really have been a problem; it would need associated files updates etc to be coming in slowly for some reason and then be interrupted to cause any problem. IIRC the design of the keys db assumes that any interruped operation will be restarted, and so it can lose any buffered database updates safely.	2019-10-03 09:54:19 -04:00
Joey Hess	9628ae2e67	Close sqlite databases more robustly. Had a report of close throwing ErrorBusy on CIFS. Retrying up to 16 seconds is a balance between hopefully waiting long enough for the problem to clear up and waiting so long that git-annex seems to hang. The new dependency is free; persistent depends on unliftio-core.	2019-09-26 12:25:21 -04:00
Joey Hess	3f0eef4baa	v7 for all repositories * Default to v7 for new repositories. * Automatically upgrade v5 repositories to v7.	2019-08-30 14:09:14 -04:00
Joey Hess	018b5b8173	Support building with socks-0.6 and persistant-template-2.7 persistent-template now needs UndecidableInstances. socks changed defaultSocksConf to take a SockAddr.	2019-07-30 12:50:48 -04:00
Joey Hess	9a5ddda511	remove many old version ifdefs Drop support for building with ghc older than 8.4.4, and with older versions of serveral haskell libraries than will be included in Debian 10. The only remaining version ifdefs in the entire code base are now a couple for aws! This commit should only be merged after the Debian 10 release. And perhaps it will need to wait longer than that; it would make backporting new versions of git-annex to Debian 9 (stretch) which has been actively happening as recently as this year. This commit was sponsored by Ilya Shlyakhter.	2019-07-05 15:09:37 -04:00
Joey Hess	6babb2c73f	remove wrong uniqueness constraint from ContentIdentifier db Fix bug that caused importing from a special remote to repeatedly download unchanged files when multiple files in the remote have the same content. Unfortunately, there's really no good way to remove a uniqueness constraint from a sqlite database. The best that can be done is to make a new table and copy the data over. But that would require using persistent's migrations or raw sql, and I don't want to do either. Instead, a sledgehammer approach: Renamed .git/annex/cid to .git/annex/cids. When the new database doesn't exist, it will be populated from the git-annex branch. Noting deletes the old database. Don't want to delete it out from under some long-running git-annex process that might be using it. It could eventually be deleted. But this is such a new feature, probably few repos have the database in any case.	2019-04-09 19:58:24 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	e3a704224f	fix export db locking deadlock	2019-03-07 16:06:02 -04:00
Joey Hess	9a72785307	fixes to export db lookup when accessing importtree=yes Now in a fresh clone with a importtree=yes remote enabled, git annex fsck --from the remote works.	2019-03-07 14:10:56 -04:00
Joey Hess	50797ee2c5	remove obsolete comment	2019-03-07 13:02:46 -04:00
Joey Hess	71fec9060c	move	2019-03-07 12:56:40 -04:00
Joey Hess	ee251b2e2e	implement updating the ContentIdentifier db with info from the git-annex branch untested This won't be super slow, but it does need to diff two likely large trees, and since the git-annex branch rarely sits still, it will most likely be run at the beginning of every import. A possible speed improvement would be to only run this when the database did not contain a ContentIdentifier. But that would only speed up imports when there is no new version of a file on the special remote, at most renames of existing files being imported. A better speed improvement would be to record something in the git-annex branch that indicates when an import has been run, and only do the diff if the git-annex branch has record of a newer import than we've seen before. Then, it would only run when there is in fact new ContentIdentifier information available from a remote. Certianly doable, but didn't want to complicate things yet.	2019-03-06 18:04:30 -04:00
Joey Hess	f85f06aae3	change to more efficient IKey	2019-03-06 11:14:33 -04:00
Joey Hess	3c652e1499	limit to requested remote	2019-03-05 15:56:28 -04:00
Joey Hess	cd3a2b023a	initial try at using storeExportWithContentIdentifier Untested, and I'm not sure about the locking of the ContentIdentifier db.	2019-03-04 17:50:41 -04:00

1 2 3 4

162 commits