git-annex

Author	SHA1	Message	Date
Joey Hess	5ce61c6b2a	add: Significantly speed up adding lots of non-large files to git * add: Significantly speed up adding lots of non-large files to git, by disabling the annex smudge filter when running git add. * add --force-small: Run git add rather than updating the index itself, so any other smudge filters than the annex one that may be enabled will be used.	2021-01-04 13:12:28 -04:00
Joey Hess	c6e693b25d	remove ContentIndentifiersCidRemoteIndex uniqueness constraint For reasons explained in the bug report. Implemented using a persistent migration, which works fine. It may add a little startup overhead when a remote is enabled that uses this, but probably un-noticable. On the next major version, it would be fine to delete this database, and regenerate it from the git-annex branch information. Then this change could be reverted. Did nothing about adding back the data that got dropped from the db due to the bug. Only the borg special remote was probably affected, and it's not been released yet. rm -rf .git/annex/cidsdb does work.	2020-12-23 14:03:33 -04:00
Joey Hess	028d4517c7	enable extensions needed by new version of persistent Needed in order to use mkPersist in persistent version 2.11.0.1 persistent-template version 2.9.1.0	2020-11-07 14:09:17 -04:00
Joey Hess	1db49497e0	finished this stage of the RawFilePath conversion This commit was sponsored by Denis Dzyubenko on Patreon.	2020-11-06 14:10:58 -04:00
Joey Hess	2c8cf06e75	more RawFilePath conversion Converted file mode setting to it, and follow-on changes. Compiles up through 369/646. This commit was sponsored by Ethan Aubin.	2020-11-05 18:45:37 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	681b44236a	more RawFilePath conversion at 377/645 This commit was sponsored by Svenne Krap on Patreon.	2020-10-29 14:20:57 -04:00
Joey Hess	f45ad178cb	more RawFilePath conversion At 318/645 after 4k lines of changes This commit was sponsored by Jake Vosloo on Patreon.	2020-10-29 12:03:50 -04:00
Joey Hess	8d66f7ba0f	more RawFilePath conversion Added a RawFilePath createDirectory and kept making stuff build. Up to 296/645 This commit was sponsored by Mark Reidenbach on Patreon.	2020-10-28 17:25:59 -04:00
Joey Hess	10e62a810e	remove unused import	2020-10-02 13:45:09 -04:00
Joey Hess	37426920d8	Fix build with Benchmark build flag Broke a while ago during optimisation work, and not noticed since the flag is disabled by default. This commit was sponsored by Brock Spratlen on Patreon.	2020-10-02 13:30:24 -04:00
Joey Hess	aa1ad0b7ca	remove redundant imports Clean build under ghc 8.8.3, which seems to do better at finding cases where two imports both provide the same symbol, and warns about one of them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:05:34 -04:00
Joey Hess	529f488ec4	fix a thundering herd problem Avoid repeatedly opening keys db when accessing a local git remote and -J is used. What was happening was that Remote.Git.onLocal created a new annex state as each thread started up. The way the MVar was used did not prevent that. And that, in turn, led to repeated opening of the keys db, as well as probably other extra work or resource use. Also managed to get rid of Annex.remoteannexstate, and it turned out there was an unncessary Maybe in the keysdbhandle, since the handle starts out closed.	2020-04-17 17:09:29 -04:00
Joey Hess	6c81e0c8f1	ByteString Ref continued Several nice speed wins I think. At 340/633 files converted.	2020-04-07 13:27:11 -04:00
Joey Hess	279991604d	started converting Ref from String to ByteString This should make code that reads shas and refs from git faster. Does not compile yet, a lot needs to be done still.	2020-04-06 17:14:49 -04:00
Joey Hess	7f992ef59c	mostly finished with createDirectoryUnder conversion Remaining things needing converted are in the assistant, and Annex.Ssh. Every other remaining call to createDirectoryIfMissing True has been audited and is not relevant. The ones in Build/ of course don't get included in the program. Others included eg, Remote.Tahoe and Config.Files which both write to dotfiles under the home directory.	2020-03-06 11:57:15 -04:00
Joey Hess	6d58ca94d6	some easy createDirectoryUnder conversions	2020-03-05 15:20:10 -04:00
Joey Hess	029c883713	Merge branch 'master' into v8	2020-02-19 14:32:11 -04:00
Joey Hess	c9357bdc0e	ifdef persistent-template 2.8.0 fixes The i386ancient build has a ghc too old for these extensions. Build with persistent-template 2.8.0 tested.	2020-02-04 13:53:00 -04:00
Joey Hess	4920df6573	Fix build with newest version of persistent-template. This is untested because of rain, also I am operating from truncated copiler error messages in a bug report that also doesn't mention what the library version is. Still, it should work. May break builds with old ghc, in particular DerivingStrategies is I think fairly new? The pragmas could be ifdefed if necessary. Works with ghc 8.6.5.	2020-02-04 12:03:30 -04:00
Joey Hess	6db4aee7df	use --no-abbrev instead of --abbrev=40 This avoids hardcoding the sha size, so when git uses sha256, it will output the full sha256 and not a truncation to 40 characters. I reviewed git's history, and while there have been some bugs with commands not supporting --no-abbrev (eg git diff --no-index --no-abbrev was broken in git 2.1), none of the commands git-annex uses will be impacted by those old bugs.	2020-01-07 12:29:37 -04:00
Joey Hess	5e4deb3620	support sha256 git repos Git will eventually switch to sha2 and there will not be one single shaSize anymore, but two (40 and 64). Changed all parsers for git plumbing output to support both sizes of shas. One potential problem this does not deal with is, if somewhere in git-annex it reads two shas from different sources, and compares them to see if they're the same sha, it would fail if they're sha1 and sha256 of the same value. I don't know if that will really be a concern.	2020-01-07 12:22:19 -04:00
Joey Hess	02e00fd7ab	Merge branch 'master' into sqlite	2019-12-19 16:33:42 -04:00
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	535b153381	building again after merge Nice, several conversions fell out.	2019-12-18 15:02:46 -04:00
Joey Hess	d5628a16b8	Merge branch 'bs' into sqlite-bs	2019-12-18 14:51:03 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	2f9a80d803	merging sqlite and bs branches Since the sqlite branch uses blobs extensively, there are some performance benefits, ByteStrings now get stored and retrieved w/o conversion in some cases like in Database.Export.	2019-12-06 15:30:45 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	70a8716324	improve benchmark addAssociatedFileNewBench would sometimes pick a random number that a previous call had already added. Using a MVar, make it always advance, so the same behavior is benchmarked each time.	2019-11-22 13:20:22 -04:00
Joey Hess	7263aafd2b	Merge branch 'master' into sqlite	2019-11-22 12:49:35 -04:00
Joey Hess	92e1bb250b	simplify the name of the test cases	2019-11-21 17:38:58 -04:00
Joey Hess	d4661959de	Merge branch 'master' into sqlite	2019-11-21 17:26:50 -04:00
Joey Hess	25ba8156bc	improve benchmark --databases * benchmark: Changed --databases to take a parameter specifiying the size of the database to benchmark. * benchmark --databases: Display size of the populated database. * benchmark --databases: Improve the "addAssociatedFile to (new)" benchmark to really add new values, not overwriting old values.	2019-11-21 17:25:20 -04:00
Joey Hess	1b5f4b67b5	use new name for new format fsck db The old db is cleaned up when a new incremental fsck is started. The incremental fsck won't pick up where the old one left off, but I consider this a minor enough thing that it can just be documented and won't be a problem.	2019-11-06 16:27:25 -04:00
Joey Hess	d3e4de0175	fix test suite The test suite found a bug; select_ can fail now because a uniqueness constrain has been added. Now the test suite passes. Also, I'm satisfied the changed PersistField instances work. Looking over what changed, and what I've already tested, Key, FilePath, and InodeCache are known working; ContentIdentifier is trivial ByteString to blob; and SSha is trivial String to varchar. Both are tested by the test suite. I've also tested the new FileSize and EpochTime instances already, and they work.	2019-10-30 15:51:37 -04:00
Joey Hess	4940a135af	eliminate raw sql LIKE query	2019-10-30 15:19:52 -04:00
Joey Hess	9085a2cfec	make sure all sqlite selects have indexes Bearing in mind that these indexes are really uniqueness constraints that just happen to also make sqlite generate indexes. In Database.ContentIndentifier, the ContentIndentifiersKeyRemoteCidIndex is fine as a uniqueness constraint because it contains all rows from the table. The ContentIndentifiersCidRemoteIndex is also ok because there can only be one key for a given (cid, uuid) combination. In Database.Export, the new ExportTreeFileKeyIndex is the same pair as the old ExportTreeKeyFileIndex (previously ExportTreeIndex). And in Database.Keys.SQL, the new InodeCacheKeyIndex is the same pair as the old KeyInodeCacheIndex.	2019-10-30 13:46:52 -04:00
Joey Hess	3732f27722	document indexes This was really confusing, though the code was ok. I think I now understand it fully again.	2019-10-30 13:28:00 -04:00
Joey Hess	f6cfb84dfe	rename row	2019-10-30 13:22:41 -04:00
Joey Hess	aa27969e55	improve layout	2019-10-29 17:08:36 -04:00
Joey Hess	e8437ae7a3	removed now unused import	2019-10-29 17:07:15 -04:00
Joey Hess	c35a9047d3	improve data types for sqlite This is a non-backwards compatable change, so not suitable for merging w/o a annex.version bump and transition code. Not yet tested. This improves performance of git-annex benchmark --databases across the board by 10-25%, since eg Key roundtrips as a ByteString. (serializeKey' produces a lazy ByteString, so there is still a copy involved in converting it to a strict ByteString. It may be faster to switch to using bytestring-strict-builder.) FilePath and Key are both stored as blobs. This avoids mojibake in some situations. It would be possible to use varchar instead, if persistent could avoid converting that to Text, but it seems there is no good way to do so. See doc/todo/sqlite_database_improvements.mdwn Eliminated some ugly artifacts of using Read/Show serialization; constructors and quoted strings are no longer stored in sqlite. Renamed SRef to SSha to reflect that it is only ever a git sha, not a ref name. Since it is limited to the characters in a sha, it is not affected by mojibake, so still uses String.	2019-10-29 17:05:36 -04:00
Joey Hess	e1b21a0491	benchmark: Add --databases to benchmark sqlite databases Rescued from commit `11d6e2e260` which removed db benchmarks in favor of benchmarking arbitrary git-annex commands. Which is nice and general, but microbenchmarks are useful too.	2019-10-29 17:05:10 -04:00
Joey Hess	25f912de5b	benchmark: Add --databases to benchmark sqlite databases Rescued from commit `11d6e2e260` which removed db benchmarks in favor of benchmarking arbitrary git-annex commands. Which is nice and general, but microbenchmarks are useful too.	2019-10-29 16:59:27 -04:00
Joey Hess	0f7fd008d4	fix sql syntax	2019-10-24 11:57:17 -04:00
Joey Hess	098afe144e	display sqlite error message when it crashes	2019-10-24 11:50:55 -04:00
Joey Hess	94efc400e9	horrible impementation of isInodeKnown The only good thing about it is it does not require a major version bump to improve the database. That will need to happen at some point though. Potentially very very slow in a large repository. Ugly use of raw sql.	2019-10-23 14:37:29 -04:00

1 2 3 4

179 commits