git-annex

Author	SHA1	Message	Date
Joey Hess	7a42a47902	renaming	2020-07-10 14:17:35 -04:00
Joey Hess	4c9ad1de46	optimisation: stream keys through git cat-file --buffer This is only implemented for git-annex get so far. It makes git-annex get nearly twice as fast in a repo with 10k files, all of them present! But, see the TODO for some caveats.	2020-07-10 13:54:52 -04:00
Joey Hess	1df9e72a78	update	2020-07-10 13:31:47 -04:00
Joey Hess	bf72316b08	add function split out from CatFile	2020-07-10 13:28:16 -04:00
Joey Hess	6b9d1c1317	Merge branch 'master' of ssh://git-annex.branchable.com	2020-07-10 13:16:11 -04:00
Joey Hess	bd2d304064	better catObjectStream' and use Chan The catObjectStream' is generic enough to let it be nicely used from inside Annex monad. Chan will be faster than DList here. Bearing in mind, it is unbounded, but in reality will be bounded by the size of the stdio buffer through git cat-file. This speeds up --all by about 10% although I think only getting back to the previous performance before I introduced that DList.	2020-07-10 13:15:14 -04:00
Joey Hess	f63a7aa0e7	fix headTList to drop the head item	2020-07-10 13:02:32 -04:00
barak	59c90643cd	typo	2020-07-10 14:26:22 +00:00
Joey Hess	6e9fcf468d	streamkeys branch	2020-07-09 14:48:03 -04:00
Joey Hess	cb6e19f4c5	work around catObjectStream polymorism perf Breaking it up like this doesn't change perf, and lets another version be written in just a couple lines.	2020-07-09 14:27:07 -04:00
branchable@bafd175a4b99afd6ed72501042e364ebd3e0c45e	b2581d4dd1	Added a comment: I've moved my auto-sync-daemon script	2020-07-09 14:24:48 +00:00
branchable@bafd175a4b99afd6ed72501042e364ebd3e0c45e	bbc3800369	Added a comment: Update on my auto-commit / auto-sync scripts	2020-07-09 14:23:15 +00:00
Ilya_Shlyakhter	96aad5458b	Added a comment: re: git-annex-cat	2020-07-09 01:06:37 +00:00
Ilya_Shlyakhter	75b96059af	Added a comment: git-annex-cat	2020-07-09 00:21:02 +00:00
Joey Hess	9f6bd6cc05	add inRepoDetails planned to use for an optimisation most things using stagedDetails were not expecting to get dup files in a conflicted merge and deal with them, so converted them to use inRepoDetails.	2020-07-08 15:36:35 -04:00
Joey Hess	7347e50123	add stage number to stagedDetails parser And convert parser to attoparsec, probably faster. Before, a parse failure threw the whole --stage output line in to the filename, which was certianly a bad idea, so fixed that.	2020-07-08 15:05:12 -04:00
Joey Hess	c1eaf5b930	note	2020-07-08 14:21:37 -04:00
Joey Hess	d08c178f97	avoid catObjectStream skipping over unavailable shas Not needed as it's used for --all, but will be needed later.	2020-07-08 13:57:17 -04:00
Joey Hess	de3d7d044d	make catObjectStream support newline and carriage return in filenames Turns out the %(rest) trick was not needed. Instead, just maintain a list of files we've asked for, and each cat-file response is for the next file in the list. This actually benchmarks 25% faster than before! Very surprising, but it must be due to needing to shove less data through the pipe, and parse less.	2020-07-08 13:49:03 -04:00
Joey Hess	2cf6717aec	thoughts	2020-07-08 10:51:24 -04:00
Joey Hess	5849bd6340	Merge branch 'master' of ssh://git-annex.branchable.com	2020-07-07 16:50:26 -04:00
Joey Hess	afd9b2f667	idea	2020-07-07 16:49:44 -04:00
yarikoptic	c9d0bf0e6a	reassign to datalad - generic enhancement	2020-07-07 19:05:59 +00:00
Joey Hess	ba0adefe4c	Merge branch 'master' of ssh://git-annex.branchable.com	2020-07-07 14:19:46 -04:00
Joey Hess	9483b10469	cache one more log file for metadata My worry was that a preferred content expression that matches on metadata would have removed the location log from cache, causing an expensive re-read when a Seek action later checked the location log. Especially when the --all optimisation in the previous commit pre-cached the location log. This also means that the --all optimisation could cache the metadata log too, if it wanted too, but not currently done. The cache is a list, with the most recently accessed file first. That optimises it for the common case of reading the same file twice, eg a get, examine, followed by set reads it twice. And sync --content reads the location log 3 times in a row commonly. But, as a list, it should not be made to be too long. I thought about expanding it to 5 items, but that seemed unlikely to be a win commonly enough to outweigh the extra time spent checking the cache. Clearly there could be some further benchmarking and tuning here.	2020-07-07 14:18:55 -04:00
Joey Hess	d010ab04be	sped up the --all option by 2x to 16x by using git cat-file --buffer This assumes that no location log files will have a newline or carriage return in their name. catObjectStream skips any such files due to cat-file not supporting them. Keys have been prevented from containing newlines since 2011, commit `480495beb4`. If some old repo had a key with a newline in it, --all will just skip processing that key. Other things, like .git/annex/unused files certianly assume no newlines in keys too, and AFAICR, such keys never actually worked. Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys generated before that point could perhaps contain a CR. (URL probably not, http probably doesn't support an URL with a raw CR in it.) So, added a warning in fsck about such keys. Although, fsck --all will naturally skip them, so won't be able to warn about them. Not entirely satisfactory, but I'll bet there are not really any such keys in existence. Thanks to Lukey for finding this optimisation.	2020-07-07 13:54:04 -04:00
Joey Hess	98e2e3cb9c	optimise logfile to key parsing Using bytestring-filepath	2020-07-07 13:03:33 -04:00
flpgdt@f64318f00d9e1c9535e11f5d27c80c1d799cce00	c4994c608e	Added a comment	2020-07-07 16:54:46 +00:00
kyle	a5142499ce	rsync hang	2020-07-07 15:26:38 +00:00
timothy.sanders@a7ce3a8bae11a60e0c4cda9cb4aef24ec459bbab	3b6754e2a5		2020-07-07 10:26:00 +00:00
timothy.sanders@a7ce3a8bae11a60e0c4cda9cb4aef24ec459bbab	8a9323f5b5		2020-07-07 10:24:29 +00:00
Lukey	56f5d99ceb	Added a comment	2020-07-06 21:20:58 +00:00
Joey Hess	9468675ba9	note	2020-07-06 15:12:26 -04:00
Joey Hess	d66fc1a464	Revert "async exception safety for coprocesses" This reverts commit `7013798df5`.	2020-07-06 15:11:28 -04:00
Joey Hess	6b8c961e1f	some analysis but stuck	2020-07-06 14:46:05 -04:00
Joey Hess	dfa1c21b8a	comment and update changelog with benchmark results	2020-07-06 13:39:42 -04:00
Joey Hess	0518b62d2b	update	2020-07-06 12:58:29 -04:00
Joey Hess	e72ec8b9b2	add back git-annex branch read cache The cache was removed way back in 2012, commit `3417c55189` Then I forgot I had removed it! I remember clearly multiple times when I thought, "this reads the same data twice, but the cache will avoid that being very expensive". The reason it was removed was it messed up the assistant noticing when other processes made changes. That same kind of problem has recently been addressed when adding the optimisation to avoid reading the journal unnecessarily. Indeed, enableInteractiveJournalAccess is run in just the right places, so can just piggyback on it to know when it's not safe to use the cache.	2020-07-06 12:22:33 -04:00
Joey Hess	9a2fbc2ea8	comment	2020-07-06 11:58:14 -04:00
Joey Hess	27bbeea00e	close	2020-07-06 10:49:06 -04:00
Joey Hess	e2a4c49004	Merge branch 'master' of ssh://git-annex.branchable.com	2020-07-06 10:46:52 -04:00
andrew	48a6978d55	Added a comment: transfer repos	2020-07-05 17:25:49 +00:00
Joey Hess	4ac504cd2e	update	2020-07-05 11:12:10 -04:00
flpgdt@f64318f00d9e1c9535e11f5d27c80c1d799cce00	bbd5d1503d		2020-07-04 23:47:54 +00:00
jenkin.schibel@286264d9ceb79998aecff0d5d1a4ffe34f8b8421	151efb9c3c	Added a comment: problem fixed itself	2020-07-04 03:28:26 +00:00
jenkin.schibel@286264d9ceb79998aecff0d5d1a4ffe34f8b8421	e40ba93924	Added a comment: problem fixed itself	2020-07-04 03:27:59 +00:00
Ilya_Shlyakhter	f6af30a7af	Added a comment	2020-07-03 19:55:36 +00:00
Joey Hess	52e72f878e	expand	2020-07-03 14:42:04 -04:00
Joey Hess	c016527cb5	link	2020-07-03 14:40:13 -04:00
Joey Hess	d89b52086e	close	2020-07-03 14:31:12 -04:00

1 2 3 4 5 ...

37489 commits