Commit graph

38075 commits

Author SHA1 Message Date
Joey Hess
9f6bd6cc05
add inRepoDetails
planned to use for an optimisation

most things using stagedDetails were not expecting to get dup files in a
conflicted merge and deal with them, so converted them to use
inRepoDetails.
2020-07-08 15:36:35 -04:00
Joey Hess
7347e50123
add stage number to stagedDetails parser
And convert parser to attoparsec, probably faster.

Before, a parse failure threw the whole --stage output line in to the
filename, which was certianly a bad idea, so fixed that.
2020-07-08 15:05:12 -04:00
Joey Hess
c1eaf5b930
note 2020-07-08 14:21:37 -04:00
Joey Hess
d08c178f97
avoid catObjectStream skipping over unavailable shas
Not needed as it's used for --all, but will be needed later.
2020-07-08 13:57:17 -04:00
Joey Hess
de3d7d044d
make catObjectStream support newline and carriage return in filenames
Turns out the %(rest) trick was not needed. Instead, just maintain a
list of files we've asked for, and each cat-file response is for the
next file in the list.

This actually benchmarks 25% faster than before! Very surprising, but it
must be due to needing to shove less data through the pipe, and parse
less.
2020-07-08 13:49:03 -04:00
Joey Hess
2cf6717aec
thoughts 2020-07-08 10:51:24 -04:00
Joey Hess
5849bd6340
Merge branch 'master' of ssh://git-annex.branchable.com 2020-07-07 16:50:26 -04:00
Joey Hess
afd9b2f667
idea 2020-07-07 16:49:44 -04:00
yarikoptic
c9d0bf0e6a reassign to datalad - generic enhancement 2020-07-07 19:05:59 +00:00
Joey Hess
ba0adefe4c
Merge branch 'master' of ssh://git-annex.branchable.com 2020-07-07 14:19:46 -04:00
Joey Hess
9483b10469
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.

Especially when the --all optimisation in the previous commit
pre-cached the location log.

This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.

The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.

But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.

Clearly there could be some further benchmarking and tuning here.
2020-07-07 14:18:55 -04:00
Joey Hess
d010ab04be
sped up the --all option by 2x to 16x by using git cat-file --buffer
This assumes that no location log files will have a newline or carriage
return in their name. catObjectStream skips any such files due to
cat-file not supporting them.

Keys have been prevented from containing newlines since 2011,
commit 480495beb4. If some old repo
had a key with a newline in it, --all will just skip processing that key.
Other things, like .git/annex/unused files certianly assume no newlines in
keys too, and AFAICR, such keys never actually worked.

Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys
generated before that point could perhaps contain a CR. (URL probably not,
http probably doesn't support an URL with a raw CR in it.) So, added
a warning in fsck about such keys. Although, fsck --all will naturally
skip them, so won't be able to warn about them. Not entirely
satisfactory, but I'll bet there are not really any such keys in
existence.

Thanks to Lukey for finding this optimisation.
2020-07-07 13:54:04 -04:00
Joey Hess
98e2e3cb9c
optimise logfile to key parsing
Using bytestring-filepath
2020-07-07 13:03:33 -04:00
flpgdt@f64318f00d9e1c9535e11f5d27c80c1d799cce00
c4994c608e Added a comment 2020-07-07 16:54:46 +00:00
kyle
a5142499ce rsync hang 2020-07-07 15:26:38 +00:00
timothy.sanders@a7ce3a8bae11a60e0c4cda9cb4aef24ec459bbab
3b6754e2a5 2020-07-07 10:26:00 +00:00
timothy.sanders@a7ce3a8bae11a60e0c4cda9cb4aef24ec459bbab
8a9323f5b5 2020-07-07 10:24:29 +00:00
Lukey
56f5d99ceb Added a comment 2020-07-06 21:20:58 +00:00
Joey Hess
9468675ba9
note 2020-07-06 15:12:26 -04:00
Joey Hess
d66fc1a464
Revert "async exception safety for coprocesses"
This reverts commit 7013798df5.
2020-07-06 15:11:28 -04:00
Joey Hess
6b8c961e1f
some analysis but stuck 2020-07-06 14:46:05 -04:00
Joey Hess
dfa1c21b8a
comment
and update changelog with benchmark results
2020-07-06 13:39:42 -04:00
Joey Hess
0518b62d2b
update 2020-07-06 12:58:29 -04:00
Joey Hess
e72ec8b9b2
add back git-annex branch read cache
The cache was removed way back in 2012,
commit 3417c55189

Then I forgot I had removed it! I remember clearly multiple times when I
thought, "this reads the same data twice, but the cache will avoid that
being very expensive".

The reason it was removed was it messed up the assistant noticing when
other processes made changes. That same kind of problem has recently
been addressed when adding the optimisation to avoid reading the journal
unnecessarily.

Indeed, enableInteractiveJournalAccess is run in just the
right places, so can just piggyback on it to know when it's not safe
to use the cache.
2020-07-06 12:22:33 -04:00
Joey Hess
9a2fbc2ea8
comment 2020-07-06 11:58:14 -04:00
Joey Hess
27bbeea00e
close 2020-07-06 10:49:06 -04:00
Joey Hess
e2a4c49004
Merge branch 'master' of ssh://git-annex.branchable.com 2020-07-06 10:46:52 -04:00
andrew
48a6978d55 Added a comment: transfer repos 2020-07-05 17:25:49 +00:00
Joey Hess
4ac504cd2e
update 2020-07-05 11:12:10 -04:00
flpgdt@f64318f00d9e1c9535e11f5d27c80c1d799cce00
bbd5d1503d 2020-07-04 23:47:54 +00:00
jenkin.schibel@286264d9ceb79998aecff0d5d1a4ffe34f8b8421
151efb9c3c Added a comment: problem fixed itself 2020-07-04 03:28:26 +00:00
jenkin.schibel@286264d9ceb79998aecff0d5d1a4ffe34f8b8421
e40ba93924 Added a comment: problem fixed itself 2020-07-04 03:27:59 +00:00
Ilya_Shlyakhter
f6af30a7af Added a comment 2020-07-03 19:55:36 +00:00
Joey Hess
52e72f878e
expand 2020-07-03 14:42:04 -04:00
Joey Hess
c016527cb5
link 2020-07-03 14:40:13 -04:00
Joey Hess
d89b52086e
close 2020-07-03 14:31:12 -04:00
Joey Hess
b1fb4f9f58
clarify 2020-07-03 14:31:08 -04:00
Joey Hess
28fee95d4e
extend proposed interface with IMPORTKEY 2020-07-03 14:23:04 -04:00
Joey Hess
57cceac569
simplify interface by removing size
Add size to the returned key after the fact, unless the remote happened
to add it itself.
2020-07-03 14:22:22 -04:00
Joey Hess
85cd79ea01
no importKey for android yet
adb shell has sha256sum sha1sum and some others, so they could be used.
They're provided by toybox, so seem about as likely to keep
working as find and stat, which it already depends on.

Or to not add a dep, could use stat the same as getExportContentIdentifier
to get a mtime, and make a WORM key. But do I really want this to
default to WORM?

Unsure what's the best path, so punting for now.
2020-07-03 14:02:50 -04:00
Joey Hess
ddcab38e4a
no importKey for S3 for now
The Etag is sometimes a md5, but not if eg, there was a multipart
upload.

May revisit later if there's demand.
2020-07-03 13:53:14 -04:00
Joey Hess
85506a7015
import: Added --no-content option, which avoids downloading files from a special remote
Only supported by some special remotes: directory
I need to check the rest and they're currently missing methods until I do.

git-annex sync --no-content does not yet use this to do imports
2020-07-03 13:41:57 -04:00
Joey Hess
a8099b9896
thought 2020-07-03 12:02:07 -04:00
Joey Hess
8ca0cdb0ac
reflow to fix man page conversion 2020-07-03 11:54:21 -04:00
Joey Hess
6953cc830a
Merge branch 'master' of ssh://git-annex.branchable.com 2020-07-03 10:34:18 -04:00
petersjt014@c20995a2e26a13566cd49aef30a99a7dff732a4e
668f3278b9 2020-07-03 07:36:41 +00:00
petersjt014@c20995a2e26a13566cd49aef30a99a7dff732a4e
ba209702c1 Added a comment: tor remote 2020-07-03 07:33:54 +00:00
Joey Hess
89108d6f5a
thought 2020-07-02 21:56:00 -04:00
Joey Hess
e463ef1b91
comment 2020-07-02 20:13:19 -04:00
Joey Hess
a3e125c1fe
move to re-notify 2020-07-02 20:05:54 -04:00