Commit graph

430 commits

Author SHA1 Message Date
Joey Hess
7b2d236556
importfeed: stream metadata for 5% speedup
On top of the 10% speedup from streaming url logs.
2020-07-14 14:35:26 -04:00
Joey Hess
535cdc8d48
importfeed: Made checking known urls step around 10% faster.
This was a bit disappointing, I was hoping for a 2x speedup. But, I think
the metadata lookup is wasting a lot of time and also needs to be made to
stream.

The changes to catObjectStreamLsTree were benchmarked to not also speed
up --all around 3% more. Seems I managed to make it polymorphic after all.
2020-07-14 12:47:51 -04:00
Joey Hess
9f6bd6cc05
add inRepoDetails
planned to use for an optimisation

most things using stagedDetails were not expecting to get dup files in a
conflicted merge and deal with them, so converted them to use
inRepoDetails.
2020-07-08 15:36:35 -04:00
Joey Hess
7347e50123
add stage number to stagedDetails parser
And convert parser to attoparsec, probably faster.

Before, a parse failure threw the whole --stage output line in to the
filename, which was certianly a bad idea, so fixed that.
2020-07-08 15:05:12 -04:00
Joey Hess
57b89c635f
support required groupwanted
When the required content is set to "groupwanted", use whatever expression
has been set in groupwanted as the required content of the repo, similar to
how setting required content to "standard" already worked.
2020-04-28 13:31:26 -04:00
Joey Hess
f85ca7dc80
fix all remaining -Wincomplete-uni-patterns warnings
A couple of these were probably actual bugs in edge cases. Most of the
changes I'm fine with. The fact that aeson's object returns sometihng
that we know will be an Object, but the type checker does not know is
kind of annoying.
2020-04-15 13:55:08 -04:00
Joey Hess
0e4c92503e
fix warning
I don't think the NoConfigValue case ever actually occurs here.
2020-04-15 13:04:00 -04:00
Joey Hess
9cb69dbb76
support boolean git configs that are represented by the name of the setting with no value
Eg"core.bare" is the same as "core.bare = true".

Note that git treats "core.bare =" the same as "core.bare = false", so the
code had to become more complicated in order to treat the absense of a
value differently than an empty value. Ugh.
2020-04-13 13:35:22 -04:00
Joey Hess
aeca7c2207
Sped up query commands that read the git-annex branch by around 5%
The only price paid is one additional MVar read per write to the journal.
Presumably writing a journal file dominiates over a MVar read time by
several orders of magnitude.

--batch does not get the speedup because then it needs to notice when
another process has made a change. Also made the assistant and other damon
modes bypass the optimisation, which would not help them anyway.
2020-04-09 13:54:43 -04:00
Joey Hess
c0cd07c36b
Ref ByteString conversion done
Test suite passes.
2020-04-07 17:41:09 -04:00
Joey Hess
6c81e0c8f1
ByteString Ref continued
Several nice speed wins I think.

At 340/633 files converted.
2020-04-07 13:27:11 -04:00
Joey Hess
eaa49ab53d
convert replaceFile to createDirectoryUnder
Since it was used on both worktree and .git/annex files, split into
multiple functions.

In passing, this also improves permissions of created directories in
.git/annex, using createAnnexDirectory on those.
2020-03-06 11:31:01 -04:00
Joey Hess
879f52a116
annex.tune.branchhash1=true bugfix
Fix support for repositories tuned with annex.tune.branchhash1=true,
including --all not working and git-annex log not displaying anything for
annexed files.
2020-02-14 15:22:48 -04:00
Joey Hess
71ecfbfccf
be stricter about rejecting invalid configurations for remotes
This is a first step toward that goal, using the ProposedAccepted type
in RemoteConfig lets initremote/enableremote reject bad parameters that
were passed in a remote's configuration, while avoiding enableremote
rejecting bad parameters that have already been stored in remote.log

This does not eliminate every place where a remote config is parsed and a
default value is used if the parse false. But, I did fix several
things that expected foo=yes/no and so confusingly accepted foo=true but
treated it like foo=no. There are still some fields that are parsed with
yesNo but not not checked when initializing a remote, and there are other
fields that are parsed in other ways and not checked when initializing a
remote.

This also lays groundwork for rejecting unknown/typoed config keys.
2020-01-10 14:52:48 -04:00
Joey Hess
37467a008f
annex.addunlocked expressions
* annex.addunlocked can be set to an expression with the same format used by
  annex.largefiles, in case you want to default to unlocking some files but
  not others.
* annex.addunlocked can be configured by git-annex config.

Added a git-annex-matching-expression man page, broken out from
tips/largefiles.

A tricky consequence of this is that git-annex add --relaxed
honors annex.addunlocked, but an expression might want to know the size
or content of an url, which it's not going to download. I decided it was
better not to fail, and just dummy up some plausible data in that case.

Performance impact should be negligible. The global config is already
loaded for annex.largefiles. The expression only has to be parsed once,
and in the simple true/false case, it should not do any additional work
matching it.
2019-12-20 15:56:25 -04:00
Joey Hess
ce3fb0b2e5
fixed an oversight that had always prevented annex.resolvemerge from being honored, when it was configured by git-annex config
forgot to add it to the merge function
2019-12-20 11:00:08 -04:00
Joey Hess
686791c4ed
more RawFilePath
Remove dup definitions and just use the RawFilePath one. </> etc are
enough faster that it's probably faster than building a String directly,
although I have not benchmarked.
2019-12-18 17:10:28 -04:00
Joey Hess
bdec7fed9c
convert TopFilePath to use RawFilePath
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.

Git.Repo also changed to use RawFilePath for the path to the repo.

This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
2019-12-09 15:07:21 -04:00
Joey Hess
c20f4704a7
all commands building except for assistant
also, changed ConfigValue to a newtype, and moved it into Git.Config.
2019-12-05 14:41:18 -04:00
Joey Hess
b88f89c1ef
get the most commonly used commands building again
A quick benchmark of whereis shows not much speed improvement, maybe a
few percent. Profiling it found a hotspot, adds to todo.
2019-12-04 13:45:18 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
81d402216d cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:

Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.

The Eq instance is fixed.

Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.

Used git-annex benchmark:
  find is 7% faster
  whereis is 3% faster
  get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
2019-11-22 17:49:16 -04:00
Joey Hess
ce48eb797c
make DropDead transition minimize remote.log for dead sameas remotes
All that needs to be retained in remote.log is the sameas-uuid.
The rest of the config is eliminated. This doesn't save enough space to
bother with, but it prevents anything sensitive in the config of the
dead sameas remote from lingering around.

Note that minimizesameasdead does not update the VectorClock when
changing the log line. That's normally a no-no, but in this case,
it makes each DropDead result in the exact same file contents,
and vector clocks are not needed because the transition breaks
the history chain.
2019-10-15 11:39:25 -04:00
Joey Hess
78f522e7c8
forgot to add this new file 2019-10-14 16:09:04 -04:00
Joey Hess
5e9a2cc37f
forget state of sameas remotes during DropDead transitions
It would have been a lot less round-about to just make git annex dead
also add the uuids of sameas remotes to the trust.log as dead.

But, that would fail in the case where there's an unmerged other clone
that has a sameas remote that the current repo does not know about.
Then it would not get marked as dead.

Handling it at transition time avoids that scenario.

Note that the generation of trustmap' in dropDead should only
happen once, due to the partial application.
2019-10-14 15:47:42 -04:00
Joey Hess
9828f45d85
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:

* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
  could in theory generate the same content identifier for two different
  peices of content

While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.

External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.

Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 13:51:42 -04:00
Joey Hess
91eed85fd4
add sameas inherited configs to newConfig
This makes initremote --sameas work with encryption inherited.
2019-10-11 13:05:20 -04:00
Joey Hess
c3975ff3b4
sameas RemoteConfig inheritance
I found a way to avoid inheritance complicating anything outside of
Logs.Remote. It seems fine to require all inherited values to be
inherited and not set in the sameas remote's config. Since inherited
values will be used for stuff like encryption and perhaps chunking, which
control the actual content stored on the remote, it seems likely that
there will not be any reason to need them to vary between two remotes
that access the same underlying data store.

The newer version of containers is free; the minimum ghc version is
bundled with a newer version than that.
2019-10-10 15:58:22 -04:00
Joey Hess
84e729fda5
fix init default description reversion
init: Fix a reversion in the last release that prevented automatically
generating and setting a description for the repository.

Seemed best to factor out uuidDescMapRaw that does not
have the default mempty descrition behavior.

I don't much like that behavior, but I know things depend on it.
One thing in particular is `git annex info` which lists the uuids and
descriptions; if the current repo has been initialized in some way that
means it does not have a description, it would not show up w/o that.

(Not only repos created due to this bug might lack that. For example a repo
that was marked dead and had --drop-dead delete its git-annex branch info,
and then came back from the dead would similarly not be in the uuid.log.
Also there have been other versions of git-annex that didn't set a default
description; for years there was no default description.)
2019-06-20 20:30:24 -04:00
Joey Hess
258a7c5cd1
add Key to all ActionItem constructors 2019-06-06 12:53:24 -04:00
Joey Hess
b646a85c4a
oops, fixed wrong change 2019-05-23 12:44:27 -04:00
Joey Hess
e1aa8b9001
avoid build failure with older base 2019-05-23 12:16:57 -04:00
Joey Hess
16a2bed710
avoid build warning on Windows about unused import 2019-05-23 12:15:33 -04:00
Joey Hess
e06feb7316
honor preferred content when importing
Importing from a special remote honors its preferred content too; unwanted
files are not imported. But, some preferred content expressions can't be
checked before files are imported, and trying to import with such an
expression will fail.

Tested this with scenarios including changing the preferred content
expression and making sure merging the import didn't delete files that were
no longer wanted.

There was one minor inefficiency mentioned in the todo that I punted on.
2019-05-21 14:38:06 -04:00
Joey Hess
97fd9da6e7
add back non-preferred files to imported tree
Prevents merging the import from deleting the non-preferred files from
the branch it's merged into.

adjustTree previously appended the new list of items to the old, which
could result in it generating a tree with multiple files with the same
name. That is not good and confuses some parts of git. Gave it a
function to resolve such conflicts.

That allowed dealing with the problem of what happens when the import
contains some files (or subtrees) with the same name as files that were
filtered out of the export. The files from the import win.
2019-05-20 16:43:52 -04:00
Joey Hess
354c0eb57f
support standard and groupwanted in keyless mode
Only when the preferred content expression includes them will a parse
failure due to them needing keys result in the preferred content
expression not parsing in keyless mode.
2019-05-14 14:59:03 -04:00
Joey Hess
9411a7c93c
matching preferred content before key is known
This will let import try to match preferred content expressions before
downloading the content and generating its key.

If an expression needs a key, it preferredContentParser with
preferredContentKeylessTokens will fail to parse it.

standard and groupwanted are not in preferredContentKeylessTokens
because they may refer to an expression that refers to a key.
That needs further work to support them.
2019-05-14 14:28:23 -04:00
Joey Hess
aa7710982b
avoid list lookup by parseToken
Minor optimisation to parsing of a preferred content expression.
2019-05-14 13:11:29 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
ee251b2e2e
implement updating the ContentIdentifier db with info from the git-annex branch
untested

This won't be super slow, but it does need to diff two likely large
trees, and since the git-annex branch rarely sits still, it will most
likely be run at the beginning of every import.

A possible speed improvement would be to only run this when the database
did not contain a ContentIdentifier. But that would only speed up
imports when there is no new version of a file on the special remote,
at most renames of existing files being imported.

A better speed improvement would be to record something in the git-annex
branch that indicates when an import has been run, and only do the diff
if the git-annex branch has record of a newer import than we've seen
before. Then, it would only run when there is in fact new
ContentIdentifier information available from a remote. Certianly doable,
but didn't want to complicate things yet.
2019-03-06 18:04:30 -04:00
Joey Hess
1b3d04979e
speed up slow quickcheck test
Only test parsing of ContentIdentifier lists, not the whole log.
2019-03-06 16:43:41 -04:00
Joey Hess
6ef38df881
fix another parser bug 2019-03-06 14:51:31 -04:00
Joey Hess
c0bd202147
fix failing test case
An empty list of [ContentIdenfier] serialized to the same thing
as a single ContentIdentifier "". Avoid this ambiguity by requiring the
list be non-empty.
2019-03-06 14:27:15 -04:00
Joey Hess
b0fe4916b7
forgot to change list delimiter in parser 2019-03-06 11:59:42 -04:00
Joey Hess
7af55de83c
optimisation: use graftTree to remember the export branch
Sped up git-annex export in repositories with lots of keys.

Old method read whole git-annex branch tree into memory.
2019-02-22 11:16:22 -04:00
Joey Hess
56137ce0d2
use colon not space to delimit content identifier list
InodeCache serializes to a value with spaces, and seems likely other
things will too, and want to avoid unncessary base64 of content
identifiers when possible.
2019-02-21 13:45:16 -04:00
Joey Hess
1f6339ade7
fix parsing of empty content identifier
Seems very unlikely an empty content identifier would be used, but
quickcheck found this bug.
2019-02-21 13:44:09 -04:00
Joey Hess
9887a378fe
renamings to make clean when old-format logs are being used 2019-02-21 13:43:44 -04:00
Joey Hess
936aee6a60
quickcheck property for parsing of content identifier logs 2019-02-21 13:17:43 -04:00
Joey Hess
5a294f0dd7
add Logs.ContentIdentifier 2019-02-20 17:22:56 -04:00