After commit f4bdecc4ec, there is no
longer any distinction between SingleWriter and MultiWriter's handling
of read after write.
Databases that were SingleWriter still have lock files that are used to
prevent multiple writers.
This does make writing to such databases a bit more expensive,
because the MultiWriter code path that is now used opens a second db
connection in order to write to them.
This adds the overhead of a copy when serializing and deserializing keys.
I have not benchmarked much, but runtimes seem barely changed at all by that.
When a lot of keys are in memory, it improves memory use.
And, it prevents keys sometimes getting PINNED in memory and failing to GC,
which is a problem ByteString has sometimes. In particular, git-annex sync
from a borg special remote had that problem and this improved its memory
use by a large amount.
Sponsored-by: Shae Erisson on Patreon
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Broke a while ago during optimisation work, and not noticed since the flag
is disabled by default.
This commit was sponsored by Brock Spratlen on Patreon.
Since the sqlite branch uses blobs extensively, there are some
performance benefits, ByteStrings now get stored and retrieved w/o
conversion in some cases like in Database.Export.
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.
Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:
Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.
The Eq instance is fixed.
Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.
Used git-annex benchmark:
find is 7% faster
whereis is 3% faster
get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
addAssociatedFileNewBench would sometimes pick a random number that a
previous call had already added. Using a MVar, make it always advance,
so the same behavior is benchmarked each time.
* benchmark: Changed --databases to take a parameter specifiying the size
of the database to benchmark.
* benchmark --databases: Display size of the populated database.
* benchmark --databases: Improve the "addAssociatedFile to (new)"
benchmark to really add new values, not overwriting old values.
This is a non-backwards compatable change, so not suitable for merging
w/o a annex.version bump and transition code. Not yet tested.
This improves performance of git-annex benchmark --databases
across the board by 10-25%, since eg Key roundtrips as a ByteString.
(serializeKey' produces a lazy ByteString, so there is still a
copy involved in converting it to a strict ByteString. It may be faster
to switch to using bytestring-strict-builder.)
FilePath and Key are both stored as blobs. This avoids mojibake in some
situations. It would be possible to use varchar instead, if persistent
could avoid converting that to Text, but it seems there is no good
way to do so. See doc/todo/sqlite_database_improvements.mdwn
Eliminated some ugly artifacts of using Read/Show serialization;
constructors and quoted strings are no longer stored in sqlite.
Renamed SRef to SSha to reflect that it is only ever a git sha,
not a ref name. Since it is limited to the characters in a sha,
it is not affected by mojibake, so still uses String.
Rescued from commit 11d6e2e260 which removed
db benchmarks in favor of benchmarking arbitrary git-annex commands. Which
is nice and general, but microbenchmarks are useful too.
Rescued from commit 11d6e2e260 which removed
db benchmarks in favor of benchmarking arbitrary git-annex commands. Which
is nice and general, but microbenchmarks are useful too.