Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.
Benchmarking vs master, this git-annex find is significantly faster!
Specifically:
num files old new speedup
48500 4.77 3.73 28%
12500 1.36 1.02 66%
20 0.075 0.074 0% (so startup time is unchanged)
That's without really finishing the optimization. Things still to do:
* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.
It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.
Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:
Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.
The Eq instance is fixed.
Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.
Used git-annex benchmark:
find is 7% faster
whereis is 3% faster
get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
* benchmark: Changed --databases to take a parameter specifiying the size
of the database to benchmark.
* benchmark --databases: Display size of the populated database.
* benchmark --databases: Improve the "addAssociatedFile to (new)"
benchmark to really add new values, not overwriting old values.
Convert Utility.Url to return Either String so the error message can be
displated in the annex monad and so captured.
(When curl is used, its errors are still not caught.)
Rescued from commit 11d6e2e260 which removed
db benchmarks in favor of benchmarking arbitrary git-annex commands. Which
is nice and general, but microbenchmarks are useful too.
Added annex.gitaddtoannex configuration. Setting it to false prevents
git add from usually adding files to the annex.
(Unless the file was annexed before, or a renamed annexed file is detected.)
Currently left at true; some users are encouraging it be set to false.
Renamed unlocked files are now detected, and will always be
annexed, unless annex.largefiles disallows it.
This allows for git add's behavior to later be changed to otherwise
not annex files (whether by default or as a config option), without
worrying about the rename case.
This is not a major behavior change; annexing is still the default. But
there is one case where the behavior is changed, I think for the better:
touch f
git -c annex.largefiles=nothing add f
git add bigfile
git commit -m ...
mv bigfile f
git add f
Before, git-annex would see that f was previously not annexed,
and so the renamed bigfile content gets added to git. Now, it notices
that the inode is the one that bigfile used, and so it annexes it.
This potentially slows down git add a lot in some repositories because
of the poor performance of isInodeKnown when there are a lot of unlocked
files. Configuring annex.largefiles avoids the speed hit.
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
initremote --sameas=remotename sets sameas-name and sameas-uuid
Using sameas-name rather than name prevents old git-annex initremote
from enabling a sameas remote by name, since it would not handle it
correctly.
Straightforward, except for the issue of how to reverse LockAdjustment.
With --unlock, a commit that modifies/adds unlocked files gets reverse
adjusted to use locked files. That's fairly reasonable, I think.
But reversing --lock by unlocking all modified files feels wrong. Maybe
that's just because repositories typically seem to still have mostly
locked files in them (unless one is in an adjusted unlocked branch of
course!)
It may be that eventually how to reverse both will need to be configurable,
I don't know.
The code is only needed because for a long time, git-annex didn't
install hooks in repos on crippled filesystems. Now it does, and they
work at least on FAT (where all files are executable) and Windows.
It would be possible to remove this code in v8 simply by re-installing
the hooks.
Can be set to false to prevent any automatic repository upgrades.
Also, removed direct mode specific upgrade code in Annex.Init, and made
needsUpgrade always include the name/path of the repo, so if
there's a problem it's clear what repo has the problem.
And, made needsUpgrade catch any exceptions that might occur during the
upgrade, so it can display a more useful error message than just the
exception.
Whether or not there's a false index, it can't Restage here.
When there's a false index, restaging would alter it and not the real
index, but it fails anyway because that index is locked.
When there's not a false index, the index is locked, and so restaging
can't alter it.
No longer used. The only possible user of it would be code in
Upgrade.V5, so I verified that the parts of Annex.Content it used were
not used to manipulate direct mode files.
* Automatically convert direct mode repositories to v7 with adjusted
unlocked branches and set annex.thin.
* init: When run on a crippled filesystem with --version=5,
will error out, since version 7 is needed for adjusted unlocked branch.
* direct: This command always errors out as direct mode is no longer
supported.
* indirect: This command has become a deprecated noop.
* proxy: This command is deprecated because it was only needed in direct
mode. (But it continues to work.)
Also removed mentions of direct mode throughough the documentation.
I have not removed all the direct mode code yet.
When file matching options are specified when getting info of
something other than a directory, they won't have any effect, so error out
to avoid confusion.
This commit was sponsored by mo on Patreon.
* merge: When run with a branch parameter, merges from that branch.
This is especially useful when using an adjusted branch, because
it applies the same adjustment to the branch before merging it.
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.
The only remaining version ifdefs in the entire code base are now a couple
for aws!
This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.
This commit was sponsored by Ilya Shlyakhter.