This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.
Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:
Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.
The Eq instance is fixed.
Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.
Used git-annex benchmark:
find is 7% faster
whereis is 3% faster
get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
* benchmark: Changed --databases to take a parameter specifiying the size
of the database to benchmark.
* benchmark --databases: Display size of the populated database.
* benchmark --databases: Improve the "addAssociatedFile to (new)"
benchmark to really add new values, not overwriting old values.
Convert Utility.Url to return Either String so the error message can be
displated in the annex monad and so captured.
(When curl is used, its errors are still not caught.)
Rescued from commit 11d6e2e260 which removed
db benchmarks in favor of benchmarking arbitrary git-annex commands. Which
is nice and general, but microbenchmarks are useful too.
Added annex.gitaddtoannex configuration. Setting it to false prevents
git add from usually adding files to the annex.
(Unless the file was annexed before, or a renamed annexed file is detected.)
Currently left at true; some users are encouraging it be set to false.
Renamed unlocked files are now detected, and will always be
annexed, unless annex.largefiles disallows it.
This allows for git add's behavior to later be changed to otherwise
not annex files (whether by default or as a config option), without
worrying about the rename case.
This is not a major behavior change; annexing is still the default. But
there is one case where the behavior is changed, I think for the better:
touch f
git -c annex.largefiles=nothing add f
git add bigfile
git commit -m ...
mv bigfile f
git add f
Before, git-annex would see that f was previously not annexed,
and so the renamed bigfile content gets added to git. Now, it notices
that the inode is the one that bigfile used, and so it annexes it.
This potentially slows down git add a lot in some repositories because
of the poor performance of isInodeKnown when there are a lot of unlocked
files. Configuring annex.largefiles avoids the speed hit.
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
initremote --sameas=remotename sets sameas-name and sameas-uuid
Using sameas-name rather than name prevents old git-annex initremote
from enabling a sameas remote by name, since it would not handle it
correctly.
Straightforward, except for the issue of how to reverse LockAdjustment.
With --unlock, a commit that modifies/adds unlocked files gets reverse
adjusted to use locked files. That's fairly reasonable, I think.
But reversing --lock by unlocking all modified files feels wrong. Maybe
that's just because repositories typically seem to still have mostly
locked files in them (unless one is in an adjusted unlocked branch of
course!)
It may be that eventually how to reverse both will need to be configurable,
I don't know.
The code is only needed because for a long time, git-annex didn't
install hooks in repos on crippled filesystems. Now it does, and they
work at least on FAT (where all files are executable) and Windows.
It would be possible to remove this code in v8 simply by re-installing
the hooks.
Can be set to false to prevent any automatic repository upgrades.
Also, removed direct mode specific upgrade code in Annex.Init, and made
needsUpgrade always include the name/path of the repo, so if
there's a problem it's clear what repo has the problem.
And, made needsUpgrade catch any exceptions that might occur during the
upgrade, so it can display a more useful error message than just the
exception.
Whether or not there's a false index, it can't Restage here.
When there's a false index, restaging would alter it and not the real
index, but it fails anyway because that index is locked.
When there's not a false index, the index is locked, and so restaging
can't alter it.
No longer used. The only possible user of it would be code in
Upgrade.V5, so I verified that the parts of Annex.Content it used were
not used to manipulate direct mode files.
* Automatically convert direct mode repositories to v7 with adjusted
unlocked branches and set annex.thin.
* init: When run on a crippled filesystem with --version=5,
will error out, since version 7 is needed for adjusted unlocked branch.
* direct: This command always errors out as direct mode is no longer
supported.
* indirect: This command has become a deprecated noop.
* proxy: This command is deprecated because it was only needed in direct
mode. (But it continues to work.)
Also removed mentions of direct mode throughough the documentation.
I have not removed all the direct mode code yet.
When file matching options are specified when getting info of
something other than a directory, they won't have any effect, so error out
to avoid confusion.
This commit was sponsored by mo on Patreon.
* merge: When run with a branch parameter, merges from that branch.
This is especially useful when using an adjusted branch, because
it applies the same adjustment to the branch before merging it.
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.
The only remaining version ifdefs in the entire code base are now a couple
for aws!
This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.
This commit was sponsored by Ilya Shlyakhter.
No behavior changes, but this shows everywhere that a progress meter
could be displayed when hashing a file to add to the annex.
Many of the places don't make sense to display a progress meter though,
eg when importing the copy of the file probably swamps the hashing of
the file.
This means that Command.Move and Command.Get don't need to
manually set the stage, and is a lot cleaner conceptually.
Also, this makes Command.Sync.syncFile use the worker pool better.
In the scenario where it first downloads content and then uploads it to
some other remotes, it will start in TransferStage, then enter VerifyStage
and then go back to TransferStage for each transfer to the remotes.
Before, it entered CleanupStage after the download, and stayed in it for
the upload, so too many transfer jobs could run at the same time.
Note that, in Remote.Git, it uses runTransfer and also verifyKeyContent
inside onLocal. That has a Annex state for the remote, with no worker pool.
So the resulting calls to enteringStage won't block in there.
While Remote.Git.copyToRemote does do checksum verification, I
realized that should not use a verification slot in the WorkerPool
to do it. Because, it's reading back from eg, a removable disk to checksum.
That will contend with other writes to that disk. It's best to treat
that checksum verification as just part of the transer. So, removed the todo
item about that, as there's nothing needing to be done.
Rather than limiting it to PerformStage and CleanupStage, this opens it
up so any number of stages can be added as needed by commands.
Each concurrent command has a set of stages that it uses, and only
transitions between those can block waiting for a free slot in the
worker pool. Calling enteringStage for some other stage does not block,
and has very little overhead.
Note that while before the Annex state was duplicated on the first call
to commandAction, this now happens earlier, in startConcurrency.
That means that seek stage actions should that use startConcurrency
and then modify Annex state won't modify the state of worker threads
they then start. I audited all of them, and only Command.Seek
did so; prepMerge changes the working directory and so has to come
before startConcurrency.
Also, the remote list is built before duplicating the state, which means
that it gets built earlier now than it used to. This would only have an
effect of making commands that end up not needing to perform any actions
unncessary build the remote list (only when they're run with concurrency
enable), but that's a minor overhead compared to commands seeking
through the work tree and determining they don't need to do anything.