This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.
Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.
All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.
Currently, no UUIDs are treated as private yet, a way to configure that
is needed.
The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.
It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.
And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
init: Fix a crash when the repo's was cloned from a repo that had an
adjusted branch checked out, and the origin remote is not named "origin".
The only other hardcoding of the name of origin is in:
- Upgrade.V2, which can be ignored probably
- Annex.Branch, which doesn't fail if it has some other name, but just
doesn't set up the git-annex branch with quite as linear a history in
that case.
Reads of cached data are not debugged, only cache misses are, and since
many commands pre-cache location log data, this avoids a slew of
fastDebug calls when running commands such as git-annex get --from
Had to add to AnnexRead an indication of whether debugging is enabled.
Could have just made setupConsole not install a debug output action that
outputs, and have enableDebug be what installs that, but then in the
common case where there is no debug selector, and so all debug output is
selected, it would run the debug output action every time, which entails
an IORef access. Which would make fastDebug too slow..
Most of the changes here involve global option parsing: GlobalSetter
changed so it can both run an Annex action to set state, but can also
change the AnnexRead value, which is immutable once the Annex monad is
running.
That allowed a debugselector value to be added to AnnexRead, seeded
from the git config. The --debugfilter option's GlobalSetter then updates
the AnnexRead.
This improved GlobalSetter can later be used to move more stuff to
AnnexRead. Things that don't involve a git config will be easier to
move, and probably a *lot* of things can be moved eventually.
fastDebug, while implemented, is not used anywhere yet. But it should be
fast..
This uses a DebugSelector, rather than debug levels, which will allow
for a later option like --debug-from=Process to only
see debuging about running processes.
The module name that contains the thing being debugged is used as the
DebugSelector (in most cases; does not need to be a hard and fast rule).
Debug calls were changed to add that. hslogger did not display
that first parameter to debugM, but the DebugSelector does get
displayed.
Also fastDebug will allow doing debugging in places that are used in
tight loops, with the DebugSelector coming from the Annex Reader
essentially for free. Not done yet.
Values in AnnexRead can be read more efficiently, without MVar overhead.
Only a few things have been moved into there, and the performance
increase so far is not likely to be noticable.
This is groundwork for putting more stuff in there, particularly a value
that indicates if debugging is enabled.
The obvious next step is to change option parsing to not run in the
Annex monad to set values in AnnexState, and instead return a pure value
that gets stored in AnnexRead.
When git-annex transferrer started up, and the journal contained something,
it would commit it to the git-annex branch. This caused excess commits to
the branch, in cases where normally several changes would be journalled and
committed together. That generated some excess git objects and was also
just noisy on stdout.
Since transferrer uses enableInteractiveBranchAccess, it does not need to
commit journalled changes, since the optimisation that avoids checking
the journal when reading from the branch is disabled for processes that
call that.
This commit was sponsored by Svenne Krap on Patreon.
Keys stored on the filesystem are mangled by keyFile to avoid problem
chars. So, that mangling has to be reversed when parsing files from a
borg backup back to a key.
The directory special remote also so mangles them. Some other special
remotes do not; eg S3 just serializes the key -- but S3 object names are
not limited to filesystem valid filenames anyway, so a S3 server must
not map them directly to files in any case. It seems unlikely that a
borg backup of some such special remote will get broken by this change.
This commit was sponsored by Graham Spencer on Patreon.
New error message:
Remote foo not usable by git-annex; setting annex-ignore
http://localhost/foo/config download failed: Configuration of annex.security.allowed-ip-addresses does not allow accessing address ::1
If git config parse fails, or the git config file is not available at the url,
a better error message for that is also shown.
This commit was sponsored by Mark Reidenbach on Patreon.
Seems that hasOrigin was never finding origin's git-annex branch, so a new
one got created each time. And so then it later needed to merge the two
branches, which is expensive.
Added --no-track to git branch to avoid it displaying a message about
setting up tracking branches. Of course there's no reason to make the
git-annex branch a tracking branch since git-annex auto-merges it.
Can beet to false to avoid some expensive things needed to support unlocked
files.
See my comment for why this only controls what init sets up, and not other
behavior.
I didn't bother with making the v5 upgrade code path look at this, though
it easily could, because the docs say to run git-annex init after setting
it to make it take effect.
Not yet used, but allows getting the size of items in the tree fairly
cheaply.
I noticed that CmdLine.Seek uses ls-tree and the feeds the files into
another long-running process to check their size. That would be an
example of a place that might be sped up by using this. Although in that
particular case, it only needs to know the size of unlocked files, not
locked. And since enabling --long probably doubles the ls-tree runtime
or more, the overhead of using it there may outwweigh the benefit.
When autoenabling special remotes of type S3, weddav, or glacier, do not
take login credentials from environment variables, as the user may not be
expecting the autoenable to happen, and may have those set for other
purposes.
This solves the problem that import of such files gets confused and
converts them back to annexed files.
The import code already used GIT keys internally when it determined a
file should not be annexed. So now when it sees a GIT key that export
used, it already does the right thing.
This also means that even older version of git-annex can import and will
do the right thing, once a fixed version has exported. Still, there may
be other complications around upgrades; still need to think it all
through.
Moved gitShaKey and keyGitSha from Key to Annex.Export since they're
only used for export/import.
Documented GIT keys in backends, since they do appear in the git-annex
branch now.
This commit was sponsored by Graham Spencer on Patreon.
Added LinkType to ProvidedInfo, and unified MatchingKey with
ProvidedInfo. They're both used in the same way, so there was no real
reason to keep separate.
Note that addLocked and addUnlocked still set matchNeedsFileName,
because to handle MatchingFile, they do need it. However, they
don't use it when MatchingInfo is provided. This should be ok,
the --branch case will be able skip checking matchNeedsFileName,
since it will provide a filename in any case.
Checksum as content is received from a remote git-annex repository, rather
than doing it in a second pass.
Not tested at all yet, but I imagine it will work!
Not implemented for any special remotes, and also not implemented for
copies from local remotes. It may be that, for local remotes, it will
suffice to use rsync, rely on its checksumming, and simply return Verified.
(It would still make a checksumming pass when cp is used for COW, I guess.)
When annex.stalldetection is not enabled, and a likely stall is detected,
display a suggestion to enable it.
Note that the progress meter display is not taken down when displaying
the message, so it will display like this:
0% 8 B 0 B/s
Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection
0% 10 B 0 B/s
Although of course if it's really stalled, it will never update
again after the message. Taking down the progress meter and starting
a new one doesn't seem too necessary given how unusual this is,
also this does help show the state it was at when it stalled.
Use of uninterruptibleCancel here is ok, the thread it's canceling
only does STM transactions and sleeps. The annex thread that gets
forked off is separate to avoid it being canceled, so that it
can be joined back at the end.
A module cycle required moving from dupState the precaching of the
remote list. Doing it at startConcurrency should cover all the cases
where the remote list is used in concurrent actions.
This commit was sponsored by Kevin Mueller on Patreon.
annex.stalldetection can now be set to "true" to make git-annex do
automatic stall detection when it detects a remote is updating its transfer
progress consistently enough.
This commit was sponsored by Luke Shumaker on Patreon.
Missed this when implementing it because of the default case catching
the new constructor. So, removed that default case to make sure
future types of adjusted branches don't make the same mistake.
Complicated by git-annex addurl --fast which adds the file whose content
is not present, so it needs to stay unlocked when on such a branch.
This commit was sponsored by Brock Spratlen on Patreon.
This avoids the smudge --clean filter failing on the URL keys.
git checkout runs the post-checkout hook, which runs smudge --update.
That populates all the pointer files, but it neglected to store their inode
caches in the keys db. With that done, and the keys db flushed before
smudge --clean gets run (by restagePointerFile), the isUnmodifiedCheap
check can tell the file is not modified, so will not try to re-ingest it,
which does not work with URL keys because they do not support genKey.
It also seems possible that the isUnmodifiedCheap was also failing for
non-URL keys, which would cause them to be re-ingested, leading to a lot of
extra work. I have not verified that, but don't see why it wouldn't have
happened. So this probably also speeds up checking out adjusted branches.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
This is probably a reversion, but not sure what caused it. By the time
Annex.Init runs fixupUnusualReposAfterInit, another git-annex process has
at least sometimes already done the necessary fixups. (Eg, one run
indirectly by a git command.) But since the Repo is cached, it doesn't
realize and does them again. So, avoid crashing when git config --unset
fails.
This commit was sponsored by Jack Hill on Patreon.
This is conceptually very simple, just making a 1 that was hard coded be
exposed as a config option. The hard part was plumbing all that, and
dealing with complexities like reading it from git attributes at the
same time that numcopies is read.
Behavior change: When numcopies is set to 0, git-annex used to drop
content without requiring any copies. Now to get that (highly unsafe)
behavior, mincopies also needs to be set to 0. It seemed better to
remove that edge case, than complicate mincopies by ignoring it when
numcopies is 0.
This commit was sponsored by Denis Dzyubenko on Patreon.
* add: Significantly speed up adding lots of non-large files to git,
by disabling the annex smudge filter when running git add.
* add --force-small: Run git add rather than updating the index itself,
so any other smudge filters than the annex one that may be enabled will
be used.
Especially from borg, where the content identifier logs
all end up being the same identical file!
But also, for other imports, the location tracking logs can,
in some cases, be identical files.
Bonus optimisation: Avoid looking up (and parsing when set)
GIT_ANNEX_VECTOR_CLOCK env var every time a log is written to.
Although the lookup does happen at startup even when no
log will be written now.
May actually work now.
Note that, importKey now has to add the size to the key if it's supposed
to have size. Remote.Directory relied on the importer adding the size,
which is no longer done, so it was changed; it was the only one.
This way, importKey does not need to behave differently between regular
and thirdpartypopulated imports.