As seen in this bug report, the lifted exception handling using the StateT
monad throws away state changes when an action throws an exception.
http://git-annex.branchable.com/bugs/git_annex_fork_bombs_on_gpg_file/
.. Which can result in cached values being redundantly calculated, or other
possibly worse bugs when the annex state gets out of sync with reality.
This switches from a StateT AnnexState to a ReaderT (MVar AnnexState).
All changes to the state go via the MVar. So when an Annex action is
running inside an exception handler, and it makes some changes, they
immediately go into affect in the MVar. If it then throws an exception
(or even crashes its thread!), the state changes are still in effect.
The MonadCatchIO-transformers change is actually only incidental.
I could have kept on using lifted-base for the exception handling.
However, I'd have needed to write a new instance of MonadBaseControl
for the new monad.. and I didn't write the old instance.. I begged Bas
and he kindly sent it to me. Happily, MonadCatchIO-transformers is
able to derive a MonadCatchIO instance for my monad.
This is a deep level change. It passes the test suite! What could it break?
Well.. The most likely breakage would be to code that runs an Annex action
in an exception handler, and *wants* state changes to be thrown away.
Perhaps the state changes leaves the state inconsistent, or wrong. Since
there are relatively few places in git-annex that catch exceptions in the
Annex monad, and the AnnexState is generally just used to cache calculated
data, this is unlikely to be a problem.
Oh yeah, this change also makes Assistant.Types.ThreadedMonad a bit
redundant. It's now entirely possible to run concurrent Annex actions in
different threads, all sharing access to the same state! The ThreadedMonad
just adds some extra work on top of that, with its own MVar, and avoids
such actions possibly stepping on one-another's toes. I have not gotten
rid of it, but might try that later. Being able to run concurrent Annex
actions would simplify parts of the Assistant code.
* since this is a crippled filesystem anyway, git-annex doesn't use
symlinks on it
* so there's no reason to use the mixed case hash directories that we're
stuck using to avoid breaking everyone's symlinks to the content
* so we can do what is already done for all bare repos, and make non-bare
repos on crippled filesystems use the all-lower case hash directories
* which are, happily, all 3 letters long, so they cannot conflict with
mixed case hash directories
* so I was able to 100% fix this and even resuming `git annex add` in the
test case will recover and it will all just work.
Now there's a Config type, that's extracted from the git config at startup.
Note that laziness means that individual config values are only looked up
and parsed on demand, and so we get implicit memoization for all of them.
So this is not only prettier and more type safe, it optimises several
places that didn't have explicit memoization before. As well as getting rid
of the ugly explicit memoization code.
Not yet done for annex.<remote>.* configuration settings.
It occurs to me that all config settings should be parsed once at startup,
into a proper ADT, rather than all this ad-hoc parsing and memoization. One
day..
I had been using -ignore-package monads-tf to deal with this, but
the XMPP library uses monads-tf, so that also ignores it. Instead,
use PackageImports to force use of mtl in my own code.
Monitors git-annex branch for changes, which are noticed by the Merger
thread whenever the branch ref is changed (either due to an incoming push,
or a local change), and refreshes cached config values for modified config
files.
Rate limited to run no more often than once per minute. This is important
because frequent git-annex branch changes happen when files are being
added, or transferred, etc.
A primary use case is that, when preferred content changes are made,
and get pushed to remotes, the remotes start honoring those settings.
Other use cases include propigating repository description and trust
changes to remotes, and learning when a remote has added a new special
remote, so the webapp can present the GUI to enable that special remote
locally.
Also added a uuid.log cache. All other config files already had caches.
When in a subdir, both the normal filepath, and the filepath relative to
the top of the git repo are needed for matching. The former for key lookup,
and the latter for include/exclude to match against. Previously, key lookup
didn't work in this situation.
Solves the issue with preferred content expressions and dropping that
I mentioned yesterday. My solution was to add a parameter to specify a set
of repositories where content should be assumed not to be present. When
deciding whether to drop, it can put the current repository in, and then
if the expression fails to match, the content can be dropped.
Using yesterday's example "(not copies=trusted:2) and (not in=usbdrive)",
when the local repo is one of the 2 trusted copies, the drop check will
see only 1 trusted copy, so the expression matches, and so the content will
not be dropped.
This includes a full parser for the boolean expressions in the log,
that compiles them into Matchers. Those matchers are not used yet.
A complication is that matching against an expression should never
crash git-annex with an error. Instead, vicfg checks that the expressions
parse. If a bad expression (or an expression understood by some future
git-annex version) gets into the log, it'll be ignored.
Most of the code in Limit couldn't fail anyway, but I did have to make
limitCopies check its parameter first, and return an error if it's bad,
rather than erroring at runtime.
Uses a MVar again, as there seems no other way to thread the state through
inotify events.
This is a rather unsatisfactory result. I had wanted to run them in
the same monad so that the git queue could be used to coleasce git commands
and speed things up. But, that led to fragility: If several files are
added, and one is removed before queue flush, git add will fail to add
any of them. So, the queue is still explicitly flushed after each add for
now.
TODO: Investigate using git add --ignore-errors. This would need to be done
in Command.Add. And, git add still exits nonzero with it, so would need
to avoid crashing on queue flush.
The environment needs to override git-config. Changed when git config is
read, and avoid rereading it once it's been read.
chdir for both worktree settings.
This option avoids gpg key distribution, at the expense of flexability, and
with the requirement that all clones of the git repository be equally
trusted.
A bit tricky to avoid printing it twice in a row when there are queued git
commands to run and journal to stage.
Added a generic way to run an action that may output multiple side
messages, with only the first displayed.
Added Annex.cleanup, which is a general purpose interface for adding
actions to run at the end.
Remotes with the old git-annex-shell will commit every time, and have no
commit command, so hide stderr when running the commit command.
useful when adding hundreds of thousands of files on a system with plenty
of memory.
git add gets quite slow in such a large repository, so if the system has
more than the ~32 mb of memory the queue can use by default, it's a useful
optimisation to increase the queue size, in order to decrease the number
of times git add is run.
Now gitattributes are looked up, efficiently, in only the places that
really need them, using the same approach used for cat-file.
The old CheckAttr code seemed very fragile, in the way it streamed files
through git check-attr.
I actually found that cad8824852
was still deadlocking with ghc 7.4, at the end of adding a lot of files.
This should fix that problem, and avoid future ones.
The best part is that this removes withAttrFilesInGit and withNumCopies,
which were complicated Seek methods, as well as simplfying the types
for several other Seek methods that had a Backend tupled in.
I had not realized what a memory leak the lazy state monad could be,
although I have not seen much evidence of actual leaking in git-annex.
However, if running git-annex on a great many files, this could matter.
The additional Utility.State.changeState adds even more strictness,
avoiding a problem I saw in github-backup where repeatedly modifying
state built up a huge pile of thunks.
Ssh connection caching is now enabled automatically by git-annex. Only one
ssh connection is made to each host per git-annex run, which can speed some
things up a lot, as well as avoiding repeated password prompts. Concurrent
git-annex processes also share ssh connections. Cached ssh connections are
shut down when git-annex exits.
Note: The rsync special remote does not yet participate in the ssh
connection caching.
This overrides the trust.log, and is overridden by the command-line trust
parameters.
It would have been nicer to have Logs.Trust.trustMap just look up the
configuration for all remotes, but a dependency loop prevented that
(Remotes depends on Logs.Trust in several ways). So instead, look up
the configuration when building remotes, storing it in the same forcetrust
field used for the command-line trust parameters.
Made --from and --to command-specific options.
Added generic storage for values of command-specific options,
which allows removing some of the special case fields in AnnexState.
(Also added generic storage for command-specific flags, although there are
not yet any.)
Note that this storage uses a Map, so repeatedly looking up the same value
is slightly more expensive than looking up an AnnexState field. But, the
value can be looked up once in the seek stage, transformed as necessary,
and passed in a closure to the start stage, and this avoids that overhead.
Still, I'm hesitant to use this for things like force or fast flags.
It's probably best to reserve it for flags that are only used by a few
commands, or options like --from and --to that it's important only be
allowed to be used with commands that implement them, to avoid user
confusion.
Thanks to Bas van Dijk for providing the instance declarations I needed.
Grody stuff. Bas is talking about perhaps providing utility functions that
contain the ugly parts, so this code may be able to be removed using a
future version of monad-control.
I had to, I hope temporarily, lose my nice Annex newtype, and use a type
synonym. This because I cannot find a way to derive a MonadBaseControl
instance of the Annex newtype. I've emailed Bas van Dijk in hope he can
help get the newtype back.
Otherwise appears to build & work.
It would be nice if command-specific options were supported. The first
difficulty is that which command is being called is not known until after
getopt; but that could be worked around by finding the first non-dashed
parameter. Storing the settings without putting them in the annex monad is
the next difficulty; it could perhaps be handled by making the seek stage
pass applicable settings into the start stage (and from there on to perform
as needed). But that still leaves a problem, what data type to use to
represent the options between getopt and seek?
Many functions took the repo as their first parameter. Changing it
consistently to be the last parameter allows doing some useful things with
currying, that reduce boilerplate.
In particular, g <- gitRepo is almost never needed now, instead
use inRepo to run an IO action in the repo, and fromRepo to get
a value from the repo.
This also provides more opportunities to use monadic and applicative
combinators.
Before the config was read each time onLocal was called, and entirely
redundantly since it's read for same-host remotes on startup.
Also a minor bug fix: When rsyncing to a same-host remote, use the
rsync-options from the repository that the user ran git-annex in, not those of
the receiving repository.