Only fsck and reinject and the test suite used the Backend, and they can
look it up as needed from the Key. This simplifies the code and also speeds
it up.
There is a small behavior change here. Before, all commands would warn when
acting on an annexed file with an unknown backend. Now, only fsck and
reinject show that warning.
* numcopies: New command, sets global numcopies value that is seen by all
clones of a repository.
* The annex.numcopies git config setting is deprecated. Once the numcopies
command is used to set the global number of copies, any annex.numcopies
git configs will be ignored.
* assistant: Make the prefs page set the global numcopies.
This global numcopies setting is needed to let preferred content
expressions operate on numcopies.
It's also convenient, because typically if you want git-annex to preserve N
copies of files in a repo, you want it to do that no matter which repo it's
running in. Making it global avoids needing to warn the user about gotchas
involving inconsistent annex.numcopies settings.
(See changes to doc/numcopies.mdwn.)
Added a new variety of git-annex branch log file, that holds only 1 value.
Will probably be useful for other stuff later.
This commit was sponsored by Nicolas Pouillard.
I've been disliking how the command seek actions were written for some
time, with their inversion of control and ugly workarounds.
The last straw to fix it was sync --content, which didn't fit the
Annex [CommandStart] interface well at all. I have not yet made it take
advantage of the changed interface though.
The crucial change, and probably why I didn't do it this way from the
beginning, is to make each CommandStart action be run with exceptions
caught, and if it fails, increment a failure counter in annex state.
So I finally remove the very first code I wrote for git-annex, which
was before I had exception handling in the Annex monad, and so ran outside
that monad, passing state explicitly as it ran each CommandStart action.
This was a real slog from 1 to 5 am.
Test suite passes.
Memory usage is lower than before, sometimes by a couple of megabytes, and
remains constant, even when running in a large repo, and even when
repeatedly failing and incrementing the error counter. So no accidental
laziness space leaks.
Wall clock speed is identical, even in large repos.
This commit was sponsored by an anonymous bitcoiner.
Option parsing for commands that run outside git repos is still screwy,
as there is no Annex monad and so the flags cannot be passed in. But,
any remaining parameters can be, which is enough for this fix.
This avoids some small overhead by only running the check once per command;
it also ensures that, even if the command doesn't find anything to run on,
it still fails to run when in a bare repo.
* get/copy --auto: Transfer data even if it would exceed numcopies,
when preferred content settings want it.
* drop --auto: Fix dropping content when there are no preferred content
settings.
When rsyncProgress pipes rsync's stdout, this turns out to cause a ssh
process started by rsync to be left behind as a zombie. I don't know why,
but my recent zombie reaping cleanup was correct, it's just that this other
zombie, that's not directly started by git-annex, was no longer reaped
due to changes in the cleanup. Make rsyncProgress reap the zombie started
by rsync, as a workaround.
FWIW, the process tree looks like this. It seems like the rsync child
is for some reason starting but not waiting on this extra ssh process.
Ssh connection caching may be involved -- disabling it seemed to change
the shape of the tree, but did not eliminate the zombie.
9378 pts/14 S+ 0:00 | \_ rsync -p --progress --inplace -4 -e 'ssh' '-S' ...
9379 pts/14 S+ 0:00 | | \_ ssh ...
9380 pts/14 S+ 0:00 | | \_ rsync -p --progress --inplace -4 -e 'ssh' '-S' ...
9381 pts/14 Z+ 0:00 | \_ [ssh] <defunct>
Now gitattributes are looked up, efficiently, in only the places that
really need them, using the same approach used for cat-file.
The old CheckAttr code seemed very fragile, in the way it streamed files
through git check-attr.
I actually found that cad8824852
was still deadlocking with ghc 7.4, at the end of adding a lot of files.
This should fix that problem, and avoid future ones.
The best part is that this removes withAttrFilesInGit and withNumCopies,
which were complicated Seek methods, as well as simplfying the types
for several other Seek methods that had a Backend tupled in.
Many functions took the repo as their first parameter. Changing it
consistently to be the last parameter allows doing some useful things with
currying, that reduce boilerplate.
In particular, g <- gitRepo is almost never needed now, instead
use inRepo to run an IO action in the repo, and fromRepo to get
a value from the repo.
This also provides more opportunities to use monadic and applicative
combinators.
Checks location log information, and file contents.
Does not check that numcopies is satisfied, as .gitattributes information
about numcopies is not available in a bare repository. In practice, that
should not be a problem, since fsck is also run in a checkout and will
check numcopies there.
This new approach allows filtering out checks from the default set that are
not appropriate for a command, rather than having to list every check
that is appropriate. It also reduces some boilerplate.
Haskell does not define Eq for functions, so I had to go a long way around
with each check having a unique id. Meh.