This is the last memory leak that prevents git-annex from running
in constant space, as far as I can see. I can now run git annex find
dummied up to repeatedly find the same file over and over, on millions
olf files, and memory stays entirely constant.
When converting to the strict state monad, I missed this place where
thunks to the state could be built up, possibly. This seems to make
it run in some percentage less memory.
Done by adding a oneshot mode, in which location log changes are written to
the journal, but not committed. Taking advantage of git-annex's existing
ability to recover in this situation.
This is used by git-annex-shell and other places where changes are made to
a remote's location log.
Ssh connection caching is now enabled automatically by git-annex. Only one
ssh connection is made to each host per git-annex run, which can speed some
things up a lot, as well as avoiding repeated password prompts. Concurrent
git-annex processes also share ssh connections. Cached ssh connections are
shut down when git-annex exits.
Note: The rsync special remote does not yet participate in the ssh
connection caching.
This new approach allows filtering out checks from the default set that are
not appropriate for a command, rather than having to list every check
that is appropriate. It also reduces some boilerplate.
Haskell does not define Eq for functions, so I had to go a long way around
with each check having a unique id. Meh.
Actually, let's do a targeted fix of the actual forkProcess that was not
waited on. The global reap is moved back to the end, after the long-running
git processes actually exit.
when a git repository is first being created. Clones will automatically
notice that git-annex is in use and automatically perform a basic
initalization. It's still recommended to run "git annex init" in any
clones, to describe them.
The only remaining vestiage of backends is different types of keys. These
are still called "backends", mostly to avoid needing to change user interface
and configuration. But everything to do with storing keys in different
backends was gone; instead different types of remotes are used.
In the refactoring, lots of code was moved out of odd corners like
Backend.File, to closer to where it's used, like Command.Drop and
Command.Fsck. Quite a lot of dead code was removed. Several data structures
became simpler, which may result in better runtime efficiency. There should
be no user-visible changes.
Otherwise, the location log changes are only staged in its index,
and this can confuse matters if pulling or cloning from the remote.
The test suite was failing because this wasn't done.
Since the queue is flushed in between subcommand actions being run,
there should be no issues with actions that expect to queue up some stuff
and have it run after they do other stuff. So I didn't have to audit for
such assumptions.
Added a cheap way to query the size of a queue.
runQueueAt is not the default yet only because there may be some code that
expects to be able to queue some suff, do something else, and run the whole
queue at the end.
10240 is an arbitrary size for the queue. If we assume annexed
filenames are between 10 and 255 characters long, then the queue will
build up between 100kb and 2550kb long commands. The max command line
length on linux is somewhere above 20k, so this is a fairly good balance --
the queue will buffer only a few megabytes of stuff and a minimal number
of commands will be run by xargs.
Also, insert queue items strictly, this should save memory.
Haskell's IO layer crashes on characters > 255 when in a non-unicode (latin1)
locale. Until Haskell gets better behavior, put in an admittedly ugly
workaround for that: git-annex forces utf8 output mode no matter what
locale is selected. So if you use a non-utf8 locale, your filenames with
characters > 127 will not be displayed as you'd expect. But at least it
won't crash.
I had not taken into account that the code was written to run git and leave
zombies, for performance/laziness reasons, when I wrote the test suite.
So rather than the typical 1 zombie process that git-annex develops, test
developed dozens. Caused problems on system with low process limits.
Added a reap function to GitRepo, that waits for any zombie child processes.
* Improved temp file handling. Transfers of content can now be resumed
from temp files later; the resume does not have to be the immediate
next git-annex run.
* unused: Include partially transferred content in the list.
Rename Locations functions for better consitency, and make their values
more consistent too.
Used </> rather than manually building paths. There are still more places
that manually do so, but are tricky, due to the behavior of </> when
the second FilePath is absolute. So I only changed places where
it obviously was relative.
Moved away from a map of flags to storing config directly in the AnnexState
structure. Got rid of most accessor functions in Annex.
This allowed supporting multiple --exclude flags.
* fsck: Check if annex.numcopies is satisfied.
* fsck: Verify the sha1 of files when the SHA1 backend is used.
* fsck: Verify the size of files when the WORM backend is used.
* fsck: Allow specifying individual files to fsk if fscking everything
is not desired.
* fsck: Fix bug, introduced in 0.04, in detection of unused data.