filterM is not a good idea if you were streaming in a large list of files.
Fixing this memory leak that I introduced earlier today was a PITA because
to avoid the filterM, it's necessary to do the filtering only after
building up the data structures like BackendFile, and that means each
separate data structure needs it own function to apply the filter,
at least in this naive implementation.
There is also a minor performance regression, when using copy/drop/get/fsck
with a filter, git is now asked to look up attributes for all files,
since that now comes before the filter is applied. This is only a very
minor thing, since getting the attributes is very fast and --exclude was
probably not typically used to speed it up.
These were a mistake, they make the type signatures harder to read and
less flexible. The CommandSeek, CommandStart, CommandPerform, and
CommandCleanup types were a good idea, but composing them with the
parameters expected is going too far.
get, drop: Added --auto option, which decides whether to get/drop content
as needed to work toward the configured numcopies.
The problem with bundling it up in optimize was that I then found I wanted
to run an optmize that did not drop files, only got them. Considered adding
a --only-get switch to it, but that seemed wrong. Instead, let's make
existing subcommands optionally smarter.
Note that the only actual difference between drop and drop --auto is that
the latter does not even try to drop a file if it knows of not enough
copies, and does not print any error messages about files it was unable to
drop.
It might be nice to make get avoid asking git for attributes when not in
auto mode. For now it always asks for attributes.
The space leak was somehow caused by this line:
absfiles <- mapM absPath files
I confess, I don't quite understand why this caused bad buffering,
but apparently the whole pipeline from git-ls-files backed up at that
point.
Happily, rewriting the code to only get the cwd once and use a pure
function to calculate absfiles clears it up, and should be a little more
efficient in syscalls too.
It compiles. It sorta works. Several subcommands are FIXME marked and
broken, because things that used to accept separate --backend and --key
params need to be changed to accept just a --key that encodes all the key
info, now that there is metadata in keys.
Rename Locations functions for better consitency, and make their values
more consistent too.
Used </> rather than manually building paths. There are still more places
that manually do so, but are tricky, due to the behavior of </> when
the second FilePath is absolute. So I only changed places where
it obviously was relative.
Moved away from a map of flags to storing config directly in the AnnexState
structure. Got rid of most accessor functions in Annex.
This allowed supporting multiple --exclude flags.
* fsck: Check if annex.numcopies is satisfied.
* fsck: Verify the sha1 of files when the SHA1 backend is used.
* fsck: Verify the size of files when the WORM backend is used.
* fsck: Allow specifying individual files to fsk if fscking everything
is not desired.
* fsck: Fix bug, introduced in 0.04, in detection of unused data.