* This version of git-annex only works with git 1.7.7 and newer.
The breakage with old versions is subtle, and affects
annex.numcopies .gitattributes settings, so be sure to upgrade git
to 1.7.7. (Debian package now depends on that version.)
* Don't pass absolute paths to git show-attr, as it started following
symlinks when that's done in 1.7.7. Instead, use relative paths,
which show-attr only handles 100% correctly in 1.7.7. Closes: #645046
Unfortunatly I can find no way to work with the old and new gits, as
the old had bugs that require absolute paths, while the new doesn't like
them at all. And the behavior of git show-attr in 1.7.7. is the same as
eg, git add of an absolute path to a symlink, so seems entirely
intentional and not likely to change.
* git-annex now asks git-annex-shell to verify that it's operating in
the expected repository.
* Note that this git-annex will not interoperate with remotes using
older versions of git-annex-shell.
The reason for this check is to avoid git-annex getting confused about
what remote repository actually contains a value. It's a prerequisite for
supporting git insteadOf aliases.
* New or changed repository descriptions in uuid.log now have a timestamp,
which is used to ensure the newest description is used when the uuid.log
has been merged.
* Note that older versions of git-annex will display the timestamp as part
of the repository description, which is ugly but otherwise harmless.
This yields a second or so speedup in unused, find, etc. Seems that even
when the ByteString is immediately split and then converted to Strings,
it's faster.
I may try to push ByteStrings out into more of git-annex gradually,
although I suspect most of the time-critical parts are already covered
now, and many of the rest rely on libraries that only support Strings.
Added Git.ByteString which replaces Git IO methods with ones using lazy
ByteStrings. This can be more efficient when large quantities of data are
being read from git.
In Git.LsTree, parse git ls-tree output more efficiently, thanks
to ByteString. This benchmarks 25% faster, in a benchmark that includes
(probably predominately) the run time for git ls-tree itself.
In real world numbers, this makes git annex unused 2 seconds faster for
each branch it needs to check, in my usual large repo.
Using Sets is the right thing; they have constant size lookup like my
SizeList, and logn insertation, which beats nub to death.
Runs faster than --fast mode did before, and gives accurate counts.
13 seconds total runtime with a warm cache in a repository with 40 thousand
keys.
find: Rather than only showing files whose contents are present, when used
with --exclude --copies or --in, displays all files that match the
specified conditions.
Note that this is a behavior change for find --exclude! Old behavior
can be gotten with find --in . --exclude=...
get, drop: Added --auto option, which decides whether to get/drop content
as needed to work toward the configured numcopies.
The problem with bundling it up in optimize was that I then found I wanted
to run an optmize that did not drop files, only got them. Considered adding
a --only-get switch to it, but that seemed wrong. Instead, let's make
existing subcommands optionally smarter.
Note that the only actual difference between drop and drop --auto is that
the latter does not even try to drop a file if it knows of not enough
copies, and does not print any error messages about files it was unable to
drop.
It might be nice to make get avoid asking git for attributes when not in
auto mode. For now it always asks for attributes.
First, this ensures that git annex addurl, when run repeatedly with the
same url, doesn't create duplicate files, which it did before when it
fell back to the longer filename.
Secondly, the file part of an url is frequently not very descriptive on its
own.
The uri scheme, auth, and port is intentionally left out, as clutter.
This includes a generic JSONStream library built on top of Text.JSON
(somewhat hackishly).
It would be possible to stream out a single json document describing
all actions, but it's probably better for consumers if they can expect
one json document per line, so I did it that way instead.
Output from external programs used for transferring files is not
currently hidden when outputting json, which probably makes it not very
useful there. This may be dealt with if there is demand for json
output for --get or --move to be parsable.
The version, status, and find subcommands have hand-crafted output and
don't do json. The whereis subcommand needs to be modified to produce
useful json.
Using a single strictness annotation, in just the right place.
Tried several others, none of which helped and some of which potentially
hurt. This is only the second time I've really had to deal with this in
a year of using haskell, which is, I suppose not that bad.
Statting files returned by dirContents to see if they exist and are regular
files seems pretty useless. This code was originally part of fsck, and
perhaps the idea then was to avoid things returned by dirContents that were
not files. But it's certianly not needed in the current use cases for
getKeysPresent.
when a git repository is first being created. Clones will automatically
notice that git-annex is in use and automatically perform a basic
initalization. It's still recommended to run "git annex init" in any
clones, to describe them.
The tricky part about this is that to generate a key, the file must be
present already. Worked around by adding (back) an URL key type, which
is used for addurl --fast.