Checks location log information, and file contents.
Does not check that numcopies is satisfied, as .gitattributes information
about numcopies is not available in a bare repository. In practice, that
should not be a problem, since fsck is also run in a checkout and will
check numcopies there.
Specifically, disabled trying to update the git-annex branch on the remote,
since that data is never used by operations that act on such remotes.
Also, when copying content to such a remote, skip committing the presence
information changes to its git-annex branch. Leaving it in the journal there
is ok: Any command run on the remote that needs the info will flush the
journal.
This may partially solve this bug:
http://git-annex.branchable.com/bugs/fails_to_handle_lot_of_files/
Although I still see unreaped git processes piling up when doing a copy --to.
* This version of git-annex only works with git 1.7.7 and newer.
The breakage with old versions is subtle, and affects
annex.numcopies .gitattributes settings, so be sure to upgrade git
to 1.7.7. (Debian package now depends on that version.)
* Don't pass absolute paths to git show-attr, as it started following
symlinks when that's done in 1.7.7. Instead, use relative paths,
which show-attr only handles 100% correctly in 1.7.7. Closes: #645046
Unfortunatly I can find no way to work with the old and new gits, as
the old had bugs that require absolute paths, while the new doesn't like
them at all. And the behavior of git show-attr in 1.7.7. is the same as
eg, git add of an absolute path to a symlink, so seems entirely
intentional and not likely to change.
* git-annex now asks git-annex-shell to verify that it's operating in
the expected repository.
* Note that this git-annex will not interoperate with remotes using
older versions of git-annex-shell.
The reason for this check is to avoid git-annex getting confused about
what remote repository actually contains a value. It's a prerequisite for
supporting git insteadOf aliases.
* New or changed repository descriptions in uuid.log now have a timestamp,
which is used to ensure the newest description is used when the uuid.log
has been merged.
* Note that older versions of git-annex will display the timestamp as part
of the repository description, which is ugly but otherwise harmless.
This yields a second or so speedup in unused, find, etc. Seems that even
when the ByteString is immediately split and then converted to Strings,
it's faster.
I may try to push ByteStrings out into more of git-annex gradually,
although I suspect most of the time-critical parts are already covered
now, and many of the rest rely on libraries that only support Strings.
Added Git.ByteString which replaces Git IO methods with ones using lazy
ByteStrings. This can be more efficient when large quantities of data are
being read from git.
In Git.LsTree, parse git ls-tree output more efficiently, thanks
to ByteString. This benchmarks 25% faster, in a benchmark that includes
(probably predominately) the run time for git ls-tree itself.
In real world numbers, this makes git annex unused 2 seconds faster for
each branch it needs to check, in my usual large repo.
Using Sets is the right thing; they have constant size lookup like my
SizeList, and logn insertation, which beats nub to death.
Runs faster than --fast mode did before, and gives accurate counts.
13 seconds total runtime with a warm cache in a repository with 40 thousand
keys.
find: Rather than only showing files whose contents are present, when used
with --exclude --copies or --in, displays all files that match the
specified conditions.
Note that this is a behavior change for find --exclude! Old behavior
can be gotten with find --in . --exclude=...
get, drop: Added --auto option, which decides whether to get/drop content
as needed to work toward the configured numcopies.
The problem with bundling it up in optimize was that I then found I wanted
to run an optmize that did not drop files, only got them. Considered adding
a --only-get switch to it, but that seemed wrong. Instead, let's make
existing subcommands optionally smarter.
Note that the only actual difference between drop and drop --auto is that
the latter does not even try to drop a file if it knows of not enough
copies, and does not print any error messages about files it was unable to
drop.
It might be nice to make get avoid asking git for attributes when not in
auto mode. For now it always asks for attributes.
First, this ensures that git annex addurl, when run repeatedly with the
same url, doesn't create duplicate files, which it did before when it
fell back to the longer filename.
Secondly, the file part of an url is frequently not very descriptive on its
own.
The uri scheme, auth, and port is intentionally left out, as clutter.