tweak-fetch is a new git hook I have developed (not yet accepted into
git, but looking bright). Amoung other things, the hook can be used to
observe what is being fetched, notice remote git-annex branches that might
be updated, and merge them into the git-annex branch.
This will solve problems where users do a git pull, immediately followed
by a push, and it refuses to push because their git-annex branch is
diverged, and they neither ran git annex merge by hand, nor ran other
git-annex commands that auto-merge.
The tweak-fetch is written by git annex init. Of course, existing
repositories won't have it, which is ok, because git-annex still
automatically does a merge if changed branches have appeared. Indeed,
it will always need to do that check, as long as it needs to support
support git-annex branches that might be updated by other means.
Eventually though, I will want to ensure all repositories have the
tweak-fetch hook. Perhaps a minor verison upgrade to ensure it is added?
A subtlety of the hook is that when it's run, the remote tracking refs
have not yet been updated. So Annex.Branch.updateTo has to be careful to
only use the sha1 that was fetched, not the branch name. The branch
name is only used in the commit message.
The other tricky thing is that git tweak-fetch hook should *only*
output lines in a specific format, and git will be unhappy if it also
outputs status messages, etc. So those messages are sent to stderr.
It would be nice if command-specific options were supported. The first
difficulty is that which command is being called is not known until after
getopt; but that could be worked around by finding the first non-dashed
parameter. Storing the settings without putting them in the annex monad is
the next difficulty; it could perhaps be handled by making the seek stage
pass applicable settings into the start stage (and from there on to perform
as needed). But that still leaves a problem, what data type to use to
represent the options between getopt and seek?
Using Sets is the right thing; they have constant size lookup like my
SizeList, and logn insertation, which beats nub to death.
Runs faster than --fast mode did before, and gives accurate counts.
13 seconds total runtime with a warm cache in a repository with 40 thousand
keys.
find: Rather than only showing files whose contents are present, when used
with --exclude --copies or --in, displays all files that match the
specified conditions.
Note that this is a behavior change for find --exclude! Old behavior
can be gotten with find --in . --exclude=...
get, drop: Added --auto option, which decides whether to get/drop content
as needed to work toward the configured numcopies.
The problem with bundling it up in optimize was that I then found I wanted
to run an optmize that did not drop files, only got them. Considered adding
a --only-get switch to it, but that seemed wrong. Instead, let's make
existing subcommands optionally smarter.
Note that the only actual difference between drop and drop --auto is that
the latter does not even try to drop a file if it knows of not enough
copies, and does not print any error messages about files it was unable to
drop.
It might be nice to make get avoid asking git for attributes when not in
auto mode. For now it always asks for attributes.
This includes a generic JSONStream library built on top of Text.JSON
(somewhat hackishly).
It would be possible to stream out a single json document describing
all actions, but it's probably better for consumers if they can expect
one json document per line, so I did it that way instead.
Output from external programs used for transferring files is not
currently hidden when outputting json, which probably makes it not very
useful there. This may be dealt with if there is demand for json
output for --get or --move to be parsable.
The version, status, and find subcommands have hand-crafted output and
don't do json. The whereis subcommand needs to be modified to produce
useful json.
Backends are now only used to generate keys (and check them); they
are not arbitrary key-value stores for data, because it turned out such
a store is better modeled as a special remote. Updated docs to not
imply backends do more than they do now.
Sometimes I'm tempted to rename "backend" to "keytype" or something,
which would really be more clear. But it would be an annoying transition
for users, with annex.backends etc.
The tricky part about this is that to generate a key, the file must be
present already. Worked around by adding (back) an URL key type, which
is used for addurl --fast.
This allows eg, `git-annex -c annex.rsync-options=-6 get file`
The overridden git configs are not passed on to git plumbing commands
that are run. Perhaps someone will find a need to do that, but I don't yet
and it would require storing more state to know what config settings
have been overridden and need to be passed on.