tweak-fetch is a new git hook I have developed (not yet accepted into
git, but looking bright). Amoung other things, the hook can be used to
observe what is being fetched, notice remote git-annex branches that might
be updated, and merge them into the git-annex branch.
This will solve problems where users do a git pull, immediately followed
by a push, and it refuses to push because their git-annex branch is
diverged, and they neither ran git annex merge by hand, nor ran other
git-annex commands that auto-merge.
The tweak-fetch is written by git annex init. Of course, existing
repositories won't have it, which is ok, because git-annex still
automatically does a merge if changed branches have appeared. Indeed,
it will always need to do that check, as long as it needs to support
support git-annex branches that might be updated by other means.
Eventually though, I will want to ensure all repositories have the
tweak-fetch hook. Perhaps a minor verison upgrade to ensure it is added?
A subtlety of the hook is that when it's run, the remote tracking refs
have not yet been updated. So Annex.Branch.updateTo has to be careful to
only use the sha1 that was fetched, not the branch name. The branch
name is only used in the commit message.
The other tricky thing is that git tweak-fetch hook should *only*
output lines in a specific format, and git will be unhappy if it also
outputs status messages, etc. So those messages are sent to stderr.
A crash on parsing was fixed a while ago. This adds support for fully
correctly parsing multiline git config values, using git config --null.
Since git-annex-shell configlist uses normal git config output, I left in
support for that too; the two forms of config output can be easily
identified by the parser. Since configlist only prints the annex.uuid
config, there's no risk of multiline values there, so no need to change it.
Added files don't have to be committed before they can be unannexed.
unannex no longer commits existing staged changes
unannex of the last file in a directory now works, before it failed because
git rm deleted the directory out from under it,
Supporting multiple directory hash types will allow converting to a
different one, without a flag day.
gitAnnexLocation now checks which of the possible locations have a file.
This means more statting of files. Several places currently use
gitAnnexLocation and immediately check if the returned file exists;
those need to be optimised.
The only fully supported thing is to have the main repository on one disk,
and .git/annex on another. Only commands that move data in/out of the annex
will need to copy it across devices.
There is only partial support for putting arbitrary subdirectories of
.git/annex on different devices. For one thing, but this can require more
copies to be done. For example, when .git/annex/tmp is on one device, and
.git/annex/journal on another, every journal write involves a call to
mv(1). Also, there are a few places that make hard links between various
subdirectories of .git/annex with createLink, that are not handled.
In the common case without cross-device, the new moveFile is actually
faster than renameFile, avoiding an unncessary stat to check that a file
(not a directory) is being moved. Of course if a cross-device move is
needed, it is as slow as mv(1) of the data.
It would be nice if command-specific options were supported. The first
difficulty is that which command is being called is not known until after
getopt; but that could be worked around by finding the first non-dashed
parameter. Storing the settings without putting them in the annex monad is
the next difficulty; it could perhaps be handled by making the seek stage
pass applicable settings into the start stage (and from there on to perform
as needed). But that still leaves a problem, what data type to use to
represent the options between getopt and seek?
Left out the backend usage graph for now, and bad/temp directory sizes
are only displayed when present. Also, disk usage is returned as a string
with units, which I can see changing later.
In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for
those. Note that this does not prevent mixing up of eg, refs and branches
at the type level. Since git really doesn't care, except rare cases like
git update-ref, or git tag -d, that seems ok for now.
There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref
may or may not be a tree-ish, depending on the object type, so there seems
no point in trying to represent it at the type level.
semitrusted uuids rarely are listed in trust.log, so a special case
is needed to get a list of them. Take the difference of all known uuids
with non-semitrusted uuids.
More accurately, it was supported already when map uses git-annex-shell,
but not when it does not.
Note that the user name cannot be shell escaped using git-annex's current
approach for shell escaping. I tried and some shells like dash cannot
cd ~'joey'. Rest of directory is still shell escaped, not for security but
in case a directory has a space or other weird character.