The smudge filter does need to be run, because if the key is in the local
annex already (due to renaming, or a copy of a file added, or a new file
added and its content has already arrived), git merge smudges the file and
this should provide its content.
This does probably mean that in merge conflict resolution, git smudges the
existing file, re-copying all its content to it, and then the file is
deleted. So, not efficient.
This is a behavior change for merge conflicts between locked files
that both pointed to the same key, in different ways.
Before, the conflict was resolved, but the file was renamed to .variant.
This was unnecessary, because there was only one variant.
Of course, this also handles conflicts between unlocked and locked, or even
two unlocked files with different pointer contents.
Since the file was present and locked, its annex object was not in the
inode cache. So, despite not needing to update the annex object when the
clean filter is run on the content by git merge, it does need to record the
inode cache of the annex object. Otherwise, the annex object will be
assumed to be bad, since its inode is not cached.
Several tricky parts:
* When the conflict is just between the same key being locked and unlocked,
the unlocked version wins, and the file is not renamed in this case.
* Need to update associated file map when conflict resolution renames
an unlocked file.
* git merge runs the smudge filter on the conflicting file, and actually
overwrites the file with the same content it had before, and so
invalidates its inode cache. This makes it difficult to know when it's
safe to remove such files as conflict cruft, without going so far as to
compare their entire contents.
Dealt with this by preventing the smudge filter from populating the file
when a merge is run. However, that also prevents the smudge filter being
run for non-conflicting files, so eg moving a file won't put its new
content into place.
* Ideally, if a merge or a merge conflict resolution renames an unlocked
file, the file in the work tree can just be moved, rather than copying
the content to a new worktree file.
This is attempted to be done in merge conflict resolution, but
due to git merge's behavior of running smudge filters, what actually
seems to happen is the old worktree file with the content is deleted and
rewritten as a pointer file, so doesn't get reused.
So, this is probably not as efficient as it optimally could be.
If that becomes a problem, could look into running the merge in a separate
worktree and updating the real worktree more efficiently, similarly to the
direct mode merge. However, the direct mode merge had a lot of bugs, and
I'd rather not use that more error-prone method unless really needed.
Make these features solely dependent on the OS being built on.
This lets stack build on windows w/o XMPP, on OSX w/o DBUS,
and on Linux with everything.
Since stack is being used to build the OSX autouild now, I want xmpp
enabled.
This means stack can't be used to build git-annex on windows, unless the
user edits this file and disables xmpp. Unfortunate that stack is so
unconfigurable, compared with cabal..
Now available on mips, mipsel, but temporarily removed armel since build is
failing there.
If armel would just get caught up, I could remove the per-arch specs
entirely.
Maybe time to turn maint of this over to richih?
Decided it's too scary to make v6 unlocked files have 1 copy by default,
but that should be available to those who need it. This is consistent with
git-annex not dropping unused content without --force, etc.
* Added annex.thin setting, which makes unlocked files in v6 repositories
be hard linked to their content, instead of a copy. This saves disk
space but means any modification of an unlocked file will lose the local
(and possibly only) copy of the old version.
* Enable annex.thin by default on upgrade from direct mode to v6, since
direct mode made the same tradeoff.
* fix: Adjusts unlocked files as configured by annex.thin.
This optimisation was not necessary, and didn't work for v6 unlocked files.
Typically only a small number of files will be changed by a commit, so just
catKey them all.
In copyFromRemote, it used to check isDirect, but that was not needed;
the remote is sending the file, so it doesn't matter if the local,
receiving repository is in direct mode or not. And, since the content is not
present, yet, it's certianly not unlocked. Note that, the remote may indeed
be sending an unlocked file, but sendkey uses sendAnnex, which will detect
if the file is modified before or during transfer, and will exit nonzero,
aborting the upload. So, the receiver doesn't need any checks.
In copyToRemote, it forces recvkey to verify content whenever it's being
sent from a v6 repository. recvkey is almost always going to verify content
anyway, unless annex.verify is not set. So, this doesn't make it any more
expensive, except for in that unusual configuration. The alternative would
be to change the recvkey interface, so that the sender checks afterwards if
what it was sending changed, and the receiver then throws out the bad
transfer. That would be less expensive for the reciever, as it would not
need to do a checksum verification. But, it would mean another network
round trip, and since rsync closes the connection, it would need to open
another ssh connection to do this. Even with connction caching, that would
add latency to uploads. It would also complicate the interface, especially
because an older git-annex-shell would not have the new interface
available. For these reasons, I prefer punting on that at this time, and
instead someone might set annex.verify=false and be unhappy that it still
verifies..
(One other gotcha not dealt with is that a v5 repo could be upgraded to v6
while an upload is in progress, and a file unlocked and modified.)
(Also, I double-checked Remote.GCrypt's calls to rsyncParamsRemote, and
they're fine. When a file is being uploaded to gcrypt, or any other special
repository, it is mediated by sendAnnex, so changes will be detected at
that level and the special remote implementation doesn't need to worry
about them.)
The direct flag is also set when sending unlocked content, to support old
versions of git-annex-shell. At some point, the direct flag will be
removed, and only the unlocked flag will be used.