git-annex/doc/design/assistant/blog/day_83__3-way.mdwn
Joey Hess b6d46c212e git-annex (5.20140402) unstable; urgency=medium
* unannex, uninit: Avoid committing after every file is unannexed,
    for massive speedup.
  * --notify-finish switch will cause desktop notifications after each
    file upload/download/drop completes
    (using the dbus Desktop Notifications Specification)
  * --notify-start switch will show desktop notifications when each
    file upload/download starts.
  * webapp: Automatically install Nautilus integration scripts
    to get and drop files.
  * tahoe: Pass -d parameter before subcommand; putting it after
    the subcommand no longer works with tahoe-lafs version 1.10.
    (Thanks, Alberto Berti)
  * forget --drop-dead: Avoid removing the dead remote from the trust.log,
    so that if git remotes for it still exist anywhere, git annex info
    will still know it's dead and not show it.
  * git-annex-shell: Make configlist automatically initialize
    a remote git repository, as long as a git-annex branch has
    been pushed to it, to simplify setup of remote git repositories,
    including via gitolite.
  * add --include-dotfiles: New option, perhaps useful for backups.
  * Version 5.20140227 broke creation of glacier repositories,
    not including the datacenter and vault in their configuration.
    This bug is fixed, but glacier repositories set up with the broken
    version of git-annex need to have the datacenter and vault set
    in order to be usable. This can be done using git annex enableremote
    to add the missing settings. For details, see
    http://git-annex.branchable.com/bugs/problems_with_glacier/
  * Added required content configuration.
  * assistant: Improve ssh authorized keys line generated in local pairing
    or for a remote ssh server to set environment variables in an
    alternative way that works with the non-POSIX fish shell, as well
    as POSIX shells.

# imported from the archive
2014-04-02 21:42:53 +01:00

73 lines
3.5 KiB
Markdown

Syncing works well when the graph of repositories is strongly connected.
Now I'm working on making it work reliably with less connected graphs.
I've been focusing on and testing a doubly-connected list of repositories,
such as: `A <-> B <-> C`
----
I was seeing a lot of git-annex branch push failures occuring in
this line of repositories topology. Sometimes was is able to recover from
these, but when two repositories were trying to push to one-another at the
same time, and both failed, both would pull and merge, which actually keeps
the git-annex branch still diverged. (The two merge commits differ.)
A large part of the problem was that it pushed directly into the git-annex
branch on the remote; the same branch the remote modifies. I changed it to
push to `synced/git-annex` on the remote, which avoids most push failures.
Only when A and C are both trying to push into `B/synced/git-annex` at the
same time would one fail, and need to pull, merge, and retry.
-----
With that change, git syncing always succeeded in my tests, and without
needing any retries. But with more complex sets of repositories, or more
traffic, it could still fail.
I want to avoid repeated retries, exponential backoffs, and that kind of
thing. It'd probably be good enough, but I'm not happy with it because
it could take arbitrarily long to get git in sync.
I've settled on letting it retry once to push to the synced/git-annex
and synced/master branches. If the retry fails, it enters a fallback mode,
which is guaranteed to succeed, as long as the remote is accessible.
The problem with the fallback mode is it uses really ugly branch names.
Which is why Joachim Breitner and I originally decided on making `git annex
sync` use the single `synced/master` branch, despite the potential for
failed syncs. But in the assistant, the requirements are different,
and I'm ok with the uglier names.
It does seem to make sense to only use the uglier names as a fallback,
rather than by default. This preserves compatability with `git annex sync`,
and it allows the assistant to delete fallback sync branches after it's
merged them, so the ugliness is temporary.
---
Also worked some today on a bug that prevents C from receiving files
added to A.
The problem is that file contents and git metadata sync independantly. So C
will probably receive the git metadata from B before B has finished
downloading the file from A. C would normally queue a download of the
content when it sees the file appear, but at this point it has nowhere to
get it from.
My first stab at this was a failure. I made each download of a file result
in uploads of the file being queued to every remote that doesn't have it
yet. So rather than C downloading from B, B uploads to C. Which works fine,
but then C sees this download from B has finished, and proceeds to try to
re-upload to B. Which rejects it, but notices that this download has
finished, so re-uploads it to C...
The problem with that approach is that I don't have an event when a download
succeeds, just an event when a download ends. Of course, C could skip
uploading back to the same place it just downloaded from, but loops are
still possible with other network topologies (ie, if D is connected to both
B and C, there would be an upload loop 'B -> C -> D -> B`). So unless I can
find a better event to hook into, this idea is doomed.
I do have another idea to fix the same problem. C could certainly remember
that it saw a file and didn't know where to get the content from, and then
when it receives a git push of a git-annex branch, try again.