![Joey Hess](/assets/img/avatar_default.png)
* unannex, uninit: Avoid committing after every file is unannexed, for massive speedup. * --notify-finish switch will cause desktop notifications after each file upload/download/drop completes (using the dbus Desktop Notifications Specification) * --notify-start switch will show desktop notifications when each file upload/download starts. * webapp: Automatically install Nautilus integration scripts to get and drop files. * tahoe: Pass -d parameter before subcommand; putting it after the subcommand no longer works with tahoe-lafs version 1.10. (Thanks, Alberto Berti) * forget --drop-dead: Avoid removing the dead remote from the trust.log, so that if git remotes for it still exist anywhere, git annex info will still know it's dead and not show it. * git-annex-shell: Make configlist automatically initialize a remote git repository, as long as a git-annex branch has been pushed to it, to simplify setup of remote git repositories, including via gitolite. * add --include-dotfiles: New option, perhaps useful for backups. * Version 5.20140227 broke creation of glacier repositories, not including the datacenter and vault in their configuration. This bug is fixed, but glacier repositories set up with the broken version of git-annex need to have the datacenter and vault set in order to be usable. This can be done using git annex enableremote to add the missing settings. For details, see http://git-annex.branchable.com/bugs/problems_with_glacier/ * Added required content configuration. * assistant: Improve ssh authorized keys line generated in local pairing or for a remote ssh server to set environment variables in an alternative way that works with the non-POSIX fish shell, as well as POSIX shells. # imported from the archive
73 lines
3.5 KiB
Markdown
73 lines
3.5 KiB
Markdown
Syncing works well when the graph of repositories is strongly connected.
|
|
Now I'm working on making it work reliably with less connected graphs.
|
|
|
|
I've been focusing on and testing a doubly-connected list of repositories,
|
|
such as: `A <-> B <-> C`
|
|
|
|
----
|
|
|
|
I was seeing a lot of git-annex branch push failures occuring in
|
|
this line of repositories topology. Sometimes was is able to recover from
|
|
these, but when two repositories were trying to push to one-another at the
|
|
same time, and both failed, both would pull and merge, which actually keeps
|
|
the git-annex branch still diverged. (The two merge commits differ.)
|
|
|
|
A large part of the problem was that it pushed directly into the git-annex
|
|
branch on the remote; the same branch the remote modifies. I changed it to
|
|
push to `synced/git-annex` on the remote, which avoids most push failures.
|
|
Only when A and C are both trying to push into `B/synced/git-annex` at the
|
|
same time would one fail, and need to pull, merge, and retry.
|
|
|
|
-----
|
|
|
|
With that change, git syncing always succeeded in my tests, and without
|
|
needing any retries. But with more complex sets of repositories, or more
|
|
traffic, it could still fail.
|
|
|
|
I want to avoid repeated retries, exponential backoffs, and that kind of
|
|
thing. It'd probably be good enough, but I'm not happy with it because
|
|
it could take arbitrarily long to get git in sync.
|
|
|
|
I've settled on letting it retry once to push to the synced/git-annex
|
|
and synced/master branches. If the retry fails, it enters a fallback mode,
|
|
which is guaranteed to succeed, as long as the remote is accessible.
|
|
|
|
The problem with the fallback mode is it uses really ugly branch names.
|
|
Which is why Joachim Breitner and I originally decided on making `git annex
|
|
sync` use the single `synced/master` branch, despite the potential for
|
|
failed syncs. But in the assistant, the requirements are different,
|
|
and I'm ok with the uglier names.
|
|
|
|
It does seem to make sense to only use the uglier names as a fallback,
|
|
rather than by default. This preserves compatability with `git annex sync`,
|
|
and it allows the assistant to delete fallback sync branches after it's
|
|
merged them, so the ugliness is temporary.
|
|
|
|
---
|
|
|
|
Also worked some today on a bug that prevents C from receiving files
|
|
added to A.
|
|
|
|
The problem is that file contents and git metadata sync independantly. So C
|
|
will probably receive the git metadata from B before B has finished
|
|
downloading the file from A. C would normally queue a download of the
|
|
content when it sees the file appear, but at this point it has nowhere to
|
|
get it from.
|
|
|
|
My first stab at this was a failure. I made each download of a file result
|
|
in uploads of the file being queued to every remote that doesn't have it
|
|
yet. So rather than C downloading from B, B uploads to C. Which works fine,
|
|
but then C sees this download from B has finished, and proceeds to try to
|
|
re-upload to B. Which rejects it, but notices that this download has
|
|
finished, so re-uploads it to C...
|
|
|
|
The problem with that approach is that I don't have an event when a download
|
|
succeeds, just an event when a download ends. Of course, C could skip
|
|
uploading back to the same place it just downloaded from, but loops are
|
|
still possible with other network topologies (ie, if D is connected to both
|
|
B and C, there would be an upload loop 'B -> C -> D -> B`). So unless I can
|
|
find a better event to hook into, this idea is doomed.
|
|
|
|
I do have another idea to fix the same problem. C could certainly remember
|
|
that it saw a file and didn't know where to get the content from, and then
|
|
when it receives a git push of a git-annex branch, try again.
|