diff --git a/doc/design/assistant/blog/day_83__3-way.mdwn b/doc/design/assistant/blog/day_83__3-way.mdwn new file mode 100644 index 0000000000..d58ec9fe57 --- /dev/null +++ b/doc/design/assistant/blog/day_83__3-way.mdwn @@ -0,0 +1,73 @@ +Syncing works well when the graph of repositories is strongly connected. +Now I'm working on making it work reliably with less connected graphs. + +I've been focusing on and testing a doubly-connected list of repositories, +such as: `A <-> B <-> C` + +---- + +I was seeing a lot of git-annex branch push failures occuring in +this line of repositories topology. Sometimes was is able to recover from +these, but when two repositories were trying to push to one-another at the +same time, and both failed, both would pull and merge, which actually keeps +the git-annex branch still diverged. (The two merge commits differ.) + +A large part of the problem was that it pushed directly into the git-annex +branch on the remote; the same branch the remote modifies. I changed it to +push to `synced/git-annex` on the remote, which avoids most push failures. +Only when A and C are both trying to push into `B/synced/git-annex` at the +same time would one fail, and need to pull, merge, and retry. + +----- + +With that change, git syncing always succeeded in my tests, and without +needing any retries. But with more complex sets of repositories, or more +traffic, it could still fail. + +I want to avoid repeated retries, exponential backoffs, and that kind of +thing. It'd probably be good enough, but I'm not happy with it because +it could take arbitrarily long to get git in sync. + +I've settled on letting it retry once to push to the synced/git-annex +and synced/master branches. If the retry fails, it enters a fallback mode, +which is guaranteed to succeed, as long as the remote is accessible. + +The problem with the fallback mode is it uses really ugly branch names. +Which is why Joachim Breitner and I originally decided on making `git annex +sync` use the single `synced/master` branch, despite the potential for +failed syncs. But in the assistant, the requirements are different, +and I'm ok with the uglier names. + +It does seem to make sense to only use the uglier names as a fallback, +rather than by default. This preserves compatability with `git annex sync`, +and it allows the assistant to delete fallback sync branches after it's +merged them, so the ugliness is temporary. + +--- + +Also worked some today on a bug that prevents C from receiving files +added to A. + +The problem is that file contents and git metadata sync independantly. So C +will probably receive the git metadata from B before B has finished +downloading the file from A. C would normally queue a download of the +content when it sees the file appear, but at this point it has nowhere to +get it from. + +My first stab at this was a failure. I made each download of a file result +in uploads of the file being queued to every remote that doesn't have it +yet. So rather than C downloading from B, B uploads to C. Which works fine, +but then C sees this download from B has finished, and proceeds to try to +re-upload to B. Which rejects it, but notices that this download has +finished, so re-uploads it to C... + +The problem with that approach is that I don't have an event when a download +succeeds, just an event when a download ends. Of course, C could skip +uploading back to the same place it just downloaded from, but loops are +still possible with other network topologies (ie, if D is connected to both +B and C, there would be an upload loop 'B -> C -> D -> B`). So unless I can +find a better event to hook into, this idea is doomed. + +I do have another idea to fix the same problem. C could certianly remember +that it saw a file and didn't know where to get the content from, and then +when it receives a git push of a git-annex branch, try again.