Merge branch 'master' into assistant
This commit is contained in:
commit
fa3aef96e2
50 changed files with 582 additions and 46 deletions
|
@ -12,12 +12,13 @@ Feel free to chip in with comments! --[[Joey]]
|
|||
* Month 3 "easy setup": [[!traillink configurators]] [[!traillink pairing]]
|
||||
* Month 4 "polishing": [[!traillink cloud]] [[!traillink leftovers]]
|
||||
* Months 5-6 "9k bonus round": [[!traillink Android]] [[!traillink partial_content]]
|
||||
* Months 7-11: user-driven features and polishing
|
||||
* Month 12: "Windows purgatory" [[Windows]]
|
||||
|
||||
## not yet on the map:
|
||||
|
||||
* [[desymlink]]
|
||||
* [[deltas]]
|
||||
* In my overfunded nighmares: [[Windows]]
|
||||
|
||||
## blog
|
||||
|
||||
|
|
|
@ -23,7 +23,7 @@ In other words, I was lost in the weeds for a lot of those hours...
|
|||
|
||||
At one point, something glorious happened, and it was always making exactly
|
||||
one commit for batch mode modifications of a lot of files (like untarring
|
||||
them). Unfortunatly, I had to lose that gloriousness due to another
|
||||
them). Unfortunately, I had to lose that gloriousness due to another
|
||||
potential race, which, while unlikely, would have made the program deadlock
|
||||
if it happened.
|
||||
|
||||
|
@ -40,7 +40,7 @@ are still open for write.
|
|||
|
||||
This works great! Starting up `git annex watch` when processes have files
|
||||
open is no longer a problem, and even if you're evil enough to try having
|
||||
muliple processes open the same file, it will complain and not annex it
|
||||
multiple processes open the same file, it will complain and not annex it
|
||||
until all the writers close it.
|
||||
|
||||
(Well, someone really evil could turn the write bit back on after git annex
|
||||
|
|
|
@ -3,13 +3,13 @@ to `kqueue`, and Haskell code to use that library. By now I think I
|
|||
understand kqueue fairly well -- there are some very tricky parts to the
|
||||
interface.
|
||||
|
||||
But... it still did't work. After building all this, my code was
|
||||
But... it still didn't work. After building all this, my code was
|
||||
failing the same way that the
|
||||
[haskell kqueue library failed](https://github.com/hesselink/kqueue/issues/1)
|
||||
yesterday. I filed a [bug report with a testcase]().
|
||||
|
||||
Then I thought to ask on #haskell. Got sorted out in quick order! The
|
||||
problem turns out to be that haskell's runtime has a peridic SIGALARM,
|
||||
problem turns out to be that haskell's runtime has a periodic SIGALARM,
|
||||
that is interrupting my kevent call. It can be worked around with `+RTS -V0`,
|
||||
but I put in a fix to retry to kevent when it's interrupted.
|
||||
|
||||
|
|
|
@ -10,13 +10,13 @@ But it's not all easy. Syncing should happen as fast as possible, so
|
|||
changes show up without delay. Eventually it'll need to support syncing
|
||||
between nodes that cannot directly contact one-another. Syncing needs to
|
||||
deal with nodes coming and going; one example of that is a USB drive being
|
||||
plugged in, which should immediatly be synced, but network can also come
|
||||
plugged in, which should immediately be synced, but network can also come
|
||||
and go, so it should periodically retry nodes it failed to sync with. To
|
||||
start with, I'll be focusing on fast syncing between directly connected
|
||||
nodes, but I have to keep this wider problem space in mind.
|
||||
|
||||
One problem with `git annex sync` is that it has to be run in both clones
|
||||
in order for changes to fully propigate. This is because git doesn't allow
|
||||
in order for changes to fully propagate. This is because git doesn't allow
|
||||
pushing changes into a non-bare repository; so instead it drops off a new
|
||||
branch in `.git/refs/remotes/$foo/synced/master`. Then when it's run locally
|
||||
it merges that new branch into `master`.
|
||||
|
|
|
@ -12,7 +12,7 @@ not sufficient. There are two problems with it:
|
|||
So, instead, git-annex will use a regular `git merge`, and if it fails, it
|
||||
will fix up the conflicts.
|
||||
|
||||
That presented its own difficully, of finding which files in the tree
|
||||
That presented its own difficulty, of finding which files in the tree
|
||||
conflict. `git ls-files --unmerged` is the way to do that, but its output
|
||||
is a quite raw form:
|
||||
|
||||
|
@ -21,9 +21,9 @@ is a quite raw form:
|
|||
100644 1eabec834c255a127e2e835dadc2d7733742ed9a 2 bar
|
||||
100644 36902d4d842a114e8b8912c02d239b2d7059c02b 3 bar
|
||||
|
||||
I had to stare at the rather inpenetrable documentation for hours and
|
||||
I had to stare at the rather impenetrable documentation for hours and
|
||||
write a lot of parsing and processing code to get from that to these mostly
|
||||
self expanatory data types:
|
||||
self explanatory data types:
|
||||
|
||||
data Conflicting v = Conflicting
|
||||
{ valUs :: Maybe v
|
||||
|
|
|
@ -35,7 +35,7 @@ more threads:
|
|||
1. Uploads new data to every configured remote. Triggered by the watcher
|
||||
thread when it adds content. Easy; just use a `TSet` of Keys to send.
|
||||
|
||||
2. Downloads new data from the cheapest remote that has it. COuld be
|
||||
2. Downloads new data from the cheapest remote that has it. Could be
|
||||
triggered by the
|
||||
merger thread, after it merges in a git sync. Rather hard; how does it
|
||||
work out what new keys are in the tree without scanning it all? Scan
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
Well, sometimes you just have to go for the hack. Trying to find a way
|
||||
to add additional options to git-annex-shell without breaking backwards
|
||||
compatability, I noticed that it ignores all options after `--`, because
|
||||
compatibility, I noticed that it ignores all options after `--`, because
|
||||
those tend to be random rsync options due to the way rsync runs it.
|
||||
|
||||
So, I've added a new class of options, that come in between, like
|
||||
|
|
|
@ -21,5 +21,5 @@ nontrivial features can be added easily.
|
|||
|
||||
--
|
||||
|
||||
Next up: Enough nonsense with tracking tranfers... Time to start actually
|
||||
Next up: Enough nonsense with tracking transfers... Time to start actually
|
||||
transferring content around!
|
||||
|
|
|
@ -6,7 +6,7 @@ Details follow..
|
|||
|
||||
Made the committer thread queue Upload Transfers when new files
|
||||
are added to the annex. Currently it tries to transfer the new content
|
||||
to *every* remote; this innefficiency needs to be addressed later.
|
||||
to *every* remote; this inefficiency needs to be addressed later.
|
||||
|
||||
Made the watcher thread queue Download Transfers when new symlinks
|
||||
appear that point to content we don't have. Typically, that will happen
|
||||
|
@ -30,12 +30,12 @@ all the assistant's other threads from entering that monad while a transfer
|
|||
is running. This is also necessary to allow multiple concurrent transfers
|
||||
to run in the future.
|
||||
|
||||
This is a very tricky peice of code, because that thread will modify the
|
||||
This is a very tricky piece of code, because that thread will modify the
|
||||
git-annex branch, and its parent thread has to invalidate its cache in
|
||||
order to see any changes the child thread made. Hopefully that's the extent
|
||||
of the complication of doing this. The only reason this was possible at all
|
||||
is that git-annex already support multiple concurrent processes running
|
||||
and all making independant changes to the git-annex branch, etc.
|
||||
and all making independent changes to the git-annex branch, etc.
|
||||
|
||||
After all my groundwork this week, file content transferring is now
|
||||
fully working!
|
||||
|
|
31
doc/design/assistant/blog/day_27__robust_transfers.mdwn
Normal file
31
doc/design/assistant/blog/day_27__robust_transfers.mdwn
Normal file
|
@ -0,0 +1,31 @@
|
|||
Spent most of the day making file content transfers robust. There were lots
|
||||
of bugs, hopefully I've fixed most of them. It seems to work well now,
|
||||
even when I throw a lot of files at it.
|
||||
|
||||
One of the changes also sped up transfers; it no longer roundtrips to the
|
||||
remote to verify it has a file. The idea here is that when the assistant is
|
||||
running, repos should typically be fairly tightly synced to their remotes
|
||||
by it, so some of the extra checks that the `move` command does are
|
||||
unnecessary.
|
||||
|
||||
Also spent some time trying to use ghc's threaded runtime, but continue to
|
||||
be baffled by the random hangs when using it. This needs fixing eventually;
|
||||
all the assistant's threads can potentially be blocked when it's waiting on
|
||||
an external command it has run.
|
||||
|
||||
Also changed how transfer info files are locked. The lock file is now
|
||||
separate from the info file, which allows the TransferWatcher thread to
|
||||
notice when an info file is created, and thus actually track transfers
|
||||
initiated by remotes.
|
||||
|
||||
---
|
||||
|
||||
I'm fairly close now to merging the `assistant` branch into `master`.
|
||||
The data syncing code is very brute-force, but it will work well enough
|
||||
for a first cut.
|
||||
|
||||
Next I can either add some repository network mapping, and use graph
|
||||
analysis to reduce the number of data transfers, or I can move on to the
|
||||
[[webapp]]. Not sure yet which I'll do. It's likely that since DebConf
|
||||
begins tomorrow I'll put off either of those big things until after the
|
||||
conference.
|
|
@ -0,0 +1,17 @@
|
|||
I didn't plan to work on git-annex much while at DebConf, because the conference
|
||||
always prevents the kind of concentration I need. But I unexpectedly also had to deal
|
||||
with [three dead drives](http://joeyh.name/blog/entry/I_am_become_Joey_destroyer_of_drives/)
|
||||
and illness this week.
|
||||
|
||||
That said, I have been trying to debug a problem with git-annex and Haskell's threaded
|
||||
runtime all week. It just hangs, randomly. No luck so far isolating why, although I now
|
||||
have a branch that hangs fairly reliably, and in which I am trying to whittle the entire
|
||||
git-annex code base (all 18 thousand lines!) into a nice test case.
|
||||
|
||||
This threaded runtime problem doesn't affect the assistant yet, but if I want to use
|
||||
Yesod in developing the webapp, I'll need the threaded runtime, and using the threaded
|
||||
runtime in the assistant generally would make it more responsive and less hacky.
|
||||
|
||||
Since this is a task I can work on without much concentration, I'll probably keep beating
|
||||
on it until I return home. Then I need to spend some quality thinking time on where
|
||||
to go next in the assistant.
|
|
@ -1,6 +1,6 @@
|
|||
Last night I got `git annex watch` to also handle deletion of files.
|
||||
This was not as tricky as feared; the key is using `git rm --ignore-unmatch`,
|
||||
which avoids most problimatic situations (such as a just deleted file
|
||||
which avoids most problematic situations (such as a just deleted file
|
||||
being added back before git is run).
|
||||
|
||||
Also fixed some races when `git annex watch` is doing its startup scan of
|
||||
|
|
9
doc/design/assistant/blog/day_36__minimal_test_case.mdwn
Normal file
9
doc/design/assistant/blog/day_36__minimal_test_case.mdwn
Normal file
|
@ -0,0 +1,9 @@
|
|||
Managed to find a minimal, 20 line test case for at least one of the ways
|
||||
git-annex was hanging with GHC's threaded runtime. Sent it off to
|
||||
haskell-cafe for analysis.
|
||||
[thread](http://news.gmane.org/gmane.comp.lang.haskell.cafe)
|
||||
|
||||
Further managed to narrow the bug down to MissingH's use of logging code,
|
||||
that git-annex doesn't use. [bug report](http://bugs.debian.org/681621).
|
||||
So, I can at least get around this problem with a modified version of
|
||||
MissingH. Hopefully that was the only thing causing the hangs I was seeing!
|
|
@ -16,7 +16,7 @@ thread that wakes up periodically, flushes the queue, and autocommits.
|
|||
(This will, in fact, be the start of the [[syncing]] phase of my roadmap!)
|
||||
There's lots of room here for smart behavior. Like, if a lot of changes are
|
||||
being made close together, wait for them to die down before committing. Or,
|
||||
if it's been idle and a single file appears, commit it immediatly, since
|
||||
if it's been idle and a single file appears, commit it immediately, since
|
||||
this is probably something the user wants synced out right away. I'll start
|
||||
with something stupid and then add the smarts.
|
||||
|
||||
|
|
|
@ -11,7 +11,7 @@ things slow and ugly. This was not unexpected.
|
|||
|
||||
So next, I added some smarts to it. First, I wanted to stop it waking up
|
||||
every second when there was nothing to do, and instead blocking wait on a
|
||||
change occuring. Secondly, I wanted it to know when past changes happened,
|
||||
change occurring. Secondly, I wanted it to know when past changes happened,
|
||||
so it could detect batch mode scenarios, and avoid committing too
|
||||
frequently.
|
||||
|
||||
|
@ -52,6 +52,6 @@ shouldCommit now changetimes
|
|||
thisSecond t = now `diffUTCTime` t <= 1
|
||||
"""]]
|
||||
|
||||
Still some polishing to do to eliminate minor innefficiencies and deal
|
||||
Still some polishing to do to eliminate minor inefficiencies and deal
|
||||
with more races, but this part of the git-annex assistant is now very usable,
|
||||
and will be going out to my beta testers soon!
|
||||
|
|
|
@ -24,7 +24,7 @@ symlinks might have just been deleted and re-added, or changed, and
|
|||
the index still have the old value.
|
||||
|
||||
Instead, I got creative. :) We can't trust what the index says about the
|
||||
symlink, but if the index happens to contian a symlink that looks right,
|
||||
symlink, but if the index happens to contain a symlink that looks right,
|
||||
we can trust that the SHA1 of its blob is the right SHA1, and reuse it
|
||||
when re-staging the symlink. Wham! Massive speedup!
|
||||
|
||||
|
|
|
@ -10,7 +10,7 @@ own git index parser (or use one from Hackage), this check requires running
|
|||
tree of files is being moved or unpacked into the watched directory.
|
||||
|
||||
Instead, I made it only do the check during `git annex watch`'s initial
|
||||
scan of the tree. This should be ok, because once it's running, you
|
||||
scan of the tree. This should be OK, because once it's running, you
|
||||
won't be adding new files to git anyway, since it'll automatically annex
|
||||
new files. This is good enough for now, but there are at least two problems
|
||||
with it:
|
||||
|
|
|
@ -16,7 +16,7 @@ quickly is really only important so people don't think it's a resource hog.
|
|||
First impressions are important. :)
|
||||
|
||||
But what does "made recently" mean exactly? Well, my answer is possibly
|
||||
overengineered, but most of it is really groundwork for things I'll need
|
||||
over engineered, but most of it is really groundwork for things I'll need
|
||||
later anyway. I added a new data structure for tracking the status of the
|
||||
daemon, which is periodically written to disk by another thread (thread #6!)
|
||||
to `.git/annex/daemon.status` Currently it looks like this; I anticipate
|
||||
|
|
|
@ -3,11 +3,11 @@ all the other git clones, at both the git level and the key/value level.
|
|||
|
||||
## immediate action items
|
||||
|
||||
* Check that download transfer triggering code works (when a symlink appears
|
||||
and the remote does *not* upload to us.
|
||||
* At startup, and possibly periodically, look for files we have that
|
||||
location tracking indicates remotes do not, and enqueue Uploads for
|
||||
them. Also, enqueue Downloads for any files we're missing.
|
||||
* After git sync, identify content that we don't have that is now available
|
||||
on remotes, and transfer.
|
||||
|
||||
## longer-term TODO
|
||||
|
||||
|
@ -29,6 +29,9 @@ all the other git clones, at both the git level and the key/value level.
|
|||
only uploading new files but not downloading, and only downloading
|
||||
files in some directories and not others. See for use cases:
|
||||
[[forum/Wishlist:_options_for_syncing_meta-data_and_data]]
|
||||
* speed up git syncing by using the cached ssh connection for it too
|
||||
(will need to use `GIT_SSH`, which needs to point to a command to run,
|
||||
not a shell command line)
|
||||
|
||||
## misc todo
|
||||
|
||||
|
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://www.google.com/accounts/o8/id?id=AItOawlup4hyZo4eCjF8T85vfRXMKBxGj9bMdl0"
|
||||
nickname="Ben"
|
||||
subject="ARM support"
|
||||
date="2012-07-13T16:51:15Z"
|
||||
content="""
|
||||
The closure of [this](http://hackage.haskell.org/trac/ghc/ticket/5839) ticket hopefully marks the end of TH issues on ARM. As of 7.4.2, GHC's linker has enough ARM support to allow a selection of common packages compile on my PandaBoard. That being said, it hasn't had a whole lot of testing so it's possible I still need to implement a few relocation types.
|
||||
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue