Merge branch 'master' into assistant

Conflicts:
	debian/changelog
This commit is contained in:
Joey Hess 2012-08-16 16:36:32 -07:00
commit cbca93cf7c
31 changed files with 337 additions and 10 deletions

View file

@ -31,7 +31,7 @@ Time for the first of probably many polls!
What should the default directory name used by the git-annex assistant be?
[[!poll open=no 19 "Annex" 7 "GitAnnex" 10 "Synced" 0 "AutoSynced" 1 "Shared" 10 "something lowercase!" 1 "CowboyNeal" 1 "Annexbox"]]
[[!poll open=no 19 "Annex" 7 "GitAnnex" 1 "~/git-annex/" 10 "Synced" 0 "AutoSynced" 1 "Shared" 10 "something lowercase!" 1 "CowboyNeal" 1 "Annexbox"]]
(Note: This is a wiki. You can edit this page to add your own
[[ikiwiki/directive/poll]] options!)

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://cweiske.de/"
nickname="cweiske"
subject="comment 1"
date="2012-08-15T22:02:14Z"
content="""
I thought that most (if not all) browsers prevent opening local files/directories via web pages. How did you solve that problem?
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="http://joeyh.name/"
subject="comment 2"
date="2012-08-16T23:34:12Z"
content="""
Yes, it's surprising to me when I see it work, and I wrote it. :) Since git-annex is running locally on the same system as the browser, it can open the file manager when the user clicks on a link in the web browser. To the web browser, this is just an AJAX request.
"""]]

View file

@ -0,0 +1,8 @@
A bit under the weather, but got into building buttons to control running
and queued transfers today. The html and javascript side is done, with
each transfer now having a cancel button, as well as a pause/start button.
Canceling queued transfers works. Canceling running transfers will
need some more work, because killing a thread doesn't kill the processes
being run by that thread. So I'll have to make the assistant run separate
git-annex processes for transfers, that can be individually sent signals.

View file

@ -0,0 +1,40 @@
Probably won't be doing any big coding on the git-annex assistant in the
upcoming week, as I'll be traveling and/or slightly ill enough that I can't
fully get into <a href="http://en.wikipedia.org/wiki/Flow_(psychology)">flow</a>.
---
There was a new Yesod release this week, which required minor changes to
make the webapp build with it. I managed to keep the old version of Yesod
also supported, and plan to keep that working so it can be built with the
version of Yesod available in, eg, Linux distributions. TBD how much pain
that will involve going forward.
---
I'm mulling over how to support stopping/pausing transfers. The problem
is that if the assistant is running a transfer in one thread, and the
webapp is used to cancel it, killing that thread won't necessarily stop the
transfer, because, at least in Haskell's thread model, killing a thread
does not kill processes started by the thread (like rsync).
So one option is to have the transfer thread run a separate git-annex
process, which will run the actual transfer. And killing that process will
stop the transfer nicely. However, using a separate git-annex process means
a little startup overhead for each file transferred (I don't know if it'd
be enough to matter). Also, there's the problem that git-annex is sometimes
not installed in PATH (wish I understood why cabal does that), which
makes it kind of hard for it to run itself. (It can't simply fork, sadly.
See past horrible pain with forking and threads.)
The other option is to change the API for git-annex remotes, so that
their `storeKey` and `retrieveKeyFile` methods return a pid of the program
that they run. When they *do* run a program.. not all remotes do. This
seems like it'd make the code in the remotes hairier, and it is also asking
for bugs, when a remote's implementation changes. Or I could go
lower-level, and make every place in the utility libraries that forks a
process record its pid in a per-thread MVar. Still seems to be asking for
bugs.
Oh well, at least git-annex is already crash-safe, so once I figure out
how to kill a transfer process, I can kill it safely. :)

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="http://claimid.com/strager"
nickname="strager"
subject="comment 1"
date="2012-08-11T04:50:52Z"
content="""
What if `storeKey`, `retrieveKeyFile`, etc. return an `IO ()` which cancels the operation, if possible? The implementation can be canceled regardless if it uses separate processes or Haskell threads.
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="http://claimid.com/strager"
nickname="strager"
subject="comment 2"
date="2012-08-11T04:55:13Z"
content="""
In fact, making a dedicated data type or some typeclasses may be more appropriate:
class Cancelable a where cancel :: a -> IO ()
class Pauseable a where pause :: a -> IO ()
-- Alternatively:
data Transfer = Transfer { cancel :: IO (), pause :: IO () }
-- Or both!
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="http://joeyh.name/"
subject="comment 3"
date="2012-08-11T14:41:51Z"
content="""
That's the lines I was thinking along, and I even made a throwaway branch with some types. But the problem is reworking all the code to do that. Particularly since lots of the code uses generic utility functions that are reused in other, unrelated places and would have to be modified to pass back cancel actions.
The first case the type checker landed me on when I changed the types was code that downloads an url from the web. Naturally that uses a Utility.Url.download. How to cancel `download`? Depends on its implementation -- it happens to currently shell out to curl, so you have to kill curl, but it could just as easily have used libcurl (other parts of my Utility.Url library do), and then it would need to fork its own thread. So it's an abstraction layer violation problem.
If I had a month to devote to this one problem, I might manage to come up with some clean solution involving monads, or maybe convert all my code to use conduit or something that might allow managing these effects better. Just a guess..
"""]]

View file

@ -0,0 +1,27 @@
[[!comment format=mdwn
username="http://claimid.com/strager"
ip="173.228.13.253"
subject="comment 4"
date="2012-08-11T16:08:47Z"
content="""
> How to cancel download? Depends on its implementation .... So it's an abstraction layer violation problem.
Precisely why I suggested returning something as generic as `IO ()`:
-- Current
download :: URLString -> Headers -> [CommandParam] -> FilePath -> IO Bool
-- Suggestion
data Transfer a = Transfer { run :: IO a, cancel :: IO () }
download :: URLString -> Headers -> [CommandParam] -> FilePath -> Transfer
transfer <- download ...
-- You can pass `cancel transfer` to another thread
-- which you want to be able to cancel the transfer.
run transfer -- blocking
I realized while writing this that you may not get any result from e.g. a download while it is occurring (because the function is blocking). Maybe that's where a misunderstanding occurred. I separated the concepts of creating a transfer and starting/canceling it.
(My idea is starting to feel a bit object-oriented... ;P)
"""]]

View file

@ -0,0 +1,26 @@
Unexpectedly managed a mostly productive day today.
Went ahead with making the assistant run separate `git-annex` processes for
transfers. This will currently fail if git-annex is not installed in PATH.
(Deferred dealing with that.)
To stop a transfer, the webapp needs to signal not just the git-annex
process, but all its children. I'm using process groups for this, which is
working, but I'm not extremely happy with.
Anyway, the webapp's UI can now be used for stopping transfers, and it
wasn't far from there to also implementing pausing of transfers.
Pausing a transfer is actually the same as stopping it, except a special
signal is sent to the transfer control thread, which keeps running, despite
the git-annex process having been killed, waits for a special resume
signal, and restarts the transfer. This way a paused transfer continues to
occupy a transfer slot, which prevents other queued transfers from running.
This seems to be the behavior that makes sense.
Still need to wire up the webapp's button for starting a transfer. For a
paused transfer, that will just need to resume it. I have not decided what
the button should do when used on a transfer that is queued but not running
yet. Maybe it forces it to run even if all transfer slots are already in
use? Maybe it stops one of the currently running transfers to free up a
slot?

View file

@ -34,7 +34,7 @@ Also needed for USB keys and Android gadgets.
Of course, one option is to just use github etc to store the git repo.
Two things can store git repos in Anazon S3:
Two things can store git repos in Amazon S3:
* <http://gabrito.com/post/storing-git-repositories-in-amazon-s3-for-high-availability>
* <http://wiki.cs.pdx.edu/oss2009/index/projects/gits3.html>

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawneJXwhacIb0YvvdYFxhlNVpz6Wpg6V7AA"
nickname="Shayne"
subject="comment 11"
date="2012-08-13T00:37:35Z"
content="""
Yeah definately go with homebrew rather than macports if possible. macports and fink, whilst great systems, have a tendency to sort of create their own alternative-dimension of files within the system that just dont always feel particularly well integrated. As a result \"brew\" has become increasingly more popular to the point its almost ubuquitous now.
Plus its brew-doctor thing is awesome.
The best approach though thats agnostic to distro systems is to simply go for a generic installer.
"""]]

View file

@ -28,6 +28,10 @@ available!
* honor .gitignore, not adding files it excludes (difficult, probably
needs my own .gitignore parser to avoid excessive running of git commands
to check for ignored files)
* There needs to be a way for a new version of git-annex, when installed,
to restart any running watch or assistant daemons. Or for the daemons
to somehow detect it's been upgraded and restart themselves. Needed
to allow for imcompatable changes and, I suppose, for security upgrades..
## beyond Linux

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkS6aFVrEwOrDuQBTMXxtGHtueA69NS_jo"
nickname="Hans"
subject="using sshfs + cryptmount is more secure"
date="2012-08-14T13:41:47Z"
content="""
\"For git-annex, note that an attacker with local machine access can tell at least all the filenames and metadata of files stored in the encrypted remote anyway, and can access whatever content is stored locally.\"
Better security is given by sshfs + cryptmount, which I used when I recently setup a git-annex repository on a free shell account from a provider I do not trust.
See http://code.cjb.net/free-secure-online-backup.html for what I did to get a really secure solution.
Kind regards,
Hans Ekbrand
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo"
nickname="Justin"
subject="comment 6"
date="2012-08-14T14:10:40Z"
content="""
Hans,
You are misunderstanding how git-annex encryption works. The \"untrusted host\" and the \"local machine\" are not the same machine. git-annex only transfers pre-encrypted files to the \"untrusted host\".
You should setup a git-annex encrypted remote and watch how it works so you can see for yourself that it is not insecure.
Your solution does not provide better security, it accomplishes the same thing as git-annex in a more complicated way. In addition, since you are mounting the image from the client your solution will not work with multiple clients.
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkS6aFVrEwOrDuQBTMXxtGHtueA69NS_jo"
nickname="Hans"
subject="comment 7"
date="2012-08-15T19:16:10Z"
content="""
Justin,
thanks for clearing that up. It's great that git-annex has implemented mechanisms to work securely on untrusted hosts. My solution is thus only interesting for files that are impractical to manage with git-annex (e.g. data for/from applications that need rw-access to a large number of files). And, possibly, for providers that do not provide rsync.
Your remark that my solution does not work with more than one client, is not entirely accurate. No more than one client can access the repository at any given time, but as long as access is not simultaneous, any number of clients can access the repository. Still, your point is taken, it's a limitation I should mention.
It would be interesting to compare the performance of individually encrypted files to encrypted image-file. My intuition says that encrypted image-file should be faster, but that's just a guess.
"""]]