Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2012-11-30 12:34:29 -04:00
commit c78fc95ec2
11 changed files with 153 additions and 1 deletions

View file

@ -6,7 +6,7 @@ locally paired systems, and remote servers with rsync.
Help me prioritize my work: What special remote would you most like
to use with the git-annex assistant?
[[!poll open=yes 15 "Amazon S3 (done)" 12 "Amazon Glacier (done)" 9 "Box.com (done)" 63 "My phone (or MP3 player)" 16 "Tahoe-LAFS" 6 "OpenStack SWIFT" 23 "Google Drive"]]
[[!poll open=yes 15 "Amazon S3 (done)" 12 "Amazon Glacier (done)" 9 "Box.com (done)" 63 "My phone (or MP3 player)" 17 "Tahoe-LAFS" 6 "OpenStack SWIFT" 23 "Google Drive"]]
This poll is ordered with the options I consider easiest to build
listed first. Mostly because git-annex already supports them and they

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="Steve"
ip="92.104.175.136"
subject="comment 4"
date="2012-11-29T23:51:21Z"
content="""
I've been thinking about writing a sort of git-annex du. I'm surprised to find someone else looking for such a thing. While \"du -L\" will tell you how much space is used by files you actually have, I was interested in knowing (approximately) how much space would be used if you were to git-annex get everything you don't yet have.
There are many options and variations to think about, such as:
* do you want to count duplicate files once or as many times as they appear (as if you 'git-annex lock'd them all)
* maybe you want to know how much space is used by files that reside only on a certain remote or set of remotes
* you might want to know how much space would be used by all the files you don't yet have, but not count the files you already have
All of the backends so so far seem to store the size of the files in the filename, so my plan was to read it out of the links. If anybody has a better idea about how to get the sizes of annexed files or options that would be handy for a git-annex du, let me know. I'll see if I can get the start of something useful this weekend. I'll post here when I have something to share.
I'm also open to suggestions for the executable name. Right now I'm thinking \"gadu\" for git-annex disk usage.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://sunny256.sunbase.org/"
nickname="sunny256"
subject="comment 5"
date="2012-11-30T00:29:44Z"
content="""
Steve, that would be a very useful utility. I've been thinking of such a tool, but haven't gotten around to write it yet. It would be practical to have before copying big/many files from another drive. If I've been short of free space, I've executed `du -L` in the source directory, but that's a bit cumbersome.
And \"gadu\" is a fine name, yes. Goes well along with my \"ga\" shortcut for \"git annex\", which I created two hours after I started using git-annex. I've probably saved thousands of keystrokes because of that. ☺
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="http://edheil.wordpress.com/"
ip="173.162.44.162"
subject="comment 7"
date="2012-11-29T22:15:11Z"
content="""
Sounds likely!! I'll get to deleting, and when I get a chance I'll grab your latest commit and recompile.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://edheil.wordpress.com/"
ip="173.162.44.162"
subject="comment 8"
date="2012-11-29T22:15:55Z"
content="""
(thanks very much!)
"""]]

View file

@ -0,0 +1,11 @@
I've noticed that if I'm using git-assistant, it wants to pull down all my files from other repos onto my laptop, even after I've dropped them. (My laptop is set up as "client," my usb drive and an ssh server as "backup".)
I want to use git annex to save space on my laptop, but of course when I'm running the assistant, it pulls everything down there, even things I've manually dropped.
Is my "I want to save space, with a partial archive on my laptop" use case simply out of scope for the assistant? So I should just be using the command line for my needs? That's fine if that's the case.
Or maybe something like this is what I should be doing? http://git-annex.branchable.com/assistant/archival_walkthrough/ ? so instead of manually "git annex drop"-ping files in place, I should set up a directory called "archive" on my machine, from which files will magically disappear and get backed up elsewhere?
If it's the case that a directory named "archive" in your checkout has the magical property of having the assistant drop and archive its contents, that's awesome, maybe just what I need, but if that behavior is spelled out in so many words anywhere I managed to miss it.
Apologies for all these questions, just enjoying the software immensely and wanting to get to know it.

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawn7gQ1zZDdWhXy9H51W2krZYShNmKL3qfM"
nickname="Karsten"
subject="comment 1"
date="2012-11-30T08:26:24Z"
content="""
I'm second this request. Also, I'm posting this to enable the rss comment subscription button for this post.
Maybe these design documents are interesting:
<http://git-annex.branchable.com/design/assistant/transfer_control/>
<http://git-annex.branchable.com/design/assistant/partial_content/>
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.154.6.49"
subject="comment 2"
date="2012-11-30T16:31:21Z"
content="""
The archive directory is a new feature, but yes, it does work. Just use the webapp to edit the configuration of a remote, put it into the \"small archive\" group, and the contents of archive directories will be sent to it.
This is a particular application of [[preferred_content]] settings, which give you a large amount of control over which data end up where when using the assistant.
"""]]

View file

@ -0,0 +1,59 @@
If you're anything like me¹, you have a copy of your annex on a computer running at home², set up so you can access it from anywhere like this:
ssh myhome.no-ip.org
This is totally great! Except, there is no way for your home computer to pull your changes, because there is no *on-the-go.no-ip.org*. You can get clunky and use a *bare git repository and git push*, but there is a better way.
First, install *openssh-server* on your *on-the-go* computer
sudo apt-get install openssh-server # Adjust to your flavor of unix
Then, log into your *home* computer, with *port forwarding*:
ssh me@myhome.no-ip.org L 2201:localhost:22
Your *home* computer can now ssh into your *on-the-go* computer, as long as you keep the above shell running.
You can now add your *on-the-go* computer as a remote on your *home* computer. Use the port forwarding shell you just connected with the command above, if you like.
ssh-keygen -t rsa
ssh-copy-id me@localhost -p 2201
cd ~/annex
git annex remote add on-the-go ssh://me@localhost:2201/home/myuser/annex
Now you can run normal annex operations, as long as the port forwarding shell is running³.
git annex sync
git annex get on-the-go some/big/file
git annex status
You can add more computers by repeating with a different port, e.g. 2202 or 2203 (or any other).
If you're security paranoid (like me), read on. If you're not, that's it! Thanks for reading!
---
Paranoid Area
Note you're granting passwordless access to your on-the-go computer to your home computer. I believe that's all right, as long as:
* Your home computer is really in your home, and not at a friend's house or some datacenter
* Your home computer can be accessed only by ssh, and not HTTP or Samba or NTP or (shoot me now!) FTP
* Only you (and perhaps trustworthy family) have access to your home computer
* You have reasonably strong passwords or key-only logins on both your home and on-the-go computers.
* You regularly install security updates on both computers (sudo apt-get update && sudo apt-get upgrade)
In any case, the setup is much, much, much more secure than Dropbox. With Dropbox, you have exactly the same setup, but:
* Your data is stored in some datacenter. It's supposed to be encrypted. It might not be.
* Lot's of people have routine access to your files, and plausible reason to. Bored employees might regularly be doing some 'maintenance work' involving your pictures.
* The dropbox software can do anything it likes on your computer, and it's closed source so you don't know if it does. A disgruntled employee could put a trojan into it.
* Dropbox might have a backdoor for employee access to any file on your computer. This might be done with the best of intentions, but a mal-intentioned or careless employee might still erase things or send sensitive files from your computer by email.
* A truly huge amount of eyes connected to incredibly smart brains have looked at openssh and found it secure. Everybody trusts openssh. With dropbox, there is, well, dropbox. Whoever that is.
-----
¹ Me=Carlo, not Joey. I'm pretty sure doing what I wrote here is a good idea, but in case it turns out to be catastrophically dumb, it's my fault, not his.
² My always-on computer at home is a raspberry pi with a 32GB USB stick. Best self-hosted dropbox you could imagine.
³ You can just forward the port, but not open a shell, by adding the -N command. This could be useful for connecting on startup, e.g. in /etc/rc.local. I prefer to open the shell to forward the ports, maybe use it, and close it to stop it.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.154.6.49"
subject="comment 1"
date="2012-11-30T16:25:58Z"
content="""
If you don't trust your home computer with shell access, you can lock it down in `.ssh/authorized_keys` to only be able to run git-annex-shell. See [[forum/Restricting_git-annex-shell_to_a_specific_repository]]
"""]]

View file

@ -0,0 +1,5 @@
Right now the assistant can have a huge list of pending transfers for certain hosts if its data is a bit outdated, or a host hasn't been synced lately. When starting up it will then attempt each transfer to said host (which will in turn fail, but at times take time to time out), possibly before doing other stuff like attempting to download new files, or copy files to online hosts.
I suggest that if a transfer fails for host X, and there are other pending transfers, say to host Y and from Z, then all other pending transfers to/from X gets pushed to the back of the queue, to avoid having to wait a long time for several transfers to time out before doing useful stuff.
The prime example for me was this morning, when a laptop that was turned off had a huge amount of queued transfers to it, resulting in the assistant attempting a load of transfers to that host before it retrieved a new file that I had created on another machine yesterday.