Merge branch 'master' into assistant

Conflicts:
	Makefile
This commit is contained in:
Joey Hess 2012-07-25 14:55:53 -04:00
commit 03979d4d54
29 changed files with 195 additions and 16 deletions

View file

@ -3,20 +3,29 @@ mans=git-annex.1 git-annex-shell.1
sources=Build/SysConfig.hs Utility/Touch.hs Utility/Mounts.hs sources=Build/SysConfig.hs Utility/Touch.hs Utility/Mounts.hs
all=$(bins) $(mans) docs all=$(bins) $(mans) docs
CFLAGS=-Wall
OS:=$(shell uname | sed 's/[-_].*//') OS:=$(shell uname | sed 's/[-_].*//')
ifeq ($(OS),Linux) ifeq ($(OS),Linux)
BASEFLAGS_OPTS+=-DWITH_INOTIFY -DWITH_DBUS BASEFLAGS_OPTS=-DWITH_INOTIFY -DWITH_DBUS
clibs=Utility/libdiskfree.o Utility/libmounts.o clibs=Utility/libdiskfree.o Utility/libmounts.o
else else
BASEFLAGS_OPTS+=-DWITH_KQUEUE # BSD system
BASEFLAGS_OPTS=-DWITH_KQUEUE
clibs=Utility/libdiskfree.o Utility/libmounts.o Utility/libkqueue.o clibs=Utility/libdiskfree.o Utility/libmounts.o Utility/libkqueue.o
ifeq ($(OS),Darwin)
# Ensure OSX compiler builds for 32 bit when using 32 bit ghc
GHCARCH:=$(shell ghc -e 'print System.Info.arch')
ifeq ($(GHCARCH),i386)
CFLAGS=-Wall -m32
endif
endif
endif endif
PREFIX=/usr PREFIX=/usr
IGNORE=-ignore-package monads-fd -ignore-package monads-tf IGNORE=-ignore-package monads-fd -ignore-package monads-tf
BASEFLAGS=-threaded -Wall $(IGNORE) -outputdir tmp -IUtility -DWITH_ASSISTANT -DWITH_S3 $(BASEFLAGS_OPTS) BASEFLAGS=-threaded -Wall $(IGNORE) -outputdir tmp -IUtility -DWITH_ASSISTANT -DWITH_S3 $(BASEFLAGS_OPTS)
GHCFLAGS=-O2 $(BASEFLAGS) GHCFLAGS=-O2 $(BASEFLAGS)
CFLAGS=-Wall
ifdef PROFILE ifdef PROFILE
GHCFLAGS=-prof -auto-all -rtsopts -caf-all -fforce-recomp $(BASEFLAGS) GHCFLAGS=-prof -auto-all -rtsopts -caf-all -fforce-recomp $(BASEFLAGS)

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkZRoTRyW3tox-FD2DQWxskgI6_tkEtHL4"
nickname="Ben"
subject="comment 6"
date="2012-07-23T16:11:52Z"
content="""
The above would work fine for me but the files in my annex (e.g. .git/annex/objects/xx/yy/blah.ogg) don't have extensions like that, so my media player doesn't recognize them as media files. How do I get the files under \"objects\" to keep the extensions of the original files like in Joey's example?
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="http://joeyh.name/"
subject="comment 7"
date="2012-07-24T14:51:50Z"
content="""
You can get the extensions by migrating to the SHA1E (or SHA256E) backend.
"""]]

View file

@ -46,3 +46,5 @@ Or just telling users to use the 64bit version of the haskell platform?
It may also be possible to get osx's c compiler to output a universal binary It may also be possible to get osx's c compiler to output a universal binary
to give you everything, but that be going down the _being too platform to give you everything, but that be going down the _being too platform
specific route_. specific route_.
> [[done]], it'll detect this and force -m32. --[[Joey]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="hamish"
ip="203.0.139.24"
subject="dbus vs polling "
date="2012-07-22T07:13:37Z"
content="""
I, too, am running a dbus but like to hand mount my filesystems. However, I'd imagine that I am both a minority and that my minority could like the extra control, so perhaps even a \"re-read the mtab /now/\" command that can be manually run after something is manually mounted would suffice
Is it not possible to use inotify on the mtab?
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.154.4.169"
subject="comment 3"
date="2012-07-22T16:03:52Z"
content="""
How did I not think about using my favorite hammer on this problem too? But, no, /proc/mounts cannot be watched with inotify it seems, and of course the BSDs don't seem to have a file at all.
I think the dbus stuff is sorted out for manual users, see later blog entries.
"""]]

View file

@ -0,0 +1,27 @@
Made the MountWatcher update state for remotes located in a drive that
gets mounted. This was tricky code. First I had to make remotes declare
when they're located in a local directory. Then it has to rescan git
configs of git remotes (because the git repo mounted at a mount point may
change), and update all the state that a newly available remote can affect.
And it works: I plug in a drive containing one of my git remotes, and the
assistant automatically notices it and syncs the git repositories.
---
But, data isn't transferred yet. When a disconnected remote becomes
connected, keys should be transferred in both directions to get back into
sync.
To that end, added Yet Another Thread; the TransferScanner thread
will scan newly available remotes to find keys, and queue low priority
transfers to get them fully in sync.
(Later, this will probably also be used for network remotes that become
available when moving between networks. I think network-manager sends
dbus events it could use..)
This new thread is missing a crucial peice, it doesn't yet have a way to
find the keys that need to be transferred. Doing that efficiently (without
scanning the whole git working copy) is Hard. I'm considering design
possibilities..

View file

@ -3,20 +3,59 @@ all the other git clones, at both the git level and the key/value level.
## immediate action items ## immediate action items
* At startup, and possibly periodically, look for files we have that * At startup, and possibly periodically, or when the network connection
location tracking indicates remotes do not, and enqueue Uploads for changes, or some heuristic suggests that a remote was disconnected from
them. Also, enqueue Downloads for any files we're missing. us for a while, queue remotes for processing by the TransferScanner,
to queue Transfers of files it or we're missing.
* After git sync, identify content that we don't have that is now available * After git sync, identify content that we don't have that is now available
on remotes, and transfer. But first, need to ensure that when a remote on remotes, and transfer. (Needed when we have a uni-directional connection
to a remote, so it won't be uploading content to us.)
But first, need to ensure that when a remote
receives content, and updates its location log, it syncs that update receives content, and updates its location log, it syncs that update
out. out.
* When MountWatcher detects a newly mounted drive, rescan git remotes
in order to get ones on the drive, and do a git sync and file transfers ## TransferScanner
to sync any repositories on it.
The TransferScanner thread needs to find keys that need to be Uploaded
to a remote, or Downloaded from it.
How to find the keys to transfer? I'd like to avoid potentially
expensive traversals of the whole git working copy if I can.
One way would be to do a git diff between the (unmerged) git-annex branches
of the git repo, and its remote. Parse that for lines that add a key to
either, and queue transfers. That should work fairly efficiently when the
remote is a git repository. Indeed, git-annex already does such a diff
when it's doing a union merge of data into the git-annex branch. It
might even be possible to have the union merge and scan use the same
git diff data.
But that approach has several problems:
1. The list of keys it would generate wouldn't have associated git
filenames, so the UI couldn't show the user what files were being
transferred.
2. Worse, without filenames, any later features to exclude
files/directories from being transferred wouldn't work.
3. Looking at a git diff of the git-annex branches would find keys
that were added to either side while the two repos were disconnected.
But if the two repos' keys were not fully in sync before they
disconnected (which is quite possible; transfers could be incomplete),
the diff would not show those older out of sync keys.
The remote could also be a special remote. In this case, I have to either
traverse the git working copy, or perhaps traverse the whole git-annex
branch (which would have the same problems with filesnames not being
available).
If a traversal is done, should check all remotes, not just
one. Probably worth handling the case where a remote is connected
while in the middle of such a scan, so part of the scan needs to be
redone to check it.
## longer-term TODO ## longer-term TODO
* Test MountWatcher on Gnome (should work ok) and LXDE (dunno). * Test MountWatcher on LXDE.
* git-annex needs a simple speed control knob, which can be plumbed * git-annex needs a simple speed control knob, which can be plumbed
through to, at least, rsync. A good job for an hour in an through to, at least, rsync. A good job for an hour in an
airport somewhere. airport somewhere.

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawk4YX0PWICfWGRLuncCPufMPDctT7KAYJA"
nickname="betabrain"
subject="selective data syncing"
date="2012-07-24T15:27:08Z"
content="""
How will the assistant know which files' data to distribute between the repos?
I'm using git-annex and it's numcopies attribute to maintain a redundant archive spread over different computers and usb drives. Not all drives should get a copy of everything, e.g. the usb drive I take to work should not automatically get a copy of family pictures.
How about .gitattributes?
* \* annex.auto-sync-data = false # don't automatically sync the data
* archive/ annex.auto-push-repos = NAS # everything added to archive/ in any repo goes automatically to the NAS remote.
* work/ annex.auto-synced-repos = LAPTOP WORKUSB # everything added to work/ in LAPTOP or WORKUSB gets synced to WORKUSB and LAPTOP
* work/ annex.auto-push-repos = LAPTOP WORKUSB # stuff added to work/ anywhere gets synced to LAPTOP and WORKUSB
* important/ annex.auto-sync-data = true # push data to all repos
* webserver_logs/ annex.remote.WEBSERVER.auto-push-repos = S3 # only the assistant running in WEBSERVER pushes webserver_logs/ to S3 remote
"""]]

View file

@ -9,8 +9,8 @@
* [[ArchLinux]] * [[ArchLinux]]
* [[NixOS]] * [[NixOS]]
* [[Gentoo]] * [[Gentoo]]
* Windows: [[sorry, not possible yet|todo/windows_support]]
* [[ScientificLinux5]] - This should cover RHEL5 clones such as CentOS5 and so on * [[ScientificLinux5]] - This should cover RHEL5 clones such as CentOS5 and so on
* Windows: [[sorry, not possible yet|todo/windows_support]]
## Using cabal ## Using cabal

View file

@ -1,4 +1,10 @@
Installation recipe for Fedora 14 thruough 17. git-annex is recently finding its way into Fedora.
* [Status of getting a Fedora package](https://bugzilla.redhat.com/show_bug.cgi?id=662259)
* [Koji build for F17](http://koji.fedoraproject.org/koji/buildinfo?buildID=328654)
* [Koji build for F16](http://koji.fedoraproject.org/koji/buildinfo?buildID=328656)
Installation recipe for Fedora 14 thruough 15.
<pre> <pre>
sudo yum install ghc cabal-install pcre-devel sudo yum install ghc cabal-install pcre-devel
@ -15,6 +21,3 @@ cabal install --bindir=$HOME/bin
Note: You can't just use `cabal install git-annex`, because Fedora does Note: You can't just use `cabal install git-annex`, because Fedora does
not yet ship ghc 7.4. not yet ship ghc 7.4.
* [Status of getting a Fedora package](https://bugzilla.redhat.com/show_bug.cgi?id=662259)a
* [Koji build for F17](http://koji.fedoraproject.org/koji/buildinfo?buildID=328654)
* [Koji build for F16](http://koji.fedoraproject.org/koji/buildinfo?buildID=328656)

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="http://joeyh.name/"
subject="comment 2"
date="2012-07-24T15:09:29Z"
content="""
I've moved some outdated comments about installing on OSX to [[old_comments]].
"""]]

View file

@ -0,0 +1 @@
Moved a bunch of outdated comments here, AFAIK all these issues are fixed.

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="https://a-or-b.myopenid.com/"
ip="203.45.2.230"
subject="Compiling git-annex on OSX (with 32 bit Haskell)"
date="2012-07-24T03:26:45Z"
content="""
I came across an issue when following the instructions here:
<http://git-annex.branchable.com/install/OSX/>
I'm compiling the 'assistant' branch (522f568450a005ae81b24f63bb37e75320b51219).
The pre-compiled version of Haskell for OSX recommends the 32 bit installer, however git-annex compiles
> Utility/libdiskfree.o Utility/libkqueue.o Utility/libmounts.o
as 64 bit. The 'make' command fails on linking 32- and 64-bit code.
So... I made a small change to the Makefile
> CFLAGS=-Wall
becomes
> CFLAGS=-Wall -m32
I don't know if there is an easy way to programmatically check for this, or even if you'd want to spend time doing it, but it might help someone else out.
<https://gist.github.com/3167798>
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="http://joeyh.name/"
subject="comment 17"
date="2012-07-24T15:03:49Z"
content="""
The instructions say to use cabal for a reason -- it's more likely to work. But I have made the Makefile detect the mismatched GHC and C compiler and force the C compiler to 32 bit.
"""]]