Merge branch 'master' into watch

This commit is contained in:
Joey Hess 2012-06-04 17:07:25 -04:00
commit ce9dc15ea0
13 changed files with 233 additions and 6 deletions

View file

@ -0,0 +1,48 @@
I've just come across a subtle build issue (as haskell-platform just
got updated, I thought I might give it a try) The scenario is
* OSX 10.7 (everything is up to date with xcode etc... the usual)
* The 32bit version of Haskell Platform 2012.2
The issue is when libdiskfree.c is compiled and linked to git-annex,
OSX defaults to a 64bit binary, thus...
Linking git-annex ...
ld: warning: ignoring file Utility/libdiskfree.o, file was built for unsupported file format which is not the architecture being linked (i386)
Undefined symbols for architecture i386:
"_diskfree", referenced from:
_UtilityziDiskFree_zdwa_info in DiskFree.o
ld: symbol(s) not found for architecture i386
collect2: ld returned 1 exit status
make: *** [git-annex] Error 1
You can either compile up the c library in a 32bit mode if you have the 32bit
version of Haskell Platform installed as in the following example
laplace:git-annex jtang$ cc -m32 -c -o Utility/libdiskfree.o Utility/libdiskfree.c
Utility/libdiskfree.c: In function diskfree:
Utility/libdiskfree.c:61: warning: statfs64 is deprecated (declared at /usr/include/sys/mount.h:379)
laplace:git-annex jtang$ make
ghc -O2 -Wall -ignore-package monads-fd -outputdir tmp -IUtility -DWITH_S3 --make git-annex Utility/libdiskfree.o
Utility/Touch.hs:1:12:
Warning: -#include and INCLUDE pragmas are deprecated: They no longer have any effect
Utility/Touch.hs:2:12:
Warning: -#include and INCLUDE pragmas are deprecated: They no longer have any effect
Utility/Touch.hs:3:12:
Warning: -#include and INCLUDE pragmas are deprecated: They no longer have any effect
Utility/Touch.hs:4:12:
Warning: -#include and INCLUDE pragmas are deprecated: They no longer have any effect
Linking git-annex ...
Or else just install the 64bit haskell platform. I'm not too sure where
you would but the intelligence to detect 32 or 64 outputs from the
different compilers. I suspect checking what ghc outputs then putting in
the appropriate -m32 or -m64 for the c compiler is the right thing to do.
Or just telling users to use the 64bit version of the haskell platform?
It may also be possible to get osx's c compiler to output a universal binary
to give you everything, but that be going down the _being too platform
specific route_.

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="http://joeyh.name/"
subject="comment 1"
date="2012-06-04T20:02:23Z"
content="""
Seems you built it using `make` .. could you try instead building with cabal, ie run `cabal install git-annex` or `cabal build` in the source tree. I think cabal will probably do the right thing.
I could fix the Makefile, I suppose. What does this say: `ghc -e 'print System.Info.arch'
"""]]

View file

@ -18,3 +18,8 @@ Feel free to chip in with comments! --[[Joey]]
* [[desymlink]]
* [[deltas]]
* In my overfunded nighmares: [[Windows]]
## blog
I'll be blogging about my progress in the [[blog]] on a semi-daily basis.
Follow along!

View file

@ -0,0 +1 @@
[[!inline pages="page(design/assistant/blog/*)" show=30]]

View file

@ -0,0 +1,57 @@
First day of [Kickstarter funded work](http://www.kickstarter.com/projects/joeyh/git-annex-assistant-like-dropbox-but-with-your-own/)!
Worked on [[inotify]] today. The `watch` branch in git now does a pretty
good job of following changes made to the directory, annexing files
as they're added and staging other changes into git. Here's a quick
transcript of it in action:
joey@gnu:~/tmp>mkdir demo
joey@gnu:~/tmp>cd demo
joey@gnu:~/tmp/demo>git init
Initialized empty Git repository in /home/joey/tmp/demo/.git/
joey@gnu:~/tmp/demo>git annex init demo
init demo ok
(Recording state in git...)
joey@gnu:~/tmp/demo>git annex watch &
[1] 3284
watch . (scanning...) (started)
joey@gnu:~/tmp/demo>dd if=/dev/urandom of=bigfile bs=1M count=2
add ./bigfile 2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.835976 s, 2.5 MB/s
(checksum...) ok
(Recording state in git...)
joey@gnu:~/tmp/demo>ls -la bigfile
lrwxrwxrwx 1 joey joey 188 Jun 4 15:36 bigfile -> .git/annex/objects/Wx/KQ/SHA256-s2097152--e5ced5836a3f9be782e6da14446794a1d22d9694f5c85f3ad7220b035a4b82ee/SHA256-s2097152--e5ced5836a3f9be782e6da14446794a1d22d9694f5c85f3ad7220b035a4b82ee
joey@gnu:~/tmp/demo>git status -s
A bigfile
joey@gnu:~/tmp/demo>mkdir foo
joey@gnu:~/tmp/demo>mv bigfile foo
"del ./bigfile"
joey@gnu:~/tmp/demo>git status -s
AD bigfile
A foo/bigfile
Due to Linux's inotify interface, this is surely some of the most subtle,
race-heavy code that I'll need to deal with while developing the git annex
assistant. But I can't start wading, need to jump off the deep end to make
progress!
The hardest problem today involved the case where a directory is moved
outside of the tree that's being watched. Inotify will still send events
for such directories, but it doesn't make sense to continue to handle them.
Ideally I'd stop inotify watching such directories, but a lot of state
would need to be maintained to know which inotify handle to stop watching.
(Seems like Haskell's inotify API makes this harder than it needs to be...)
Instead, I put in a hack that will make it detect inotify events from
directories moved away, and ignore them. This is probably acceptable,
since this is an unusual edge case.
----
The notable omission in the inotify code, which I'll work on next, is
staging deleting of files. This is tricky because adding a file to the
annex happens to cause a deletion event. I need to make sure there are no
races where that deletion event causes data loss.

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="http://joeyh.name/"
subject="comment 2"
date="2012-06-04T19:45:00Z"
content="""
Jimmy, I hope to make it as easy as possible to install. I've been focusing on getting it directly into popular Linux distributions, rather than shipping my own binary. The OSX binary is static, and while I lack a OSX machine, I would like to get it easier to distribute to OSX users.
"""]]

View file

@ -1,17 +1,37 @@
Finish "git annex watch" command, which runs, in the background, watching via
inotify for changes, and automatically annexing new files, etc.
There is a `watch` branch in git that adds such a command, although currently
it only handles adding new files, and nothing else. To make this really
useful, it needs to:
There is a `watch` branch in git that adds such a command. To make this
really useful, it needs to:
- notice deleted files and stage the deletion
(tricky; there's a race with add..)
- on startup, add any files that have appeared since last run **done**
- on startup, fix the symlinks for any renamed links **done**
- on startup, stage any files that have been deleted since last run
(seems to require a `git commit -a` on startup, or at least a
`git add --update`, which will notice deleted files)
- notice new files, and git annex add **done**
- notice renamed files, auto-fix the symlink, and stage the new file location
**done**
- handle cases where directories are moved outside the repo, and stop
watching them **done**
- when a whole directory is deleted or moved, stage removal of its
contents from the index **done**
- notice deleted files and stage the deletion
(tricky; there's a race with add since it replaces the file with a symlink..)
- periodically auto-commit staged changes (avoid autocommitting when
lots of changes are coming in)
- tunable delays before adding new files, etc
- honor .gitignore, not adding files it excludesa
- don't annex `.gitignore` and `.gitattributes` files, but do auto-stage
changes to them
- configurable option to only annex files meeting certian size or
filename criteria
- honor .gitignore, not adding files it excludes (difficult, probably
needs my own .gitignore parser to avoid excessive running of git commands
to check for ignored files)
- Possibly, when a directory is moved out of the annex location,
unannex its contents.
- Gracefully handle when the default limit of 8192 inotified directories
is exceeded. This can be tuned by root, so help the user fix it.
Also to do:

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="http://ciffer.net/~svend/"
subject="comment 1"
date="2012-06-04T19:42:07Z"
content="""
I would find it useful if the watch command could 'git add' new files (instead of 'git annex add') for certain repositories.
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="http://joeyh.name/"
subject="comment 2"
date="2012-06-04T19:46:03Z"
content="""
I think it's already on the list: \"configurable option to only annex files meeting certian size or filename criteria\" -- files not meeting those criteria would just be git added.
"""]]

View file

@ -0,0 +1,46 @@
## Use case
A laptop with a relatively small hard drive has copies of a subset of
all annexed files. When annexed files are changed externally and `git
annex sync` is run on the laptop, the stale local copies are
invalidated and their symlinks break. How can I automatically fetch
the updated versions of these previously locally-cached files?
Because I only want a subset of files, I can't do
git annex add --not --in here --and --in superset.
Because files may be renamed, the
[[tips/automatically_getting_files_on_checkout/]] solution, by making
`dir` specify the subset, will require manually and redundantly
tracking renames.
## Simple ( (?) ) feature addition to git-annex to support this
When locally-cached files are invalidated by `git-annex sync`,
git-annex could notify the user, and give them the option to
`git-annex get` the invalidated files. Bonus points if the mechanism
allows this to be done at any point in the future, not just when
running `git-annex sync`. The idea is that git-annex could track
which files, previously cached locally, have been invalidated
*unintentionally* by syncs, and treat them differently from files,
previously cached locally, that have been *intentionally* dropped
using `git-annex drop` or `git-annex move`.
## More generally
The ability to specify a collection of files to always cache locally
(something like a numcopies.here=1), which is robust to renames, would
work. The "robust to renames" part seems tricky in git: whereas svn
attaches properties to files, and so properties are propagated by `svn
mv`, I believe git attributes are only specified by patterns in
.gitattributes files.
## Related questions / possible approaches
Other forum posts mention [[`git
subtree`|forum/git-subtree_support__63__/]] and [[sparse git
checkouts|forum/sparse_git_checkouts_with_annex/]], but I'm not
familiar with these features and from reading those questions it's
unclear if those approaches will work for me. Does anyone more
familiar see how to adapt one of those features to my use case?

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="http://joeyh.name/"
subject="comment 1"
date="2012-06-04T19:56:05Z"
content="""
Personally, I deal with this problem by having a directory, or directories where I put files that I want to have on my partial checkout laptop, and run `git annex get` in that directory.
It's not a perfect solution, but I don't know that a perfect solution exists.
"""]]

View file

@ -0,0 +1,4 @@
Hi,
May I know, how can I debug git-annex code.
I am new to Haskell Platform, I would like to know which IDE can be used to debug haskell code.
Thank You.

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="http://joeyh.name/"
subject="This is not an easy question to answer..."
date="2012-06-04T19:49:46Z"
content="""
Do you have a bug in git-annex that you need fixed, or are you just curious?
"""]]