Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2011-12-23 11:34:10 -04:00
commit 8a2105c90a
6 changed files with 168 additions and 0 deletions

View file

@ -0,0 +1,37 @@
[[!comment format=mdwn
username="http://adamspiers.myopenid.com/"
nickname="Adam"
subject="I think Matt is right."
date="2011-12-23T14:04:44Z"
content="""
I got bitten by this too. It seems that the user is expected to fetch
remote git-annex branches themselves, but this is not documented
anywhere.
The man page says of \"git annex merge\":
Automatically merges any changes from remotes into the git-annex
branch.
I am not a git newbie, but even so I had incorrectly assumed that git
annex merge would take care of pulling the git-annex branch from the
remote prior to merging, thereby ensuring all versions of the
git-annex branch would be merged, and that the location tracking data
would be synced across all peer repositories.
My master branches do not track any specific upstream branch, because
I am operating in a decentralized fashion. Therefore the error
message caused by `git pull $remote` succeeded in encouraging me to
instead use `git pull $remote master`, and this excludes the git-annex
branch from the fetch. Even worse, a git newbie might realise this
and be tempted to do `git pull $remote git-annex`.
Therefore I think it needs to be explicitly documented that
git fetch $remote
git merge $remote/master
is required when the local branch doesn't track an upstream branch.
Or maybe a `--fetch` option could be added to `git annex merge` to
perform the fetch from all remotes before running the merge(s).
"""]]

View file

@ -0,0 +1,11 @@
I used to save movies with the srt subtitle files next to them.
Usually vlc finds it because it's on the same directory than the movie file, however with git annex the link is located on another folder.
So after adding movies to git, the subtitles doesn't load anymore.
couldn't find a quick fix. I'm thinking a bash script, but wanted to discuss it here with all annex users.
I know It's out of annex scope, but I think a movie archive is a great scenario for git-annex.
most of my HD is filled up with movies from the camcorder, screencast, etc...
And we usually don't modify those files

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://adamspiers.myopenid.com/"
nickname="Adam"
subject="comment 1"
date="2011-12-23T13:31:33Z"
content="""
ControlPersist is awesome - thanks!
Here's [an alternative, git-specific approach](http://thread.gmane.org/gmane.comp.version-control.home-dir/502).
"""]]

View file

@ -0,0 +1,48 @@
I have two repos, using SHA1 backend and both using git.
The first one is a laptop, the second one is a usb drive.
When I drop a file on the laptop repo, the file is not available on that repo until I run *git annex get*
But when the usb drive is plugged in the file is actually available.
How about adding a feature to link some/all files to the remote repo?
e.g.
We have *railscasts/196-nested-model-form-part-1.mp4* file added to git, and only available on the usb drive:
$ git annex whereis 196-nested-model-form-part-1.mp4
whereis 196-nested-model-form-part-1.mp4 (1 copy)
a7b7d7a4-2a8a-11e1-aebc-d3c589296e81 -- origin (Portable usb drive)
I can see the link with:
$ cd railscasts
$ ls -ls 196*
8 lrwxr-xr-x 1 framallo staff 193 Dec 20 05:49 196-nested-model-form-part-1.mp4 -> ../.git/annex/objects/Wz/6P/SHA256-s16898930--43679c67cd968243f58f8f7fb30690b5f3f067574e318d609a01613a2a14351e/SHA256-s16898930--43679c67cd968243f58f8f7fb30690b5f3f067574e318d609a01613a2a14351e
I save this in a variable just to make the example more clear:
ID=".git/annex/objects/Wz/6P/SHA256-s16898930--43679c67cd968243f58f8f7fb30690b5f3f067574e318d609a01613a2a14351e/SHA256-s16898930--43679c67cd968243f58f8f7fb30690b5f3f067574e318d609a01613a2a14351e"
The file doesn't exist on the local repo:
$ ls ../$ID
ls: ../$ID: No such file or directory
however I can create a link to access that file on the remote repo.
First I create a needed dir:
$ mkdir ../.git/annex/objects/Wz/6P/SHA256-s16898930--43679c67cd968243f58f8f7fb30690b5f3f067574e318d609a01613a2a14351e/
Then I link to the remote file:
$ ln -s /mnt/usb_drive/repo_folder/$ID ../$ID
now I can open the file in the laptop repo.
I think it could be easy to implement. Maybe It's a naive approach, but looks apealing.
Checking if it's a real file or a link shouldn't impact on performance.
The limitation is that it would work only with remote repos on local dirs
Also allows you to have one directory structure like AFS or other distributed FS. If the file is not local I go to the remote server.
Which is great for apps like Picasa, Itunes, and friends that depends on the file location.

View file

@ -0,0 +1,54 @@
[[!comment format=mdwn
username="http://adamspiers.myopenid.com/"
nickname="Adam"
subject="comment 7"
date="2011-12-22T20:04:14Z"
content="""
> My main concern with putting this in git-annex is that finding
> duplicates necessarily involves storing a list of every key and file
> in the repository
Only if you want to search the *whole* repository for duplicates, and if
you do, then you're necessarily going to have to chew up memory in
some process anyway, so what difference whether it's git-annex or
(say) a Perl wrapper?
> and git-annex is very carefully built to avoid things that require
> non-constant memory use, so that it can scale to very big
> repositories.
That's a worthy goal, but if everything could be implemented with an
O(1) memory footprint then we'd be in much more pleasant world :-)
Even O(n) isn't that bad ...
That aside, I like your `--format=\"%f %k\n\"` idea a lot. That opens
up the \"black box\" of `.git/annex/objects` and makes nice things
possible, as your pipeline already demonstrates. However, I'm not
sure why you think `git annex find | sort | uniq` would be more
efficient. Not only does the sort require the very thing you were
trying to avoid (i.e. the whole list in memory), but it's also
O(n log n) which is significantly slower than my O(n) Perl script
linked above.
More considerations about this pipeline:
* Doesn't it only include locally available files? Ideally it should
spot duplicates even when the backing blob is not available locally.
* What's the point of `--include '*'` ? Doesn't `git annex find`
with no arguments already include all files, modulo the requirement
above that they're locally available?
* Any user using this `git annex find | ...` approach is likely to
run up against its limitations sooner rather than later, because
they're already used to the plethora of options `find(1)` provides.
Rather than reinventing the wheel, is there some way `git annex find`
could harness the power of `find(1)` ?
Those considerations aside, a combined approach would be to implement
git annex find --format=...
and then alter my Perl wrapper to `popen(2)` from that rather than using
`File::Find`. But I doubt you would want to ship Perl wrappers in the
distribution, so if you don't provide a Haskell equivalent then users
who can't code are left high and dry.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://adamspiers.myopenid.com/"
nickname="Adam"
subject="How much memory would it actually use anyway?"
date="2011-12-22T20:15:22Z"
content="""
Another thought - an SHA1 digest is 20 bytes. That means you can fit over 50 million keys into 1GB of RAM. Granted you also need memory to store the values (pathnames) which in many cases will be longer, and some users may also choose more expensive backends than SHA1 ... but even so, it seems to me that you are at risk of throwing the baby out with the bath water.
"""]]