Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2013-06-12 14:21:13 -04:00
commit f6050f92ad
15 changed files with 175 additions and 1 deletions

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawk3Wgg0XiqYFwM_Pw1RxZwlpNFi65g17sM"
nickname="James"
subject="comment 3"
date="2013-06-12T01:12:24Z"
content="""
Ah, ok, I presumed there was an option in git to set a per-repository ssh command. I've looked at vcsh, but I'm not that confident with git remotes, so I don't use it (I use hg). If a per-repository ssh command added to git, would you consider adding this?
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnXybLxkPMYpP3yw4b_I6IdC3cKTD-xEdU"
nickname="Matt"
subject="comment 10"
date="2013-06-11T19:27:04Z"
content="""
First off, I really like git-annex :-)
Secondly, if I make the change as suggested, what are the consequences? When you add files to the annex back-end it may still be open and being written to? But then the next hash-function will reveal the differences of an incomplete upload and fix things.... But it may be too late as it's sent to other repositories...hmmmmm...I guess I want to know if I do this will my data be safe? I suspect not.
Perhaps the race condition could be mitigated against (not solved) by simply introducing a slight delay? If only 5 secs it will catch many of these cases. And longer would prevent git committing files that I save, realize I've slightly got wrong, tweak and save again.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
nickname="joey"
subject="comment 11"
date="2013-06-12T17:03:07Z"
content="""
There's an annex.delayadd git config setting you can use that makes it wait a specified number of seconds before committing. So it would indeed be a workaround to set: `git config annex.delayadd 2`
However, I'm pretty confident I can entirely avoid this problem, safely.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
nickname="joey"
subject="comment 7"
date="2013-06-11T15:14:44Z"
content="""
Re-reading, I see you're using Fedora and OSX.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
nickname="joey"
subject="comment 8"
date="2013-06-11T15:17:56Z"
content="""
I can reproduce the bug. Doesn't look likely to involve a race, since it happens every time.
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="http://joeyh.name/"
nickname="joey"
subject="comment 9"
date="2013-06-11T15:30:14Z"
content="""
My guess about the write bit seems to be spot on. Which does mean it's a race, just one that happens to be easy to reproduce. It does not happen every time, but 1 time out of 10 or more often.
You can try commenting out the `preventWrite` line in `Command/Add.hs` and rebuilding to see it fix it for you too. I will need to think long and hard about how to make files be ingested safely without turning off the write bit. But, I had been meaning to work on that at some point anyway, so good to have this bug to make it happen.
I instrumented latexmk's call to `$out_handle->open` to see how it's failing:
open failed: Permission denied 256
Which confirms the problem. It seems that it first creates the file, and then closes it, and then re-opens it to write to it some more. git-annex gets in between these two calls and messes up the permissions behind its back.
"""]]

View file

@ -0,0 +1,3 @@
My collaborators and I use git annex to track various large data files (among some smaller metadata files managed by ordinary git). Some of these data files need to change completely -- the old ones were just wrong. So I do a git checkout, but don't `git annex get` because it would just be a waste of time and bandwidth. This means that my "data files" are just broken symlinks. Now, I find that by making the necessary directories under `.git/annex/objects/`, I can write to these files in the usual way. The symlinks are preserved, and the files they link to now exist and are full of my corrected data. This seems like it's a problem because the hash has presumably changed. (I'm still a little fuzzy on how exactly git-annex works.) Also, git/git-annex doesn't seem to realize that anything has changed. Is this recoverable?
Would it have been better to just `git rm` (or something) the original version of the file, commit that, and then add the new data? And if so, how should I go about this now that I've created these many very large files? If not, what would be the preferred way to do this?

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="http://joeyh.name/"
nickname="joey"
subject="comment 1"
date="2013-06-12T16:57:52Z"
content="""
I think you're making this more complicated than it needs to be. You don't need to mess around with .git/annex/objects at all. You can replace git-annex symlinks with new files and git annex add the new content.
For example:
[[!format sh \"\"\"
joey@gnu:~/tmp/repo>git annex drop --force foo
drop foo ok
(Recording state in git...)
joey@gnu:~/tmp/repo>ls
foo@
joey@gnu:~/tmp/repo>rm foo
joey@gnu:~/tmp/repo>echo \"new good contents\" > foo
joey@gnu:~/tmp/repo>git annex add foo
add foo (checksum...) ok
(Recording state in git...)
joey@gnu:~/tmp/repo>git commit -m add
[master ec3ed14] add
1 file changed, 1 insertion(+), 1 deletion(-)
\"\"\"]]
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkxmke7K8gEXleVRuQvCK5LHPLIzQA6s0E"
nickname="Michael"
subject="comment 2"
date="2013-06-12T17:14:11Z"
content="""
Yes, it seems I was making this more complicated than it needed to be. Just a plain rm seems to work. But just to be clear, I never had the data so I don't need to drop it, right? (By which I mean, is there some other function of drop that I don't understand?)
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
nickname="joey"
subject="comment 3"
date="2013-06-12T17:15:26Z"
content="""
Right. I just put the drop in there to get my repository to the state you described.
"""]]

View file

@ -0,0 +1 @@
Since all the annexed (in indirect mode) files are symlinks to topdir/.git/annex/... moving files among directories at different levels is not that straightforward since symlinks would get broken. And since there is not 'annex mv' command -- what is the best way? (unlock is not the resolution since it copies the file, which might be prohibitively large and inefficient)

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
nickname="joey"
subject="comment 1"
date="2013-06-12T16:42:54Z"
content="""
You can move the symlinks however you like (git mv, or regular mv and git add).
To fix up broken symlinks, you can either run `git annex fix`, or just commit the move. The pre-commit hook fixes up links automatically.
"""]]

View file

@ -0,0 +1,47 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawl9J51AO9t75xN5k0sJgg8taUo4y0a4hpQ"
nickname="Daniel"
subject="comment 6"
date="2013-06-11T22:28:52Z"
content="""
This is what I ended up doing.
[https://gist.github.com/ifnull/5761255](https://gist.github.com/ifnull/5761255)
Basically you just add the extensions of the files you want to exclude to .gitignore_large_binaries and run \"git a .\" instead of \"git add .\"
#######################
# Setup
#######################
mkdir annex-test
cd annex-test
git init
git annex init master
#######################
# Fab setup task
#######################
git config --local core.excludesfile ./.gitignore_large_binaries
git config --local alias.a '! sh ./git-add.sh'
#######################
# git a (git-add.sh)
#######################
# Generate annex include arg from .gitignore_large_binaries
include_str=\"--include='.lazy'\";
while read line
do
if [[ \"$line\" != *\"#\"* ]] && [[ \"$line\" != \"\" ]]; then
include_str=\"$include_str --or --include=${line}\";
fi
done < \"./.gitignore_large_binaries\"
# git annex add
git config --local core.excludesfile ./.gitignore;
git annex add $1 $include_str;
# git add
git config --local core.excludesfile ./.gitignore_large_binaries;
git add $1
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://me.yahoo.com/a/bBy7WkgQicYHIiiyj.Vm0TcMbxi2quzbPFef#6f9f7"
nickname="Frederik Vanrenterghem"
subject="comment 2"
date="2013-06-12T00:47:21Z"
content="""
I have witnessed that second problem as well, to the point where I've stopped autostarting (and hence using) git annex for the moment. I'll try to get some debugging data.
"""]]

View file

@ -41,8 +41,9 @@ brew link libxml2
cabal update
PATH=$HOME/bin:$PATH
PATH=$HOME/.cabal/bin:$PATH
cabal install c2hs --bindir=$HOME/bin
cabal install gnuidn
cabal install c2hs git-annex --bindir=$HOME/bin
cabal install git-annex --bindir=$HOME/bin
</pre>
## using MacPorts