Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2014-08-02 19:10:55 -04:00
commit b5eb02bf77
6 changed files with 164 additions and 17 deletions

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.112"
subject="comment 3"
date="2014-08-02T23:08:44Z"
content="""
hS3's author seems to have abandoned it and it has other problems. I should try to switch to a different S3 library.
There is now a workaround; S3 special remotes can be configured to use [[chunking]]. A max of one chunk will then be buffered in memory at a time.
For example, to reconfigure an existing mys3 remote: `enableremote mys3 chunk=1MiB`
"""]]

View file

@ -0,0 +1,57 @@
### Please describe the problem.
While the docs say that WORM keys are a function of a files basename,
when doing «git annex add .», the generated keys will actually contain
the relative path (with slashes escaped). Not sure whether this is by
design or a bug in its own right. I suppose that to minimize the chance
of collisions on WORM, having the path within the key is preferable.
A problem about this, however, is that the path in the key is not
stable, but varies with the working dir when doing the «git annex
add». So, when a file is added from one working dir (say, the repo
base), later unlocked, and readded from another working dir (say,
somewhere below the repo base), this will generate a different key
even when the file has not been touched.
Is there a rationale for this variability, or should «add» canonicalize
the encoded paths to the repo root?
### What steps will reproduce the problem?
[[!format sh """
# Init
$ git init /tmp/foo
$ cd /tmp/foo && git annex init
$ mkdir baz
$ touch baz/quux
# Add file with working dir at repo root.
$ git annex add --backend=WORM baz
$ git commit -m "first"
# Key includes relative path.
$ readlink baz/quux
../.git/annex/objects/8x/8V/WORM-s0-m1406981486--baz%quux/WORM-s0-m1406981486--baz%quux
# Unlock and readd with working dir at path below repo root.
$ cd baz
$ git annex unlock quux
$ git annex add quux
$ git com -m "second"
# Relative path is anchored to working dir instead of repo root.
$ readlink quux
../.git/annex/objects/9G/72/WORM-s0-m1406981486--quux/WORM-s0-m1406981486--quux
# End of transcript or log.
"""]]
### What version of git-annex are you using? On what operating system?
Linux 3.15.8
git-annex 5.20140716

View file

@ -0,0 +1,77 @@
Sorry that I put all this in the same thread but I don't know what happened and how it is related.
I have just a simple setup: git-annex client with assistant (Windows 7) and on a server (Debian, no assistant).
Suddenly weird things started to happen
1.) On Windows, when I start the assistant, it writes "Attempting to repair THINKTANK:c:\data\annex [here]" but it runs forever and never stops
2.) On Windows, when I get "Pusher crashed: failed to read sha from git write-tree [Restart Thread]". When I click "Restart Thread" nothing happens but the message from (1) persists.
3.) When I run "git annex fsck" on the client I get thousands of messages like
fsck Fotos/2014/DSC_0303.JPG
** No known copies exist of Fotos/2014/DSC_0303.JPG
failed
Here the same:
$ git annex whereis "Fotos/2014/DSC_0303.JPG"
whereis Fotos/2014/DSC_0303.JPG (0 copies) failed
git-annex: whereis: 1 failed
4.) When I do "git annex status" a whole bunch of files are displayed with "M" (modified) although they are not, they are not even checked out and should be only at the server ...
5.) On the server, files that should ALWAYS be on the server (configured as "full backup") suddenly wiped data that was also made available on the client. The symlinks are dangling symlinks and contain just binary data:
ls -l
lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0011.JPG -> ????
lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0012.JPG -> ????
lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0013.JPG -> ????
lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0014.JPG -> ????
lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0015.JPG -> ????
lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0018.JPG -> ????
lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0019.JPG -> ????
lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0020.JPG -> ????
6.) "git annex fsck" on the server is still successful, returning no errors!
7.) Manually executing "git annex sync --content" on both sides does not change anything and does not output any error messages.
8.) On the client:
$ git annex group here
error: invalid object 100644 3b3767ae65e5c6d2e3835af3d55fbf2f9e145c8b for '000/0e6/SHA256Es193806--b6d4689fba8e15acd6497f9a7e584c93ea0c8c2199ad32eadac79d59b9f49814.JPG.log'
fatal: git-write-tree: error building trees
manual
(Recording state in git...)
git-annex: failed to read sha from git write-tree
$ git annex wanted here
error: invalid object 100644 3b3767ae65e5c6d2e3835af3d55fbf2f9e145c8b for '000/0e6/SHA256Es193806--b6d4689fba8e15acd6497f9a7e584c93ea0c8c2199ad32eadac79d59b9f49814.JPG.log'
fatal: git-write-tree: error building trees
exclude="*" and present
git-annex: failed to read sha from git write-tree
9.) Ok I don't know what happened I did nothing special but it seems that the repository is broken :( :(
$ git annex --verbose --debug repair
[...]
[2014-08-02 13:27:38 Pacific Daylight Time] read: git ["--git-dir=C:\\Data\\annex\\.git","--work-tree=C:\\Data\\annex","-c","core.bare=false","show","ef3fe549f457783dbbd877b467b4e54b0ebc813c"]
Running git fsck ...
git-annex: DeleteFile "C:\\Data\\annex\\.git\\objects\\2a\\54bb281c80c91ea7a732c0d48db0c5acc0ca2c": permission denied (Access is denied.)
failed
git-annex: repair: 1 failed
But this file exists, I can read, write and delete to this file manually, there is definitely no permission denied ...
Oh no, so desparate :-( Any ideas?
As it seems the client repository is broken but how can it be then that also files on the server repository get deleted which shouldn't be deleted?
And how can it be that there are not only broken symlinks but symlinks that have just binary garbage as target and "fsck" returns success?
(I am happy to share all log files privately but I do not want to publish them here because they contain sensitive data)

View file

@ -1,6 +1,8 @@
# Introduction
i want to relate a usability story that happens fairly regularly when I show git-annex to people. the story goes like this.
----
# The story
Antoine sat down at his computer saying, "i have this great movie collection I want to share with you, my friend, because the fair use provisions allow for that, and I use this great git-annex tool that allows me to sync my movie collection between different places". His friend Charlie, a Linux user only vaguely familiar with the internals of how his operating system or legal system actually works, reads this as "yay free movies" and wholeheartedly agrees to lend himself to the experiment.
@ -10,7 +12,7 @@ Charlie logs into Antoine's computer, named `marcos`. Antoine shows Charlie wher
Antoine then has no solution but to convert the git-annex repository into direct mode, something which takes a significant amount of time and is actually [[designated as "untrusted"|direct_mode]] in the documentation. In fact, so much so that he actually did [[screw up his repository magnificently|bugs/direct_command_leaves_repository_inconsistent_if_interrupted]] because he freaked out when `git-annex direct` started and interrupted it because he tought it would take too long.
----
# Technical analysis
Now I understand it is not necessarily `git-annex`'s responsability if Thunar (or Nautilus, for that matter), doesn't know how to properly deal with symlinks (hint: just dereference the damn thing already). Maybe I should file a bug about this against thunar? I also understand that symlinks are useful to ensure the security of the data hosted in `git-annex`, and that I could have used direct mode in the first place. But I like to track changes in git to those files, and direct mode makes that really difficult.
@ -19,3 +21,9 @@ I didn't file this as a bug because I want to start the conversation, but maybe
(The other being "how do i actually use git annex to sync those files instead of just copying them by hand", but that's for another story!)
-- [[anarcat]]
# Followup
Here is a bug report filed against Thunar, with a patch to fix this behavior: https://bugzilla.xfce.org/show_bug.cgi?id=11065
Similar bugs would need to be filed against Nautilus, at the very least, but probably other file managers, which makes this task a little daunting, to say the least. -- [[anarcat]]

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.64"
subject="comment 21"
date="2013-11-24T15:58:30Z"
content="""
@Bence the closest I have is some tests of particular special remotes inside Test.hs. The shell equivilant of that code is:
[[!format sh \"\"\"
set -e
git annex copy file --to remote # tests store
git annex drop file # tests checkpresent when remote has file
git annex move file --from remote # tests retrieve and remove
\"\"\"]]
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="zardoz"
ip="78.48.163.229"
subject="comment 2"
date="2014-08-02T14:29:26Z"
content="""
This could be achieved in a generic way by allowing filter binaries in expressions, which are run on the filename and return 0 or 1.
"""]]