Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2015-05-24 15:48:11 -04:00
commit f8171d92b3
7 changed files with 203 additions and 0 deletions

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="eigengrau"
subject="comment 2"
date="2015-05-24T14:58:30Z"
content="""
Thanks! This would be useful. While it probably takes up only little disk space, the noise it creates during fsck would make it harder to discern which data has been unintentionally lost, as opposed to migrated to another back-end.
"""]]

View file

@ -0,0 +1,43 @@
### What steps will reproduce the problem?
1. Create a fresh annex repo. Create a dummy test file and add it to the annex.
2. `git init --bare` in an empty directory on a remote machine.
3. `git initremote testremote type=gcrypt encryption=hybrid gitrepo=ssh://machine/the/remote/machine/dir keyid=my_key_id`
4. `git annex sync testremote --content`
5. Unplug network/switch off WiFi.
6. `git annex sync testremote --content`, which fails due to the broken network.
7. Reconnect network, check that can ssh to remote host.
8. `git annex sync testremote --content`
gcrypt issues warning `gcrypt: WARNING: Remote ID has changed!`
### What version of git-annex are you using? On what operating system?
git-annex 5.20141125 on Debian Wheezy 32-bit.
### Please provide any additional information below.
This is essentially a gcrypt bug, so I don't know if you want to fix it, and I know that the gcrypt author is inactive.
My diagnosis is that when running `git annex sync testremote --content` when the network is disconnected, git can't SSH to the remote and gcrypt makes the mistake of regenerating the remote ID and setting up a new remote. So when the network comes back online, the local record of the remote's gcrypt ID is just wrong. gcrypt ought not to "set up a new repository" when there is a network failure.
gcrypt: Development version -- Repository format MAY CHANGE
gcrypt: Repository not found: ssh://url-here
gcrypt: Setting up new repository
gcrypt: Remote ID is :id:agVyn7wBG/JGwN9LW5Qn
Counting objects: 22, done.
Compressing objects: 100% (17/17), done.
Total 22 (delta 4), reused 0 (delta 0)
gcrypt: Encrypting to: -r my_key_id_here
gcrypt: Requesting manifest signature
ssh: Could not resolve hostname my_remote_host_here: No such file or directory
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
error: failed to push some refs to 'gcrypt::ssh://url-here
Pushing to testremote failed.
(non-fast-forward problems can be solved by setting receive.denyNonFastforwards to false in the remote's git config)
failed

View file

@ -0,0 +1,14 @@
### Please describe the problem.
btrfs automatically validates checksums when data is read. If a checksum fails, instead of giving the corrupted file contents, the read will throw an I/O error. In result, it seems that git-annex fsck will not recognize the file as faulty, and will instead fail with a sha1sum parse error, without dropping the corresponding file as “bad”.
[[!format sh """
git annex fsck file
fsck file (checksum...)
sha1sum: .git/annex/objects/…: Input/output error
git-annex: sha1sum parse error
# End of transcript or log.
"""]]
### What version of git-annex are you using? On what operating system?
git-annex 5.20150508
linux 4.0.4

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="eigengrau"
subject="comment 1"
date="2015-05-24T15:07:34Z"
content="""
I suppose since an I/O error can be intermittent, the file cant be outright regarded as bad. Also Im not sure whether the read system call returns a dedicated error code for checksum errors.
"""]]

View file

@ -0,0 +1,103 @@
### Please describe the problem.
Incremental fsck keeps information about which time a file was last fsck-ed by setting mtime of the file's parent directory in `.git/annex/objects/`. When we are doing incremental fsck from a remote, files that are not available locally are never marked as checked (since said directory does not exist), so they are checked at every invocation of `git annex fsck --more`.
### What steps will reproduce the problem?
Create a git-annex repository with some random content. Then add any remote, copy files there, remove them locally and run an incremental fsck from the remote. Interrupt it and run again with `--more`. It will check again all the files, including those that have already been checked.
### What version of git-annex are you using? On what operating system?
Debian official package, 5.20141125, on Debian sid (more or less up-to-date).
### Please provide any additional information below.
[[!format sh """
# Create a test repository
giovanni@amalgama:~$ cd /tmp/
giovanni@amalgama:/tmp$ mkdir test
giovanni@amalgama:/tmp$ cd test/
giovanni@amalgama:/tmp/test$ git init
Inizializzato un repository Git in /tmp/test/.git/
giovanni@amalgama:/tmp/test (master)$ git annex init
init ok
(Recording state in git...)
# Create random content
giovanni@amalgama:/tmp/test (master)$ dd if=/dev/urandom bs=1M count=20 of=test1
20+0 record dentro
20+0 record fuori
20971520 byte (21 MB) copiati, 1,15928 s, 18,1 MB/s
giovanni@amalgama:/tmp/test (master)$ dd if=/dev/urandom bs=1M count=20 of=test2
20+0 record dentro
20+0 record fuori
20971520 byte (21 MB) copiati, 1,12974 s, 18,6 MB/s
giovanni@amalgama:/tmp/test (master)$ dd if=/dev/urandom bs=1M count=20 of=test3
20+0 record dentro
20+0 record fuori
20971520 byte (21 MB) copiati, 1,16881 s, 17,9 MB/s
giovanni@amalgama:/tmp/test (master)$ dd if=/dev/urandom bs=1M count=20 of=test4
20+0 record dentro
20+0 record fuori
20971520 byte (21 MB) copiati, 1,14387 s, 18,3 MB/s
giovanni@amalgama:/tmp/test (master)$ git annex add .
add test1 ok
add test2 ok
add test3 ok
add test4 ok
(Recording state in git...)
# Create a remote of type directory and move content there
giovanni@amalgama:/tmp/test (master)$ mkdir /tmp/dir
giovanni@amalgama:/tmp/test (master)$ git annex initremote test type=directory encryption=none directory=/tmp/dir
initremote test ok
(Recording state in git...)
giovanni@amalgama:/tmp/test (master)$ git annex move --to test
move test1 (to test...)
ok
move test2 (to test...)
ok
move test3 (to test...)
ok
move test4 (to test...)
ok
(Recording state in git...)
# Launch a remote incremental fsck
giovanni@amalgama:/tmp/test (master)$ git annex fsck --from test --incremental
fsck test1 (checksum...)
ok
fsck test2 (checksum...)
ok
fsck test3 (checksum...)
ok
fsck test4 (checksum...)
ok
# Continue it; here I would expect nothing to happen, since all content has already been checked
giovanni@amalgama:/tmp/test (master)$ git annex fsck --from test --more
fsck test1 (checksum...)
ok
fsck test2 (checksum...)
ok
fsck test3 (checksum...)
ok
fsck test4 (checksum...)
ok
# Bring back content locally and launch again fsck
giovanni@amalgama:/tmp/test (master)$ git annex get
get test1 (from test...)
ok
get test2 (from test...)
ok
get test3 (from test...)
ok
get test4 (from test...)
ok
(Recording state in git...)
giovanni@amalgama:/tmp/test (master)$ git annex fsck --from test --incremental
fsck test1 (checksum...)
ok
fsck test2 (checksum...)
ok
fsck test3 (checksum...)
ok
fsck test4 (checksum...)
ok
# Now --more semantics is respected
giovanni@amalgama:/tmp/test (master)$ git annex fsck --from test --more
giovanni@amalgama:/tmp/test (master)$
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="clacke"
subject="comment 4"
date="2015-05-24T17:41:17Z"
content="""
Hm, not so sure that \"rebooted, did not help\" was actually true. I take that back.
Now I saw a stray `git-annex-shell recv-key` process mentioning that file. I killed it and now everything seems fine. I will keep this in mind for next time, to see if I can verify that this was actually the cause of the message, but maybe it's a clue.
"""]]

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="junk@5e3eeba2290e8a3fcf938d9f93b0dfa2565dc7b1"
nickname="junk"
subject="Tahoe-LAFS helper: multiple FURLs for the same grid"
date="2015-05-24T13:48:33Z"
content="""
Hi,
I would like to uses git-annex in combination with Tahoe-LAFS. The grid will consist of private Servers connected though slow DSL-Lines. Thus I would like to use the Tahoe-LAFS helper feature (like a Tahoe-LAFS upload proxy):
https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/helper.rst
This will result in a different FURL for each location pointing to the same Tahoe-LAFS grid.
How can I setup two git-annex clients to use two different FURLs for the same remote (the same Tahoe-LAFS grid)?
Thank you very much for your help!
Oliver
"""]]