Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2015-08-04 11:53:54 -04:00
commit 615b0f56a7
9 changed files with 220 additions and 0 deletions

View file

@ -0,0 +1,59 @@
### Please describe the problem.
Files with umlauts were not copied from local system, all other files were copied successfully.
### What steps will reproduce the problem?
Trying to sync content from a repository on the same machine.
### What version of git-annex are you using? On what operating system?
git-annex version: 5.20150727 / Darwin tba.lan 14.4.0 Darwin Kernel Version 14.4.0: Thu May 28 11:35:04 PDT 2015; root:xnu-2782.30.5~1/RELEASE_X86_64 x86_64
### Please provide any additional information below.
[[!format sh """
$ git annex get .
get Die Sterne/Flucht in die Flucht (Bonus Track Version)/03 Ihr wollt mich töten.m4a
Unable to access these remotes: tba
Try making some of these repositories available:
2cabf5e0-00ae-4cc6-b9b7-5d303a7f3f06 -- Music [tba]
8e315ed0-f318-45f7-98ca-1a791f9c92df -- jan@hostname:/srv/annex-Music
failed
get Die Sterne/Flucht in die Flucht (Bonus Track Version)/03 Ihr wollt mich töten.m4a
Unable to access these remotes: tba
Try making some of these repositories available:
2cabf5e0-00ae-4cc6-b9b7-5d303a7f3f06 -- Music [tba]
8e315ed0-f318-45f7-98ca-1a791f9c92df -- jan@hostname:/srv/annex-Music
failed
get Die Sterne/Flucht in die Flucht (Bonus Track Version)/10 Der Bär.m4a
Unable to access these remotes: tba
Try making some of these repositories available:
2cabf5e0-00ae-4cc6-b9b7-5d303a7f3f06 -- Music [tba]
8e315ed0-f318-45f7-98ca-1a791f9c92df -- jan@hostname:/srv/annex-Music
failed
get Die Sterne/Flucht in die Flucht (Bonus Track Version)/10 Der Bär.m4a
Unable to access these remotes: tba
Try making some of these repositories available:
2cabf5e0-00ae-4cc6-b9b7-5d303a7f3f06 -- Music [tba]
8e315ed0-f318-45f7-98ca-1a791f9c92df -- jan@hostname:/srv/annex-Music
failed
"""]]
the *tba* repository is accessible since all the other files were synced correctly.
Git status reports untracked files which look like they were renamed.
[[!format sh """
$ git status
Untracked files:
(use "git add <file>..." to include in what will be committed)
"Ant\303\263nio Varia\303\247\303\265es/"
"B Fachada/B Fachada/01 sozinho no r\303\263que.mp3"
"B Fachada/B Fachada/03 D\303\241 mais m\303\272sica \303\240 b\303\263fia.mp3"
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="zedr0nre@e1b86776f21c5f6a6e064c6ab8da039b54915b7d"
nickname="zedr0nre"
subject="comment 10"
date="2015-08-04T00:00:23Z"
content="""
A workaround for now is to use a newer GIT from [http://git-for-windows.github.io/](http://git-for-windows.github.io/) which has Git 2.5+ and use the new environment variable GIT_SSH_COMMAND which overrides GIT_SSH so setting
`export GIT_SSH_COMMAND=ssh`
allows git annex sync to work successfully for me
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="zedr0nre@e1b86776f21c5f6a6e064c6ab8da039b54915b7d"
nickname="zedr0nre"
subject="comment 9"
date="2015-08-03T22:46:30Z"
content="""
By default on windows in direct mode sshcaching is disabled but it seems like in [5.2010617](https://git-annex.branchable.com/news/version_5.20150617/) there was this change:
`sync, remotedaemon: Pass configured ssh-options even when annex.sshcaching is disabled.`
which might explain why GIT_SSH is still set up and points to git-annex instead of ssh?
"""]]

View file

@ -0,0 +1,29 @@
I'm trying to use a Synology NAS (ARM architecture, DiskStation 214+) as a remote repository for my laptop, but I'm failing to get a convenient configuration to work.
I already set-up git-annex on the NAS following the explanations found [here](http://git-annex.branchable.com/tips/Synology_NAS_and_git_annex/). I installed version **5.20150714-g8695533**.
On my laptop I have the version provided with Ubuntu 14.04: **5.20140412ubuntu1**.
If calling git annex from my laptop's command line and doing everything manually (git remote add, copy file to the dir, git annex add, git commit, git push, git annex copy), then it works properly.
But when trying with the assistant I get this error:
```
fatal: unrecognized command 'sh -c 'mkdir -p '"'"'annex'"'"'&&cd '"'"'annex'"'"'&&if [ ! -d .git ]; then git init --bare --shared && git config receive.denyNonFastforwards false; fi&&git annex init''
git-annex-shell: git-shell failed
```
This is the content of daemon.log:
[[!format sh """
[2015-08-04 00:51:41 CEST] main: starting assistant version 5.20140412ubuntu1
[2015-08-04 00:51:41 CEST] Cronner: You should enable consistency checking to protect your data.
(Recording state in git...)
(scanning...) [2015-08-04 00:51:41 CEST] Watcher: Performing startup scan
(started...) [2015-08-04 00:52:41 CEST] Cronner: Consistency check in progress
[2015-08-04 00:59:12 CEST] read: ssh-keygen ["-F","git-annex-trusted"]
[2015-08-04 00:59:12 CEST] read: ssh ["-oNumberOfPasswordPrompts=0","-oStrictHostKeyChecking=no","-n","-p","22","git-annex@git-annex-trusted","sh -c 'echo git-annex-probe loggedin;if which git-annex-shell; then echo git-annex-probe git-annex-shell; fi;if which git; then echo git-annex-probe git; fi;if which rsync; then echo git-annex-probe rsync; fi;if which ~/.ssh/git-annex-shell; then echo git-annex-probe ~/.ssh/git-annex-shell; fi;cd '\"'\"'annex'\"'\"' && git config --list'"]
[2015-08-04 00:59:13 CEST] read: gpg ["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--with-colons","--list-secret-keys","--fixed-list-mode"]
[2015-08-04 00:59:15 CEST] read: ssh ["-p","22","git-annex@git-annex-trusted","sh -c 'mkdir -p '\"'\"'annex'\"'\"'&&cd '\"'\"'annex'\"'\"'&&if [ ! -d .git ]; then git init --bare --shared && git config receive.denyNonFastforwards false; fi&&git annex init'"]
"""]]
Is there any problem with the version provided by Ubuntu that is producing this strange behavior?

View file

@ -0,0 +1,41 @@
### Please describe the problem.
The shim for the standalone version of git annex invokes the dynamic linker with --library-path set appropriatly. glibc's parsing of library-path treats : and ; as seperaters and has no quoting syntax for those characters. Also path items have to be absolute paths so you can't avoid having the whole path.
As a result if you install the standalone version in a directory whose path includes either of those characters it will produce a library-path that's interpreted incorrectly leading to it looking for the libraries in the wrong place and potentially eventually using the system's installed versions if they are available.
### What steps will reproduce the problem?
1. Find a system with as few libraries installed as possible. libnettle.so.6 is an easy one to avoid.
2. create a directory called 'test;test'
3. install git annex standalone in 'test;test'
4. use the standalone git annex to do a 'git annex get'
5. there will be an error saying it can't find libnettle.so.6
### What version of git-annex are you using? On what operating system?
This bug applies to any version. with a ':' it will probably occur on any unix, with a ';' it will occur on at least Linux and Solaris.
### Please provide any additional information below.
This bug can't be fixed. It's unlikely to trip people up but might with a bad combination of label-based drive mounting and odd drive labels. The worst case would be incompatable libraries on the system install which could lead to data corruption.
I recommend that the git annex shim should check for ':' or ';' in the path and exit non-zero if they are found.
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
jdamery-iabak@oklina:/mnt/iabak-ext;/IA.BAK/shard9$ PATH=../git-annex.linux/:$PATH git annex get prelinger_library/11annualreport00unitrich/11annualreport00unitrich_raw_jp2.zip
get prelinger_library/11annualreport00unitrich/11annualreport00unitrich_raw_jp2.zip (from web...)
/mnt/iabak-ext;/IA.BAK/git-annex.linux/shimmed/wget/wget: error while loading shared libraries: libnettle.so.6: cannot open shared object file: No such file or directory
Unable to access these remotes: web
Try making some of these repositories available:
00000000-0000-0000-0000-000000000001 -- web
failed
git-annex: get: 1 failed
# End of transcript or log.
"""]]

View file

@ -0,0 +1,34 @@
ATM, even with ControlPersist=yes, on a fast interconnection between hosts (so it a millisecond or so to transfer a single file I have there), majority of time is spent I guess on running a new ssh (although with instructions for persistent connection etc) or is there some intentional sleep somewhere (or just output flushed at 1 sec intervals so ts has those consistent subsec offset), which ends up spending up to a second to transfer a single file. I have ~25,000 files so would need about 5 hours, although with direct rsync -e 'ssh' -L it takes total of 2 seconds for those 100MB . Here is the protocol (with ts with subsecond timing):
Aug 04 10:46:02.1438699562 get stimulus/task002/generate/part_1-182x121/000251.jpeg (from origin...)
Aug 04 10:46:02.1438699562 SHA256E-s1650--768d78ad49fc413d178a5cd9407b56bb442f40aaa629b7f608844330c2c4bbf9.jpeg
Aug 04 10:46:02.1438699562 ^M 0 0% 0.00kB/s 0:00:00 ^M 1,650 100% 1.57MB/s 0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:02.1438699562 ok
Aug 04 10:46:02.1438699562 get stimulus/task002/generate/part_1-182x121/000252.jpeg (from origin...)
Aug 04 10:46:02.1438699562 SHA256E-s1662--d156f08ecfcc248aeb01239f5e605f3ac9ed72fa77c54e593a54b1f6a8b3f0f4.jpeg
Aug 04 10:46:02.1438699562 ^M 0 0% 0.00kB/s 0:00:00 ^M 1,662 100% 1.59MB/s 0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:02.1438699562 ok
Aug 04 10:46:02.1438699562 get stimulus/task002/generate/part_1-182x121/000253.jpeg (from origin...)
Aug 04 10:46:02.1438699562 SHA256E-s1673--47562fe25853fe2972678cbaa1ef8e03bad068095d9f8575ba96f8df0a18cff0.jpeg
Aug 04 10:46:02.1438699562 ^M 0 0% 0.00kB/s 0:00:00 ^M 1,673 100% 1.60MB/s 0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:03.1438699563 ok
Aug 04 10:46:03.1438699563 get stimulus/task002/generate/part_1-182x121/000254.jpeg (from origin...)
Aug 04 10:46:03.1438699563 SHA256E-s1675--a3dae03d805040af4a7341479b782342431ee5377713c061e02daa075d188037.jpeg
Aug 04 10:46:03.1438699563 ^M 0 0% 0.00kB/s 0:00:00 ^M 1,675 100% 1.60MB/s 0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:03.1438699563 ok
Aug 04 10:46:03.1438699563 get stimulus/task002/generate/part_1-182x121/000255.jpeg (from origin...)
Aug 04 10:46:03.1438699563 SHA256E-s1674--a6743261238a87f9f3574295665896be2a8f373dd5b400fbf1552fb8d3b8fc66.jpeg
Aug 04 10:46:03.1438699563 ^M 0 0% 0.00kB/s 0:00:00 ^M 1,674 100% 1.60MB/s 0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:04.1438699564 ok
Aug 04 10:46:04.1438699564 get stimulus/task002/generate/part_1-182x121/000256.jpeg (from origin...)
Aug 04 10:46:04.1438699564 SHA256E-s1659--e1bd4829f53b226ad62ebccfbbb1a132d977af930545ade4161b5f9acf2e80b1.jpeg
Aug 04 10:46:04.1438699564 ^M 0 0% 0.00kB/s 0:00:00 ^M 1,659 100% 1.58MB/s 0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:04.1438699564 ok
Aug 04 10:46:04.1438699564 get stimulus/task002/generate/part_1-182x121/000257.jpeg (from origin...)
Aug 04 10:46:04.1438699564 SHA256E-s1663--ae0e673c60dede66c773441ee198fa8979c6de4a169e468fbb10dc22860afb27.jpeg
Aug 04 10:46:04.1438699564 ^M 0 0% 0.00kB/s 0:00:00 ^M 1,663 100% 1.59MB/s 0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:05.1438699565 ok
both hosts do not show any high CPU load

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="https://me.yahoo.com/a/EbvxpTI_xP9Aod7Mg4cwGhgjrCrdM5s-#7c0f4"
subject="comment 1"
date="2015-08-04T15:00:02Z"
content="""
doh -- and apparently I had some aged annex -- git-annex version: 5.20141125, but even with a freshier 5.20150706+gitgefc3bcd-1~ndall+1 situation is the same
"""]]

View file

@ -0,0 +1,25 @@
[[!comment format=mdwn
username="https://me.yahoo.com/a/EbvxpTI_xP9Aod7Mg4cwGhgjrCrdM5s-#7c0f4"
subject="comment 2"
date="2015-08-04T15:28:29Z"
content="""
since I am not sure what is the actual overhead here, can't provide any good advice, but may be it is worth looking at least into bundling multiple transfers within the same rsync call? rsync man page says
The syntax for requesting multiple files from a remote host is done by specifying
additional remote-host args in the same style as the first, or
with the hostname omitted. For instance, all these work:
rsync -av host:file1 :file2 host:file{3,4} /dest/
so it should be quite possible to batch a hundred or two transfers into the same rsync call I guess. Probably on other systems limit is different but on linux the cmdline size is quite hefty:
$> xargs --show-limits
Your environment variables take up 3441 bytes
POSIX upper limit on argument length (this system): 2091663
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2088222
Size of command buffer we are actually using: 131072
not sure if there are inherent limits within ssh etc
"""]]

View file

@ -0,0 +1 @@
ATM --debug uses timestamps at second precision. Would be nice (to see where time is spent) to have subsecond timing