Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2016-09-09 16:40:11 -04:00
commit 61faf240d5
No known key found for this signature in database
GPG key ID: C910D9222512E3C7
10 changed files with 264 additions and 0 deletions

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="https://openid.stackexchange.com/user/3ee5cf54-f022-4a71-8666-3c2b5ee231dd"
nickname="Anthony DeRobertis"
subject="comment 2"
date="2016-09-09T06:51:45Z"
content="""
`sha256sum` isn't available on the tablet—at least not in the git-annex shell's PATH.
I tried checking if it's present in the apk, and I don't see it in there—but I'm also not sure exactly where in the jar file I should find it. Unfortunately, I couldn't find an older Android git-annex build to check—seems the download site only keeps the most recent.
"""]]

View file

@ -0,0 +1,77 @@
### Please describe the problem.
autoenable for special remotes seems has stopped working (didn't check with prev versions though, so it might be just something changed in how repository is setup or smth else)
### What version of git-annex are you using? On what operating system?
6.20160902+gitgbc49d8a-1~ndall+1
### Please provide any additional information below.
as you can see below autoenable=true is set for that remote, and it enables manually just fine
[[!format sh """
(venv-tests) % rm -rf fbirn_phaseIII; git clone http://datasets.datalad.org/nidm/fbirn_phaseIII/.git
Cloning into 'fbirn_phaseIII'...
Checking connectivity... done.
(venv-tests) % cd fbirn_phaseIII
(venv-tests) % git annex info
(merging origin/git-annex into git-annex...)
(recording state in git...)
repository mode: indirect
trusted repositories: 0
semitrusted repositories: 6
00000000-0000-0000-0000-000000000001 -- web
00000000-0000-0000-0000-000000000002 -- bittorrent
225f46f1-c353-48ce-89da-ccc94dc59d01 -- yoh@falkor:/srv/datasets.datalad.org/www/nidm/fbirn_phaseIII [origin]
72ce8ab3-19bd-4cef-95b0-5b150c53edc1 -- datalad-archives
d3ceb488-0266-4464-985d-4d4a265e4144 -- yoh@smaug:/mnt/datasets/datalad/crawl/nidm/fbirn_phaseIII
f779a37c-96a5-43b5-822b-0010651dc7b1 -- yoh@hopa:/tmp/autoenable/fbirn_phaseIII [here]
untrusted repositories: 0
transfers in progress: none
available local disk space: 1.2 gigabytes (+1 megabyte reserved)
local annex keys: 0
local annex size: 0 bytes
annexed files in working tree: 7521
size of annexed files in working tree: 2.76 gigabytes
bloom filter size: 32 mebibytes (0% full)
backend usage:
MD5E: 7521
(venv-tests) % echo 'git-annex:remote.log' | git cat-file --batch
951989a46d53a17d9a2621f6af82def73c2dc96e blob 328
72ce8ab3-19bd-4cef-95b0-5b150c53edc1 autoenable=true encryption=none externaltype=datalad-archives name=datalad-archives type=external timestamp=1473266618.950662s
72ce8ab3-19bd-4cef-95b0-5b150c53edc1 autoenable=true encryption=none externaltype=datalad-archives name=datalad-archives type=external timestamp=1473444735.988475s
(venv-tests) % git annex enableremote datalad-archives
enableremote datalad-archives ok
(recording state in git...)
(venv-tests) % git annex info
repository mode: indirect
trusted repositories: 0
semitrusted repositories: 6
00000000-0000-0000-0000-000000000001 -- web
00000000-0000-0000-0000-000000000002 -- bittorrent
225f46f1-c353-48ce-89da-ccc94dc59d01 -- yoh@falkor:/srv/datasets.datalad.org/www/nidm/fbirn_phaseIII [origin]
72ce8ab3-19bd-4cef-95b0-5b150c53edc1 -- [datalad-archives]
d3ceb488-0266-4464-985d-4d4a265e4144 -- yoh@smaug:/mnt/datasets/datalad/crawl/nidm/fbirn_phaseIII
f779a37c-96a5-43b5-822b-0010651dc7b1 -- yoh@hopa:/tmp/autoenable/fbirn_phaseIII [here]
untrusted repositories: 0
transfers in progress: none
available local disk space: 1.2 gigabytes (+1 megabyte reserved)
local annex keys: 0
local annex size: 0 bytes
annexed files in working tree: 7521
size of annexed files in working tree: 2.76 gigabytes
bloom filter size: 32 mebibytes (0% full)
backend usage:
MD5E: 7521
"""]]
I am a little confused though since we do test for this scenario in datalad and test still passes, i.e. remote gets enabled...
[[!meta author=yoh]]

View file

@ -0,0 +1,24 @@
Here is a line from the debug log...
[2016-09-08 13:08:37.01053] chat: ssh
["-oNumberOfPasswordPrompts=0","-oStrictHostKeyChecking=no",
"9553@git-annex-.usw.2Ds009.2Ersync.2Enet-9553_22_annex",
"mkdir -p .ssh;touch .ssh/authorized_keys;dd of=.ssh/authorized_keys oflag=append conv=notrunc;mkdir -p annex"]
The hostname I entered was ordinary: `usw-s009.rsync.net`... but as you can see, the `user@host:port` string is mangled.
I'm using git tag `6.20160907` with changes to `git-annex.cabal` and `stack.yaml` to force use of `concurrent-output-1.7.7` since `1.7.6` had a bug that kept it from building on Windows (I guess?).
Oh, this is on Windows, in case that wasn't clear...
I think the bug is in `${git-annex-root}/Assistant/Ssh.hs` or `${git-annex-root}/Assistant/Pairing/MakeRemote.hs`. The `.2D` and `.2E` bits in the mangled string make me think that the `-` and `.` characters in my hostname are being replaced by some Haskell representation of those values (`2D` in hexadecimal in ASCII is `-`, `2E` is `.`).
But I've never even written hello world in Haskell so my path ends there.
I'm happy to pull some tag or branch from github and run `stack install` over again and try adding the rsync.net remote again.
I hope this helps!
Cheers,
--Dave

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="dave@2ab82f485adf7e2ce787066e35f5f9789bff430b"
nickname="dave"
subject="output of git annex version"
date="2016-09-08T18:48:43Z"
content="""
$ git annex version
git-annex version: 6.20160907-gad0a7f6
build flags: Assistant Webapp Pairing Testsuite S3(multipartupload)(storageclasses) WebDAV ConcurrentOutput TorrentParser Feeds Quvi
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E
SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external
local repository version: unknown
supported repository versions: 5 6
upgrade supported from repository versions: 2 3 4 5
operating system: mingw32 x86_64
"""]]

View file

@ -0,0 +1,33 @@
[[!comment format=mdwn
username="dave@2ab82f485adf7e2ce787066e35f5f9789bff430b"
nickname="dave"
subject="Similar issue on Debian machine"
date="2016-09-08T20:04:45Z"
content="""
OK, let's try this on a GNU machine (debian, sid) running a git-annex from the official debian repos:
$ git annex version
git-annex version: 6.20160808
build flags: Assistant Webapp Pairing Testsuite S3(multipartupload)(storageclasses) WebDAV Inotify DBus DesktopNotify
XMPP ConcurrentOutput TorrentParser MagicMime Feeds Quvi
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512
SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external
local repository version: 5
supported repository versions: 5 6
upgrade supported from repository versions: 0 1 2 3 4 5
operating system: linux x86_64
Now, this time, the error message in the web app says something ordinary: `Permission denied (publickey,password,keyboard-interactive).`. However, I see this in the log:
[2016-09-08 14:56:36.125939] read: ssh-keygen [\"-F\",\"usw-s009.rsync.net\"]
[2016-09-08 14:56:36.134026] process done ExitSuccess
[2016-09-08 14:56:36.1344] chat: ssh [
\"-oNumberOfPasswordPrompts=0\",
\"9553@git-annex-.usw.2Ds009.2Ersync.2Enet-9553_22_annex\",
\"mkdir -p .ssh;touch .ssh/authorized_keys;dd of=.ssh/authorized_keys oflag=append conv=notrunc;mkdir -p annex\"
]
[2016-09-08 14:56:36.71948] process done ExitFailure 255
As you can see, that string mangling is present there too.
"""]]

View file

@ -0,0 +1,48 @@
Hello.
I have been using a portable USB disc for storing media. After several weeks of inactivity, when I used it with git-annex again I noticed it was running very slowly. I mean, it is an USB 2.0 connection, `git-annex sync` would take a few minutes, but now it takes *many hours*.
When adding files to it (`git-annex add`), I would see the list of files passing by at the speed I would expect, but then, at the end, I would have to wait; the next day I saw the operation had completed. `git-annex status` makes me wait a similar amount of time, and `git-annex sync` also makes me wait, even when it has nothing to do — I saw `top` reporting 100% CPU usage, `iotop` saying `git` and `fuse` were saturating the USB connection to its maximum.
File operations outside of `git-annex` seem to work fine, so I can't blame the hardware or the filesystem (NTFS mounted with `fuse`).
I would like to know your thought about this problem, that I was not noticing before. Here is the regular information to help you form an oppinion:
% time git annex status --debug
[2016-09-07 20:20:46.093990203] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","core.bare=false","status","-uall","-z"]
[2016-09-08 15:05:23.239573491] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","core.bare=false","cat-file","--batch"]
[2016-09-08 15:08:25.605689622] process done ExitSuccess
git annex status --debug 33802.97s user 1162.38s system 51% cpu 18:47:39.54 total
% git-annex info
repository mode: direct
trusted repositories: 0
semitrusted repositories: 4
00000000-0000-0000-0000-000000000001 -- web
00000000-0000-0000-0000-000000000002 -- bittorrent
069de9a2-dc53-4c0a-82e0-a61a1f29e6b3 -- stratos PC [stratos]
49b5b3a4-56ac-4cf2-aed9-1c23d3181c97 -- Toshiba USB HDD [here]
untrusted repositories: 0
transfers in progress: none
available local disk space: 120.41 gigabytes (+1 megabyte reserved)
local annex keys: 6303
local annex size: 1.87 terabytes
annexed files in working tree: 7412
size of annexed files in working tree: 2.24 terabytes
bloom filter size: 32 mebibytes (1.3% full)
backend usage:
SHA256E: 7412
% git-annex version
git-annex version: 6.20160613-g1e4e6f4
build flags: Assistant Webapp Pairing Testsuite S3(multipartupload)(storageclasses) WebDAV Inotify DBus DesktopNotify XMPP ConcurrentOutput TorrentParser MagicMime Feeds Quvi
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external
local repository version: 5
supported repository versions: 5 6
upgrade supported from repository versions: 0 1 2 3 4 5
operating system: linux x86_64
Thank you for your time.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="dave@2ab82f485adf7e2ce787066e35f5f9789bff430b"
nickname="dave"
subject="concurrent-output-1.7.7"
date="2016-09-08T17:41:28Z"
content="""
Hi. `concurrent-output-1.7.7` is not present in resolver `lts-5.18`.
"""]]

View file

@ -0,0 +1,9 @@
I have some Git Annex repos that I keep copies of on both NTFS on Linux and on ecryptfs (which Ubuntu uses for home directory encryption) on Linux. Now, ecryptfs allows each path component of a filename to be only up to 140-ish characters, because it has to encrypt that filename, add some encryption info to it, and store it inside another filename on a backing ext4 filesystem (which limits path components to 255 characters).
Several times now I've added a bunch of stuff to my annex on the NTFS checkout, where path components are allowed to be longer than 140 characters, synced it over to my other annex checkout on ecryptfs, and then had Git Annex fail during the sync, trying to create these empty symlinks with path components too long for the filesystem it is on. When in this state, I don't really know how to fix it. I can't just "git mv" the offending file to a valid name, both because "git mv" needs the source file to be on disk in the first place and because the failed "git checkout" leaves my repo thinking it has thousands of untracked files (because some stuff did get created, but git refused to officially move to the commit it was trying to check out, because the checkout failed).
I am looking for a solution for this inside Git Annex. The simplest thing, I think, would be to set a max path component length for the whole set of repos, so I could get an error when I go to "git annex add" on the NTFS checkout that the filenames being added are too long for some of the repos that will eventually want to check them out. Is it possible to do this with a pre-commit hook somehow?
The next simplest thing would be for Git Annex to look at the filesystem it is running on and do something smarter than exploding and leaving my repo in a weird out-of-sync state if some of the filenames it wants to create can't be created. Maybe it should fail the sync earlier, in Git Annex itself rather than in git checkout. Maybe it should just leave those files out of the checkout, or force/allow me to rename them right then.
The most complex thing would be to somehow make it work anyway and check out the symlinks under different, valid names. Perhaps it could just truncate those path components in the symlink view? There's already support for different metadata views; this would be sort of like that. You get a special view of the repo subject to the constraints of your filesystem.

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="https://me.yahoo.com/a/EbvxpTI_xP9Aod7Mg4cwGhgjrCrdM5s-#7c0f4"
subject="comment 5"
date="2016-09-08T18:08:49Z"
content="""
good idea on parametrizing frequency of updates from json -- indeed we wouldn't want to deal with output from it probably more often than once in a second per file. Either \"skinny\" progress indication using IDs or full ones (so we could match by e.g. key) would work for us I think
btw, just so that we don't forget. I guess all of this would also be needed to be done for addurl, copy, move commands, right? (copy and move do not have --json yet) ;)
"""]]

View file

@ -0,0 +1,28 @@
[[!comment format=mdwn
username="https://me.yahoo.com/a/EbvxpTI_xP9Aod7Mg4cwGhgjrCrdM5s-#7c0f4"
subject="comment 6"
date="2016-09-09T12:47:30Z"
content="""
ha -- a wild idea: instead of git ls-files git-annex | git cat-file you could be much better off with using \"git archive\" to dump the content of all the files under git-annex branch!
[[!format sh \"\"\"
$> GIT_TRACE_PACKET=true GIT_TRACE_PERFORMANCE=true git annex find --not --in here >/dev/null
08:46:11.246625 trace.c:420 performance: 0.000291504 s: git command: '/usr/lib/git-annex.linux/shimmed/git/git' 'config' '--null' '--list'
08:46:11.267559 trace.c:420 performance: 0.000466198 s: git command: '/usr/lib/git-annex.linux/shimmed/git/git' '--git-dir=.git' '--work-tree=.' '--literal-pathspecs' 'show-ref' 'git-annex'
08:46:11.271522 trace.c:420 performance: 0.000434572 s: git command: '/usr/lib/git-annex.linux/shimmed/git/git' '--git-dir=.git' '--work-tree=.' '--literal-pathspecs' 'show-ref' '--hash' 'refs/heads/git-annex'
08:46:22.647051 trace.c:420 performance: 11.387079176 s: git command: '/usr/lib/git-annex.linux/shimmed/git/git' '--git-dir=.git' '--work-tree=.' '--literal-pathspecs' 'ls-files' '--cached' '-z' '--'
08:46:23.616005 trace.c:420 performance: 12.339791892 s: git command: '/usr/lib/git-annex.linux/shimmed/git/git' '--git-dir=.git' '--work-tree=.' '--literal-pathspecs' 'cat-file' '--batch'
08:46:23.616052 trace.c:420 performance: 12.391364205 s: git command: 'git' 'annex' 'find' '--not' '--in' 'here'
$> git ls-tree -r --name-only git-annex | sed -e \"s/^/git-annex:/g\" | time git --git-dir=.git cat-file --buffer --batch >| /tmp/111
git --git-dir=.git cat-file --buffer --batch >| /tmp/111 7.80s user 0.40s system 99% cpu 8.214 total
$> time git archive git-annex > /dev/null
git archive git-annex > /dev/null 0.20s user 0.00s system 97% cpu 0.212 total
\"\"\"]]
x40 times faster (if we disregard time to parse/split tar, but it should not be way too much I think)
"""]]