Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2020-07-07 14:19:46 -04:00
commit ba0adefe4c
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 202 additions and 0 deletions

View file

@ -0,0 +1,163 @@
I'm looking into another SSH-based hang in the DataLad tests. I can't
trigger the hang on my local system (Debian Buster), but in a Ubuntu
Xenial VM I can. The hang bisects to 1f2e2d15e (async exception
safety, 2020-06-03). It disappears on the parent of 1f2e2d15e or if I
revert that commit on top of master (currently 3b6754e2a).
I was able to reduce the hang down to a `git annex get` from an rsync
remote. Here is a script that triggers the hang via a Xenial Docker
container. Sorry for the length; given the system interaction, it's
the simplest reproducer I've managed to come up with.
[[!format sh """
cd "$(mktemp -d ${TMPDIR:-/tmp}/ga-XXXXXXX)"
cat >demo.sh <<'EOF'
git annex version
cd "$(mktemp -d /tmp/ga-XXXXXXX)"
remdir="$PWD/store"
mkdir "$remdir"
git init repo
(
cd repo
git annex init
git annex initremote r type=rsync rsyncurl="localhost:$remdir" encryption=none
echo 0 >f0
git annex add f0
git commit -m'f0'
git annex copy --to=r f0
git annex drop f0
)
git clone repo clone
(
cd clone
git annex init clone
git annex enableremote r
git annex get --debug f0
)
EOF
cat >Dockerfile <<'EOF'
FROM ubuntu:xenial
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update
RUN apt-get install -y --no-install-recommends \
curl rsync openssh-client openssh-server ca-certificates
RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN cd /root && curl -fSsL \
https://downloads.kitenet.net/git-annex/autobuild/amd64/git-annex-standalone-amd64.tar.gz \
| tar xz
ENV PATH="/root/git-annex.linux:$PATH"
RUN git config --system user.name "u"
RUN git config --system user.email "u@e"
RUN mkdir -p /root/.ssh
RUN mkdir -p /var/run/sshd
RUN ssh-keygen -f /root/.ssh/id_rsa -N ""
RUN cat /root/.ssh/id_rsa.pub >>/root/.ssh/authorized_keys
RUN echo "Host localhost\nStrictHostKeyChecking no\n" >>/root/.ssh/config
COPY demo.sh /root/demo.sh
CMD /usr/sbin/sshd && sh /root/demo.sh
EOF
docker build -t ga-rsync-hang:latest .
docker run -it --rm ga-rsync-hang:latest
"""]]
Output, where the last line stalled:
```
[... 50 lines ...]
git-annex version: 8.20200618-g3b6754e2a
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.20 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.1.0 ghc-8.6.5 http-client-0.5.14 persistent-sqlite-2.9.3 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
[... 32 lines ...]
[2020-07-07 14:28:55.252423613] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2020-07-07 14:28:55.256082009] process done ExitSuccess
[2020-07-07 14:28:55.256316845] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2020-07-07 14:28:55.26056207] process done ExitSuccess
[2020-07-07 14:28:55.260917245] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","-z","--cached","--","f0"]
get f0 [2020-07-07 14:28:55.26481346] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2020-07-07 14:28:55.268786541] process done ExitSuccess
[2020-07-07 14:28:55.268997933] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2020-07-07 14:28:55.272176401] process done ExitSuccess
[2020-07-07 14:28:55.272550418] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..ea06186dfc3dab39316da534144d5d6ef3d6090c","--pretty=%H","-n1"]
[2020-07-07 14:28:55.275989985] process done ExitSuccess
[2020-07-07 14:28:55.27634403] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2020-07-07 14:28:55.276782737] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2020-07-07 14:28:55.279315206] read: git ["config","--null","--list"]
[2020-07-07 14:28:55.281990769] process done ExitSuccess
(from r...)
[2020-07-07 14:28:55.288194172] read: rsync ["-e","'ssh' '-S' '.git/annex/ssh/localhost' '-o' 'ControlMaster=auto' '-o' 'ControlPersist=yes' '-T'","--progress","--inplace","localhost:/tmp/ga-075MZKs/store/0c5/66e/'SHA256E-s2--9a271f2a916b0b6ee6cecb2426f0b3206ef074578be55d9bc94f6f3fe3ab86aa/SHA256E-s2--9a271f2a916b0b6ee6cecb2426f0b3206ef074578be55d9bc94f6f3fe3ab86aa'",".git/annex/tmp/SHA256E-s2--9a271f2a916b0b6ee6cecb2426f0b3206ef074578be55d9bc94f6f3fe3ab86aa"]
SHA256E-s2--9a271f2a916b0b6ee6cecb2426f0b3206ef074578be55d9bc94f6f3fe3ab86aa
0 0% 0.00kB/s 0:00:00
2 100% 1.95kB/s 0:00:00 (xfr#1, to-chk=0/1)
^C
```
Replacing "FROM ubuntu:xenial" with "FROM ubuntu:bionic" (a later
release) resolves the hang, so perhaps there is some interaction with
an older rsync or openssh version. Here are the versions that are
present in Xenial:
```
OpenSSH_7.2p2 Ubuntu-4ubuntu2.10, OpenSSL 1.0.2g 1 Mar 2016
rsync version 3.1.1 protocol version 31
Copyright (C) 1996-2014 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
append, ACLs, xattrs, iconv, symtimes, prealloc
```
Thinking it's an older version of openssh or rsync, I tried with an older version of Debian. Using "FROM debian:stretch-slim" doesn't hang. Here are the versions there:
```
OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2u 20 Dec 2019
rsync version 3.1.2 protocol version 31
Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
append, ACLs, xattrs, iconv, symtimes, prealloc
```
However, going back farther, "FROM debian:jessie-slim" does hang. The
versions there:
```
OpenSSH_6.7p1 Debian-5+deb8u8, OpenSSL 1.0.1t 3 May 2016
rsync version 3.1.1 protocol version 31
Copyright (C) 1996-2014 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
append, ACLs, xattrs, iconv, symtimes, prealloc
```
So perhaps there's some interaction with openssh before 7.4p1 or rsync
before 3.1.2. Those are pretty old versions, and, in the case of
Jessie, the release is now EOL. But I figured it was at least worth
writing up given that the hang isn't triggered until a recent commit
in git-annex.
[[!meta author=kyle]]
[[!tag projects/datalad]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="flpgdt@f64318f00d9e1c9535e11f5d27c80c1d799cce00"
nickname="flpgdt"
avatar="http://cdn.libravatar.org/avatar/df837a3ae490227608cc38a7c9edfc7a"
subject="comment 2"
date="2020-07-07T16:54:46Z"
content="""
thanks for the input. I will rethink this a bit, might share if come up with something interesting.
kind regards.
"""]]

View file

@ -0,0 +1,20 @@
I wanted to share some thoughts for an idea I had.
There are times when I want to stream data from a remote -- I want to start processing it immediately, and do not want to keep it in my annex when I am done with it.
I can give some examples:
* I have several projects which have a large number of similar text files, and they compress really well with borg or bup. For example, I have a repo with many [ncdu](https://dev.yorhel.nl/ncdu) json index files. They total 60G, but in a bup special remote, they are ~3G. In another repo, I have large highly differential tsv files.
* I have an annex with 5-10G video files that are stored in a variety of network special remotes. Most of them are in my Google Drive. I would like to be able to immediately start playing them with VLC rather than downloading and verifying them in their entirety.
It would look like this:
```
git annex cat "someindex.ncdu" | ncdu -f -
diff <(git annex cat "huge-data-dump1.tsv" -f mybupremote ) <(git annex cat "huge-data-dump2.tsv" -f mybupremote )
git annex cat "myvideo.mp4" -f googledrive | vlc -
```
I imagine that there might be issues with verification. But I really am ok with not verifying a video file I am streaming.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Lukey"
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
subject="comment 10"
date="2020-07-06T21:20:58Z"
content="""
I'm seeing larger improvements in my repo: ~40% speedup with -J2 and even ~200% speedup without jobs. Good work!
"""]]