Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
ba0adefe4c
4 changed files with 202 additions and 0 deletions
doc
bugs
forum/Is_it_possible_to_setup_a_git-annex_Glacier_gatway__63__
todo
idea__58___git-annex_cat.mdwn
speed_up_git_annex_sync_--content_--all
|
@ -0,0 +1,163 @@
|
|||
I'm looking into another SSH-based hang in the DataLad tests. I can't
|
||||
trigger the hang on my local system (Debian Buster), but in a Ubuntu
|
||||
Xenial VM I can. The hang bisects to 1f2e2d15e (async exception
|
||||
safety, 2020-06-03). It disappears on the parent of 1f2e2d15e or if I
|
||||
revert that commit on top of master (currently 3b6754e2a).
|
||||
|
||||
I was able to reduce the hang down to a `git annex get` from an rsync
|
||||
remote. Here is a script that triggers the hang via a Xenial Docker
|
||||
container. Sorry for the length; given the system interaction, it's
|
||||
the simplest reproducer I've managed to come up with.
|
||||
|
||||
[[!format sh """
|
||||
cd "$(mktemp -d ${TMPDIR:-/tmp}/ga-XXXXXXX)"
|
||||
|
||||
cat >demo.sh <<'EOF'
|
||||
git annex version
|
||||
|
||||
cd "$(mktemp -d /tmp/ga-XXXXXXX)"
|
||||
remdir="$PWD/store"
|
||||
mkdir "$remdir"
|
||||
git init repo
|
||||
(
|
||||
cd repo
|
||||
git annex init
|
||||
git annex initremote r type=rsync rsyncurl="localhost:$remdir" encryption=none
|
||||
echo 0 >f0
|
||||
git annex add f0
|
||||
git commit -m'f0'
|
||||
git annex copy --to=r f0
|
||||
git annex drop f0
|
||||
)
|
||||
|
||||
git clone repo clone
|
||||
(
|
||||
cd clone
|
||||
git annex init clone
|
||||
git annex enableremote r
|
||||
git annex get --debug f0
|
||||
)
|
||||
EOF
|
||||
|
||||
cat >Dockerfile <<'EOF'
|
||||
FROM ubuntu:xenial
|
||||
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
RUN apt-get update
|
||||
RUN apt-get install -y --no-install-recommends \
|
||||
curl rsync openssh-client openssh-server ca-certificates
|
||||
RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
|
||||
|
||||
RUN cd /root && curl -fSsL \
|
||||
https://downloads.kitenet.net/git-annex/autobuild/amd64/git-annex-standalone-amd64.tar.gz \
|
||||
| tar xz
|
||||
ENV PATH="/root/git-annex.linux:$PATH"
|
||||
|
||||
RUN git config --system user.name "u"
|
||||
RUN git config --system user.email "u@e"
|
||||
|
||||
RUN mkdir -p /root/.ssh
|
||||
RUN mkdir -p /var/run/sshd
|
||||
RUN ssh-keygen -f /root/.ssh/id_rsa -N ""
|
||||
RUN cat /root/.ssh/id_rsa.pub >>/root/.ssh/authorized_keys
|
||||
RUN echo "Host localhost\nStrictHostKeyChecking no\n" >>/root/.ssh/config
|
||||
|
||||
COPY demo.sh /root/demo.sh
|
||||
CMD /usr/sbin/sshd && sh /root/demo.sh
|
||||
EOF
|
||||
|
||||
docker build -t ga-rsync-hang:latest .
|
||||
docker run -it --rm ga-rsync-hang:latest
|
||||
"""]]
|
||||
|
||||
Output, where the last line stalled:
|
||||
|
||||
```
|
||||
[... 50 lines ...]
|
||||
git-annex version: 8.20200618-g3b6754e2a
|
||||
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
|
||||
dependency versions: aws-0.20 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.1.0 ghc-8.6.5 http-client-0.5.14 persistent-sqlite-2.9.3 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
|
||||
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
|
||||
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
|
||||
operating system: linux x86_64
|
||||
supported repository versions: 8
|
||||
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
|
||||
[... 32 lines ...]
|
||||
[2020-07-07 14:28:55.252423613] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
|
||||
[2020-07-07 14:28:55.256082009] process done ExitSuccess
|
||||
[2020-07-07 14:28:55.256316845] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
|
||||
[2020-07-07 14:28:55.26056207] process done ExitSuccess
|
||||
[2020-07-07 14:28:55.260917245] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","-z","--cached","--","f0"]
|
||||
get f0 [2020-07-07 14:28:55.26481346] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
|
||||
[2020-07-07 14:28:55.268786541] process done ExitSuccess
|
||||
[2020-07-07 14:28:55.268997933] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
|
||||
[2020-07-07 14:28:55.272176401] process done ExitSuccess
|
||||
[2020-07-07 14:28:55.272550418] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..ea06186dfc3dab39316da534144d5d6ef3d6090c","--pretty=%H","-n1"]
|
||||
[2020-07-07 14:28:55.275989985] process done ExitSuccess
|
||||
[2020-07-07 14:28:55.27634403] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
|
||||
[2020-07-07 14:28:55.276782737] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
|
||||
[2020-07-07 14:28:55.279315206] read: git ["config","--null","--list"]
|
||||
[2020-07-07 14:28:55.281990769] process done ExitSuccess
|
||||
(from r...)
|
||||
[2020-07-07 14:28:55.288194172] read: rsync ["-e","'ssh' '-S' '.git/annex/ssh/localhost' '-o' 'ControlMaster=auto' '-o' 'ControlPersist=yes' '-T'","--progress","--inplace","localhost:/tmp/ga-075MZKs/store/0c5/66e/'SHA256E-s2--9a271f2a916b0b6ee6cecb2426f0b3206ef074578be55d9bc94f6f3fe3ab86aa/SHA256E-s2--9a271f2a916b0b6ee6cecb2426f0b3206ef074578be55d9bc94f6f3fe3ab86aa'",".git/annex/tmp/SHA256E-s2--9a271f2a916b0b6ee6cecb2426f0b3206ef074578be55d9bc94f6f3fe3ab86aa"]
|
||||
SHA256E-s2--9a271f2a916b0b6ee6cecb2426f0b3206ef074578be55d9bc94f6f3fe3ab86aa
|
||||
|
||||
0 0% 0.00kB/s 0:00:00
|
||||
2 100% 1.95kB/s 0:00:00 (xfr#1, to-chk=0/1)
|
||||
^C
|
||||
```
|
||||
|
||||
Replacing "FROM ubuntu:xenial" with "FROM ubuntu:bionic" (a later
|
||||
release) resolves the hang, so perhaps there is some interaction with
|
||||
an older rsync or openssh version. Here are the versions that are
|
||||
present in Xenial:
|
||||
|
||||
```
|
||||
OpenSSH_7.2p2 Ubuntu-4ubuntu2.10, OpenSSL 1.0.2g 1 Mar 2016
|
||||
|
||||
rsync version 3.1.1 protocol version 31
|
||||
Copyright (C) 1996-2014 by Andrew Tridgell, Wayne Davison, and others.
|
||||
Web site: http://rsync.samba.org/
|
||||
Capabilities:
|
||||
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
|
||||
socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
|
||||
append, ACLs, xattrs, iconv, symtimes, prealloc
|
||||
```
|
||||
|
||||
Thinking it's an older version of openssh or rsync, I tried with an older version of Debian. Using "FROM debian:stretch-slim" doesn't hang. Here are the versions there:
|
||||
|
||||
```
|
||||
OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2u 20 Dec 2019
|
||||
|
||||
rsync version 3.1.2 protocol version 31
|
||||
Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others.
|
||||
Web site: http://rsync.samba.org/
|
||||
Capabilities:
|
||||
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
|
||||
socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
|
||||
append, ACLs, xattrs, iconv, symtimes, prealloc
|
||||
```
|
||||
|
||||
However, going back farther, "FROM debian:jessie-slim" does hang. The
|
||||
versions there:
|
||||
|
||||
```
|
||||
OpenSSH_6.7p1 Debian-5+deb8u8, OpenSSL 1.0.1t 3 May 2016
|
||||
|
||||
rsync version 3.1.1 protocol version 31
|
||||
Copyright (C) 1996-2014 by Andrew Tridgell, Wayne Davison, and others.
|
||||
Web site: http://rsync.samba.org/
|
||||
Capabilities:
|
||||
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
|
||||
socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
|
||||
append, ACLs, xattrs, iconv, symtimes, prealloc
|
||||
```
|
||||
|
||||
So perhaps there's some interaction with openssh before 7.4p1 or rsync
|
||||
before 3.1.2. Those are pretty old versions, and, in the case of
|
||||
Jessie, the release is now EOL. But I figured it was at least worth
|
||||
writing up given that the hang isn't triggered until a recent commit
|
||||
in git-annex.
|
||||
|
||||
[[!meta author=kyle]]
|
||||
[[!tag projects/datalad]]
|
|
@ -0,0 +1,11 @@
|
|||
[[!comment format=mdwn
|
||||
username="flpgdt@f64318f00d9e1c9535e11f5d27c80c1d799cce00"
|
||||
nickname="flpgdt"
|
||||
avatar="http://cdn.libravatar.org/avatar/df837a3ae490227608cc38a7c9edfc7a"
|
||||
subject="comment 2"
|
||||
date="2020-07-07T16:54:46Z"
|
||||
content="""
|
||||
thanks for the input. I will rethink this a bit, might share if come up with something interesting.
|
||||
|
||||
kind regards.
|
||||
"""]]
|
20
doc/todo/idea__58___git-annex_cat.mdwn
Normal file
20
doc/todo/idea__58___git-annex_cat.mdwn
Normal file
|
@ -0,0 +1,20 @@
|
|||
I wanted to share some thoughts for an idea I had.
|
||||
|
||||
There are times when I want to stream data from a remote -- I want to start processing it immediately, and do not want to keep it in my annex when I am done with it.
|
||||
|
||||
I can give some examples:
|
||||
|
||||
* I have several projects which have a large number of similar text files, and they compress really well with borg or bup. For example, I have a repo with many [ncdu](https://dev.yorhel.nl/ncdu) json index files. They total 60G, but in a bup special remote, they are ~3G. In another repo, I have large highly differential tsv files.
|
||||
* I have an annex with 5-10G video files that are stored in a variety of network special remotes. Most of them are in my Google Drive. I would like to be able to immediately start playing them with VLC rather than downloading and verifying them in their entirety.
|
||||
|
||||
It would look like this:
|
||||
|
||||
```
|
||||
git annex cat "someindex.ncdu" | ncdu -f -
|
||||
|
||||
diff <(git annex cat "huge-data-dump1.tsv" -f mybupremote ) <(git annex cat "huge-data-dump2.tsv" -f mybupremote )
|
||||
|
||||
git annex cat "myvideo.mp4" -f googledrive | vlc -
|
||||
```
|
||||
|
||||
I imagine that there might be issues with verification. But I really am ok with not verifying a video file I am streaming.
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 10"
|
||||
date="2020-07-06T21:20:58Z"
|
||||
content="""
|
||||
I'm seeing larger improvements in my repo: ~40% speedup with -J2 and even ~200% speedup without jobs. Good work!
|
||||
"""]]
|
Loading…
Add table
Reference in a new issue