Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
1d335520df
4 changed files with 99 additions and 0 deletions
47
doc/bugs/annex-ssh-options_dropped_since_8.20200330.mdwn
Normal file
47
doc/bugs/annex-ssh-options_dropped_since_8.20200330.mdwn
Normal file
|
@ -0,0 +1,47 @@
|
||||||
|
When debugging some ssh-related datalad tests that hang with newer
|
||||||
|
git-annex versions, I noticed that there was a regression in the
|
||||||
|
treatment of annex-ssh-options in c8fec6ab0 (Fix a minor bug that
|
||||||
|
caused options provided with -c to be passed multiple times to git,
|
||||||
|
2020-03-16).
|
||||||
|
|
||||||
|
Here's a demo script. Pointing `SSHURL` to any ssh-accessible annex
|
||||||
|
repo should do. In the case below, the target is an annex repo with
|
||||||
|
one commit and no files in the working tree.
|
||||||
|
|
||||||
|
[[!format sh """
|
||||||
|
SSHURL="smaug:/home/kyle/scratch/repo"
|
||||||
|
|
||||||
|
cd "$(mktemp -d ${TMPDIR:-/tmp}/gx-ssh-opts-XXXXXXX)"
|
||||||
|
git clone "$SSHURL" ./ >/dev/null 2>&1
|
||||||
|
git annex init \
|
||||||
|
-c annex.sshcaching=false \
|
||||||
|
-c remote.origin.annex-ssh-options="-o ControlMaster=auto -S CACHE" \
|
||||||
|
--debug 2>&1 | grep 'read: ssh'
|
||||||
|
"""]]
|
||||||
|
|
||||||
|
With the parent of the above commit checked out (b166223d4), the
|
||||||
|
script outputs
|
||||||
|
|
||||||
|
```
|
||||||
|
[2020-06-30 11:09:43.853918422] read: ssh ["smaug","-o","ControlMaster=auto","-S","CACHE","-n","-T","git-annex-shell 'configlist' '/home/kyle/scratch/repo' '--debug'"]
|
||||||
|
```
|
||||||
|
|
||||||
|
With c8fec6ab0 checked out, it outputs
|
||||||
|
|
||||||
|
```
|
||||||
|
[2020-06-30 11:11:03.833678263] read: ssh ["smaug","-S",".git/annex/ssh/smaug","-o","ControlMaster=auto","-o","ControlPersist=yes","-n","-T","git-annex-shell 'configlist' '/home/kyle/scratch/repo' '--debug'"]
|
||||||
|
[2020-06-30 11:11:04.448046366] read: ssh ["-O","stop","-S","smaug","-o","ControlMaster=auto","-o","ControlPersist=yes","localhost"]
|
||||||
|
```
|
||||||
|
|
||||||
|
It looks like the options specified via
|
||||||
|
`remote.origin.annex-ssh-options` are dropped, and git-annex switches
|
||||||
|
to using its built-in ssh caching.
|
||||||
|
|
||||||
|
A recent commit on master (95b8b4a5a) shows the same behavior.
|
||||||
|
|
||||||
|
I've tried to work through the config-related handling and understand
|
||||||
|
why the condition from c8fec6ab0 results in the ssh options being
|
||||||
|
dropped, but I haven't made any progress yet.
|
||||||
|
|
||||||
|
[[!meta author=kyle]]
|
||||||
|
[[!tag projects/datalad]]
|
|
@ -0,0 +1,12 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://bmwiedemann.zq1.de/"
|
||||||
|
nickname="bmwiedemann"
|
||||||
|
avatar="http://cdn.libravatar.org/avatar/96f3cd71c3d677f31ed8f79ffb8fb343a8282c085731f405997ff3ef77a1a71b"
|
||||||
|
subject="comment 2"
|
||||||
|
date="2020-06-30T15:26:10Z"
|
||||||
|
content="""
|
||||||
|
We are already building openSUSE haskell packages sequentially since 2017-07-14 for that reason:
|
||||||
|
<https://build.opensuse.org/package/rdiff/devel:languages:haskell/ghc-rpm-macros?linkrev=base&rev=79>
|
||||||
|
|
||||||
|
Here, non-determinism from filesystem readdir order is another independent class of issue.
|
||||||
|
"""]]
|
|
@ -0,0 +1,33 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="kyle"
|
||||||
|
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
|
||||||
|
subject="comment 3"
|
||||||
|
date="2020-06-30T16:26:17Z"
|
||||||
|
content="""
|
||||||
|
[ I don't have a good understanding of the build issues here. I'm
|
||||||
|
sorry if this isn't relevant. ]
|
||||||
|
|
||||||
|
> I found, the build becomes reproducible, when using a filesystem
|
||||||
|
> with deterministic readdir order such as disorderfs with sort mode.
|
||||||
|
|
||||||
|
This reminded me of an issue with Guix's git-annex build:
|
||||||
|
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=33922>. In that case,
|
||||||
|
the nondeterminism came from \"package database files that are
|
||||||
|
generated by ghc-pkg (where readdir is used and the result isn’t
|
||||||
|
sorted)\".
|
||||||
|
|
||||||
|
The fix on Guix's end, 5de93cdba7 (gnu: ghc-8: Patch ghc-pkg for
|
||||||
|
reproducibility, 2019-01-17), was at the level of the ghc package. It
|
||||||
|
replaced the following line in utils/ghc-pkg/Main.hs
|
||||||
|
|
||||||
|
```
|
||||||
|
confs = map (path </>) $ filter (\".conf\" `isSuffixOf`) fs
|
||||||
|
```
|
||||||
|
|
||||||
|
with
|
||||||
|
|
||||||
|
```
|
||||||
|
confs = map (path </>) $ filter (\".conf\" `isSuffixOf`) (sort fs)
|
||||||
|
```
|
||||||
|
|
||||||
|
"""]]
|
7
doc/todo/speed_up_git_annex_sync_--content_--all.mdwn
Normal file
7
doc/todo/speed_up_git_annex_sync_--content_--all.mdwn
Normal file
|
@ -0,0 +1,7 @@
|
||||||
|
Hello Joeyh,
|
||||||
|
Overall the performance of git-annex is good for me. However, one case where git-annex could improve is with "git annex sync --content --all", as it takes 20 minutes just to traverse all keys without uploading/downloading anything in my repo. I've looked at the code (learnig some haskell along the way) and I think it's due to getting the location logs via git cat-file. I see two ways how performance could be improved:
|
||||||
|
|
||||||
|
1. Use "git cat-file --batch-all-objects --unordered" and traverse the keys in whatever order that outputs the location logs.
|
||||||
|
2. Cache the location logs in the sqlite database
|
||||||
|
|
||||||
|
Other than that, git-annex has really solved all my file syncing and archival needs and is just awesome!
|
Loading…
Add table
Add a link
Reference in a new issue