Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
1d335520df
4 changed files with 99 additions and 0 deletions
47
doc/bugs/annex-ssh-options_dropped_since_8.20200330.mdwn
Normal file
47
doc/bugs/annex-ssh-options_dropped_since_8.20200330.mdwn
Normal file
|
@ -0,0 +1,47 @@
|
|||
When debugging some ssh-related datalad tests that hang with newer
|
||||
git-annex versions, I noticed that there was a regression in the
|
||||
treatment of annex-ssh-options in c8fec6ab0 (Fix a minor bug that
|
||||
caused options provided with -c to be passed multiple times to git,
|
||||
2020-03-16).
|
||||
|
||||
Here's a demo script. Pointing `SSHURL` to any ssh-accessible annex
|
||||
repo should do. In the case below, the target is an annex repo with
|
||||
one commit and no files in the working tree.
|
||||
|
||||
[[!format sh """
|
||||
SSHURL="smaug:/home/kyle/scratch/repo"
|
||||
|
||||
cd "$(mktemp -d ${TMPDIR:-/tmp}/gx-ssh-opts-XXXXXXX)"
|
||||
git clone "$SSHURL" ./ >/dev/null 2>&1
|
||||
git annex init \
|
||||
-c annex.sshcaching=false \
|
||||
-c remote.origin.annex-ssh-options="-o ControlMaster=auto -S CACHE" \
|
||||
--debug 2>&1 | grep 'read: ssh'
|
||||
"""]]
|
||||
|
||||
With the parent of the above commit checked out (b166223d4), the
|
||||
script outputs
|
||||
|
||||
```
|
||||
[2020-06-30 11:09:43.853918422] read: ssh ["smaug","-o","ControlMaster=auto","-S","CACHE","-n","-T","git-annex-shell 'configlist' '/home/kyle/scratch/repo' '--debug'"]
|
||||
```
|
||||
|
||||
With c8fec6ab0 checked out, it outputs
|
||||
|
||||
```
|
||||
[2020-06-30 11:11:03.833678263] read: ssh ["smaug","-S",".git/annex/ssh/smaug","-o","ControlMaster=auto","-o","ControlPersist=yes","-n","-T","git-annex-shell 'configlist' '/home/kyle/scratch/repo' '--debug'"]
|
||||
[2020-06-30 11:11:04.448046366] read: ssh ["-O","stop","-S","smaug","-o","ControlMaster=auto","-o","ControlPersist=yes","localhost"]
|
||||
```
|
||||
|
||||
It looks like the options specified via
|
||||
`remote.origin.annex-ssh-options` are dropped, and git-annex switches
|
||||
to using its built-in ssh caching.
|
||||
|
||||
A recent commit on master (95b8b4a5a) shows the same behavior.
|
||||
|
||||
I've tried to work through the config-related handling and understand
|
||||
why the condition from c8fec6ab0 results in the ssh options being
|
||||
dropped, but I haven't made any progress yet.
|
||||
|
||||
[[!meta author=kyle]]
|
||||
[[!tag projects/datalad]]
|
|
@ -0,0 +1,12 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://bmwiedemann.zq1.de/"
|
||||
nickname="bmwiedemann"
|
||||
avatar="http://cdn.libravatar.org/avatar/96f3cd71c3d677f31ed8f79ffb8fb343a8282c085731f405997ff3ef77a1a71b"
|
||||
subject="comment 2"
|
||||
date="2020-06-30T15:26:10Z"
|
||||
content="""
|
||||
We are already building openSUSE haskell packages sequentially since 2017-07-14 for that reason:
|
||||
<https://build.opensuse.org/package/rdiff/devel:languages:haskell/ghc-rpm-macros?linkrev=base&rev=79>
|
||||
|
||||
Here, non-determinism from filesystem readdir order is another independent class of issue.
|
||||
"""]]
|
|
@ -0,0 +1,33 @@
|
|||
[[!comment format=mdwn
|
||||
username="kyle"
|
||||
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
|
||||
subject="comment 3"
|
||||
date="2020-06-30T16:26:17Z"
|
||||
content="""
|
||||
[ I don't have a good understanding of the build issues here. I'm
|
||||
sorry if this isn't relevant. ]
|
||||
|
||||
> I found, the build becomes reproducible, when using a filesystem
|
||||
> with deterministic readdir order such as disorderfs with sort mode.
|
||||
|
||||
This reminded me of an issue with Guix's git-annex build:
|
||||
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=33922>. In that case,
|
||||
the nondeterminism came from \"package database files that are
|
||||
generated by ghc-pkg (where readdir is used and the result isn’t
|
||||
sorted)\".
|
||||
|
||||
The fix on Guix's end, 5de93cdba7 (gnu: ghc-8: Patch ghc-pkg for
|
||||
reproducibility, 2019-01-17), was at the level of the ghc package. It
|
||||
replaced the following line in utils/ghc-pkg/Main.hs
|
||||
|
||||
```
|
||||
confs = map (path </>) $ filter (\".conf\" `isSuffixOf`) fs
|
||||
```
|
||||
|
||||
with
|
||||
|
||||
```
|
||||
confs = map (path </>) $ filter (\".conf\" `isSuffixOf`) (sort fs)
|
||||
```
|
||||
|
||||
"""]]
|
7
doc/todo/speed_up_git_annex_sync_--content_--all.mdwn
Normal file
7
doc/todo/speed_up_git_annex_sync_--content_--all.mdwn
Normal file
|
@ -0,0 +1,7 @@
|
|||
Hello Joeyh,
|
||||
Overall the performance of git-annex is good for me. However, one case where git-annex could improve is with "git annex sync --content --all", as it takes 20 minutes just to traverse all keys without uploading/downloading anything in my repo. I've looked at the code (learnig some haskell along the way) and I think it's due to getting the location logs via git cat-file. I see two ways how performance could be improved:
|
||||
|
||||
1. Use "git cat-file --batch-all-objects --unordered" and traverse the keys in whatever order that outputs the location logs.
|
||||
2. Cache the location logs in the sqlite database
|
||||
|
||||
Other than that, git-annex has really solved all my file syncing and archival needs and is just awesome!
|
Loading…
Add table
Reference in a new issue