From 68549767e91b8b2d8de2b68a3ff7b997d85bfb15 Mon Sep 17 00:00:00 2001 From: "https://bmwiedemann.zq1.de/" Date: Tue, 30 Jun 2020 15:26:10 +0000 Subject: [PATCH 1/4] Added a comment --- ...mment_2_a7d3117c18e0ae13d730071a8f2452d4._comment | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 doc/bugs/git-annex_does_not_build_reproducibly_from_readdir_order/comment_2_a7d3117c18e0ae13d730071a8f2452d4._comment diff --git a/doc/bugs/git-annex_does_not_build_reproducibly_from_readdir_order/comment_2_a7d3117c18e0ae13d730071a8f2452d4._comment b/doc/bugs/git-annex_does_not_build_reproducibly_from_readdir_order/comment_2_a7d3117c18e0ae13d730071a8f2452d4._comment new file mode 100644 index 0000000000..ccc38e8d97 --- /dev/null +++ b/doc/bugs/git-annex_does_not_build_reproducibly_from_readdir_order/comment_2_a7d3117c18e0ae13d730071a8f2452d4._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="https://bmwiedemann.zq1.de/" + nickname="bmwiedemann" + avatar="http://cdn.libravatar.org/avatar/96f3cd71c3d677f31ed8f79ffb8fb343a8282c085731f405997ff3ef77a1a71b" + subject="comment 2" + date="2020-06-30T15:26:10Z" + content=""" +We are already building openSUSE haskell packages sequentially since 2017-07-14 for that reason: + + +Here, non-determinism from filesystem readdir order is another independent class of issue. +"""]] From e6ca4cd0dff8ac2745d727ab9a124bc72698a84b Mon Sep 17 00:00:00 2001 From: Lukey Date: Tue, 30 Jun 2020 15:46:57 +0000 Subject: [PATCH 2/4] --- doc/todo/speed_up_git_annex_sync_--content_--all.mdwn | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 doc/todo/speed_up_git_annex_sync_--content_--all.mdwn diff --git a/doc/todo/speed_up_git_annex_sync_--content_--all.mdwn b/doc/todo/speed_up_git_annex_sync_--content_--all.mdwn new file mode 100644 index 0000000000..4019122278 --- /dev/null +++ b/doc/todo/speed_up_git_annex_sync_--content_--all.mdwn @@ -0,0 +1,7 @@ +Hello Joeyh, +Overall the performance of git-annex is good for me. However, one case where git-annex could improve is with "git annex sync --content --all", as it takes 20 minutes just to traverse all keys without uploading/downloading anything in my repo. I've looked at the code (learnig some haskell along the way) and I think it's due to getting the location logs via git cat-file. I see two ways how performance could be improved: + +1. Use "git cat-file --batch-all-objects --unordered" and traverse the keys in whatever order that outputs the location logs. +2. Cache the location logs in the sqlite database + +Other than that, git-annex has really solved all my file syncing and archival needs and is just awesome! From e4e28452a0b3e9ddd0bfd723df2934abed73ab47 Mon Sep 17 00:00:00 2001 From: kyle Date: Tue, 30 Jun 2020 15:48:03 +0000 Subject: [PATCH 3/4] annex-ssh-options regression --- ...-ssh-options_dropped_since_8.20200330.mdwn | 47 +++++++++++++++++++ 1 file changed, 47 insertions(+) create mode 100644 doc/bugs/annex-ssh-options_dropped_since_8.20200330.mdwn diff --git a/doc/bugs/annex-ssh-options_dropped_since_8.20200330.mdwn b/doc/bugs/annex-ssh-options_dropped_since_8.20200330.mdwn new file mode 100644 index 0000000000..64a0214c09 --- /dev/null +++ b/doc/bugs/annex-ssh-options_dropped_since_8.20200330.mdwn @@ -0,0 +1,47 @@ +When debugging some ssh-related datalad tests that hang with newer +git-annex versions, I noticed that there was a regression in the +treatment of annex-ssh-options in c8fec6ab0 (Fix a minor bug that +caused options provided with -c to be passed multiple times to git, +2020-03-16). + +Here's a demo script. Pointing `SSHURL` to any ssh-accessible annex +repo should do. In the case below, the target is an annex repo with +one commit and no files in the working tree. + +[[!format sh """ +SSHURL="smaug:/home/kyle/scratch/repo" + +cd "$(mktemp -d ${TMPDIR:-/tmp}/gx-ssh-opts-XXXXXXX)" +git clone "$SSHURL" ./ >/dev/null 2>&1 +git annex init \ + -c annex.sshcaching=false \ + -c remote.origin.annex-ssh-options="-o ControlMaster=auto -S CACHE" \ + --debug 2>&1 | grep 'read: ssh' +"""]] + +With the parent of the above commit checked out (b166223d4), the +script outputs + +``` +[2020-06-30 11:09:43.853918422] read: ssh ["smaug","-o","ControlMaster=auto","-S","CACHE","-n","-T","git-annex-shell 'configlist' '/home/kyle/scratch/repo' '--debug'"] +``` + +With c8fec6ab0 checked out, it outputs + +``` +[2020-06-30 11:11:03.833678263] read: ssh ["smaug","-S",".git/annex/ssh/smaug","-o","ControlMaster=auto","-o","ControlPersist=yes","-n","-T","git-annex-shell 'configlist' '/home/kyle/scratch/repo' '--debug'"] +[2020-06-30 11:11:04.448046366] read: ssh ["-O","stop","-S","smaug","-o","ControlMaster=auto","-o","ControlPersist=yes","localhost"] +``` + +It looks like the options specified via +`remote.origin.annex-ssh-options` are dropped, and git-annex switches +to using its built-in ssh caching. + +A recent commit on master (95b8b4a5a) shows the same behavior. + +I've tried to work through the config-related handling and understand +why the condition from c8fec6ab0 results in the ssh options being +dropped, but I haven't made any progress yet. + +[[!meta author=kyle]] +[[!tag projects/datalad]] From 59e64842a7acd8002ef1c427c3471c72d8e82562 Mon Sep 17 00:00:00 2001 From: kyle Date: Tue, 30 Jun 2020 16:26:17 +0000 Subject: [PATCH 4/4] Added a comment --- ..._1a8277f3db09f49c076f5938da2b4390._comment | 33 +++++++++++++++++++ 1 file changed, 33 insertions(+) create mode 100644 doc/bugs/git-annex_does_not_build_reproducibly_from_readdir_order/comment_3_1a8277f3db09f49c076f5938da2b4390._comment diff --git a/doc/bugs/git-annex_does_not_build_reproducibly_from_readdir_order/comment_3_1a8277f3db09f49c076f5938da2b4390._comment b/doc/bugs/git-annex_does_not_build_reproducibly_from_readdir_order/comment_3_1a8277f3db09f49c076f5938da2b4390._comment new file mode 100644 index 0000000000..b5ff287f67 --- /dev/null +++ b/doc/bugs/git-annex_does_not_build_reproducibly_from_readdir_order/comment_3_1a8277f3db09f49c076f5938da2b4390._comment @@ -0,0 +1,33 @@ +[[!comment format=mdwn + username="kyle" + avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3" + subject="comment 3" + date="2020-06-30T16:26:17Z" + content=""" +[ I don't have a good understanding of the build issues here. I'm + sorry if this isn't relevant. ] + +> I found, the build becomes reproducible, when using a filesystem +> with deterministic readdir order such as disorderfs with sort mode. + +This reminded me of an issue with Guix's git-annex build: +. In that case, +the nondeterminism came from \"package database files that are +generated by ghc-pkg (where readdir is used and the result isn’t +sorted)\". + +The fix on Guix's end, 5de93cdba7 (gnu: ghc-8: Patch ghc-pkg for +reproducibility, 2019-01-17), was at the level of the ghc package. It +replaced the following line in utils/ghc-pkg/Main.hs + +``` +confs = map (path ) $ filter (\".conf\" `isSuffixOf`) fs +``` + +with + +``` +confs = map (path ) $ filter (\".conf\" `isSuffixOf`) (sort fs) +``` + +"""]]