Added a comment

This commit is contained in:
jkniiv 2021-09-25 05:47:16 +00:00 committed by admin
parent 177de6f8e5
commit c688d549d7

View file

@ -0,0 +1,121 @@
[[!comment format=mdwn
username="jkniiv"
avatar="http://cdn.libravatar.org/avatar/05fd8b33af7183342153e8013aa3713d"
subject="comment 2"
date="2021-09-25T05:47:15Z"
content="""
This is really odd. I tried to reproduce the issue in a new git annex repo and if I remember correctly I couldn't
-- even on Windows. It must be something that has crept up during the usage of my specific annex meant for
storing whole-system images/backups. Most definitely it does not display \"(checksum...)\" at any point. Nor does
running my git-annex commands while the environment variable `GIT_TRACE=1` is set show any evidence of git running
the smudge/clean filters.
I prepared a minimal version of my repo by way of the tip [[tips/splitting_a_repository]]. It does not have any
preferred-content (or required) expressions active as far as I know. It includes only one \"small\" (9GB) backup file,
the differential backup named `9BEAE03792B9FAFB-01-01.mrimg`. This repo and its clones exhibit all the same symptoms
reported above and most notably also on Linux with a ntfs-3g mount. In fact -- after some iteration -- I wrote
a bash script to test it out with strace(1):
[[!format sh \"\"\"
#! /usr/bin/env bash
set -eu
set -o pipefail
file_under_test=9BEAE03792B9FAFB-01-01.mrimg
strace_log_file_pre=strace.git-annex--op-
strace_log_file_post=.LOG
strace_flags='-y -f -qq -tt'
debug_log_file_pre=git-annex--op-
debug_log_file_post=.LOG
echo '* Running/strace(1)ing `git annex get` and `git annex sync -C` while targeting the same file ($file_under_test).'
echo '* We also `git annex drop` the file before each operation.'
echo '*'
echo '* Depending on the file this might take a while :)'
echo '*'
# op: get
##
echo '* op: get'
echo
git annex drop ${file_under_test}
strace -e trace=read ${strace_flags} -o ${strace_log_file_pre}get${strace_log_file_post} \
/bin/time git annex get ${file_under_test} \
--debug 2> ${debug_log_file_pre}get${debug_log_file_post}
annexobj=$(git annex find --format='.git/annex/objects/${hashdirlower}${backend}-s${bytesize}--${keyname}/${backend}-s${bytesize}--${keyname}\n' ${file_under_test})
echo '* Number of read(2) file operation of annex object during `get` sub-command:' \
$(fgrep \"${annexobj}\" ${strace_log_file_pre}get${strace_log_file_post} | wc -l)
echo
# op: sync -C
##
echo '* op: sync -C'
echo
git annex drop ${file_under_test}
strace -e trace=read ${strace_flags} -o ${strace_log_file_pre}sync-C${strace_log_file_post} \
/bin/time git annex sync --no-commit --no-push --no-pull -C ${file_under_test} \
--debug 2> ${debug_log_file_pre}sync-C${debug_log_file_post}
annexobj=$(git annex find --format='.git/annex/objects/${hashdirlower}${backend}-s${bytesize}--${keyname}/${backend}-s${bytesize}--${keyname}\n' ${file_under_test})
echo '* Number of read(2) file operation of annex object during `sync -C` sub-command:' \
$(fgrep \"${annexobj}\" ${strace_log_file_pre}sync-C${strace_log_file_post} | wc -l)
echo
echo '* Finished.'
# end of script
\"\"\"
]]
One such test run on Ubuntu 20.04 (git-annex version 8.20210804-gab7b5a492) shows the following:
[[!format sh \"\"\"
jkniiv@ubuntu:/media/veracrypt1/Reflect-varmistukset-test@issue2$ _scripts/exemplify-issue.git-annex--get-vs-sync-C.sh
* Running/strace(1)ing `git annex get` and `git annex sync -C` whilst targeting the same file ($file_under_test).
* We also `git annex drop` the file before each operation.
*
* Depending on the file this might take a while :)
*
* op: get
get 9BEAE03792B9FAFB-01-01.mrimg (from origin...)
ok
(recording state in git...)
* Number of read(2) file operation of annex object during `get` sub-command: 296845
* op: sync -C
drop 9BEAE03792B9FAFB-01-01.mrimg ok
(recording state in git...)
get 9BEAE03792B9FAFB-01-01.mrimg (from origin...)
ok
(recording state in git...)
(recording state in git...)
* Number of read(2) file operation of annex object during `sync -C` sub-command: 593690
* Finished.
jkniiv@ubuntu:/media/veracrypt1/Reflect-varmistukset-test@issue2$ bc
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
296845 * 2
593690
\"\"\"
]]
So the `sync -C` operation above seems to issue exactly double the amounts of reads compared to the `get` operation.
The `get` op doesn't seems to do any reads of the annex object on Windows but I suppose on Linux what it does is
a `tailVerify` instead of passing the file through a pipe or sth.
"""]]