Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2021-09-25 11:17:45 -04:00
commit a2222b5259
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 145 additions and 0 deletions

View file

@ -0,0 +1,121 @@
[[!comment format=mdwn
username="jkniiv"
avatar="http://cdn.libravatar.org/avatar/05fd8b33af7183342153e8013aa3713d"
subject="comment 2"
date="2021-09-25T05:47:15Z"
content="""
This is really odd. I tried to reproduce the issue in a new git annex repo and if I remember correctly I couldn't
-- even on Windows. It must be something that has crept up during the usage of my specific annex meant for
storing whole-system images/backups. Most definitely it does not display \"(checksum...)\" at any point. Nor does
running my git-annex commands while the environment variable `GIT_TRACE=1` is set show any evidence of git running
the smudge/clean filters.
I prepared a minimal version of my repo by way of the tip [[tips/splitting_a_repository]]. It does not have any
preferred-content (or required) expressions active as far as I know. It includes only one \"small\" (9GB) backup file,
the differential backup named `9BEAE03792B9FAFB-01-01.mrimg`. This repo and its clones exhibit all the same symptoms
reported above and most notably also on Linux with a ntfs-3g mount. In fact -- after some iteration -- I wrote
a bash script to test it out with strace(1):
[[!format sh \"\"\"
#! /usr/bin/env bash
set -eu
set -o pipefail
file_under_test=9BEAE03792B9FAFB-01-01.mrimg
strace_log_file_pre=strace.git-annex--op-
strace_log_file_post=.LOG
strace_flags='-y -f -qq -tt'
debug_log_file_pre=git-annex--op-
debug_log_file_post=.LOG
echo '* Running/strace(1)ing `git annex get` and `git annex sync -C` while targeting the same file ($file_under_test).'
echo '* We also `git annex drop` the file before each operation.'
echo '*'
echo '* Depending on the file this might take a while :)'
echo '*'
# op: get
##
echo '* op: get'
echo
git annex drop ${file_under_test}
strace -e trace=read ${strace_flags} -o ${strace_log_file_pre}get${strace_log_file_post} \
/bin/time git annex get ${file_under_test} \
--debug 2> ${debug_log_file_pre}get${debug_log_file_post}
annexobj=$(git annex find --format='.git/annex/objects/${hashdirlower}${backend}-s${bytesize}--${keyname}/${backend}-s${bytesize}--${keyname}\n' ${file_under_test})
echo '* Number of read(2) file operation of annex object during `get` sub-command:' \
$(fgrep \"${annexobj}\" ${strace_log_file_pre}get${strace_log_file_post} | wc -l)
echo
# op: sync -C
##
echo '* op: sync -C'
echo
git annex drop ${file_under_test}
strace -e trace=read ${strace_flags} -o ${strace_log_file_pre}sync-C${strace_log_file_post} \
/bin/time git annex sync --no-commit --no-push --no-pull -C ${file_under_test} \
--debug 2> ${debug_log_file_pre}sync-C${debug_log_file_post}
annexobj=$(git annex find --format='.git/annex/objects/${hashdirlower}${backend}-s${bytesize}--${keyname}/${backend}-s${bytesize}--${keyname}\n' ${file_under_test})
echo '* Number of read(2) file operation of annex object during `sync -C` sub-command:' \
$(fgrep \"${annexobj}\" ${strace_log_file_pre}sync-C${strace_log_file_post} | wc -l)
echo
echo '* Finished.'
# end of script
\"\"\"
]]
One such test run on Ubuntu 20.04 (git-annex version 8.20210804-gab7b5a492) shows the following:
[[!format sh \"\"\"
jkniiv@ubuntu:/media/veracrypt1/Reflect-varmistukset-test@issue2$ _scripts/exemplify-issue.git-annex--get-vs-sync-C.sh
* Running/strace(1)ing `git annex get` and `git annex sync -C` whilst targeting the same file ($file_under_test).
* We also `git annex drop` the file before each operation.
*
* Depending on the file this might take a while :)
*
* op: get
get 9BEAE03792B9FAFB-01-01.mrimg (from origin...)
ok
(recording state in git...)
* Number of read(2) file operation of annex object during `get` sub-command: 296845
* op: sync -C
drop 9BEAE03792B9FAFB-01-01.mrimg ok
(recording state in git...)
get 9BEAE03792B9FAFB-01-01.mrimg (from origin...)
ok
(recording state in git...)
(recording state in git...)
* Number of read(2) file operation of annex object during `sync -C` sub-command: 593690
* Finished.
jkniiv@ubuntu:/media/veracrypt1/Reflect-varmistukset-test@issue2$ bc
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
296845 * 2
593690
\"\"\"
]]
So the `sync -C` operation above seems to issue exactly double the amounts of reads compared to the `get` operation.
The `get` op doesn't seems to do any reads of the annex object on Windows but I suppose on Linux what it does is
a `tailVerify` instead of passing the file through a pipe or sth.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="jkniiv"
avatar="http://cdn.libravatar.org/avatar/05fd8b33af7183342153e8013aa3713d"
subject="comment 3"
date="2021-09-25T06:05:40Z"
content="""
Joey, would you be interested in taking a look at my minimal test case if I could send you the repo somehow?
I could just forcefully `drop` the last sensitive file in the annex and add an innocuous, say 100MB random file,
into the annex instead and then zip the repo up (with 7z or fsarchiver) and send it to you (by whatever means)
if that would be fine with you, perhaps? I know this an odd corner case but I would be definitely grateful if
you could oblige this hell-bent Windows user here. :)
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="bmx007@171b90624bc8f788a2a925a00b98aef5942e4787"
nickname="bmx007"
avatar="http://cdn.libravatar.org/avatar/d3a5bd12fe6d876527a3cf4ac0de5fc6"
subject="comment 2"
date="2021-09-24T20:08:22Z"
content="""
I am using `git annex diff-driver`. What do you mean by unlocked ?
The file `$1` is unlocked indeed but it works, the problem is `$2` which is from `HEAD`.
I'm not sure how a file from the git history could be unlocked ?
"""]]