This commit is contained in:
Joey Hess 2021-09-06 15:32:15 -04:00
parent 9b2c88cdaf
commit 0dc203b7c5
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38

View file

@ -0,0 +1,73 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-09-06T18:52:21Z"
content="""
I tried to reproduce this on Linux, and cannot:
joey@darkstar:~/tmp/r>/usr/bin/time git-annex get bigfile
get bigfile (from origin...)
ok
(recording state in git...)
11.66user 2.08system 0:13.76elapsed 99%CPU (0avgtext+0avgdata 99012maxresident)k
206440inputs+2048856outputs (46major+21865minor)pagefaults 0swaps
joey@darkstar:~/tmp/r>git annex drop
drop bigfile ok
(recording state in git...)
joey@darkstar:~/tmp/r>/usr/bin/time git annex sync --no-commit --no-push --no-pull --content-of bigfile
get bigfile (from origin...)
ok
(recording state in git...)
11.43user 1.91system 0:13.43elapsed 99%CPU (0avgtext+0avgdata 97132maxresident)k
60016inputs+2048312outputs (25major+11680minor)pagefaults 0swaps
One reason it may do an extra checksum is if for some reason the inode
cache is stale. Since [[!commit 3b5a3e168d8decd196509ad582ad4b8795d979a6]]
it will check that and checksum before starting to send the content.
In this case, it will display "(checksum...)" before the progress display
has started.
One way I can reproduce that, on Linux, is to touch the annex object file
in the remote git repository before running git-annex. This is the result:
joey@darkstar:~/tmp/r>/usr/bin/time git-annex get bigfile
get bigfile (from origin...) (checksum...)
ok
(recording state in git...)
22.06user 2.22system 0:24.85elapsed 97%CPU (0avgtext+0avgdata 97192maxresident)k
704inputs+2048432outputs (33major+10701minor)pagefaults 0swaps
But that shows that git-annex get also behaves that way, which does not
explain why only git-annex sync would have the problem for you.
(And the two run the same code; sync literally runs code from get.) Also
I don't know why the inode cache would be getting stale on windows.
If it does not display "(checksum...)" before the progress display,
it's certianly not that. And if it does not display "(checksum...)"
at any point at all, that would be good to know, because normally
git-annex does display that when checksumming.
Your transcript seems to show no such output.
At the "2021-08-27 00:15:36.6868986" timestamp in your log, it's just
completed updating the location log, and you identify the pause after that
as where it's checksumming. So that seems to suggest it's already
transferred the content there, and updated the location log, before
checksumming. But it doesn't do anything normally
after updating the location log, and it won't update the location log
before it's satisfied it transferred the right content, so after any
checksumming.
Possibly `git annex sync` is trying to drop the content from some
remote due to its preferred content setting no longer wanting the content
to be there. Do you have preferred content expressions configured?
(I'm doubtful about this theaory because I don't think it ever hashes the
content during a drop, but one difference between `get` and `sync`
is that sync drops..)
Only other thing I can think that it could be is perhaps git is running
the smudge/clean filter on the content, after the transfer, and git-annex
hashes it again when run that way. Hard for me to tell from the filespray
log if that might be the case. Setting `GIT_TRACE=1` would show if this
happens.
"""]]