Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2023-06-02 12:14:03 -04:00
commit b8750bcb17
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38

View file

@ -0,0 +1,33 @@
[[!comment format=mdwn
username="jgoerzen"
avatar="http://cdn.libravatar.org/avatar/090740822c9dcdb39ffe506b890981b4"
subject="comment 8"
date="2023-06-02T03:25:27Z"
content="""
I recompiled git-annex with the latest master, and I'm seeing the same behavior as before.
I attached strace to the process for 10 seconds and ran some statistics:
```
wc -l /tmp/foo
106710 /tmp/foo
egrep 'pread64.14' /tmp/foo | wc -l
103228
```
fd 14 corresponds to the db file. These preads are all 4k, which matches the Sqlite page size on this database.
There were about a dozen writes; they were to an `[eventfd]` fd. I'm not sure what that is used for here.
The rest were mainly calles to futex, reads from the timerfd 4, and epolls.
If it were doing one pread64 per file, at a rate of roughly 10,000 per second, it should query for every file in about 15 seconds. But as I write this, the process has used 84 minutes of CPU time. Either it is making many queries for each file, or there is some pathology in sqlite causing the queries to explode into a vast number of individual reads.
It seems to me there is some meaningful difference between my situation and yours.
One thing I have noticed is that it is getting permission denied on about a dozen files out of the test set. However, I don't believe this could have been the case with my original test set (consisting of about 150,000 photos).
I am happy to do any kind of debugging I can that would be helpful.
I note that the Sqlite default cache is about 2MB; I don't know if a larger cache may be helpful. I suspect doing fewer syscalls may help, but the underlying problem of numerous queries would still exist.
"""]]