From 1836f7a3186853e3531e74c763e26e15b2757548 Mon Sep 17 00:00:00 2001 From: "branchable@bafd175a4b99afd6ed72501042e364ebd3e0c45e" Date: Thu, 16 Jul 2020 16:32:41 +0000 Subject: [PATCH 1/3] --- ...ilter_drastically_slows_down_git_diff.mdwn | 116 ++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 doc/bugs/smudge_filter_drastically_slows_down_git_diff.mdwn diff --git a/doc/bugs/smudge_filter_drastically_slows_down_git_diff.mdwn b/doc/bugs/smudge_filter_drastically_slows_down_git_diff.mdwn new file mode 100644 index 0000000000..7f05e76bdf --- /dev/null +++ b/doc/bugs/smudge_filter_drastically_slows_down_git_diff.mdwn @@ -0,0 +1,116 @@ +### Please describe the problem. + +Running `git diff` on a small (v8) repo is extremely slow. + +I see [[todo/Long_Running_Filter_Process]] so this is maybe a known issue, but I thought I'd report just in case, and hopefully the extra data is useful. + +### What steps will reproduce the problem? + +Just run `git diff` with some unstaged changes: + +``` +$ diff --git a/README.md b/README.md +index 6532edf..2edc620 100644 +--- a/README.md ++++ b/README.md +@@ -42,3 +42,4 @@ General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . ++foo +git diff 0.17s user 0.07s system 115% cpu 0.204 total + +``` + +We can see that execve(2) is called 77 times, and there's a lot of I/O: + +``` +$ ( strace -c -w -f git diff 3>&1 >/dev/null 2>&3- ) | grep -v strace | head -n 15 +% time seconds usecs/call calls errors syscall +------ ----------- ----------- --------- --------- ---------------- + 42.18 3.777959 1397 2704 27 read + 19.27 1.725834 13697 126 50 wait4 + 16.30 1.460262 3023 483 64 futex + 8.24 0.737923 92240 8 poll + 8.04 0.720108 11430 63 epoll_wait + 1.36 0.122156 22 5387 3519 openat + 0.87 0.078340 20 3812 2858 stat + 0.60 0.053536 23 2243 mmap + 0.52 0.046553 20 2319 16 close + 0.42 0.037637 19 1890 fstat + 0.41 0.037030 480 77 execve + 0.26 0.023351 19 1207 4 rt_sigaction + 0.17 0.014840 19 760 rt_sigprocmask +``` + +If I comment out the smudge filter in `.git/config`, it becomes instant: + +``` +git diff 0.00s user 0.00s system 108% cpu 0.005 total +``` + +and we see only one `execve(2)` and a lot less I/O: + +``` +% time seconds usecs/call calls errors syscall +------ ----------- ----------- --------- --------- ---------------- + 20.42 0.001402 18 75 lstat + 15.06 0.001034 19 53 15 openat + 11.25 0.000773 16 48 fstat + 11.16 0.000766 20 38 read + 9.49 0.000652 16 39 close + 8.40 0.000577 16 35 mmap + 4.95 0.000340 18 18 11 access + 4.49 0.000308 308 1 execve + 2.70 0.000185 23 8 getdents64 + 2.62 0.000180 20 9 mprotect + 2.62 0.000180 17 10 stat + 1.11 0.000076 15 5 brk + 1.08 0.000074 24 3 munmap +``` + +0.17s might not sound like a lot, but when there are many unstaged changes this becomes painful. For example, it changes a git status operation in one of the popular git GUIs (magit) from being instant to taking 6 seconds just for a few one-line changes in a small repository. + +It's worth noting that this repo doesn't even have any annexed files - `annex init` was only run recently: + +``` +$ git annex info +trusted repositories: 0 +semitrusted repositories: 7 + 00000000-0000-0000-0000-000000000001 -- web + 00000000-0000-0000-0000-000000000002 -- bittorrent + 31167abf-8d32-4b40-96cf-4a3310fd1b10 -- arabian + 6371770a-303e-474c-9209-2c1565b25608 -- adam@pacific:~/.config/mr [pacific] + da95efd3-09f9-432b-ae28-bc4c5891c062 -- adam@aegean:~/.config/mr [aegean] + e0112d70-c2b3-44df-9057-887eefe5d225 -- adam@ionian:~/.config/mr [here] + ec80d559-508f-445c-a1fd-48f2805315b0 -- my.personal.domain.org +untrusted repositories: 0 +transfers in progress: none +available local disk space: 1 terabyte (+1 megabyte reserved) +local annex keys: 0 +local annex size: 0 bytes +annexed files in working tree: 0 +size of annexed files in working tree: 0 bytes +bloom filter size: 32 mebibytes (0% full) +backend usage: +``` + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 8.20200618-g4c44deb2c +build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite +dependency versions: aws-0.20 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.1.0 ghc-8.6.5 http-client-0.5.14 persistent-sqlite-2.9.3 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external +operating system: linux x86_64 +supported repository versions: 8 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 +local repository version: 8 +``` + +Running on openSUSE Tumbleweed. + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Yes, mostly a happy user ;-) From 059d56659bc60699eea0f5e8f804922011afed90 Mon Sep 17 00:00:00 2001 From: "branchable@bafd175a4b99afd6ed72501042e364ebd3e0c45e" Date: Thu, 16 Jul 2020 17:06:13 +0000 Subject: [PATCH 2/3] Added a comment: GIT_TRACE=1 debug shows smudge --clean is run twice per file --- ..._249cf95263d05cb6ed8ca4d895d58c54._comment | 36 +++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 doc/bugs/smudge_filter_drastically_slows_down_git_diff/comment_1_249cf95263d05cb6ed8ca4d895d58c54._comment diff --git a/doc/bugs/smudge_filter_drastically_slows_down_git_diff/comment_1_249cf95263d05cb6ed8ca4d895d58c54._comment b/doc/bugs/smudge_filter_drastically_slows_down_git_diff/comment_1_249cf95263d05cb6ed8ca4d895d58c54._comment new file mode 100644 index 0000000000..56b08e612a --- /dev/null +++ b/doc/bugs/smudge_filter_drastically_slows_down_git_diff/comment_1_249cf95263d05cb6ed8ca4d895d58c54._comment @@ -0,0 +1,36 @@ +[[!comment format=mdwn + username="branchable@bafd175a4b99afd6ed72501042e364ebd3e0c45e" + nickname="branchable" + avatar="http://cdn.libravatar.org/avatar/ae41dba34ee6000056f00793c695be75" + subject="GIT_TRACE=1 debug shows smudge --clean is run twice per file" + date="2020-07-16T17:06:12Z" + content=""" +I just realised that `GIT_TRACE=1 git diff` gives much more useful debug, and reveals that `smudge --clean` is run twice per file, as shown below. + +Based on [[todo/Long_Running_Filter_Process]], running it once per file would be expected (although somewhat unfortunate), but twice seems more like a bug, especially given that it causes the exact same set of git commands to be run twice. Or is there a good reason for that? + +``` +18:00:20.218359 git.c:442 trace: built-in: git diff +18:00:20.219470 run-command.c:663 trace: run_command: unset GIT_PAGER_IN_USE; LV=-c 'less -h100 -i -M -q -R -W -y100 -X -F -S' +18:00:20.221554 run-command.c:663 trace: run_command: 'git-annex smudge --clean -- '\''README.md'\''' +18:00:20.358194 git.c:442 trace: built-in: git config --null --list +18:00:20.376580 git.c:442 trace: built-in: git cat-file --batch +18:00:20.377090 git.c:442 trace: built-in: git cat-file '--batch-check=%(objectname) %(objecttype) %(objectsize)' +18:00:20.395429 git.c:442 trace: built-in: git show-ref git-annex +18:00:20.413485 git.c:442 trace: built-in: git show-ref --hash refs/heads/git-annex +18:00:20.430842 git.c:442 trace: built-in: git log refs/heads/git-annex..f8cdef27e7fae6b3cc312130b86aa3c553d2ff3b '--pretty=%H' -n1 +18:00:20.451166 git.c:442 trace: built-in: git cat-file --batch +18:00:20.456687 git.c:442 trace: built-in: git cat-file '--batch-check=%(objectname) %(objecttype) %(objectsize)' +18:00:20.467209 git.c:442 trace: built-in: git check-attr -z --stdin annex.backend annex.numcopies annex.largefiles -- +18:00:20.474348 run-command.c:663 trace: run_command: 'git-annex smudge --clean -- '\''README.md'\''' +18:00:20.607471 git.c:442 trace: built-in: git config --null --list +18:00:20.620552 git.c:442 trace: built-in: git cat-file --batch +18:00:20.626814 git.c:442 trace: built-in: git cat-file '--batch-check=%(objectname) %(objecttype) %(objectsize)' +18:00:20.639210 git.c:442 trace: built-in: git show-ref git-annex +18:00:20.656755 git.c:442 trace: built-in: git show-ref --hash refs/heads/git-annex +18:00:20.675204 git.c:442 trace: built-in: git log refs/heads/git-annex..f8cdef27e7fae6b3cc312130b86aa3c553d2ff3b '--pretty=%H' -n1 +18:00:20.692108 git.c:442 trace: built-in: git cat-file --batch +18:00:20.706338 git.c:442 trace: built-in: git cat-file '--batch-check=%(objectname) %(objecttype) %(objectsize)' +18:00:20.708876 git.c:442 trace: built-in: git check-attr -z --stdin annex.backend annex.numcopies annex.largefiles -- +``` +"""]] From 77299ae6e533f4808c9cc0eb201eedb638afbe8b Mon Sep 17 00:00:00 2001 From: Ilya_Shlyakhter Date: Thu, 16 Jul 2020 17:30:55 +0000 Subject: [PATCH 3/3] Added a comment: external backends --- ...mment_10_4c0f8a338a58cef8e239613292d5aa38._comment | 11 +++++++++++ 1 file changed, 11 insertions(+) create mode 100644 doc/todo/external_backends/comment_10_4c0f8a338a58cef8e239613292d5aa38._comment diff --git a/doc/todo/external_backends/comment_10_4c0f8a338a58cef8e239613292d5aa38._comment b/doc/todo/external_backends/comment_10_4c0f8a338a58cef8e239613292d5aa38._comment new file mode 100644 index 0000000000..06d29cd0bf --- /dev/null +++ b/doc/todo/external_backends/comment_10_4c0f8a338a58cef8e239613292d5aa38._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="Ilya_Shlyakhter" + avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0" + subject="external backends" + date="2020-07-16T17:30:55Z" + content=""" +\"the user trusts the backend program as much as they do the git-annex program, when it comes to whether a hash is cryptographically secure\" -- I'd trust git-annex more because it has many more users than any one niche backend program, so gets more testing and scrutiny. + +\"just have a name registry on this site, first come first served\" -- seems fair; there may be more complex/robust solutions but doubt they're needed in practice. + +"""]]