rename an old closed bug to avoid filename too long on windows checkout
This commit is contained in:
parent
cd3c5afffd
commit
7f652c5a22
12 changed files with 0 additions and 0 deletions
|
@ -0,0 +1,15 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 10"""
|
||||
date="2020-04-20T17:24:53Z"
|
||||
content="""
|
||||
Implemented the cat-file pool. Capped at 2 cat-files of each distinct type,
|
||||
so it will start a max of 8 no matter the -J level.
|
||||
|
||||
(Although cat-file can also be run in those repositories so there will be
|
||||
more then.)
|
||||
|
||||
While testing, I noticed git-anenx drop -Jn starts n git check-attr
|
||||
processes, so the same thing ought to be done with them. Leaving this bug open
|
||||
for that, but I do think that the problem you reported should be fixed now.
|
||||
"""]]
|
|
@ -0,0 +1,7 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 11"""
|
||||
date="2020-04-21T15:23:57Z"
|
||||
content="""
|
||||
check-attr and check-ignore also converted to resource pools
|
||||
"""]]
|
|
@ -0,0 +1,26 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2020-03-16T16:13:00Z"
|
||||
content="""
|
||||
The obvious question to ask, which I can't really imagine making any
|
||||
progress without an answer to: What files did git-annex have open?
|
||||
|
||||
I did notice that of the two git-annex logs, one got 19 files before
|
||||
failing, while the other got 27. It seems unlikely that, if git-annex, or
|
||||
an external remote, or git, or whatever is somehow leaking file handles,
|
||||
it would leak different numbers at different times. Which leads to the
|
||||
second question: What else on the system has files open and how many?
|
||||
|
||||
OSX has a global limit of 12k open files, and a per process limit of 10k.
|
||||
|
||||
`git-annex get` on linux needs to open around 16 files per file it
|
||||
downloads. So if git-annex were somehow leaking every single open FD,
|
||||
it would successfully download over 600 files before hitting the
|
||||
per-process limit. If every subprocess git-annex forks also leaked every
|
||||
open FD, it would of course vary by remote, but with a regular git clone
|
||||
on the local filesystem, the number of files opened per get is still only
|
||||
62, so still over an order of magnitude less.
|
||||
|
||||
Seems much more likely that the system is unhappy for some other reason.
|
||||
"""]]
|
|
@ -0,0 +1,26 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="it is many more "open files" in reality"
|
||||
date="2020-04-16T03:41:07Z"
|
||||
content="""
|
||||
Michael has reported [a similar issue on Linux](https://github.com/datalad/datalad/issues/4404). I was initially also \"skeptical\". But the reason really is that each git-annex process takes HUNDREDS of open files (dynamic libraries etc), and parallel execution of `get` adds a good number of pipes on top (counted ~3000 for `get -J 8` process). I thought to investigate more before reporting and then randomly ran into this not so old report from myself ;)
|
||||
|
||||
A quick demo:
|
||||
[[!format sh \"\"\"
|
||||
$> echo BEFORE; lsof | grep annex | nl | tail -n 2; git clone http://datasets.datalad.org/allen-brain-observatory/visual-coding-neuropixels/ecephys-cache/.git && cd ecephys-cache && git annex get -J5 * >/dev/null & p=$! && sleep 3 && echo DURING && lsof | grep annex | nl | tail -n 2; kill %1
|
||||
BEFORE
|
||||
Cloning into 'ecephys-cache'...
|
||||
remote: Counting objects: 5875, done.
|
||||
remote: Compressing objects: 100% (3046/3046), done.
|
||||
remote: Total 5875 (delta 2335), reused 4599 (delta 1424)
|
||||
Receiving objects: 100% (5875/5875), 73.55 MiB | 30.39 MiB/s, done.
|
||||
Resolving deltas: 100% (2335/2335), done.
|
||||
Checking out files: 100% (573/573), done.
|
||||
[1] 17335
|
||||
DURING
|
||||
2242 git 17424 yoh 67w REG 9,1 40018173 131020 /tmp/ecephys-cache/.git/annex/tmp/SHA512E-s665395296--8327d0715923b88a2b6b179d02a40acb1630e420a73a16a3422b6b245e9c0e57e21529919136492ab2c746256f99831200c36b7e071ea24f25abb37efc28de13.h5
|
||||
2243 git 17424 yoh 68w REG 9,1 67095741 131021 /tmp/ecephys-cache/.git/annex/tmp/SHA512E-s166348896--3bb739a0df1acd478eb84545a7c22c31933458fcd44ce211d4dd555bc979170bef11126064fed730e3b289d41999cc1c6fb0b6c35870bb996a4faa2e34a75403.h5
|
||||
\"\"\"]]
|
||||
so with `-J5` 3 seconds after initial call with `-J5` I get over 2k open files used by annex (according to grep, may be some managed to escape matching).
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 3"
|
||||
date="2020-04-16T03:43:13Z"
|
||||
content="""
|
||||
Sorry, forgot to mention, I do not think I had spotted any file descriptor leaking. The number of used up file descriptors (according to `lsof`) was fluctuating as process kept going but was not really steadily growing
|
||||
"""]]
|
|
@ -0,0 +1,55 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2020-04-17T16:37:07Z"
|
||||
content="""
|
||||
I'm seeing a lot of git cat-file processes, not a lot of any other process.
|
||||
|
||||
Each -J increment adds 3 threads for the different command stages
|
||||
(start, perform, cleanup). Each thread might need a git cat-file
|
||||
run with either of two different parameters, and on either of two different
|
||||
index files. (Both are needed for unlocked files, only one for locked
|
||||
files.)
|
||||
|
||||
So, 5x3x2x2=60 copies of git cat-file max for -J5.
|
||||
And experimentally, that's exactly how many I see in the worst case
|
||||
repo where all files are unlocked. (Plus 4 which I think are owned by
|
||||
the controlling thread or something). Using your test case, I am seeing 44.
|
||||
So I don't think there's a subprocess leak here.
|
||||
|
||||
IIUC, what you show is lsof of things open by git-annex and any git processes
|
||||
that happen to open a file with "annex" in its name, being around 3000.
|
||||
|
||||
Now, lsof is for one thing showing a file that two different threads have
|
||||
open, as being opened twice.
|
||||
|
||||
git-annex 1459862 1459863 ghc_ticke joey mem REG 8,1 169720 9175285 /lib/x86_64-linux-gnu/ld-2.30.so
|
||||
git-annex 1459862 1459873 git-annex joey mem REG 8,1 169720 9175285 /lib/x86_64-linux-gnu/ld-2.30.so
|
||||
|
||||
That is different threads of the same process, that has certianly not
|
||||
opened ld.so repeatedly.
|
||||
|
||||
So, you should be using `lsof -Ki` or something. With that, I see around
|
||||
1019 files open, between git-annex and git. git-annex by itself has only
|
||||
246.
|
||||
|
||||
(Interestingly, the majority of those seem to be sqlite. I'm unsure
|
||||
why sqlite is opening the same database 30 times. A single thread often
|
||||
has the same database opened repeatedly. Might be that the sqlite database
|
||||
layer has a too large connection pool. There are also a lot of FIFO's,
|
||||
which I think also belong to sqlite, unless they're something internal to
|
||||
the ghc runtime.)
|
||||
|
||||
Looking at Michael's bug report, looks like they were running with -J8.
|
||||
I don't see that exceeding the default ulimit of 1024. If they were really
|
||||
running at -J32, it would. It's not clear to me either how datalad's --jobs
|
||||
interacts with git-annex's -J, does it pass through or do you run multiple
|
||||
git-annex processes? People in that bug report are referring to multiple
|
||||
git-annex processes, which git-annex -J does not result in.
|
||||
|
||||
All these -J5 etc values seem a bit high. I doubt that more than -J2
|
||||
makes a lot of sense given the command stages optimisation, that makes
|
||||
it use 6 threads and balance the work better than it used to. Only
|
||||
time it really would if if you're getting from several different
|
||||
remotes that each bottleneck on a different resource.
|
||||
"""]]
|
|
@ -0,0 +1,14 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="quick follow up"
|
||||
date="2020-04-17T20:34:55Z"
|
||||
content="""
|
||||
> It's not clear to me either how datalad's --jobs interacts with git-annex's -J, does it pass through or do you run multiple git-annex processes?
|
||||
|
||||
ATM we just run a single `annex get` with `-J` option ATM (FWIW -- in `--batch` mode IIRC). Things might change in the future to balance across different submodules.
|
||||
|
||||
> All these -J5 etc values seem a bit high. I doubt that more than -J2 makes a lot of sense given the command stages optimisation, that makes it use 6 threads and balance the work better than it used to.
|
||||
|
||||
I could do some timing later on, but I did see benefits as I could not go over 40-60MBps in a single download process (e.g. from S3) but parallel ones (even as high as 8 or 10) could easily carry that throughput in parallel, thus scaling up quite nicely. If interested -- you could experiment on smaug to which you have access to possibly observe similar effects.
|
||||
"""]]
|
|
@ -0,0 +1,21 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2020-04-17T21:09:41Z"
|
||||
content="""
|
||||
The sqlite open files is a red herring: That happened only when
|
||||
using a remote in a local directory. Anyway, I've fixed that.
|
||||
|
||||
The open files I'm seeing now in my artifical
|
||||
test case (two local repos with 1000 unlocked files, git-annex get between them, lsof
|
||||
-Ki run after that's moved 500 files, while the git-annex process is suspended):
|
||||
|
||||
no -J 48
|
||||
-J2 104
|
||||
-J5 185
|
||||
-J32 964
|
||||
|
||||
Which seems fine, 28 file handles per -J increment.
|
||||
|
||||
If you have something worse than that, show me the lsof.
|
||||
"""]]
|
|
@ -0,0 +1,127 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 7"
|
||||
date="2020-04-18T02:05:52Z"
|
||||
content="""
|
||||
> 5x3x2x2=60 copies of git cat-file max for -J5.
|
||||
|
||||
So there is now up to 60 `git` processes, where each one has about 20 open files, totaling up to 1200 open files... so we are getting into thousands
|
||||
<details>
|
||||
<summary>
|
||||
In my current attempt on the laptop, here is a `pstree` with counts per each process and total at the bottom - 883 open files for -J5 invocation, with each `git cat-file` taking between 14 and 29:
|
||||
</summary>
|
||||
|
||||
```shell
|
||||
$> total=0; pstree -l -a --compact-not -T -p `pgrep datalad` | sed -e 's,--library-path /[^ ]*,,g' -e 's,/usr/lib/git-annex.linux/shimmed/git/,,g' -e 's,--git-dir=.git --work-tree=. --literal-pathspecs -c annex.dotfiles=true,,g' | nl | while read l; do pid=$(echo \"$l\" | sed -e 's/.*,\([0-9][0-9]*\).*/\1/g'); of=$(lsof -Ki -p $pid 2>/dev/null|grep -v COMMAND | wc -l); echo \"$l\" | sed -e \"s/,$pid/,$pid = $of/g\"; total=$(($total + $of)) ; done; echo \"Total: $total open files across all processes\"
|
||||
1 datalad,2807826 = 54 /home/yoh/proj/datalad/datalad-maint/venvs/dev3/bin/datalad -l debug install -J 5 -g ///labs/haxby/raiders
|
||||
2 `-git-annex,2808614 = 149 /usr/lib/git-annex.linux/shimmed/git-annex/git-annex get -c annex.dotfiles=true --json --json-error-messages --json-progress -J5 -- .
|
||||
3 |-git,2808649 = 16 git cat-file --batch
|
||||
4 |-git,2808650 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
5 |-(git,2808653 = 0)
|
||||
6 |-git,2808654 = 16 git cat-file --batch
|
||||
7 |-git,2808655 = 16 git cat-file --batch
|
||||
8 |-git,2808656 = 16 git cat-file --batch
|
||||
9 |-git,2808657 = 16 git cat-file --batch
|
||||
10 |-git,2808658 = 16 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
11 |-git,2808659 = 16 git cat-file --batch
|
||||
12 |-git,2808660 = 16 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
13 |-git,2808661 = 16 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
14 |-git,2808662 = 16 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
15 |-git,2808663 = 16 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
16 |-git,2808669 = 16 git cat-file --batch
|
||||
17 |-git,2808670 = 16 git cat-file --batch
|
||||
18 |-git,2808671 = 16 git cat-file --batch
|
||||
19 |-git,2808672 = 16 git cat-file --batch
|
||||
20 |-git,2808673 = 16 git cat-file --batch
|
||||
21 |-git,2808674 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
22 |-git,2808675 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
23 |-git,2808676 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
24 |-git,2808677 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
25 |-git,2808678 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
26 |-git,2808679 = 17 git cat-file --batch
|
||||
27 |-git,2808680 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
28 |-git,2808682 = 25 git cat-file --batch
|
||||
29 |-git,2808683 = 23 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
30 |-git,2808685 = 26 git cat-file --batch
|
||||
31 |-git,2808686 = 24 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
32 |-git,2808688 = 27 git cat-file --batch
|
||||
33 |-git,2808689 = 24 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
34 |-git,2808690 = 28 git cat-file --batch
|
||||
35 |-git,2808691 = 26 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
36 |-git,2808693 = 29 git cat-file --batch
|
||||
37 |-git,2808694 = 27 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
38 |-git,2809036 = 26 git cat-file --batch
|
||||
39 `-git,2809037 = 24 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
|
||||
Total: 883 open files across all processes
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>
|
||||
looking at the one with 29 open files:
|
||||
</summary>
|
||||
|
||||
```shell
|
||||
$> lsof -Ki -p 2808691
|
||||
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
|
||||
git 2808691 yoh cwd DIR 259,5 4096 16654129 /tmp/raiders
|
||||
git 2808691 yoh rtd DIR 259,5 4096 2 /
|
||||
git 2808691 yoh txt REG 259,5 165632 8395234 /usr/lib/git-annex.linux/lib64/ld-linux-x86-64.so.2
|
||||
git 2808691 yoh mem REG 259,5 337024 11806232 /usr/lib/locale/aa_DJ.utf8/LC_CTYPE
|
||||
git 2808691 yoh mem REG 259,5 3284 11807992 /usr/lib/locale/en_US.utf8/LC_TIME
|
||||
git 2808691 yoh mem REG 259,5 1824496 8394627 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/libc.so.6
|
||||
git 2808691 yoh mem REG 259,5 35808 8395214 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/librt.so.1
|
||||
git 2808691 yoh mem REG 259,5 114128 8395210 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/libpthread.so.0
|
||||
git 2808691 yoh mem REG 259,5 121280 8395232 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/libz.so.1
|
||||
git 2808691 yoh mem REG 259,5 539304 8395937 /usr/lib/git-annex.linux/usr/lib/x86_64-linux-gnu/libpcre2-8.so.0
|
||||
git 2808691 yoh mem REG 259,5 3008120 8395330 /usr/lib/git-annex.linux/shimmed/git/git
|
||||
git 2808691 yoh 0r FIFO 0,13 0t0 99400741 pipe
|
||||
git 2808691 yoh 1w FIFO 0,13 0t0 99400742 pipe
|
||||
git 2808691 yoh 2w REG 0,48 0 35023711 /home/yoh/.tmp/datalad_temp__runneroutput__mq55kiau
|
||||
git 2808691 yoh 77u IPv4 99400731 0t0 TCP lena:38384->falkor.dartmouth.edu:http (CLOSE_WAIT)
|
||||
git 2808691 yoh 82u IPv4 99392984 0t0 TCP lena:38386->falkor.dartmouth.edu:http (CLOSE_WAIT)
|
||||
git 2808691 yoh 83u IPv4 99392024 0t0 TCP lena:38380->falkor.dartmouth.edu:http (ESTABLISHED)
|
||||
git 2808691 yoh 84u IPv4 99400730 0t0 TCP lena:38382->falkor.dartmouth.edu:http (ESTABLISHED)
|
||||
git 2808691 yoh 85u IPv4 99377953 0t0 TCP lena:38388->falkor.dartmouth.edu:http (CLOSE_WAIT)
|
||||
git 2808691 yoh 86w REG 259,5 68071663 16654712 /tmp/raiders/.git/annex/tmp/MD5E-s6658782008--8def61aac5f6742194027447390405ff.hdf5.gz
|
||||
git 2808691 yoh 87w REG 259,5 53047218 16654713 /tmp/raiders/.git/annex/tmp/MD5E-s121517414--f83afa4a5dff04b5a1467afd30e74632.nii.gz
|
||||
git 2808691 yoh 89w REG 259,5 23378713 16654715 /tmp/raiders/.git/annex/tmp/MD5E-s23378713--7a238410b5c496a29e7967da21332c03.nii.gz
|
||||
git 2808691 yoh 94u IPv4 99398310 0t0 TCP lena:38390->falkor.dartmouth.edu:http (CLOSE_WAIT)
|
||||
git 2808691 yoh 102u IPv4 99394953 0t0 TCP lena:38400->falkor.dartmouth.edu:http (CLOSE_WAIT)
|
||||
git 2808691 yoh 106u IPv4 99399711 0t0 TCP lena:38402->falkor.dartmouth.edu:http (CLOSE_WAIT)
|
||||
git 2808691 yoh 107w REG 259,5 545747 16654726 /tmp/raiders/.git/annex/objects/2X/Kz/MD5E-s545747--e2c2cd58ad55da46bf778d223e01389e.nii.gz/MD5E-s545747--e2c2cd58ad55da46bf778d223e01389e.nii.gz
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>and only 16</summary>
|
||||
|
||||
```shell
|
||||
$> lsof -Ki -p 2808669
|
||||
|
||||
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
|
||||
git 2808669 yoh cwd DIR 259,5 4096 16654129 /tmp/raiders
|
||||
git 2808669 yoh rtd DIR 259,5 4096 2 /
|
||||
git 2808669 yoh txt REG 259,5 165632 8395234 /usr/lib/git-annex.linux/lib64/ld-linux-x86-64.so.2
|
||||
git 2808669 yoh mem REG 259,5 1322694 16654180 /tmp/raiders/.git/objects/pack/pack-3ed86065dacf772445fec4258d6e60ebe21baf77.pack
|
||||
git 2808669 yoh mem REG 259,5 505212 16654181 /tmp/raiders/.git/objects/pack/pack-3ed86065dacf772445fec4258d6e60ebe21baf77.idx
|
||||
git 2808669 yoh mem REG 259,5 337024 11806232 /usr/lib/locale/aa_DJ.utf8/LC_CTYPE
|
||||
git 2808669 yoh mem REG 259,5 3284 11807992 /usr/lib/locale/en_US.utf8/LC_TIME
|
||||
git 2808669 yoh mem REG 259,5 1824496 8394627 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/libc.so.6
|
||||
git 2808669 yoh mem REG 259,5 35808 8395214 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/librt.so.1
|
||||
git 2808669 yoh mem REG 259,5 114128 8395210 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/libpthread.so.0
|
||||
git 2808669 yoh mem REG 259,5 121280 8395232 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/libz.so.1
|
||||
git 2808669 yoh mem REG 259,5 539304 8395937 /usr/lib/git-annex.linux/usr/lib/x86_64-linux-gnu/libpcre2-8.so.0
|
||||
git 2808669 yoh mem REG 259,5 3008120 8395330 /usr/lib/git-annex.linux/shimmed/git/git
|
||||
git 2808669 yoh 0r FIFO 0,13 0t0 99388817 pipe
|
||||
git 2808669 yoh 1w FIFO 0,13 0t0 99388818 pipe
|
||||
git 2808669 yoh 2w REG 0,48 0 35023711 /home/yoh/.tmp/datalad_temp__runneroutput__mq55kiau
|
||||
```
|
||||
</details>
|
||||
we can see why it is fluctuating, although I have no clue why those are opened by `git cat-file`: connections to the remote (although -- why??), looking at .git/annex/tmp?
|
||||
|
||||
But overall problem seems to me is this heavy growth of external processes due to multiple external `git` invocations per each `annex get` thread, and then each process consuming very small, but still tens of, open files.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 8"
|
||||
date="2020-04-18T02:14:32Z"
|
||||
content="""
|
||||
FWIW, kept running it a bit more, the number `git cat-file` processes grew up a bit (to 42) with total open files 1034 but seems to be stable, i.e. one other support that there is no leaking processes or file descriptors -- just sheer growth of sub processes leading to large number of open files.
|
||||
"""]]
|
|
@ -0,0 +1,27 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 9"""
|
||||
date="2020-04-20T15:54:15Z"
|
||||
content="""
|
||||
Thinking about this over the weekend, I had two ideas:
|
||||
|
||||
* The worker pool has an AnnexState for each thread. If those could be
|
||||
partitioned so eg perfom stage is always run by the same threads,
|
||||
then when only one stage needs cat-file, the overall number of cat-file
|
||||
processes would be reduced by 1/3rd.
|
||||
|
||||
This might be the least resource intensive approach. But, as threads
|
||||
transition between stages, their AnnexState necessarily does too,
|
||||
and the cleanup stage might need some state change made in the perform
|
||||
stage, so swapping out the perform AnnexState for a cleanup one
|
||||
seems hard to accomplish.
|
||||
|
||||
* Could have a pool of cat-files, and just have worker threads block until
|
||||
one is available. This would let it be pinned to the -J number, or
|
||||
event to a smaller number.
|
||||
|
||||
Seems likely that only 2 or 3 in the cat-file pool will
|
||||
maximise concurrency, because it's not a major bottleneck most of the
|
||||
time, and when it is the actual bottleneck is probably disk IO and so
|
||||
won't be helped by more (and likely more only increase unnecessary seeks).
|
||||
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue