From: Phillip Ezolt <ez...@perf.zko.dec.com> Subject: Overscheduling DOES happen with high web server load. Date: 1999/05/05 Message-ID: <fa.lrbmocv.13gphs@ifi.uio.no> X-Deja-AN: 474518895 Original-Date: Wed, 5 May 1999 14:54:40 -0400 (EDT) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.OSF.3.96.990505144826.13001A-100000@perf.zko.dec.com> To: linux-ker...@vger.rutgers.edu Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Hi all, In doing some performance work with SPECWeb96 on ALpha/Linux with apache, it looks like "schedule" is the main bottleneck. (Kernel v2.2.5, Apache 1.3.4, egcs-1.1.1, iprobe-4.1) When running a SPECWeb96 strobe run on Alpha/linux, I found that when the CPU is pegged, 18% of the time is spent in the scheduler. Using Iprobe, I got the following function breakdown: (only functions >1% are shown) Begin End Sample Image Total Address Address Name Count Pct Pct ------- ------- ---- ----- --- --- 0000000000000000-00000000000029FC /usr/bin/httpd 127463 18.5 00000001200419A0-000000012004339F ap_vformatter 15061 11.8 2.2 FFFFFC0000300000-00000000FFFFFFFF vmlinux 482385 70.1 FFFFFC00003103E0-FFFFFC000031045F entInt 7848 1.6 1.1 FFFFFC0000315E40-FFFFFC0000315F7F do_entInt 48487 10.1 7.0 FFFFFC0000327A40-FFFFFC0000327D7F schedule 124815 25.9 18.1 FFFFFC000033FAA0-FFFFFC000033FCDF kfree 7876 1.6 1.1 FFFFFC00003A9960-FFFFFC00003A9EBF ip_queue_xmit 8616 1.8 1.3 FFFFFC00003B9440-FFFFFC00003B983F tcp_v4_rcv 11131 2.3 1.6 FFFFFC0000441CA0-FFFFFC000044207F do_csum_partial 43112 8.9 6.3 _copy_from_user I can't pin it down to the exact source line, but the cycles are spent in close proximity of one another. FFFFFC0000327A40 schedule vmlinux FFFFFC0000327C1C 01DC 2160 ( 1.7) * FFFFFC0000327C34 01F4 28515 ( 22.8) ********************** FFFFFC0000327C60 0220 1547 ( 1.2) * FFFFFC0000327C64 0224 26432 ( 21.2) ********************* FFFFFC0000327C74 0234 36470 ( 29.2) ***************************** FFFFFC0000327C9C 025C 24858 ( 19.9) ******************* (For those interested, I have the disassembled code. ) Apache has a fairly even cycle distribution, but in the kernel, 'schedule' really sticks out as the CPU burner. I think that the linear search for next runnable process is where time is being spent. As an independent test, I ran vmstat while SPECWeb was running. The leftmost column is the number of processes waiting to run. These number are above the 3 or 4 that are normally quoted. procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 21 0 208 5968 5240 165712 0 0 4001 303 10263 6519 31 66 4 26 27 1 208 6056 5240 165848 0 0 2984 96 5623 3440 29 60 11 0 15 0 208 5096 5288 166384 0 0 4543 260 10850 7346 32 66 3 0 17 0 208 6928 5248 164936 0 0 5741 309 13129 8052 32 65 3 37 19 1 208 5664 5248 166144 0 0 2502 142 6837 3896 33 63 5 0 14 0 208 5984 5240 165656 0 0 3894 376 12432 7276 32 65 3 0 19 1 208 4872 5272 166248 0 0 2247 124 7641 4514 32 64 4 0 17 0 208 5248 5264 166336 0 0 4229 288 8786 5144 31 67 2 56 16 1 208 6512 5248 165592 0 0 2159 205 8098 4641 32 62 6 94 18 1 208 5920 5248 165896 0 0 1745 191 5288 2952 32 60 7 71 14 1 208 5920 5256 165872 0 0 2063 160 6493 3729 30 62 8 0 25 1 208 5032 5256 166544 0 0 3008 112 5668 3612 31 60 9 62 22 1 208 5496 5256 165560 0 0 2512 109 5661 3392 28 62 11 43 22 1 208 4536 5272 166112 0 0 3003 202 7198 4813 30 63 7 0 26 1 208 4800 5288 166256 0 0 2407 93 5666 3563 29 60 11 32 17 1 208 5984 5296 165632 0 0 2046 329 7296 4305 31 62 6 23 7 1 208 6744 5248 164904 0 0 1739 284 9496 5923 33 65 2 14 18 1 208 5128 5272 166416 0 0 3755 322 9663 6203 32 65 3 0 22 1 208 4256 5304 167288 0 0 2593 156 5678 3219 31 60 9 44 20 1 208 3688 5264 167184 0 0 3010 149 7277 4398 31 62 7 29 24 1 208 5232 5264 166248 0 0 1954 104 5687 3496 31 61 9 26 23 1 208 5688 5256 165568 0 0 3029 169 7124 4473 30 60 10 0 18 1 208 5576 5256 165656 0 0 3395 270 8464 5702 30 63 7 It looks like the run queue is much longer than expected. I imagine this problem is compounded by the number of times "schedule" is called. On a webserver that does not have all of the web pages in memory, an httpd processes life is the following: 1. Wake up for a request from the network. 2. Figure out what web page to load. 3. Ask the disk for it. 4. Sleep (Schedule()) until the page is ready. This means that schedule will be called alot. In addition a process will wake and sleep in a time much shorter than its allotted time slice. Each time we schedule, we have to walk through the entire run queue. This will cause less requests to be serviced. This will cause more processes to be stuck on the run queue, this will make the walk down the runqueue even longer... Bottom line, under a heavy web load, the linux kernel seems to spend and unnecessary amount of time scheduling processes. Is it necessary to calculate the goodness of every process at every schedule? Can't we make the goodnesses static? Monkeying with the scheduler is big business, and I realize that this will not be a v2.2 issue, but what about v2.3? --Phil Digital/Compaq: HPSD/Benchmark Performance Engineering Phillip.Ez...@compaq.com ez...@perf.zko.dec.com ps. <shameless plug> For those interested in more detail there will be a WIP paper describing this work presented at Linux Expo. </shameless plug> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Richard Gooch <rgo...@atnf.csiro.au> Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/06 Message-ID: <fa.i397bcv.k5i7ak@ifi.uio.no>#1/1 X-Deja-AN: 474602020 Original-Date: Thu, 6 May 1999 10:42:18 +1000 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-Id: <199905060042.KAA19778@vindaloo.atnf.CSIRO.AU> References: <fa.lrbmocv.13gphs@ifi.uio.no> To: Phillip Ezolt <ez...@perf.zko.dec.com> Original-References: <Pine.OSF.3.96.990505144826.13001A-100...@perf.zko.dec.com> Notfrom: spam...@atnf.csiro.au X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Phillip Ezolt writes: > Hi all, > > In doing some performance work with SPECWeb96 on ALpha/Linux with apache, > it looks like "schedule" is the main bottleneck. > > (Kernel v2.2.5, Apache 1.3.4, egcs-1.1.1, iprobe-4.1) > > When running a SPECWeb96 strobe run on Alpha/linux, I found that when the > CPU is pegged, 18% of the time is spent in the scheduler. This is some very interesting data! > Using Iprobe, I got the following function breakdown: (only functions >1% > are shown) > > Begin End Sample Image Total > Address Address Name Count Pct Pct > ------- ------- ---- ----- --- --- > 0000000000000000-00000000000029FC /usr/bin/httpd 127463 18.5 > 00000001200419A0-000000012004339F ap_vformatter 15061 11.8 2.2 > FFFFFC0000300000-00000000FFFFFFFF vmlinux 482385 70.1 > FFFFFC00003103E0-FFFFFC000031045F entInt 7848 1.6 1.1 > FFFFFC0000315E40-FFFFFC0000315F7F do_entInt 48487 10.1 7.0 > FFFFFC0000327A40-FFFFFC0000327D7F schedule 124815 25.9 18.1 > FFFFFC000033FAA0-FFFFFC000033FCDF kfree 7876 1.6 1.1 > FFFFFC00003A9960-FFFFFC00003A9EBF ip_queue_xmit 8616 1.8 1.3 > FFFFFC00003B9440-FFFFFC00003B983F tcp_v4_rcv 11131 2.3 1.6 > FFFFFC0000441CA0-FFFFFC000044207F do_csum_partial 43112 8.9 6.3 > _copy_from_user Why don't we see the time taken by the goodness() function? > Apache has a fairly even cycle distribution, but in the kernel, 'schedule' > really sticks out as the CPU burner. > > I think that the linear search for next runnable process is where time is > being spent. Could well be, especially if the context switches are happening between threads rather than separate processes. Thread switches are *really* fast under Linux. > As an independent test, I ran vmstat while SPECWeb was running. > > The leftmost column is the number of processes waiting to run. These number > are above the 3 or 4 that are normally quoted. > > procs memory swap io system cpu > r b w swpd free buff cache si so bi bo in cs us sy id [...] > 94 18 1 208 5920 5248 165896 0 0 1745 191 5288 2952 32 60 7 > > It looks like the run queue is much longer than expected. Indeed. As a separate question, we may wonder why so many processes/threads are being used, and whether that number could/should be reduced. Perhaps the server is doing something silly. But that's an aside. Instead, I'd like to explore ways of reducing the (already low) scheduler overhead. In September last year I wrote a patch which put RT processes on a separate run queue. While you don't have RT processes (I expect), one of the benefits of this patch is that it cleans up some of the scheduler code. Specifically, the goodness() function has some special-casing for RT processes removed. For a short run queue, the improvement is pretty marginal. However, for a long run queue, the improvement may be significant. So I'd ask you to redo your tests again with this patch applied. I've ported the patch to 2.2.7. It's untested in 2.2.7, but it worked fine in 2.1.x. See: http://www.atnf.csiro.au/~rgooch/linux/kernel-patches.html Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@nl.linux.org> Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/06 Message-ID: <fa.n4qmvjv.b400bh@ifi.uio.no>#1/1 X-Deja-AN: 474660725 Original-Date: Thu, 6 May 1999 09:32:39 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.03.9905060929130.219-100000@mirkwood.dummy.home> References: <fa.i397bcv.k5i7ak@ifi.uio.no> To: Richard Gooch <rgo...@atnf.csiro.au> X-Sender: r...@mirkwood.dummy.home X-Authentication-Warning: mirkwood.nl.linux.org: riel owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Search-Engine-Bait: http://humbolt.nl.linux.org/ X-Orcpt: rfc822;linux-kernel-outgoing-dig X-My-Own-Server: http://www.nl.linux.org/ Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Thu, 6 May 1999, Richard Gooch wrote: > Phillip Ezolt writes: > > (Kernel v2.2.5, Apache 1.3.4, egcs-1.1.1, iprobe-4.1) > > > > When running a SPECWeb96 strobe run on Alpha/linux, I found that when the > > CPU is pegged, 18% of the time is spent in the scheduler. > > This is some very interesting data! Indeed. > > It looks like the run queue is much longer than expected. > > Instead, I'd like to explore ways of reducing the (already low) > scheduler overhead. > > In September last year I wrote a patch which put RT processes on a > separate run queue. I'll start working on my scheduler bigpatch again. There must be more things that can be incorporated in order to improve the efficiency and performance of the Linux scheduler... regards, Rik -- Open Source: you deserve to be in control of your data. +-------------------------------------------------------------------+ | Le Reseau netwerksystemen BV: http://www.reseau.nl/ | | Linux Memory Management site: http://humbolt.geo.uu.nl/Linux-MM/ | | Nederlandse Linux documentatie: http://www.nl.linux.org/ | +-------------------------------------------------------------------+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: j...@pa.dec.com (Jim Gettys) Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/06 Message-ID: <fa.gsrd1gv.lkcv2h@ifi.uio.no>#1/1 X-Deja-AN: 474791689 Original-Date: Thu, 6 May 1999 06:30:31 -0700 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-Id: <9905061330.AA24536@pachyderm.pa.dec.com> References: <fa.lrcap4v.34p9h@ifi.uio.no> To: Phillip Ezolt <ez...@perf.zko.dec.com> Content-Type: text/plain X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Mime-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Whle this discussion continues, I'd like to remind people of the the work of the linux scalability project, at http://www.citi.umich.edu/projects/citi-netscape/ Some of these patches have already been applied to current kernels; but some have not, and there are some issues described there for which there are yet no solutions, including the poll() work there.... - Jim -- Jim Gettys Industry Standards and Consortia Compaq Computer Corporation Visting Scientist, World Wide Web Consortium, M.I.T. http://www.w3.org/People/Gettys/ j...@w3.org, j...@pa.dec.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@e-mind.com> Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/06 Message-ID: <fa.j5b0g0v.175qmbb@ifi.uio.no>#1/1 X-Deja-AN: 474816447 Original-Date: Thu, 6 May 1999 17:39:35 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.05.9905061734270.2228-100000@laser.random> References: <fa.lrbmocv.13gphs@ifi.uio.no> To: Phillip Ezolt <ez...@perf.zko.dec.com> X-Sender: and...@laser.random X-Authentication-Warning: laser.random: andrea owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list X-Public-Key-URL: http://e-mind.com/~andrea/aa.asc MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Wed, 5 May 1999, Phillip Ezolt wrote: >Hi all, > >In doing some performance work with SPECWeb96 on ALpha/Linux with apache, >it looks like "schedule" is the main bottleneck. Alt. Alpha uses HZ at 1024 so you get a scheduling rate by default 10 times higher than in all other archs. >Is it necessary to calculate the goodness of every process at every schedule? I think what we should do is to reschedule _only_ if a different process will be scheduled. What I consider oversheduling is to schedule() and then not switch to another task. Could you apply the patch below and run vmstat 1 again and see the rate of the overscheduling? Index: kernel//sched.c =================================================================== RCS file: /var/cvs/linux/kernel/sched.c,v retrieving revision 1.1.2.33 diff -u -r1.1.2.33 sched.c --- sched.c 1999/05/05 11:26:37 1.1.2.33 +++ sched.c 1999/05/06 15:37:16 @@ -764,12 +764,12 @@ #ifdef __SMP__ sched_data->prev = prev; #endif - kstat.context_swtch++; get_mmu_context(next); switch_to(prev,next); __schedule_tail(); } + kstat.context_swtch++; reacquire_kernel_lock(current); return; If the rate is low there is no overscheduling and you should simply decrease HZ to spend less time in the scheduler. Andrea Arcangeli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Kurt Garloff <garl...@suse.de> Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/07 Message-ID: <fa.ed3pouv.ohimj6@ifi.uio.no>#1/1 X-Deja-AN: 475019424 X-Operating-System: Linux 2.2.5-ac3_devfs i686 Original-Date: Thu, 6 May 1999 21:02:04 +0200 Sender: owner-linux-ker...@vger.rutgers.edu References: <fa.j5b0g0v.175qmbb@ifi.uio.no> Mime-Version: 1.0 X-PGP-Version: 2.6.3i Original-Message-ID: <19990506210204.B5468@kurt.kdt.de> X-PGP-Fingerprint: 92 00 AC 56 59 50 13 83 3C 18 6F 1B 25 A0 3A 5F X-PGP-Key: 1024/CEFC9215 To: Andrea Arcangeli <and...@e-mind.com> Original-References: <Pine.OSF.3.96.990505144826.13001A-100...@perf.zko.dec.com> <Pine.LNX.4.05.9905061734270.2228-100...@laser.random> X-PGP-Info: on http://www.garloff.de/kurt/pgp.public.key.kurt.home.asc Content-Type: multipart/signed; boundary="2B/JsCI69OhZNC5r"; micalg=pgp-md5; protocol="application/pgp-signature" X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Mail-Followup-To: Andrea Arcangeli <and...@e-mind.com>, Phillip Ezolt <ez...@perf.zko.dec.com>, linux-ker...@vger.rutgers.edu, j...@pa.dec.com, greg.ta...@digital.com Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu --2B/JsCI69OhZNC5r Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable On Thu, May 06, 1999 at 05:39:35PM +0200, Andrea Arcangeli wrote: > On Wed, 5 May 1999, Phillip Ezolt wrote: > > > >In doing some performance work with SPECWeb96 on ALpha/Linux with apache, > >it looks like "schedule" is the main bottleneck.=20 >=20 > Alt. Alpha uses HZ at 1024 so you get a scheduling rate by default 10 > times higher than in all other archs. =46rom what I understand, the scheduling is not caused by the timer but by blocking and woken processes. --=20 Kurt Garloff <garl...@suse.de> SuSE GmbH, N=FCrnberg, FRG Linux kernel development; SCSI driver: DC390 (tmscsim/AM53C974) --2B/JsCI69OhZNC5r Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: 2.6.3in iQCVAwUBNzHnLBaQN/7O/JIVAQE7kwP/fKhXWZhSES0WhOUUzYQpXNUAzlqF+yws MGoqncJT/fwexzLI06ghRPk22tISM4B26nW/ihQD4RnpXiM9IPyyRubf3HjJ5BD8 UWCqe4597aYeWXbvtOC43pKvityWKFfaKXqXF5tuIfZ6pZ5upbllIr4Lux01xcWY T9CAazJ52a8= =yird -----END PGP SIGNATURE----- --2B/JsCI69OhZNC5r-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@nl.linux.org> Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/07 Message-ID: <fa.n7aov3v.9km0rq@ifi.uio.no>#1/1 X-Deja-AN: 475202853 Original-Date: Fri, 7 May 1999 17:15:25 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.03.9905071703560.219-100000@mirkwood.dummy.home> References: <fa.ed3pouv.ohimj6@ifi.uio.no> To: Kurt Garloff <garl...@suse.de> X-Sender: r...@mirkwood.dummy.home X-Authentication-Warning: mirkwood.nl.linux.org: riel owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Search-Engine-Bait: http://humbolt.nl.linux.org/ X-Orcpt: rfc822;linux-kernel-outgoing-dig X-My-Own-Server: http://www.nl.linux.org/ Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Thu, 6 May 1999, Kurt Garloff wrote: > From what I understand, the scheduling is not caused by the timer > but by blocking and woken processes. Not only that, the fact that we're using a traditional O(n) run queue is somewhat of a performance bottleneck. I'm currently looking into the scheduler used by Alliance (www.allos.org) and it looks very promising. If I have any time left, I'll port it to Linux. It's based on the idea of a priority heap. Processes have a quantum and a defer value. The quantum value is the length of the process' time slice and the defer value is the time at which it will be run next (sort of a deadline). The process at the top of the heap is the process with the lowest (run next) defer value. After a process finishes it's time slice, the defer value is incremented (say, by p->priority) and the process is sorted down the B-tree. This means that the algorithm for selecting the best process is O(1) on UP machines and O(log(nr_cpus)) on SMP machines. Schedule_timeout() can be replaced by a simple resorting of the heap. Basically we only need to remove processes from the heap if they do blocking I/O or wait for a signal (otherwise they would clutter up the heap). cheers, Rik -- Open Source: you deserve to be in control of your data. +-------------------------------------------------------------------+ | Le Reseau netwerksystemen BV: http://www.reseau.nl/ | | Linux Memory Management site: http://humbolt.geo.uu.nl/Linux-MM/ | | Nederlandse Linux documentatie: http://www.nl.linux.org/ | +-------------------------------------------------------------------+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar <mi...@chiara.csoma.elte.hu> Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/07 Message-ID: <fa.lhohhov.f563o4@ifi.uio.no>#1/1 X-Deja-AN: 475225521 Original-Date: Fri, 7 May 1999 18:17:33 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.96.990507181436.8938B-100000@chiara.csoma.elte.hu> References: <fa.n7aov3v.9km0rq@ifi.uio.no> To: Rik van Riel <r...@nl.linux.org> Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Fri, 7 May 1999, Rik van Riel wrote: > Schedule_timeout() can be replaced by a simple resorting of > the heap. Basically we only need to remove processes from > the heap if they do blocking I/O or wait for a signal > (otherwise they would clutter up the heap). schedule_timeout() is not really expected to have high performance. Its functionality was moved out of the main scheduler partly for this reason. Btw., the timer code (which is used by schedule_timeout() intrnally) has a sort of heap structure already. -- mingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar <mi...@chiara.csoma.elte.hu> Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/07 Message-ID: <fa.li8jhgv.cl43g4@ifi.uio.no>#1/1 X-Deja-AN: 475225522 Original-Date: Fri, 7 May 1999 18:09:43 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.96.990507180747.8938A-100000@chiara.csoma.elte.hu> References: <fa.n7aov3v.9km0rq@ifi.uio.no> To: Rik van Riel <r...@nl.linux.org> Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Fri, 7 May 1999, Rik van Riel wrote: > It's based on the idea of a priority heap. Processes have > a quantum and a defer value. The quantum value is the length > of the process' time slice and the defer value is the time > at which it will be run next (sort of a deadline). > > The process at the top of the heap is the process with the > lowest (run next) defer value. After a process finishes it's > time slice, the defer value is incremented (say, by p->priority) > and the process is sorted down the B-tree. This means that the > algorithm for selecting the best process is O(1) on UP machines > and O(log(nr_cpus)) on SMP machines. > > Schedule_timeout() can be replaced by a simple resorting of > the heap. Basically we only need to remove processes from > the heap if they do blocking I/O or wait for a signal > (otherwise they would clutter up the heap). so? And how do you handle VM and CPU (and who knows what type of other future) affinity? -- mingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@nl.linux.org> Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/07 Message-ID: <fa.n7a703v.8ks1rs@ifi.uio.no>#1/1 X-Deja-AN: 475316640 Original-Date: Fri, 7 May 1999 20:48:07 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.03.9905072045440.219-100000@mirkwood.dummy.home> References: <fa.li8jhgv.cl43g4@ifi.uio.no> To: Ingo Molnar <mi...@chiara.csoma.elte.hu> X-Sender: r...@mirkwood.dummy.home X-Authentication-Warning: mirkwood.nl.linux.org: riel owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Search-Engine-Bait: http://humbolt.nl.linux.org/ X-Orcpt: rfc822;linux-kernel-outgoing-dig X-My-Own-Server: http://www.nl.linux.org/ Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Fri, 7 May 1999, Ingo Molnar wrote: > On Fri, 7 May 1999, Rik van Riel wrote: > > > It's based on the idea of a priority heap. [snip] > so? And how do you handle VM and CPU (and who knows what type of > other future) affinity? I haven't thought about that yet, but I think we _will_ have to think up something to make the current scheduler more responsive and more predictable under load and to reduce the CPU time used by niced tasks. I think that even without the priority heap we can use the Quantum/Defer model in order to get better predictability and responsiveness under load. Rik -- Open Source: you deserve to be in control of your data. +-------------------------------------------------------------------+ | Le Reseau netwerksystemen BV: http://www.reseau.nl/ | | Linux Memory Management site: http://humbolt.geo.uu.nl/Linux-MM/ | | Nederlandse Linux documentatie: http://www.nl.linux.org/ | +-------------------------------------------------------------------+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar <mi...@chiara.csoma.elte.hu> Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/08 Message-ID: <fa.lfo5h8v.95m087@ifi.uio.no>#1/1 X-Deja-AN: 475726641 Original-Date: Sat, 8 May 1999 17:14:39 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.96.990508170633.5517A-100000@chiara.csoma.elte.hu> References: <fa.n7a703v.8ks1rs@ifi.uio.no> To: Rik van Riel <r...@nl.linux.org> Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Fri, 7 May 1999, Rik van Riel wrote: > > > It's based on the idea of a priority heap. > [snip] > > > [...] And how do you handle VM and CPU (and who knows what type of > > other future) affinity? > > I haven't thought about that yet [...] thats one of the tough things IMO ... without those constraints we could easily get rid of the linear behavior in the scheduler. We are trying to avoid situations where there are lots of tasks on the runqueue _and_ there are lots of reschedules happening, this way linearity doesnt matter. > [...], but I think we _will_ > have to think up something to make the current scheduler > more responsive and more predictable under load [...] do you still see responsiveness problems under pre4-2.2.8? If yes then i'd be very interested in reproducing it. > [...] and to > reduce the CPU time used by niced tasks. this can be achieved by increasing (or changing) the scale of priorities. (this can be done seemlessly) But i doubt this matters in most cases. (except when there are _lots_ of niced 100%-CPU using processes running) -- mingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@nl.linux.org> Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/09 Message-ID: <fa.lge9onv.6gipos@ifi.uio.no>#1/1 X-Deja-AN: 475889559 Original-Date: Sun, 9 May 1999 18:21:55 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.03.9905091817410.256-100000@mirkwood.nl.linux.org> References: <fa.lfo5h8v.95m087@ifi.uio.no> To: Ingo Molnar <mi...@chiara.csoma.elte.hu> X-Authentication-Warning: mirkwood.nl.linux.org: riel owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Search-Engine-Bait: http://humbolt.nl.linux.org/ X-Orcpt: rfc822;linux-kernel-outgoing-dig X-My-Own-Server: http://www.nl.linux.org/ Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Sat, 8 May 1999, Ingo Molnar wrote: > On Fri, 7 May 1999, Rik van Riel wrote: > > > [...], but I think we _will_ > > have to think up something to make the current scheduler > > more responsive and more predictable under load [...] > > do you still see responsiveness problems under pre4-2.2.8? If yes > then i'd be very interested in reproducing it. I've read the patch and think you're right about this one. Under normal loads 2.2.8 will have no problems at all except possibly the fact that it calls reschedule_idle() twice on most reschedules and the fact that the kernel still recalculates the priority of _all_ processes as soon as one process runs out of it's time slice. > > [...] and to > > reduce the CPU time used by niced tasks. > > this can be achieved by increasing (or changing) the scale of > priorities. (this can be done seemlessly) More importantly, it can be done without an increase in the scheduling overhead and with removal of the current process recalculation ritual. It will increase SMP scalability of the scheduler, allow for a wider priority scale and maybe even decrease scheduler overhead (I'm not sure about the last one being measurable though)... cheers, Rik -- Open Source: you deserve to be in control of your data. +-------------------------------------------------------------------+ | Le Reseau netwerksystemen BV: http://www.reseau.nl/ | | Linux Memory Management site: http://humbolt.geo.uu.nl/Linux-MM/ | | Nederlandse Linux documentatie: http://www.nl.linux.org/ | +-------------------------------------------------------------------+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar <mi...@chiara.csoma.elte.hu> Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/09 Message-ID: <fa.nkq90mv.1j4eios@ifi.uio.no>#1/1 X-Deja-AN: 475897443 Original-Date: Sun, 9 May 1999 19:33:44 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.96.990509192416.17632G-100000@chiara.csoma.elte.hu> References: <fa.lge9onv.6gipos@ifi.uio.no> To: Rik van Riel <r...@nl.linux.org> Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Sun, 9 May 1999, Rik van Riel wrote: > I've read the patch and think you're right about this one. > Under normal loads 2.2.8 will have no problems at all except > possibly the fact that it calls reschedule_idle() twice on > most reschedules [...] what do you mean? We call reschedule_idle() only when a process 1) gets runnable 2) gets scheduled away (but is still runnable). These are two different situations. > [...] and the fact that the kernel still recalculates > the priority of _all_ processes as soon as one process runs > out of it's time slice. this recalculation argument is a complete red herring. I've explained this earlier too: the number of recalculations happens at a _constant frequency_, with the additional mechanizm that there are less recalculations if there are more CPU-bound processes. The goal is to have tasks' increase their 'dynamic priority' even if they are not on the runqueue. So any architectural change that concentrates on removing these recalculations is seriously misguided. > > > > [...] and to > > > reduce the CPU time used by niced tasks. > > > > this can be achieved by increasing (or changing) the scale of > > priorities. (this can be done seemlessly) > > More importantly, it can be done without an increase in > the scheduling overhead and with removal of the current > process recalculation ritual. > > It will increase SMP scalability of the scheduler, allow > for a wider priority scale and maybe even decrease scheduler > overhead (I'm not sure about the last one being measurable > though)... > > cheers, > > Rik -- Open Source: you deserve to be in control of your data. > +-------------------------------------------------------------------+ > | Le Reseau netwerksystemen BV: http://www.reseau.nl/ | > | Linux Memory Management site: http://humbolt.geo.uu.nl/Linux-MM/ | > | Nederlandse Linux documentatie: http://www.nl.linux.org/ | > +-------------------------------------------------------------------+ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@nl.linux.org> Subject: Re: Overscheduling DOES happen with high web server load. Date: 1999/05/09 Message-ID: <fa.lfu3pfv.40cogo@ifi.uio.no>#1/1 X-Deja-AN: 475921984 Original-Date: Sun, 9 May 1999 20:49:23 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.03.9905092043260.256-100000@mirkwood.nl.linux.org> References: <fa.nkq90mv.1j4eios@ifi.uio.no> To: Ingo Molnar <mi...@chiara.csoma.elte.hu> X-Authentication-Warning: mirkwood.nl.linux.org: riel owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Search-Engine-Bait: http://humbolt.nl.linux.org/ X-Orcpt: rfc822;linux-kernel-outgoing-dig X-My-Own-Server: http://www.nl.linux.org/ Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Sun, 9 May 1999, Ingo Molnar wrote: > On Sun, 9 May 1999, Rik van Riel wrote: > > > I've read the patch and think you're right about this one. > > Under normal loads 2.2.8 will have no problems at all except > > possibly the fact that it calls reschedule_idle() twice on > > most reschedules [...] > > what do you mean? We call reschedule_idle() only when a process 1) > gets runnable 2) gets scheduled away (but is still runnable). > These are two different situations. OK, you're right about that one -- a slight brain fart on my side. Then again, we probably want to make goodness() a bit cheaper than it is now because we're calling it more often than before.... > > [...] and the fact that the kernel still recalculates > > the priority of _all_ processes as soon as one process runs > > out of it's time slice. > > this recalculation argument is a complete red herring. I've explained this > earlier too: the number of recalculations happens at a _constant > frequency_, with the additional mechanizm that there are less > recalculations if there are more CPU-bound processes. The goal is to have > tasks' increase their 'dynamic priority' even if they are not on the > runqueue. So any architectural change that concentrates on removing these > recalculations is seriously misguided. I've got a new scheme in mind which uses a new variable in the task struct in order to make goodness simpler, increase the range of CPU usage (for niced tasks), favor interactive performance better and get rid of the recalculation as a nice side effect... Btw, the rate of recalculation increases with more CPUs in the machine so I wouldn't be surprised if it didn't scale well to number crunching machines with a lot of nice +19 tasks. Anyway, I think I can bring my point across much better in C than in english, so I'll focus on some source code now... cheers, Rik -- Open Source: you deserve to be in control of your data. +-------------------------------------------------------------------+ | Le Reseau netwerksystemen BV: http://www.reseau.nl/ | | Linux Memory Management site: http://humbolt.geo.uu.nl/Linux-MM/ | | Nederlandse Linux documentatie: http://www.nl.linux.org/ | +-------------------------------------------------------------------+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/