From: Peter Waltenberg <pet...@dascom.com> Subject: SMP Scheduling Date: 1999/08/08 Message-ID: <fa.mkkckbv.d4k6o4@ifi.uio.no>#1/1 X-Deja-AN: 510397957 Original-Date: Mon, 09 Aug 1999 08:59:30 +1000 (EST) Sender: owner-linux-ker...@vger.rutgers.edu Content-Transfer-Encoding: 8bit Original-Message-ID: <XFMail.990809085930.peterw@surf.dascom.com> To: andreas.bo...@munich.netsurf.de, linux-ker...@vger.rutgers.edu X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Reply-To: pet...@dascom.com Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu I also have a dual CPU machine. Under 2.0 if you ran a CPU hog it'd pretty well stick to one CPU. I.e. if you had xosview running you'd see one CPU at 100%, the other mostly idle. If there was a load burst, it might move to the other CPU, but that was pretty unusual. Under 2.2 you see that one CPU hog hopping CPU's and at regular intervals. Using xosview to track load what you see is a picket fence effect. And there are more than "3 processes" running, more like 80 on my machine, so running xosview alone shouldn't be enough to force this to happen and if it were, the other processes should be introducing enough noise to make the CPU swapping more erratic. This does seem to be "wrong", not so much that the process is changing CPU's, thats reasonable, but the fact that it's doing it with such regularity now. I know this has been reported before, and plausible explanations have been offered. However plausible isn't the same as "correct" and this does seem to be a symptom of a real problem, or at least a real change in behaviour. Peter ---------------------------------- E-Mail: Peter Waltenberg <pet...@surf.dascom.com> Date: 09-Aug-99 Time: 08:45:38 This message was sent by XFMail ---------------------------------- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Horst von Brand <vonbr...@sleipnir.valparaiso.cl> Subject: Re: SMP Scheduling Date: 1999/08/09 Message-ID: <fa.ipb41gv.g6a58h@ifi.uio.no>#1/1 X-Deja-AN: 510457379 Original-Date: Sun, 08 Aug 1999 21:28:50 -0400 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-Id: <199908090128.VAA04908@sleipnir.valparaiso.cl> References: <fa.mkkckbv.d4k6o4@ifi.uio.no> To: pet...@dascom.com X-Orcpt: rfc822;linux-kernel-outgoing-dig X-charset: ISO_8859-1 Organization: Internet mailing list Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Peter Waltenberg <pet...@dascom.com> said: [...] > Under 2.0 if you ran a CPU hog it'd pretty well stick to one CPU. How is that an improvement? > I.e. if you had xosview running you'd see one CPU at 100%, the other mostly > idle. If there was a load burst, it might move to the other CPU, but that was > pretty unusual. > Under 2.2 you see that one CPU hog hopping CPU's and at regular intervals. > Using xosview to track load what you see is a picket fence effect. > And there are more than "3 processes" running, more like 80 on my machine, > so running xosview alone shouldn't be enough to force this to happen and if > it were, the other processes should be introducing enough noise to make the > CPU swapping more erratic. If you have that many processes running, your hog will have its state at the CPU flushed anyway, so the CPU selected is irrelevant. > This does seem to be "wrong", not so much that the process is changing CPU's, > thats reasonable, but the fact that it's doing it with such regularity > now. File it under "random trivia" then ;-) -- Horst von Brand vonbr...@sleipnir.valparaiso.cl Casilla 9G, Viņa del Mar, Chile +56 32 672616 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Peter Waltenberg <pet...@dascom.com> Subject: Re: SMP Scheduling Date: 1999/08/09 Message-ID: <fa.mi4akrv.bki5o3@ifi.uio.no>#1/1 X-Deja-AN: 510465552 Original-Date: Mon, 09 Aug 1999 13:24:17 +1000 (EST) Sender: owner-linux-ker...@vger.rutgers.edu Content-Transfer-Encoding: 8bit Original-Message-ID: <XFMail.990809132417.peterw@surf.dascom.com> References: <fa.ipb41gv.g6a58h@ifi.uio.no> To: Horst von Brand <vonbr...@sleipnir.valparaiso.cl> X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Reply-To: pet...@dascom.com Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On 09-Aug-99 Horst von Brand wrote: > Peter Waltenberg <pet...@dascom.com> said: > [...] > >> Under 2.0 if you ran a CPU hog it'd pretty well stick to one CPU. > > How is that an improvement? > >> I.e. if you had xosview running you'd see one CPU at 100%, the other mostly >> idle. If there was a load burst, it might move to the other CPU, but that >> was >> pretty unusual. > >> Under 2.2 you see that one CPU hog hopping CPU's and at regular intervals. >> Using xosview to track load what you see is a picket fence effect. >> And there are more than "3 processes" running, more like 80 on my machine, >> so running xosview alone shouldn't be enough to force this to happen and if >> it were, the other processes should be introducing enough noise to make the >> CPU swapping more erratic. > > If you have that many processes running, your hog will have its state at > the CPU flushed anyway, so the CPU selected is irrelevant. > >> This does seem to be "wrong", not so much that the process is changing >> CPU's, >> thats reasonable, but the fact that it's doing it with such regularity >> now. > > File it under "random trivia" then ;-) I'd expect the process to be flushed, however in that case I'd expect it to be re-run on some random CPU. However that doesn't seem to happen, the process swaps CPU's at REGULAR intervals. The scheduler is supposedly designed so that a process will have a tendency to run on the same CPU. It's not that the process changes CPU's, it's that it's doing it at regular intervals that I find worrying. Yes, that could just be coincidence, or it could be a real problem. I would file it under random trivia, but the costs of moving a process from CPU to CPU are (relatively) quite high compared with the other kernel overheads and it SEEMS to be happening when it's not necessary. It's not just me seeing this, I think we have 3 or 4 separate reports now. Anyone want to produce figures for how large a % of our timeslice a cache refill takes ?. I get a very small number at 100HZ, however if we increase the scheduling rate that obviously gets worse fairly quickly. And, is there anyone out there with a box with > 2 CPU's ?, if there's no scheduler problem then with one CPU hog running I'd expect it to jump back and forward between two CPU's at most, if it gets cycled around all 4 CPU's in a regular pattern I'd say there's very likely to be a problem. Peter ---------------------------------- E-Mail: Peter Waltenberg <pet...@surf.dascom.com> Date: 09-Aug-99 Time: 12:49:43 This message was sent by XFMail ---------------------------------- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@suse.de> Subject: Re: SMP Scheduling Date: 1999/08/09 Message-ID: <fa.j4b0fov.1656n3c@ifi.uio.no>#1/1 X-Deja-AN: 510546052 Original-Date: Mon, 9 Aug 1999 12:06:09 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.10.9908091202070.7447-100000@laser.random> References: <fa.mi4akrv.bki5o3@ifi.uio.no> To: Peter Waltenberg <pet...@dascom.com> X-Sender: and...@laser.random Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list X-Public-Key-URL: http://e-mind.com/~andrea/aa.asc MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Mon, 9 Aug 1999, Peter Waltenberg wrote: >It's not that the process changes CPU's, it's that it's doing it at >regular intervals that I find worrying. Please apply this patch against 2.2.10 or 2.3.x and let me know if it helps: ftp://ftp.suse.com/pub/people/andrea/kernel-patches/2.2.10/SMP-scheduler-2.2.10-C ftp://master.softaplic.com.br/pub/andrea/kernel-patches/2.2.10/SMP-scheduler-2.2.10-C ftp://ftp.linux.it/pub/People/andrea/kernel-patches/2.2.10/SMP-scheduler-2.2.10-C ftp://e-mind.com/pub/andrea/kernel-patches/2.2.10/SMP-scheduler-2.2.10-C Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Peter Waltenberg <pet...@dascom.com> Subject: SMP Scheduling. Followup Date: 1999/08/16 Message-ID: <fa.mekijjv.e4q5g3@ifi.uio.no>#1/1 X-Deja-AN: 513241626 Original-Date: Mon, 16 Aug 1999 14:30:51 +1000 (EST) Sender: owner-linux-ker...@vger.rutgers.edu Content-Transfer-Encoding: 8bit Original-Message-ID: <XFMail.990816143051.peterw@surf.dascom.com> To: linux-ker...@vger.rutgers.edu X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Reply-To: pet...@dascom.com Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu O.K. since I noted that there might be problems with SMP scheduling I've collected quite a collection of replies. Some fall into the "plausible but not necessarilly correct" category xosview is causing the problem. A red herring. Yes running it can disturb the scheduler, is that the cause of the problem ?, probably not. (See below). Interrupts. Yes, they get serviced on both CPU's, however interrupts don't get scheduled, they just eat a (hopefully small) hole in cache and CPU time then go away again. And in the best traditions of Linux... here's the code. Thanks to Andrea Arcangeli for the alternate scheduling policy. ============================ CUT ================================ /* Program to check for scheduling problems on SMP systems. */ #include <stdlib.h> #include <time.h> #define NITER 100000 /* change this to match your cache size */ #define BUFLEN (128*1024) /* 128k (size of Celeron Cache) */ int p[BUFLEN/sizeof(int)]; /*Hopefully gcc aligns this for us */ void main() { time_t t,t1; int i; while(1) { time(&t); for(i = 0; i < NITER; i++) memset(p,i++,BUFLEN); time(&t1); t1 -= t; printf("%d seconds for %d iterations\n",t1,NITER); } } ============================ CUT =================================== Standard Scheduler Andrea SMP-C Console 17-18 seconds 9-10 seconds Console + 24-26 seconds 9-10 seconds X (xdm login) I'll agree it's fairly pathogical, but it's also the limiting case of a well written x86 program. It does most of it's work in cache. Programs that are written with performance in mind will tend to approach this. Note: This is the hit a single cache heavy process takes with the current scheduler, it's possibly representative of games and some simulation work, how well that relates to "real life" is another matter. Results with multiple "hog" processes running also show Andrea's scheduler performing better than the standard one. If anyone has doubts, there's the code. Adjust the buffer size to match the cache size in your machine and run it yourself. Andrea's patches are available on: ftp://ftp.suse.com/pub/people/andrea/kernel-patches/ I'm not saying we should re-write the scheduler on the basis of one pathological test case, BUT there is now hard evidence to show that there are cases where the current scheduler is far from optimal, and that it can be altered to obtain substantial improvements. Peter ---------------------------------- E-Mail: Peter Waltenberg <pet...@surf.dascom.com> Date: 16-Aug-99 Time: 14:19:01 This message was sent by XFMail ---------------------------------- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/