From: rgo...@atnf.csiro.au (Richard Gooch) Subject: Interesting scheduling times Date: 1998/09/16 Message-ID: <199809161443.AAA00304@workaholix.atnf.CSIRO.AU> X-Deja-AN: 391751418 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner Newsgroups: muc.lists.linux-kernel Hi, all. I've been playing around with measuring Linux context switch times, and I noticed something curious: a Pentium/MMX 200 is doing much better than a PPro 180. Furthermore, a PPro 180 isn't doing heaps better than a Pentium 100. CPU process switch thread switch Kernel version Pentium 100 12 12 2.1.109 PPro 180 8 4 2.1.122-pre2 Pentium/MMX 200 4 2 2.1.104 all times in microseconds for UP machines. Do these times seem a little odd to people? FYI: I've appended my testcode. Regards, Richard.... =============================================================================== Code - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: ander...@inconnect.com (Erik Andersen) Subject: Re: Interesting scheduling times Date: 1998/09/16 Message-ID: <Pine.GSO.4.02A.9809161407400.25101-100000@ultra1>#1/1 X-Deja-AN: 391891454 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199809161443.AAA00304@workaholix.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Thu, 17 Sep 1998, Richard Gooch wrote: > Hi, all. I've been playing around with measuring Linux context > switch times, and I noticed something curious: a Pentium/MMX 200 is > doing much better than a PPro 180. Furthermore, a PPro 180 isn't doing > heaps better than a Pentium 100. > > CPU process switch thread switch Kernel version > Pentium 100 12 12 2.1.109 > PPro 180 8 4 2.1.122-pre2 > Pentium/MMX 200 4 2 2.1.104 > > all times in microseconds for UP machines. > > Do these times seem a little odd to people? Maybe you should run them all using the same kernel, so we see an "apples-to-apples" comparison. I lot has happened during the last 20 kernel releases... -Erik -- Erik B. Andersen Web: http://www.inconnect.com/~andersen/ email: ander...@debian.org --This message was written using 73% post-consumer electrons-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: rgo...@atnf.csiro.au (Richard Gooch) Subject: Re: Interesting scheduling times Date: 1998/09/16 Message-ID: <199809162351.JAA17376@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 391915302 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.GSO.4.02A.9809161407400.25101-100000@ultra1> Newsgroups: muc.lists.linux-kernel Erik Andersen writes: > On Thu, 17 Sep 1998, Richard Gooch wrote: > > > Hi, all. I've been playing around with measuring Linux context > > switch times, and I noticed something curious: a Pentium/MMX 200 is > > doing much better than a PPro 180. Furthermore, a PPro 180 isn't doing > > heaps better than a Pentium 100. > > > > all times in microseconds for UP machines. > > > > Do these times seem a little odd to people? > > Maybe you should run them all using the same kernel, so we see an > "apples-to-apples" comparison. I lot has happened during the last > 20 kernel releases... OK: CPU process switch thread switch Kernel version Pentium 100 12 12 2.1.109 Pentium 100 19 9 2.1.122-pre2 PPro 180 8 4 2.1.122-pre2 Pentium/MMX 200 6 2 2.1.122-pre2 Pentium/MMX 200 4 2 2.1.104 Even more interesting: 2.1.122-pre2 is slower for process switching and is faster for thread switching than 2.1.104/2.1.109. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: lin...@z.ml.org (Gregory Maxwell) Subject: Re: Interesting scheduling times Date: 1998/09/17 Message-ID: <Pine.LNX.3.96.980917000946.21056A-100000@z.ml.org>#1/1 X-Deja-AN: 391951245 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199809162351.JAA17376@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Thu, 17 Sep 1998, Richard Gooch wrote: [snip] > > OK: > > CPU process switch thread switch Kernel version > Pentium 100 12 12 2.1.109 > Pentium 100 19 9 2.1.122-pre2 L1 is 2 way. 16k > PPro 180 8 4 2.1.122-pre2 L1 is 2way/4way 16k > Pentium/MMX 200 6 2 2.1.122-pre2 L1 is 4way 32k > Pentium/MMX 200 4 2 2.1.104 [snip] I'm sure if you test a PII it will be alot more like the PMMX. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: rgo...@atnf.csiro.au (Richard Gooch) Subject: Re: Interesting scheduling times Date: 1998/09/17 Message-ID: <199809170431.OAA20382@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 391947277 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96.980917000946.21056A-100000@z.ml.org> Newsgroups: muc.lists.linux-kernel Gregory Maxwell writes: > On Thu, 17 Sep 1998, Richard Gooch wrote: > > [snip] > > > > OK: > > > > CPU process switch thread switch Kernel version > > Pentium 100 12 12 2.1.109 > > Pentium 100 19 9 2.1.122-pre2 > L1 is 2 way. 16k > > > PPro 180 8 4 2.1.122-pre2 > L1 is 2way/4way 16k > > > Pentium/MMX 200 6 2 2.1.122-pre2 > L1 is 4way 32k > > Pentium/MMX 200 4 2 2.1.104 > [snip] Yeah, I know about the better caches. I'm just surprised at the differences. More importantly, I'm surprised by the slowdown from 2.1.104 to 2.1.122-pre2. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: pet...@varel.bg (Petko Manolov) Subject: Re: Interesting scheduling times Date: 1998/09/17 Message-ID: <36011D0C.1028BAF8@varel.bg>#1/1 X-Deja-AN: 392021034 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199809161443.AAA00304@workaholix.atnf.CSIRO.AU> Organization: Varel Ltd. Newsgroups: muc.lists.linux-kernel Richard Gooch wrote: > > CPU process switch thread switch Kernel version > Pentium 100 12 12 2.1.109 > PPro 180 8 4 2.1.122-pre2 > Pentium/MMX 200 4 2 2.1.104 I did a little different test on my pentium 166 MMX: kernel 2.0.35 5us kernel 2.1.121 6us kernel 2.1.122 6us - 7us kernel 2.1.122 9us (under X Windows???) Petkan -- Petko Manolov - pet...@varel.bg http://www.varel.bg/~petkan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: rgo...@atnf.csiro.au (Richard Gooch) Subject: Re: Interesting scheduling times Date: 1998/09/17 Message-ID: <199809171358.XAA25476@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 392071993 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <36011D0C.1028BAF8@varel.bg> Newsgroups: muc.lists.linux-kernel Petko Manolov writes: > Richard Gooch wrote: > > > > CPU process switch thread switch Kernel version > > Pentium 100 12 12 2.1.109 > > PPro 180 8 4 2.1.122-pre2 > > Pentium/MMX 200 4 2 2.1.104 > > I did a little different test on my pentium 166 MMX: > > kernel 2.0.35 5us > kernel 2.1.121 6us > kernel 2.1.122 6us - 7us > kernel 2.1.122 9us (under X Windows???) Did you run it as root? If you don't, the programme competes with X for time. Running it as root allows it to grab all the CPU it wants, and thus gives a more accurate test. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: pet...@varel.bg (Petko Manolov) Subject: Re: Interesting scheduling times Date: 1998/09/17 Message-ID: <3601441B.25E31CA5@varel.bg>#1/1 X-Deja-AN: 392116202 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199809161443.AAA00304@workaholix.atnf.CSIRO.AU> Organization: Varel Ltd. Newsgroups: muc.lists.linux-kernel Richard Gooch wrote: > > Petko Manolov writes: > > I did a little different test on my pentium 166 MMX: > > > > kernel 2.0.35 5us > > kernel 2.1.121 6us > > kernel 2.1.122 6us - 7us > > kernel 2.1.122 9us (under X Windows???) > > Did you run it as root? If you don't, the programme competes with X > for time. Running it as root allows it to grab all the CPU it wants, > and thus gives a more accurate test. Yes, i was root during all tests. But i thought that software switch is faster than hardware. Other point is if this times can be correctly computed from user level? Petkan -- Petko Manolov - pet...@varel.bg http://www.varel.bg/~petkan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: torva...@transmeta.com (Linus Torvalds) Subject: Re: Interesting scheduling times Date: 1998/09/17 Message-ID: <6trfc3$io7$1@palladium.transmeta.com>#1/1 X-Deja-AN: 392174868 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199809162351.JAA17376@vindaloo.atnf.CSIRO.AU> Organization: Transmeta Corporation, Santa Clara, CA Newsgroups: muc.lists.linux-kernel In article <199809170431.OAA20...@vindaloo.atnf.CSIRO.AU>, Richard Gooch <rgo...@atnf.csiro.au> wrote: > >More importantly, I'm surprised by the slowdown from 2.1.104 to >2.1.122-pre2. What happened is that the very latest 2.1.x kernels are using a software context switch, rather than just a jump through a task-gate and using the hardware context switch code. Intel documents hardware context switching as slow, and various people have at times complained to be about using it. They can now see _why_ I used it. The reason I had to make the context switch be done in software rather than hardware was that I couldn't fix an oops any other way. What happens is that in order to streamline various other parts, I really cannot guarantee that all user-level segment registers are always up-to-date - when processes like Wine and DOSEMU change the LDT, the process context may no longer be valid in another thread, and with the hardware context switching I could force an oops in the context switch that I had no way to recover from. In contrast, with the slightly slower software context switch I can recover gracefully from bad segments. Oh, well. 2.0.x had this same problem too, but it's harder to trigger because threads don't share the LDT in 2.0.x. In addition, in 2.0.x the context switch isn't protected by a spinlock, so the oops was less damaging: in 2.1.x the oops would result in a completely dead system due to the scheduler spinlock being held stale. The new software context switch routine might be slightly optimizable, so we may be able to make it faster, but on the whole it's always easier to make buggy code faster than correct code. And while I am a performance freak, fixing bugs takes precedence.. Oh, another change is that in the later kernels the scheduler also correctly does the "tq_scheduler" bottom half processing, which earlier 2.1.x kernels didn't do because I couldn't handle the kernel lock correctly. That may or may not make a difference. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: torva...@transmeta.com (Linus Torvalds) Subject: Re: Interesting scheduling times Date: 1998/09/18 Message-ID: <6ts8hp$uim$1@palladium.transmeta.com>#1/1 X-Deja-AN: 392305442 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199809162351.JAA17376@vindaloo.atnf.CSIRO.AU> Organization: Transmeta Corporation, Santa Clara, CA Newsgroups: muc.lists.linux-kernel In article <6trfc3$io...@palladium.transmeta.com>, Linus Torvalds <torva...@transmeta.com> wrote: >In article <199809170431.OAA20...@vindaloo.atnf.CSIRO.AU>, >Richard Gooch <rgo...@atnf.csiro.au> wrote: >> >>More importantly, I'm surprised by the slowdown from 2.1.104 to >>2.1.122-pre2. > >What happened is that the very latest 2.1.x kernels are using a software >context switch, rather than just a jump through a task-gate and using >the hardware context switch code. Happily I checked some more, just to make sure, and profiled the kernel. Yes, it happened during the switch-over to a software task-switch routine, but the reason the numbers got worse for some people is simply that I screwed up the floating point save code, and it saved _every_ time through instead of doing the lazy save it was meant to do. The effect of this is not huge, but it's certainly noticeable. With that fixed, the software context switch is pretty comparable to the hardware one. 2.1.123 should have this context switch slowdown fixed. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: rgo...@atnf.csiro.au (Richard Gooch) Subject: Re: Interesting scheduling times Date: 1998/09/18 Message-ID: <199809180023.KAA29236@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 392305441 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <3601441B.25E31CA5@varel.bg> Newsgroups: muc.lists.linux-kernel Petko Manolov writes: > Richard Gooch wrote: > > > > Petko Manolov writes: > > > I did a little different test on my pentium 166 MMX: > > > > > > kernel 2.0.35 5us > > > kernel 2.1.121 6us > > > kernel 2.1.122 6us - 7us > > > kernel 2.1.122 9us (under X Windows???) > > > > Did you run it as root? If you don't, the programme competes with X > > for time. Running it as root allows it to grab all the CPU it wants, > > and thus gives a more accurate test. > > Yes, i was root during all tests. OK, I think what this means is that under X you had more processes on the run queue. Since the scheduler has to scan the run queue, the longer it is the more time the scheduler takes. You can see the effect of this by using the <num_running> option: this will add the specified number of (low priority) processes to the run queue. Increasing this number will slow the scheduler down. In fact, this indicates that Linux only maintains a single run queue. It could be argued that for RT performance, we'd be better off with two run queues: one for SCHED_OTHER and the other for RT scheduling classes. This way scheduling times for RT processes would not be affected by the number of normal processes on the run queue. However, it could be counter-argued that there are not usually many processes on the run queue anyway. This is likely true for embedded systems, but isn't necessarily the case for a large system controlling an instrument as well as supporting users [1]. Another possible advantage of separating the run queue would be that it would then become cheap to *always* call schedule() (or perhaps schedule_rt()) upon return from interrupt/syscall. In most cases there won't be any RT processes on the run queue, so this would only add a few cycles overhead to the scheduler. The advantage of doing this is that RT processes which are woken up by a driver will preempt whatever is currently running, and hence will actually start running immediately, rather than waiting N jiffies. This would make Linux more RT-friendly. Linus: what do you think of this idea? A valid project for 2.3? I have to say I'm impressed with the soft-RT performance of Linux. In my view the main limitation is the jiffies delay between when an RT process is unblocked and when it starts running. > But i thought that software switch is faster than hardware. Not always. See Linus' message about this. > Other point is if this times can be correctly computed from user > level? We have a pretty accurate gettimeofday(2) syscall, which makes doing this easy. Particularly on the Pentium and above, where we use the TSC for greater accuracy. I'm not sure how good it is on the 486 and lower. In any case, in my code I'm careful to amortise the cost of getting the time and reducing the inaccuracies of the timestamp by running through the scheduler in a loop. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: torva...@transmeta.com (Linus Torvalds) Subject: Re: Interesting scheduling times Date: 1998/09/18 Message-ID: <Pine.LNX.3.96.980917174717.495A-100000@penguin.transmeta.com>#1/1 X-Deja-AN: 392309321 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199809180023.KAA29236@vindaloo.atnf.CSIRO.AU> Reply-To: Linus Torvalds <torva...@transmeta.com> Newsgroups: muc.lists.linux-kernel On Fri, 18 Sep 1998, Richard Gooch wrote: > > Linus: what do you think of this idea? A valid project for 2.3? > I have to say I'm impressed with the soft-RT performance of Linux. In > my view the main limitation is the jiffies delay between when an RT > process is unblocked and when it starts running. A single run-queue is almost always better than multiple run-queues, and I'm very unlikely to change that. The reason for a single run-queue is that it's about 10 times simpler than any of the alternatives, and it's never slower in real life. Yes, we may end up walking a few more entries, but the simplicity more than pays back the cost of that walk. Even under heavy load, the runqueue is seldom more than a few entries deep. More than 10 entries on the run-queue is already very rare, and when it does happen the scheduling overhead is very small compared to what else the machine is doing: having that many entries implies that the scheduler isn't your biggest bottle-neck anyway. That said, the idea of just having two run-queues, one with real-time processes and one without is so far the best multi-runqueue idea I've heard. So yes, I could imagine doing something like that, but I still don't actually believe that the run-queue is the major bottle-neck. Linus PS. Here's the patch to make 2.1.122 perform as it should wrt scheduling, and not save the FP register state all the time. Embarrassing. diff -u --recursive --new-file v2.1.122/linux/arch/i386/kernel/process.c linux/arch/i386/kernel/process.c --- v2.1.122/linux/arch/i386/kernel/process.c Thu Sep 17 17:53:34 1998 +++ linux/arch/i386/kernel/process.c Thu Sep 17 17:41:51 1998 @@ -540,10 +540,10 @@ static inline void unlazy_fpu(struct task_struct *tsk) { if (tsk->flags & PF_USEDFPU) { - tsk->flags &= ~PF_USEDFPU; __asm__("fnsave %0":"=m" (tsk->tss.i387)); - stts(); asm volatile("fwait"); + tsk->flags &= ~PF_USEDFPU; + stts(); } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: pet...@varel.bg (Petko Manolov) Subject: Re: Interesting scheduling times Date: 1998/09/18 Message-ID: <36023855.4BE148E8@varel.bg>#1/1 X-Deja-AN: 392359527 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96.980917174717.495A-100000@penguin.transmeta.com> Organization: Varel Ltd. Newsgroups: muc.lists.linux-kernel Linus Torvalds wrote: > > PS. Here's the patch to make 2.1.122 perform as it should wrt scheduling, > and not save the FP register state all the time. Embarrassing. OK, i've applied your patch and got strange results: first time i run Richards prog i got 4us. All next values were 6us. I reboot the machine and the result was the same! I'm afraid i don't understand what exactly the patch does. At first glance only the sequence of lines is changed - so what? Or there is some relatione between "fnsave" and setting task switch bit? Petkan P.S. I'll try to do some profile in kernel mode -- Petko Manolov - pet...@varel.bg http://www.varel.bg/~petkan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: rgo...@atnf.csiro.au (Richard Gooch) Subject: Re: Interesting scheduling times Date: 1998/09/18 Message-ID: <199809180751.RAA02219@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 392352642 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <36023855.4BE148E8@varel.bg> Newsgroups: muc.lists.linux-kernel Petko Manolov writes: > Linus Torvalds wrote: > > > > PS. Here's the patch to make 2.1.122 perform as it should wrt scheduling, > > and not save the FP register state all the time. Embarrassing. > > OK, i've applied your patch and got strange results: > first time i run Richards prog i got 4us. All next > values were 6us. I reboot the machine and the result > was the same! > I'm afraid i don't understand what exactly the patch > does. At first glance only the sequence of lines is > changed - so what? Or there is some relatione between > "fnsave" and setting task switch bit? Actually, after some further testing, I get the same result! The first time it's faster, then subsequent runs are slower (when using RT processes). For non-RT processes, the times are stable. Strange. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: rgo...@atnf.csiro.au (Richard Gooch) Subject: Re: Interesting scheduling times Date: 1998/09/18 Message-ID: <199809180637.QAA00514@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 392362734 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <6ts8hp$uim$1@palladium.transmeta.com> Newsgroups: muc.lists.linux-kernel Linus Torvalds writes: > Yes, it happened during the switch-over to a software task-switch > routine, but the reason the numbers got worse for some people is simply > that I screwed up the floating point save code, and it saved _every_ > time through instead of doing the lazy save it was meant to do. > > The effect of this is not huge, but it's certainly noticeable. With > that fixed, the software context switch is pretty comparable to the > hardware one. > > 2.1.123 should have this context switch slowdown fixed. I tried the patch you sent, and it makes no noticable difference on my PPro 180 (less than 1 us). Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: torva...@transmeta.com (Linus Torvalds) Subject: Re: Interesting scheduling times Date: 1998/09/18 Message-ID: <Pine.LNX.3.96.980918110555.7095B-100000@penguin.transmeta.com>#1/1 X-Deja-AN: 392564799 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199809180751.RAA02219@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Fri, 18 Sep 1998, Richard Gooch wrote: > > Actually, after some further testing, I get the same result! The first > time it's faster, then subsequent runs are slower (when using RT > processes). For non-RT processes, the times are stable. Strange. I haven't even looked at the benchmark you seem to be talking about - I use "lmbench" myself which I trust to be reasonably realistic. It certainly showed an effect of my FPU screwup, although it's not all that large on any reasonable system (it's probably horrible on a i386/i387 combination where FP operations are slower). lmbench uses a set of pipes and passes a token around to force scheduling, and that should work fine. I'd be nervous about any other kind of scheduling benchmark. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/