Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com! newsfeed.direct.ca!look.ca!newshub2.rdc1.sfba.home.com!news.home.com! newshub1-work.rdc1.sfba.home.com!gehenna.pell.portland.or.us! nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96 Newsgroups: mlist.linux.kernel Date: Fri, 4 Jan 2002 18:05:23 +0100 (CET) From: Ingo Molnar <mi...@elte.hu> Reply-To: <mi...@elte.hu> X-To: <linux-ker...@vger.kernel.org> X-Cc: Linus Torvalds <torva...@transmeta.com>, Alan Cox <a...@lxorguk.ukuu.org.uk>, Anton Blanchard <an...@samba.org> Subject: [patch] O(1) scheduler, 2.4.17-A1, 2.5.2-pre7-A1. Message-ID: <linux.kernel.Pine.LNX.4.33.0201041743050.8766-100000@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Approved: n...@nntp-server.caltech.edu Lines: 66 this is the next release of the O(1) scheduler: http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.5.2-A1.patch http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.4.17-A1.patch this release includes fixes and small improvements. (The 2.5.2-A1 patch is against the 2.5.2-pre7 kernel.) I cannot reproduce any more failures with this patch, but i couldnt test the vfat lockup problem. The X lockup problem never occured on any of my boxes, but it might be fixed by one of the changes included in this patch nevertheless. Changes: - idle process notification fixes. This fixes the idle=poll breakage reported by Anton Blanchard. - fix a bug in setscheduler() which crashed if a non-SCHED_OTHER task did a setscheduler() call. This fixes the crash reported by Randy Hron. The Linux Test Project's syscall tests do not cause a crash anymore. - do some more unlikely()/likely() tagging of branches along the hotpath, suggested by Jeff Garzik. - fix the compile failures in md.c and loop.c and other files, reported by many people. - fix the too-big-by-one error in the bitmat sizing define, noticed by Anton Blanchard. - fix a bug in rt_lock() + setscheduler() that had a potential for a spinlock lockup. - introduce the idle_tick() function, so that idle CPUs can do load-balancing as well. - do LINUX_VERSION_CODE checking in jffs2 (Jeff Garzik) - optimize the big-kernel-lock releasing/acquiring code some more. From now on it's absolutely illegal to schedule() from cli()-ed code. (not that it was legal.) This moves a few instructions off the scheduler hotpath. - move the ->need_resched setting into idle_init(). - do not clear RT tasks in reparent_to_init(). There's nothing bad with running RT tasks in the background. - RT task's priority order was inverted, it should be 0-99, not 99-0. - make load-balancing a bit less eager when there are lots of processes running: it needs a ~10% imbalance in runqueue lengths to trigger a rebalance. - (there is a small hack in serial.c in the 2.5.2-pre7 patch, to make it compile.) Comments, bug reports, suggestions are welcome, Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com! newsfeed.direct.ca!look.ca!newshub2.rdc1.sfba.home.com!news.home.com! newshub1-work.rdc1.sfba.home.com!gehenna.pell.portland.or.us! nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96 Newsgroups: mlist.linux.kernel Date: Mon, 7 Jan 2002 20:23:41 +0100 (CET) From: Ingo Molnar <mi...@elte.hu> Reply-To: <mi...@elte.hu> X-To: Linus Torvalds <torva...@transmeta.com> X-Cc: <linux-ker...@vger.kernel.org>, george anzinger <geo...@mvista.com>, Davide Libenzi <davi...@xmailserver.org> Subject: [patch] O(1) scheduler, -D0, 2.5.2-pre9, 2.4.17 Message-ID: <linux.kernel.Pine.LNX.4.33.0201071952270.11688-100000@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Approved: n...@nntp-server.caltech.edu Lines: 74 i've uploaded an updated O(1) scheduler patch: http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.5.2-D0.patch http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.4.17-D0.patch this release uses Linus' idea of merging RT task priorities into the normal scheduler priority bitspace. This allowed the removal of all the ugly RT-related special-case code: the RT and non-RT schedulers are united again. It's all just one kind of task - an RT task is 'just' a task with lower priority. The RT locking/unlocking code is completely gone. rt_schedule() is gone. There is only a single rt_task() branch in the scheduler hotpaths. I cannot overemphasize the level of cleanups this enabled. Eg. schedule() itself has become a very simple, 60 lines long function. If compiled with a gcc 3.1-ish compiler that knows about likely()/unlikely() the schedule() function has just a two taken branches in the hotpath! The rest is straight fall-through code. Altogether, the cleanups reduced sched.c's source code size by more than 10%! to enable the fast searching of the 100 + 40 bits bitmap, i've shifted the SCHED_OTHER bitspace to 128-167. The RT task queues are in bit 0-99. The 100-128 bits are in essence unused. This way the bit-searching can be done very quickly for the common (no RT) case, on x86: static inline int sched_find_first_zero_bit(char *bitmap) { unsigned int *b = (unsigned int *)bitmap; unsigned int rt; rt = b[0] & b[1] & b[2] & b[3]; if (unlikely(rt != 0xffffffff)) return find_first_zero_bit(bitmap, MAX_RT_PRIO); if (b[4] != ~0) return ffz(b[4]) + MAX_RT_PRIO; return ffz(b[5]) + 32 + MAX_RT_PRIO; } also, the layout of the 'normal' task queues is thus cacheline aligned. (and even in the RT case the find_first_zero_bit() is hand-optimized assembly code as well.) There is no measurable difference between the context-switch times of the -C1 patch and this patch, both do 1.57 usecs on a 466 MHz Celeron. RT tasks can still be made 'global' at any later point, by doing directed wakeups towards lower priority CPUs. (The wakeup path has a rt_task() branch already so there would be no wakeup overhead for normal tasks.) The patch is stable on my boxes, and two alpha-testers reported that this patch fixes the crashes they saw with earlier patches. Changelog: - export set_user_nice (Jens Axboe) - report correctly scaled priorities via /proc. (this unbreaks 'top' priority output.) - speeded up the task-load estimator a bit. - cleaned up slip.c's and reiserfs/buffer2.c's scheduler usage. - lock both runqueues in init_idle(), this could explain some of the boot-time SMP crashes Anton saw. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com! newsfeed.direct.ca!look.ca!newshub2.rdc1.sfba.home.com!news.home.com! newshub1-work.rdc1.sfba.home.com!gehenna.pell.portland.or.us! nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96 Newsgroups: mlist.linux.kernel Date: Wed, 9 Jan 2002 19:22:00 +0100 (CET) From: Ingo Molnar <mi...@elte.hu> Reply-To: <mi...@elte.hu> X-To: <linux-ker...@vger.kernel.org> X-Cc: Linus Torvalds <torva...@transmeta.com>, Mike Kravetz <krav...@US.IBM.COM>, Anton Blanchard <an...@samba.org>, george anzinger <geo...@mvista.com>, Davide Libenzi <davi...@xmailserver.org> Subject: [patch] O(1) scheduler, -G1, 2.5.2-pre10, 2.4.17 Message-ID: <linux.kernel.Pine.LNX.4.33.0201091824570.5876-100000@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Approved: n...@nntp-server.caltech.edu Lines: 115 this is the latest update of the O(1) scheduler: http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.5.2-pre10-G1.patch http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.4.17-G1.patch this patch contains fixes to the scheduler (mainly from Rusty Russel), and it also contains a much reworked load-balancing code, triggered by Mike Kravetz's analysis/numbers. the previous load-balancer had a number of childhood problems, the biggest problem was that it rebalanced runqueues way too often. Also, it sometimes got into a load-balancing resonance. A usual 'make -j15' kernel compile on an 8-way box generates about 70 thousand reschedules until it finishes. Under the stock 2.5.2-pre10 kernel, about 20% of those reschedules were CPU-unaffine, ie. they scheduled to a task that was load-balanced over into this queue from another CPU. With 2.5.2-pre10 + -G1, the number of total 'incorrect' reschedules is down to 0.6%, and even the majority of those is caused by 'idle-pull' rebalancing: a situation that inevitably causes an unaffine reschedule. The number of 'unforced' unaffine reschedules is down to 0.2%. fairness is equally good with both kernels, both the -G1 and the vanilla kernel distribute CPU-using processes equally well between CPUs. the new load balancer in the -G1 patch has the following logic: there are two kinds of load-balancing activities, 'idle balancing' and 'fairness balancing'. Idle balancing must happen if any CPU runs out of processes - in this case it must find some new work or else it will stay idle and the CPU power goes unused. Fairness rebalancing must happen to even out the runqueues between CPUs - to avoid a situation where eg. 5 processes are running on one CPU, and 1 process is running on the other CPU - the processes on CPU#0 will only see 20% of single-CPU performance. The 'fair' distribution is to run 3-3 processes on both CPUs, so each process will get a fair 33% share of single-CPU performance. Whenever an idle rebalance situation happens, we try to find a new process for the soon-to-be-idle CPU. The CPU searches all the other CPUs and takes processes from the CPU that has the longest runqueue. The idle CPU pulls only a *single* process - this is the minimum we must do to avoid the CPU going idle. Fairness rebalancing happens at a 250 msec pace, which 'rebalance tick' happens on every CPU, every 250 msecs. In this case we will rebalance multiple processes as well if needed. A commonly occuring situation is that processes rush to a runqueue and go off the runqueue quickly. Such 'fluctuations' of runqueue lengths must not result in unnecessery rebalancing. Thus the fairness rebalancing code uses a (simple & fast) method of recording the runqueue length on any particular CPU in the last rebalancing tick. The balancer takes the shorter runqueue length value of the 'previous' and 'current' length, discarding the longer one as statistical fluctuation. This mechanizm works pretty well: if a runqueue is long during a long period of time, then the balancer will 'accept' that the queue is long and will rebalance it. If the runqueue is only temporarily long then the load-balancer will not balance it. in essence the fairness rebalancer establishes an 'average runqueue length' of sorts by sampling the runqueue length - without adding overhead to the actual runqueue manipulation code (wake_up() & schedule()). There exist more accurate methods of sampling runqueue length, but the current method works pretty well already. [ there is one possible improvement to this logic that i'll add, it's the ability of the wakeup code to trigger an idle rebalance. The wakeup code does not want to trigger a fairness rebalance, the fairness rebalance is purely timer-driven, ] anyway, here are some kernel compilation times in seconds, on an 8-way, Xeon, 700 MHz, 2MB L2 cache box. [lower numbers are better, results are the best results from 4 successive runs, kernel tree fully cached, exactly the same kernel tree was compiled under every kernel]: time make -j15 bzImage 2.4.17-vanilla: 44.6 sec +- 0.2 sec 2.5.2-pre9-vanilla: 45.3 sec +- 0.2 sec 2.5.2-pre10-vanilla: 45.4 sec +- 0.2 sec 2.5.2-pre10-G1: 43.4 sec +- 0.2 sec ie. the -G1 kernel compiles kernels faster than any other kernel i tried. Changes: - Rusty Russell: fix rebalance tick definition if HZ < 100 in UML. - Rusty Russell: check for new_mask in set_cpus_allowed(), to be sure. - Rusty Russell: clean up rq_ macros so that HT can be done by changing just one of the macros. - Rusty Russell: remove rq->cpu. - Rusty Russell: remove cacheline padding from runqueue_t, it's pointless now. - Rusty Russell: sched.c comment fixes. - increase minimum timeslice length by 10 msec. - fix comments in sched.h Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com! newsfeed.direct.ca!look.ca!newsfeed.media.kyoto-u.ac.jp!uio.no! nntp.uio.no!ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Fri, 11 Jan 2002 01:38:51 +0100 (CET) From: Ingo Molnar <mi...@elte.hu> Reply-To: <mi...@elte.hu> To: Linus Torvalds <torva...@transmeta.com> Cc: <linux-ker...@vger.kernel.org> Subject: [patch] O(1) scheduler, -H5 Original-Message-ID: <Pine.LNX.4.33.0201110130290.11478-100000@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Thu, 10 Jan 2002 22:44:29 GMT Message-ID: <fa.nvkb4lv.1cmsmo6@ifi.uio.no> Lines: 27 the -H5 patch adds a debugging check: http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.5.2-pre11-H5.patch it adds code to catch places that call schedule() from global-cli() sections. Right now release_kernel_lock() doesnt automatically release the IRQ lock if there is no kernel lock held. A fair amount of code does this still, and i think we should fix them in 2.5. (Such code, while of questionable quality, is safe if it also holds the big kernel lock, but it's definitely SMP-unsafe it doesnt hold the bkl - the BUG() assert only catches the later case.) (Andi Kleen noticed this on the first day the patch was released, and Andrew Morton reminded me today that i forgot to fix it ... :-| ) my systems do not trigger the BUG(), so there cannot be all that much broken code left. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com! news.tele.dk!small.news.tele.dk!193.213.112.26!newsfeed1.ulv.nextra.no! nextra.com!uninett.no!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Sun, 13 Jan 2002 20:34:39 +0100 (CET) From: Ingo Molnar <mi...@elte.hu> Reply-To: <mi...@elte.hu> To: <linux-ker...@vger.kernel.org> Cc: Linus Torvalds <torva...@transmeta.com>, Anton Blanchard <an...@samba.org> Subject: [patch] O(1) scheduler, -H7 Original-Message-ID: <Pine.LNX.4.33.0201131933500.6560-100000@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Sun, 13 Jan 2002 17:38:48 GMT Message-ID: <fa.o6pdg0v.s52613@ifi.uio.no> Lines: 34 the -H7 patch is available: http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.5.2-pre11-H7.patch http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.4.17-H7.patch there is an important SMP fix in this release, found by Anton Blanchard: double-spin_unlock()ing triggered oopses on high-end SMP boxes. stability status: all reported problems were fixed by -H6, the only problem remaining was Anton's SMP crashes, which should be fixed in this -H7 patch. Changes between -H6 and -H7: - Anton Blanchard: fix double spin_unlock in sched.c. This fixes a high-end SMP oops he saw. - William Lee Irwin III: fix mwave's ->nice code. - cleanup: mmu_context.h renamed to sched.h, as suggested by Richard Henderson. - added a irqs_enabled() macro to the x86 tree, to simplify irq.c. Bug reports, comments, suggestions welcome. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/