Date: Thu, 15 Mar 2001 02:40:04 +0100 From: Nigel Gamble <ni...@nrg.org> Reply-To: ni...@nrg.org Subject: [PATCH for 2.5] preemptible kernel Message-ID: <Pine.LNX.4.05.10103141653350.3094-100000@cosmic.nrg.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: 35637.anti-phl.bofh.it Newsgroups: linux.kernel Organization: linux.*_mail_to_news_unidirectional_gateway Path: supernews.google.com!sn-xit-03!supernews.com!bofh.it!robomod X-Original-Cc: linux-ker...@vger.kernel.org X-Original-Date: Wed, 14 Mar 2001 17:25:22 -0800 (PST) X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: Linus Torvalds <torva...@transmeta.com> Lines: 943 Here is the latest preemptible kernel patch. It's much cleaner and smaller than previous versions, so I've appended it to this mail. This patch is against 2.4.2, although it's not intended for 2.4. I'd like comments from anyone interested in a low-latency Linux kernel solution for the 2.5 development tree. Kernel preemption is not allowed while spinlocks are held, which means that this patch alone cannot guarantee low preemption latencies. But as long held locks (in particular the BKL) are replaced by finer-grained locks, this patch will enable lower latencies as the kernel also becomes more scalable on large SMP systems. Notwithstanding the comments in the Configure.help section for CONFIG_PREEMPT, I think this patch has a negligible effect on throughput. In fact, I got better average results from running 'dbench 16' on a 750MHz PIII with 128MB with kernel preemption turned on (~30MB/s) than on the plain 2.4.2 kernel (~26MB/s). (I had to rearrange three headers files that are needed in sched.h before task_struct is defined, but which include inline functions that cannot now be compiled until after task_struct is defined. I chose not to move them into sched.h, like d_path(), as I don't want to make it more difficult to apply kernel patches to my kernel source tree.) Nigel Gamble ni...@nrg.org Mountain View, CA, USA. http://www.nrg.org/ Patch - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Message-ID: <m14fHk9-001PKgC@mozart> From: Rusty Russell <ru...@rustcorp.com.au> Subject: Re: [PATCH for 2.5] preemptible kernel In-Reply-To: Your message of "Wed, 14 Mar 2001 17:25:22 -0800." <Pine.LNX.4.05.10103141653350.3094-100000@cosmic.nrg.org> Date: Tue, 20 Mar 2001 10:10:04 +0100 Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: 83108.anti-phl.bofh.it Newsgroups: linux.kernel Organization: linux.*_mail_to_news_unidirectional_gateway Path: supernews.google.com!sn-xit-03!supernews.com!bofh.it!robomod References: <Pine.LNX.4.05.10103141653350.3094-100000@cosmic.nrg.org> X-Original-Cc: linux-ker...@vger.kernel.org X-Original-Date: Tue, 20 Mar 2001 19:43:50 +1100 X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: ni...@nrg.org Lines: 80 In message <Pine.LNX.4.05.10103141653350.3094-100...@cosmic.nrg.org> you write: > Kernel preemption is not allowed while spinlocks are held, which means > that this patch alone cannot guarantee low preemption latencies. But > as long held locks (in particular the BKL) are replaced by finer-grained > locks, this patch will enable lower latencies as the kernel also becomes > more scalable on large SMP systems. Hi Nigel, I can see three problems with this approach, only one of which is serious. The first is code which is already SMP unsafe is now a problem for everyone, not just the 0.1% of SMP machines. I consider this a good thing for 2.5 though. The second is that there are "manual" locking schemes which are used in several places in the kernel which rely on non-preemptability; de-facto spinlocks if you will. I consider all these uses flawed: (1) they are often subtly broken anyway, (2) they make reading those parts of the code much harder, and (3) they break when things like this are done. The third is that preemtivity conflicts with the naive quiescent-period approach proposed for module unloading in 2.5, and useful for several other things (eg. hotplugging CPUs). This method relies on knowing that when a schedule() has occurred on every CPU, we know noone is holding certain references. The simplest example is a single linked list: you can traverse without a lock as long as you don't sleep, and then someone can unlink a node, and wait for a schedule on every other CPU before freeing it. The non-SMP case is a noop. See synchonize_kernel() below. This, too, is soluble, but it means that synchronize_kernel() must guarantee that each task which was running or preempted in kernel space when it was called, has been non-preemtively scheduled before synchronize_kernel() can exit. Icky. Thoughts? Rusty. -- Premature optmztion is rt of all evl. --DK /* We could keep a schedule count for each CPU and make idle tasks schedule (some don't unless need_resched), but this scales quite well (eg. 64 processors, average time to wait for first schedule = jiffie/64. Total time for all processors = jiffie/63 + jiffie/62... At 1024 cpus, this is about 7.5 jiffies. And that assumes noone schedules early. --RR */ void synchronize_kernel(void) { unsigned long cpus_allowed, policy, rt_priority; /* Save current state */ cpus_allowed = current->cpus_allowed; policy = current->policy; rt_priority = current->rt_priority; /* Create an unreal time task. */ current->policy = SCHED_FIFO; current->rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO); /* Make us schedulable on all CPUs. */ current->cpus_allowed = (1UL<<smp_num_cpus)-1; /* Eliminate current cpu, reschedule */ while ((current->cpus_allowed &= ~(1 << smp_processor_id())) != 0) schedule(); /* Back to normal. */ current->cpus_allowed = cpus_allowed; current->policy = policy; current->rt_priority = rt_priority; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
X-Mailer: exmh version 2.1.1 10/15/1999 From: Keith Owens <k...@ocs.com.au> Subject: Re: [PATCH for 2.5] preemptible kernel In-Reply-To: Your message of "Tue, 20 Mar 2001 19:43:50 +1100." <m14fHk9-001PKgC@mozart> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 20 Mar 2001 10:50:03 +0100 Message-ID: <851.985080735@ocs3.ocs-net> Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: 16738.anti-phl.bofh.it Newsgroups: linux.kernel Organization: linux.*_mail_to_news_unidirectional_gateway Path: supernews.google.com!sn-xit-03!supernews.com!hermes.visi.com! news-out.visi.com!newspump.sol.net!nntp.msen.com!newsxfer.eecs.umich.edu! news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed00.sul.t-online.de! t-online.de!bofh.it!robomod References: <m14fHk9-001PKgC@mozart> X-Original-Cc: ni...@nrg.org, linux-ker...@vger.kernel.org X-Original-Date: Tue, 20 Mar 2001 20:32:15 +1100 X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: Rusty Russell <ru...@rustcorp.com.au> Lines: 23 On Tue, 20 Mar 2001 19:43:50 +1100, Rusty Russell <ru...@rustcorp.com.au> wrote: >The third is that preemtivity conflicts with the naive >quiescent-period approach proposed for module unloading in 2.5, and >useful for several other things (eg. hotplugging CPUs). This method >relies on knowing that when a schedule() has occurred on every CPU, we >know noone is holding certain references. > >This, too, is soluble, but it means that synchronize_kernel() must >guarantee that each task which was running or preempted in kernel >space when it was called, has been non-preemtively scheduled before >synchronize_kernel() can exit. Icky. The preemption patch only allows preemption from interrupt and only for a single level of preemption. That coexists quite happily with synchronize_kernel() which runs in user context. Just count user context schedules (preempt_count == 0), not preemptive schedules. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Date: Wed, 21 Mar 2001 02:00:04 +0100 From: Nigel Gamble <ni...@nrg.org> Reply-To: ni...@nrg.org Subject: Re: [PATCH for 2.5] preemptible kernel In-Reply-To: <851.985080735@ocs3.ocs-net> Message-ID: <Pine.LNX.4.05.10103201625430.26853-100000@cosmic.nrg.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: 18895.anti-phl.bofh.it Newsgroups: linux.kernel Organization: linux.*_mail_to_news_unidirectional_gateway Path: supernews.google.com!sn-xit-03!supernews.com!bofh.it!robomod References: <851.985080735@ocs3.ocs-net> X-Original-Cc: Rusty Russell <ru...@rustcorp.com.au>, linux-ker...@vger.kernel.org X-Original-Date: Tue, 20 Mar 2001 16:48:01 -0800 (PST) X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: Keith Owens <k...@ocs.com.au> Lines: 29 On Tue, 20 Mar 2001, Keith Owens wrote: > The preemption patch only allows preemption from interrupt and only for > a single level of preemption. That coexists quite happily with > synchronize_kernel() which runs in user context. Just count user > context schedules (preempt_count == 0), not preemptive schedules. I'm not sure what you mean by "only for a single level of preemption." It's possible for a preempting process to be preempted itself by a higher priority process, and for that process to be preempted by an even higher priority one, limited only by the number of processes waiting for interrupt handlers to make them runnable. This isn't very likely in practice (kernel preemptions tend to be rare compared to normal calls to schedule()), but it could happen in theory. If you're looking at preempt_schedule(), note the call to ctx_sw_off() only increments current->preempt_count for the preempted task - the higher priority preempting task that is about to be scheduled will have a preempt_count of 0. Nigel Gamble ni...@nrg.org Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software ni...@mvista.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
-Mailer: exmh version 2.1.1 10/15/1999 From: Keith Owens <k...@ocs.com.au> Subject: Re: [PATCH for 2.5] preemptible kernel In-Reply-To: Your message of "Tue, 20 Mar 2001 16:48:01 -0800." <Pine.LNX.4.05.10103201625430.26853-100000@cosmic.nrg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 21 Mar 2001 02:30:04 +0100 Message-ID: <16074.985137800@kao2.melbourne.sgi.com> Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: 30003.anti-phl.bofh.it Newsgroups: linux.kernel Organization: linux.*_mail_to_news_unidirectional_gateway Path: supernews.google.com!sn-xit-03!supernews.com!bofh.it!robomod References: <Pine.LNX.4.05.10103201625430.26853-100000@cosmic.nrg.org> X-Original-Cc: Rusty Russell <ru...@rustcorp.com.au>, linux-ker...@vger.kernel.org X-Original-Date: Wed, 21 Mar 2001 12:23:20 +1100 X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: ni...@nrg.org Lines: 70 On Tue, 20 Mar 2001 16:48:01 -0800 (PST), Nigel Gamble <ni...@nrg.org> wrote: >On Tue, 20 Mar 2001, Keith Owens wrote: >> The preemption patch only allows preemption from interrupt and only for >> a single level of preemption. That coexists quite happily with >> synchronize_kernel() which runs in user context. Just count user >> context schedules (preempt_count == 0), not preemptive schedules. > >If you're looking at preempt_schedule(), note the call to ctx_sw_off() >only increments current->preempt_count for the preempted task - the >higher priority preempting task that is about to be scheduled will have >a preempt_count of 0. I misread the code, but the idea is still correct. Add a preemption depth counter to each cpu, when you schedule and the depth is zero then you know that the cpu is no longer holding any references to quiesced structures. >So, to make sure I understand this, the code to free a node would look >like: > > prev->next = node->next; /* assumed to be atomic */ > synchronize_kernel(); > free(node); > >So that any other CPU concurrently traversing the list would see a >consistent state, either including or not including "node" before the >call to synchronize_kernel(); but after synchronize_kernel() all other >CPUs are guaranteed to see a list that no longer includes "node", so it >is now safe to free it. > >It looks like there are also implicit assumptions to this approach, like >no other CPU is trying to use the same approach simultaneously to free >"prev". Not quite. The idea is that readers can traverse lists without locks, code that changes the list needs to take a semaphore first. Read node = node->next; Update down(&list_sem); prev->next = node->next; synchronize_kernel(); free(node); up(&list_sem); Because the readers have no locks, other cpus can have references to the node being freed. The updating code needs to wait until all other cpus have gone through at least one schedule to ensure that all dangling references have been flushed. Adding preemption complicates this slightly, we have to distinguish between the bottom level schedule and higher level schedules for preemptive code. Only when all preemptive code on a cpu has ended is it safe to say that there are no dangling references left on that cpu. This method is a win for high read, low update lists. Instead of penalizing the read code every time on the off chance that somebody will update the data, speed up the common code and penalize the update code. The classic example is module code, it is rarely unloaded but right now everything that *might* be entering a module has to grab the module spin lock and update the module use count. So main line code is being slowed down all the time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Date: Wed, 21 Mar 2001 04:50:03 +0100 From: Nigel Gamble <ni...@nrg.org> Reply-To: ni...@nrg.org Subject: Re: [PATCH for 2.5] preemptible kernel In-Reply-To: <16074.985137800@kao2.melbourne.sgi.com> Message-ID: <Pine.LNX.4.05.10103201920410.26853-100000@cosmic.nrg.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: 54379.anti-phl.bofh.it Newsgroups: linux.kernel Organization: linux.*_mail_to_news_unidirectional_gateway Path: supernews.google.com!sn-xit-03!supernews.com!bofh.it!robomod References: <16074.985137800@kao2.melbourne.sgi.com> X-Original-Cc: Rusty Russell <ru...@rustcorp.com.au>, linux-ker...@vger.kernel.org X-Original-Date: Tue, 20 Mar 2001 19:35:17 -0800 (PST) X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: Keith Owens <k...@ocs.com.au> Lines: 21 On Wed, 21 Mar 2001, Keith Owens wrote: > I misread the code, but the idea is still correct. Add a preemption > depth counter to each cpu, when you schedule and the depth is zero then > you know that the cpu is no longer holding any references to quiesced > structures. A task that has been preempted is on the run queue and can be rescheduled on a different CPU, so I can't see how a per-CPU counter would work. It seems to me that you would need a per run queue counter, like the example I gave in a previous posting. Nigel Gamble ni...@nrg.org Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software ni...@mvista.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Message-ID: <3AB860A8.182A10C7@mvista.com> Date: Wed, 21 Mar 2001 09:30:04 +0100 From: george anzinger <geo...@mvista.com> Organization: Monta Vista Software X-Mailer: Mozilla 4.72 [en] (X11; I; Linux 2.2.12-20b i686) X-Accept-Language: en MIME-Version: 1.0 Subject: Re: [PATCH for 2.5] preemptible kernel References: <Pine.LNX.4.05.10103201920410.26853-100000@cosmic.nrg.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: 49545.anti-phl.bofh.it Newsgroups: linux.kernel Path: supernews.google.com!sn-xit-02!supernews.com!news.tele.dk!194.25.134.62! newsfeed00.sul.t-online.de!t-online.de!bofh.it!robomod X-Original-Cc: Keith Owens <k...@ocs.com.au>, Rusty Russell <ru...@rustcorp.com.au>, linux-ker...@vger.kernel.org X-Original-Date: Wed, 21 Mar 2001 00:04:56 -0800 X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: ni...@nrg.org Lines: 30 Nigel Gamble wrote: > > On Wed, 21 Mar 2001, Keith Owens wrote: > > I misread the code, but the idea is still correct. Add a preemption > > depth counter to each cpu, when you schedule and the depth is zero then > > you know that the cpu is no longer holding any references to quiesced > > structures. > > A task that has been preempted is on the run queue and can be > rescheduled on a different CPU, so I can't see how a per-CPU counter > would work. It seems to me that you would need a per run queue > counter, like the example I gave in a previous posting. Exactly so. The method does not depend on the sum of preemption being zip, but on each potential reader (writers take locks) passing thru a "sync point". Your notion of waiting for each task to arrive "naturally" at schedule() would work. It is, in fact, over kill as you could also add arrival at sys call exit as a (the) "sync point". In fact, for module unload, isn't this the real "sync point"? After all, a module can call schedule, or did I miss a usage counter somewhere? By the way, there is a paper on this somewhere on the web. Anyone remember where? George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Message-ID: <3AC1BAD3.BBBD97E1@sequent.com> Date: Wed, 28 Mar 2001 12:30:06 +0200 From: Dipankar Sarma <dipan...@sequent.com> Organization: IBM Linux Technology Center X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i686) X-Accept-Language: en MIME-Version: 1.0 Subject: Re: [PATCH for 2.5] preemptible kernel References: <16074.985137800@kao2.melbourne.sgi.com> <Pine.LNX.4.05.10103201920410.26853-100000@cosmic.nrg.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: 74434.anti-phl.bofh.it Newsgroups: linux.kernel Path: supernews.google.com!sn-xit-03!supernews.com!bofh.it!robomod X-Original-Cc: linux-ker...@vger.kernel.org, mcken...@sequent.com X-Original-Date: Wed, 28 Mar 2001 15:50:03 +0530 X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: ni...@nrg.org Lines: 51 Nigel Gamble wrote: > > On Wed, 21 Mar 2001, Keith Owens wrote: > > I misread the code, but the idea is still correct. Add a preemption > > depth counter to each cpu, when you schedule and the depth is zero then > > you know that the cpu is no longer holding any references to quiesced > > structures. > > A task that has been preempted is on the run queue and can be > rescheduled on a different CPU, so I can't see how a per-CPU counter > would work. It seems to me that you would need a per run queue > counter, like the example I gave in a previous posting. Also, a task could be preempted and then rescheduled on the same cpu making the depth counter 0 (right ?), but it could still be holding references to data structures to be updated using synchronize_kernel(). There seems to be two approaches to tackle preemption - 1. Disable pre-emption during the time when references to data structures updated using such Two-phase updates are held. Pros: easy to implement using a flag (ctx_sw_off() ?) Cons: not so easy to use since critical sections need to be clearly identified and interfaces defined. also affects preemptive behavior. 2. In synchronize_kernel(), distinguish between "natural" and preemptive schedules() and ignore preemptive ones. Pros: easy to use Cons: Not so easy to implement. Also a low priority task that keeps getting preempted often can affect update side performance significantly. I intend to experiment with both to understand the impact. Thanks Dipankar -- Dipankar Sarma (dipan...@sequent.com) IBM Linux Technology Center IBM Software Lab, Bangalore, India. Project Page: http://lse.sourceforge.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Message-ID: <3AC1CF4B.17B29EA4@sequent.com> Date: Wed, 28 Mar 2001 14:00:07 +0200 From: Dipankar Sarma <dipan...@sequent.com> Organization: IBM Linux Technology Center X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i686) X-Accept-Language: en MIME-Version: 1.0 Subject: Re: [PATCH for 2.5] preemptible kernel References: <Pine.LNX.4.05.10103201920410.26853-100000@cosmic.nrg.org> <3AB860A8.182A10C7@mvista.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: 93503.anti-phl.bofh.it Newsgroups: linux.kernel Path: supernews.google.com!sn-xit-03!supernews.com!bofh.it!robomod X-Original-Cc: linux-ker...@vger.kernel.org, mcken...@sequent.com X-Original-Date: Wed, 28 Mar 2001 17:17:23 +0530 X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: george anzinger <geo...@mvista.com> Lines: 55 Hi George, george anzinger wrote: > > Exactly so. The method does not depend on the sum of preemption being > zip, but on each potential reader (writers take locks) passing thru a > "sync point". Your notion of waiting for each task to arrive > "naturally" at schedule() would work. It is, in fact, over kill as you > could also add arrival at sys call exit as a (the) "sync point". In > fact, for module unload, isn't this the real "sync point"? After all, a > module can call schedule, or did I miss a usage counter somewhere? It is certainly possible to implement synchronize_kernel() like primitive for two phase update using "sync point". Waiting for sys call exit will perhaps work in the module unloading case, but there could be performance issues if a cpu spends most of its time in idle task/interrupts. synchronize_kernel() provides a simple generic way of implementing a two phase update without serialization for reading. I am working a "sync point" based version of such an approach available at http://lse.sourceforge.net/locking/rclock.html. It is based on the original DYNIX/ptx stuff that Paul Mckenney developed in early 90s. This and synchronize_kernel() are very similar in approach and each can be implemented using the other. As for handling preemption, we can perhaps try 2 things - 1. The read side of the critical section is enclosed in RC_RDPROTECT()/RC_RDUNPROTECT() which are currently nops. We can disable/enable preemption using these. 2. Avoid counting preemptive context switches. I am not sure about this one though. > > By the way, there is a paper on this somewhere on the web. Anyone > remember where? If you are talking about Paul's paper, the link is http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf. Thanks Dipankar -- Dipankar Sarma (dipan...@sequent.com) IBM Linux Technology Center IBM Software Lab, Bangalore, India. Project Page: http://lse.sourceforge.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Message-ID: <3AC24EB6.1F0DD551@mvista.com> Date: Wed, 28 Mar 2001 23:10:03 +0200 From: george anzinger <geo...@mvista.com> Organization: Monta Vista Software X-Mailer: Mozilla 4.72 [en] (X11; I; Linux 2.2.12-20b i686) X-Accept-Language: en MIME-Version: 1.0 Subject: Re: [PATCH for 2.5] preemptible kernel References: <16074.985137800@kao2.melbourne.sgi.com> <Pine.LNX.4.05.10103201920410.26853-100000@cosmic.nrg.org> <3AC1BAD3.BBBD97E1@sequent.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: 77794.anti-phl.bofh.it Newsgroups: linux.kernel Path: supernews.google.com!sn-xit-03!supernews.com!bofh.it!robomod X-Original-Cc: ni...@nrg.org, linux-ker...@vger.kernel.org, mcken...@sequent.com X-Original-Date: Wed, 28 Mar 2001 12:51:02 -0800 X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: Dipankar Sarma <dipan...@sequent.com> Lines: 55 Dipankar Sarma wrote: > > Nigel Gamble wrote: > > > > On Wed, 21 Mar 2001, Keith Owens wrote: > > > I misread the code, but the idea is still correct. Add a preemption > > > depth counter to each cpu, when you schedule and the depth is zero then > > > you know that the cpu is no longer holding any references to quiesced > > > structures. > > > > A task that has been preempted is on the run queue and can be > > rescheduled on a different CPU, so I can't see how a per-CPU counter > > would work. It seems to me that you would need a per run queue > > counter, like the example I gave in a previous posting. > > Also, a task could be preempted and then rescheduled on the same cpu > making > the depth counter 0 (right ?), but it could still be holding references > to data > structures to be updated using synchronize_kernel(). There seems to be > two > approaches to tackle preemption - > > 1. Disable pre-emption during the time when references to data > structures > updated using such Two-phase updates are held. Doesn't this fly in the face of the whole Two-phase system? It seems to me that the point was to not require any locks. Preemption disable IS a lock. Not as strong as some, but a lock none the less. > > Pros: easy to implement using a flag (ctx_sw_off() ?) > Cons: not so easy to use since critical sections need to be clearly > identified and interfaces defined. also affects preemptive behavior. > > 2. In synchronize_kernel(), distinguish between "natural" and preemptive > schedules() and ignore preemptive ones. > > Pros: easy to use > Cons: Not so easy to implement. Also a low priority task that keeps > getting > preempted often can affect update side performance significantly. Actually is is fairly easy to distinguish the two (see TASK_PREEMPTED in state). Don't you also have to have some sort of task flag that indicates that the task is one that needs to sync? Something that gets set when it enters the area of interest and cleared when it hits the sync point? George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Date: Thu, 29 Mar 2001 11:50:03 +0200 From: Dipankar Sarma <dipan...@sequent.com> Subject: Re: [PATCH for 2.5] preemptible kernel Message-ID: <20010329151330.A7361@in.ibm.com> Reply-To: dipan...@sequent.com References: <16074.985137800@kao2.melbourne.sgi.com> <Pine.LNX.4.05.10103201920410.26853-100000@cosmic.nrg.org> <3AC1BAD3.BBBD97E1@sequent.com> <3AC24EB6.1F0DD551@mvista.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <3AC24EB6.1F0DD551@mvista.com>; from george@mvista.com on Wed, Mar 28, 2001 at 12:51:02PM -0800 Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: 52692.anti-phl.bofh.it Newsgroups: linux.kernel Organization: linux.*_mail_to_news_unidirectional_gateway Path: supernews.google.com!sn-xit-03!supernews.com!bofh.it!robomod X-Original-Cc: Dipankar Sarma <dipan...@sequent.com>, ni...@nrg.org, linux-ker...@vger.kernel.org, mcken...@sequent.com X-Original-Date: Thu, 29 Mar 2001 15:13:30 +0530 X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: george anzinger <geo...@mvista.com> Lines: 75 On Wed, Mar 28, 2001 at 12:51:02PM -0800, george anzinger wrote: > Dipankar Sarma wrote: > > > > Also, a task could be preempted and then rescheduled on the same cpu > > making > > the depth counter 0 (right ?), but it could still be holding references > > to data > > structures to be updated using synchronize_kernel(). There seems to be > > two > > approaches to tackle preemption - > > > > 1. Disable pre-emption during the time when references to data > > structures > > updated using such Two-phase updates are held. > > Doesn't this fly in the face of the whole Two-phase system? It seems to > me that the point was to not require any locks. Preemption disable IS a > lock. Not as strong as some, but a lock none the less. The point is to avoid acquring costly locks, so it is a question of relative cost. Disabling preemption (by an atomic increment) for short critical sections may not be as bad as spin-waiting for highly contended locks or thrashing lock cachelines. > > > > Pros: easy to implement using a flag (ctx_sw_off() ?) > > Cons: not so easy to use since critical sections need to be clearly > > identified and interfaces defined. also affects preemptive behavior. > > > > 2. In synchronize_kernel(), distinguish between "natural" and preemptive > > schedules() and ignore preemptive ones. > > > > Pros: easy to use > > Cons: Not so easy to implement. Also a low priority task that keeps > > getting > > preempted often can affect update side performance significantly. > > Actually is is fairly easy to distinguish the two (see TASK_PREEMPTED in > state). Don't you also have to have some sort of task flag that > indicates that the task is one that needs to sync? Something that gets > set when it enters the area of interest and cleared when it hits the > sync point? None of the two two-phase update implementations (synchronize_kernel()) by Rusty and read-copy update by us, monitor the tasks that require sync for update. synchronize_kernel() forces a schedule on every cpu and read-copy update waits until every cpu goes through a quiscent state, before updating. Both approaches will require significant special handling because they implicitly assume that tasks inside the kernel are bound to the current cpu until it reaches a quiescent state (like a "normal" context switch). Since preempted tasks can run later on any cpu, we will have to keep track of sync points on a per-task basis and that will probably require using a snapshot of the running tasks from the global runqueue. That may not be a good thing from performance standpoint, not to mention the complexity. Also, in situations where read-to-write ratio is not heavily skewed towards read or lots of updates happening, a very low priority task can have a significant impact on performance by getting preempted all the time. Thanks Dipankar -- Dipankar Sarma (dipan...@sequent.com) IBM Linux Technology Center IBM Software Lab, Bangalore, India. Project Page: http://lse.sourceforge.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/