From: p...@3dillusion.com (Paul Miller) Subject: Re: [OFFTOPIC] Very amusing DNS... Date: 1998/06/17 Message-ID: <Pine.LNX.3.96.980616235359.6838A-100000@serv1.3dillusion.com>#1/1 X-Deja-AN: 363415092 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96.980616173734.1139A-100000@scitus> Newsgroups: muc.lists.linux-kernel A couple of days ago, they had a web page up at http://linus.microsoft.com. Guess what it was -- The default page for Apache on a RedHat installation! hmm... I guess microsoft finally decided that windows was too unstable to run. Or, maybe they just wanted to steal some of the source code! -Paul On Tue, 16 Jun 1998, Spirilis wrote: > Hmm... > > <root>:/root# nslookup 131.107.74.11 198.6.1.1 > Server: cache00.ns.uu.net > Address: 198.6.1.1 > > Name: linus.microsoft.com > Address: 131.107.74.11 > > > <root>:/root# nslookup linus.microsoft.com 198.6.1.1 > Server: cache00.ns.uu.net > Address: 198.6.1.1 > > Non-authoritative answer: > Name: linus.microsoft.com > Address: 131.107.74.11 > > I wonder what MS uses that host for? ;-) > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.rutgers.edu > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: barba...@mail.cis.fordham.edu (Anthony Barbachan) Subject: Re: [OFFTOPIC] Very amusing DNS... Date: 1998/06/18 Message-ID: <009b01bd9a8a$e7b67260$04c809c0@Fake.Domain.com>#1/1 X-Deja-AN: 363784533 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner Newsgroups: muc.lists.linux-kernel -----Original Message----- From: Paul Miller <p...@3dillusion.com> To: Spirilis <spiri...@mindmeld.dyn.ml.org> Cc: linux-ker...@vger.rutgers.edu <linux-ker...@vger.rutgers.edu> Date: Tuesday, June 16, 1998 11:27 PM Subject: Re: [OFFTOPIC] Very amusing DNS... > >A couple of days ago, they had a web page up at >http://linus.microsoft.com. Guess what it was -- The default page for >Apache on a RedHat installation! > This could mean that they have finally started porting IE 4.01 to Linux as they have done for Solaris and HPUX. I heard that the IE for UNIX programmers were all (or at least mostly) Linux guys, they may have convinced MS to release IE for Linux. Or they might just have been compiling Apache 1.3.0 with frontpage extensions (and the other bundled utilities) for Linux. If it is IE, the addition of MS as an application provider for Linux should be benifitial to us. >hmm... I guess microsoft finally decided that windows was too unstable to >run. Or, maybe they just wanted to steal some of the source code! > >-Paul > >On Tue, 16 Jun 1998, Spirilis wrote: > >> Hmm... >> >> <root>:/root# nslookup 131.107.74.11 198.6.1.1 >> Server: cache00.ns.uu.net >> Address: 198.6.1.1 >> >> Name: linus.microsoft.com >> Address: 131.107.74.11 >> >> >> <root>:/root# nslookup linus.microsoft.com 198.6.1.1 >> Server: cache00.ns.uu.net >> Address: 198.6.1.1 >> >> Non-authoritative answer: >> Name: linus.microsoft.com >> Address: 131.107.74.11 >> >> I wonder what MS uses that host for? ;-) >> >> >> >> - >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majord...@vger.rutgers.edu >> > > >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to majord...@vger.rutgers.edu > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: n...@bayside.net Subject: Re: [OFFTOPIC] Very amusing DNS... Date: 1998/06/19 Message-ID: <Pine.LNX.3.96.980618213107.2725C-100000@nuklear.steelcity.net>#1/1 X-Deja-AN: 363930472 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <009b01bd9a8a$e7b67260$04c809c0@Fake.Domain.com> Newsgroups: muc.lists.linux-kernel > > > >A couple of days ago, they had a web page up at > >http://linus.microsoft.com. Guess what it was -- The default page for > >Apache on a RedHat installation! > > > > This could mean that they have finally started porting IE 4.01 to Linux as > they have done for Solaris and HPUX. I heard that the IE for UNIX > programmers were all (or at least mostly) Linux guys, they may have > convinced MS to release IE for Linux. Or they might just have been > compiling Apache 1.3.0 with frontpage extensions (and the other bundled > utilities) for Linux. If it is IE, the addition of MS as an application > provider for Linux should be benifitial to us. > > >hmm... I guess microsoft finally decided that windows was too unstable to > >run. Or, maybe they just wanted to steal some of the source code! oh, you haven't read http://www.microsoft.com/ie/unix/devs.htm yet? a quick quote from the page: And the fact is that both Chapman and Dawson [IE4/solaris developers] have grown quite comfortable shuttling back and forth between the worlds of Windows and UNIX. "It's amazing to me how far UNIX has to go today to catch up to NT," says Dawson. "Take, just for one example, threading support. UNIX still has benefits, but NT is just a lot more full-featured." it's good for a laugh, at least :) _ _ __ __ _ _ _ | / |/ /_ __/ /_____ | Nuke Skyjumper | | / / // / '_/ -_) | "Master of the Farce" | |_ /_/|_/\_,_/_/\_\\__/ _|_ n...@bayside.net _| - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: alex.bu...@tahallah.demon.co.uk (Alex Buell) Subject: Re: [OFFTOPIC] Very amusing DNS... Date: 1998/06/18 Message-ID: <35895509.2A79@tahallah.demon.co.uk>#1/1 X-Deja-AN: 363936230 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96.980618213107.2725C-100000@nuklear.steelcity.net> Organization: Advanced Buell Software Engineering Ltd Newsgroups: muc.lists.linux-kernel n...@bayside.net wrote: > And the fact is that both Chapman and Dawson [IE4/solaris developers] > have grown quite comfortable shuttling back and forth between the > worlds of Windows and UNIX. "It's amazing to me how far UNIX has to go > today to catch up to NT," says Dawson. "Take, just for one example, > threading support. UNIX still has benefits, but NT is just a lot more > full-featured." OH HAHAHAHA!!! I haven't laughed so much since the time someone fell on a wall and mangled his private bits. Who are Chapman and Dawson kidding? HAHAHA!! I can't believe these two are Solaris developers and yet come out with this tripe?! -- Cheers, Alex. Watch out, the NSA are everywhere. Your computer must be watched! /\_/\ Legalise cannabis now! ( o.o ) Smoke some cannabis today! > ^ < Peace, Love, Unity and Respect to all. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: [OFFTOPIC] Very amusing DNS... Date: 1998/06/18 Message-ID: <Pine.LNX.3.96dg4.980618112252.15896G-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 363941479 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <35895509.2A79@tahallah.demon.co.uk> Newsgroups: muc.lists.linux-kernel On Thu, 18 Jun 1998, Alex Buell wrote: > n...@bayside.net wrote: > > > And the fact is that both Chapman and Dawson [IE4/solaris developers] > > have grown quite comfortable shuttling back and forth between the > > worlds of Windows and UNIX. "It's amazing to me how far UNIX has to go > > today to catch up to NT," says Dawson. "Take, just for one example, > > threading support. UNIX still has benefits, but NT is just a lot more > > full-featured." > > OH HAHAHAHA!!! I haven't laughed so much since the time someone fell on > a wall and mangled his private bits. Who are Chapman and Dawson kidding? > HAHAHA!! I can't believe these two are Solaris developers and yet come > out with this tripe?! Have you worked with threads under NT and worked with threads under, say, linux? Linux is in the dark ages as far as threads go. There's linuxthreads, but to debug them you need to patch the kernel. You don't get core dumps without another kernel patch. gdb doesn't support it all directly, unless you patch it. None of that has made it into the main distributions. Even with the debugging problems solved, linuxthreads are heavier than solaris pthreads or NT fibers. Both of those use a multiplexed user-level and kernel-level threading system which results in fewer kernel context switches. In userland a "context switch" is just a function call. But we'll see this solved with Netscape's NSPR which was released with mozilla -- it provides a multiplexed threading model (that particular model isn't ported to linux yet). There's a paper from sun regarding solaris pthreads, see <http://www.arctic.org/~dgaudet/apache/2.0/impl_threads.ps.gz> for a copy of it. You may also want to visit the JAWS papers at <http://www.cs.wustl.edu/~jxh/research/research.html> for more discussion on various threading paradigms. Have you read my posts regarding file descriptors and other unix semantics that are "unfortunate" when threading? They're not the end of the world, but it's really obvious once you start digging into things that much of unix was designed with a process in mind. For example, on NT there is absolutely no problem with opening up 10000 files at the same time and holding onto the file handles. This is exactly what's required to build a top end webserver to get winning Specweb96 numbers on NT using TransmitFile. On unix there's no TransmitFile, and instead we end up using mmap() which has performance problems. Even if we had TransmitFile, 10k file descriptors isn't there. "You have to recompile your kernel for that." Uh, no thanks, I have a hard enough time getting webserver reviewers to use the right configuration file, asking them to recompile a kernel is absolutely out of the question. Unix multiplexing facilities -- select and poll -- are wake-all primitives. When something happens, everything waiting is awakened and immediately starts fighting for something to do. What a waste. They make a lot of sense for processes though. On NT completion ports provide wake-one semantics... which are perfect for threads. NT may not be stable, but there's a lot of nice ideas in there. Don't just shoo it away saying "pah, that's microsoft's piece of crap". DEC had their hand in some of the architecture. Dean P.S. And now I'll go ask myself why I'm even responding to an advocacy thread on linux-kernel. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: da...@dm.cobaltmicro.com (David S. Miller) Subject: Thread implementations... Date: 1998/06/19 Message-ID: <199806190241.TAA03833@dm.cobaltmicro.com>#1/1 X-Deja-AN: 364067658 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980618112252.15896G-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel Date: Thu, 18 Jun 1998 11:37:28 -0700 (PDT) From: Dean Gaudet <dgaudet-list-linux-ker...@arctic.org> [ My commented is not directed to Dean or anyone in particular, there were just some things I wanted to state in general wrt. to the issues raised here. ] Even with the debugging problems solved, linuxthreads are heavier than solaris pthreads or NT fibers. Both of those use a multiplexed user-level and kernel-level threading system which results in fewer kernel context switches. In userland a "context switch" is just a function call. But we'll see this solved with Netscape's NSPR which was released with mozilla -- it provides a multiplexed threading model (that particular model isn't ported to linux yet). Making threads under Linux not be multiplexed at the user side was a conscious design decision. Doing it half in user half in kernel (and this is the distinction being mentioned when Solaris nomenclature speaks of kernel bound and non-kernel bound threads) leads to enormous levels of complexity for fundamental things such a signal handling. The folks at Solaris spent a lot of time fixing bugs that were solely getting signals right in their threads implementation. Keeping track of what the kernel sends to a "kernel bound thread" and making sure the right "pure user thread" within gets that signal correctly is tricky buisness. It's complex and hell to get right. (search the Solaris patch databases for "threads" and "signals" to see that I'm for real here about how difficult it is to get right) This is why we do it the way we do it. For example, on NT there is absolutely no problem with opening up 10000 files at the same time and holding onto the file handles. This is exactly what's required to build a top end webserver to get winning Specweb96 numbers on NT using TransmitFile. Yes, I know this. On unix there's no TransmitFile, and instead we end up using mmap() which has performance problems. Even if we had TransmitFile, 10k file descriptors isn't there. One thing to keep in mind when people start howling "xxx OS allows such and such feature and Linux still does not yet, why is it so limited etc.???" Go do a little research, and find out what the cost of 10k file descriptors capability under NT is for processes which don't use nearly that many. I know, without actually being able to look at how NT does it, it's hard to say for sure. But I bet low end processes pay a bit of a price so these high end programs can have the facility. This is the reason Linux is still upcoming with the feature. We won't put it in until we come up with an implementation which costs next to nothing for "normal" programs. "You have to recompile your kernel for that." Uh, no thanks, I have a hard enough time getting webserver reviewers to use the right configuration file, asking them to recompile a kernel is absolutely out of the question. I actually don't tell people to do this. Instead I tell them to find a solution within the current framework, and that what they are after is in fact in the works. If someone can't make it work in the current framework, Linux is not for them at least for now. A bigger danger than losing users or apps for the moment due to missing features, is to mis-design something and end up paying for it forever, this is the path other unixs have gone down. Unix multiplexing facilities -- select and poll -- are wake-all primitives. When something happens, everything waiting is awakened and immediately starts fighting for something to do. What a waste. They make a lot of sense for processes though. On NT completion ports provide wake-one semantics... which are perfect for threads. Yes, this does in fact suck. However, the path to go down is not to expect the way select/poll work to change, rather look at other existing facilities or invent new ones which solve this problem. Too much user code exists which depends upon the wake-all semantics, so the only person to blame is whoever designed the behaviors of these unix operations to begin with ;-) Later, David S. Miller da...@dm.cobaltmicro.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: a...@muc.de (Andi Kleen) Subject: Re: Thread implementations... Date: 1998/06/19 Message-ID: <k27m2egm8h.fsf@zero.aec.at>#1/1 X-Deja-AN: 364077104 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980618112252.15896G-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel "David S. Miller" <da...@dm.cobaltmicro.com> writes: > The folks at Solaris spent a lot of time fixing bugs that were solely > getting signals right in their threads implementation. Keeping track > of what the kernel sends to a "kernel bound thread" and making sure > the right "pure user thread" within gets that signal correctly is > tricky buisness. It's complex and hell to get right. (search the > Solaris patch databases for "threads" and "signals" to see that I'm > for real here about how difficult it is to get right) Linux (LinuxThreads) has is it not really right unfortunately. There is no way to send a signal to a process consisting of multiple threads and it to be delivered to the first thread that has it unblocked (as defined in POSIX) - it will be always delivered to the thread with the pid it was directed to. To fix it CLONE_PID would need to be made fully working. Unfortunately that opens a can of worms - either a new tid is needed (with new system calls etc. - ugly), or the the upper 16bits of pid space are reused - but those are already allocated from Beowulf. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: spiri...@mindmeld.dyn.ml.org (Spirilis) Subject: Re: Thread implementations... Date: 1998/06/19 Message-ID: <Pine.LNX.3.96.980619001045.17049A-100000@mindmeld.dyn.ml.org>#1/1 X-Deja-AN: 364084144 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806190241.TAA03833@dm.cobaltmicro.com> Newsgroups: muc.lists.linux-kernel On Thu, 18 Jun 1998, David S. Miller wrote: > > For example, on NT there is absolutely no problem with opening up > 10000 files at the same time and holding onto the file handles. > This is exactly what's required to build a top end webserver to get > winning Specweb96 numbers on NT using TransmitFile. > > Yes, I know this. Is it not possible to configure Linux to be able to use 10k or greater file descriptors (in 2.1.xxx) by tweaking /proc/sys/fs/file-max and inode-max? (shooting down the earlier comment regarding recompiling the kernel to allow 10k or greater file descriptors...) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: Thread implementations... Date: 1998/06/19 Message-ID: <Pine.LNX.3.96dg4.980618222356.18429D-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 364103122 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806190241.TAA03833@dm.cobaltmicro.com> Newsgroups: muc.lists.linux-kernel On Thu, 18 Jun 1998, David S. Miller wrote: > Date: Thu, 18 Jun 1998 11:37:28 -0700 (PDT) > From: Dean Gaudet <dgaudet-list-linux-ker...@arctic.org> > > [ My commented is not directed to Dean or anyone in particular, > there were just some things I wanted to state in general wrt. > to the issues raised here. ] > > Even with the debugging problems solved, linuxthreads are heavier > than solaris pthreads or NT fibers. Both of those use a > multiplexed user-level and kernel-level threading system which > results in fewer kernel context switches. In userland a "context > switch" is just a function call. But we'll see this solved with > Netscape's NSPR which was released with mozilla -- it provides a > multiplexed threading model (that particular model isn't ported to > linux yet). > > Making threads under Linux not be multiplexed at the user side was a > conscious design decision. Doing it half in user half in kernel (and > this is the distinction being mentioned when Solaris nomenclature > speaks of kernel bound and non-kernel bound threads) leads to enormous > levels of complexity for fundamental things such a signal handling. Sure. If you need signals that sucks. This makes pthreads really hard to split up like this, and I can totally see why linuxthreads is the way it is. But something like NSPR which requires folks to write in a dialect that is portable between unix and NT (and still access performance features on both) doesn't have signals... because asynchronous signalling leads to far too many race conditions and other crap, it's not even considered good programming practice these days. I don't miss it at all. NSPR gives me primitives like PR_Send() which writes data, with a timeout.... which nails the main thing I would use signals for in posix -- for timeouts. (For reference NSPR on linux defaults to single process, multiplexed via poll/select. It can be compiled to use pthreads directly, which also works on linux. It has a hybrid mode that hasn't been ported to linux yet.) > One thing to keep in mind when people start howling "xxx OS allows > such and such feature and Linux still does not yet, why is it so > limited etc.???" Go do a little research, and find out what the cost > of 10k file descriptors capability under NT is for processes which > don't use nearly that many. > > I know, without actually being able to look at how NT does it, it's > hard to say for sure. But I bet low end processes pay a bit of a > price so these high end programs can have the facility. I'm not sure. Did you see my extended file handles proposal? I carefully avoided O(n) crap, I think it can be done O(1) for everything but process destruction (where you have to scan the open descriptors). And the stuff I was proposing is close to what NT provides. But of course it's not POSIX :) Briefly, an extended file handle is a global index, all processes get handles out of this single space. To implement access rights you place an extra field in each file structure, call it file_access_right. Each process also has a file_access_right, they have to compare equal for the handle's use to be permitted. exec() causes a new file_access_right to be selected. fork() uses the same file_access_right (to set up exec), clone() uses the same file_access_right. This is essentially what NT provides. They don't have fork -- when you create a process you explicitly decide which handles will be passed into the new process... and they're given new addresses in the new process. To do that with my scheme you first need to dup an extended fh into a regular handle. NT does that "behind the scenes". > Unix multiplexing facilities -- select and poll -- are wake-all > primitives. When something happens, everything waiting is awakened > and immediately starts fighting for something to do. What a waste. > They make a lot of sense for processes though. On NT completion > ports provide wake-one semantics... which are perfect for threads. > > Yes, this does in fact suck. However, the path to go down is not to > expect the way select/poll work to change, rather look at other > existing facilities or invent new ones which solve this problem. > Too much user code exists which depends upon the wake-all semantics, > so the only person to blame is whoever designed the behaviors of these > unix operations to begin with ;-) Right, I've said before that I don't care what the facility looks like, as long as it provides wake-one :) Dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/19 Message-ID: <199806191136.VAA09491@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 364163551 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806190241.TAA03833@dm.cobaltmicro.com> Newsgroups: muc.lists.linux-kernel David S. Miller writes: > Date: Thu, 18 Jun 1998 11:37:28 -0700 (PDT) > From: Dean Gaudet <dgaudet-list-linux-ker...@arctic.org> [...] > Unix multiplexing facilities -- select and poll -- are wake-all > primitives. When something happens, everything waiting is awakened > and immediately starts fighting for something to do. What a waste. > They make a lot of sense for processes though. On NT completion > ports provide wake-one semantics... which are perfect for threads. > > Yes, this does in fact suck. However, the path to go down is not to > expect the way select/poll work to change, rather look at other > existing facilities or invent new ones which solve this problem. > Too much user code exists which depends upon the wake-all semantics, > so the only person to blame is whoever designed the behaviors of these > unix operations to begin with ;-) On the other hand you could say that the UNIX semantics are fine and are quite scalable, provided you use them sensibly. Some of these "problems" are due to applications not being properly thought out in the first place. If for example you have N threads each polling a chunk of FDs, things can run well, provided you don't have *each* thread polling *all* FDs. Of course, you want to use poll(2) rather than select(2), but other than that the point stands. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: a...@lxorguk.ukuu.org.uk (Alan Cox) Subject: Re: Thread implementations... Date: 1998/06/19 Message-ID: <m0yn0VT-000aOnC@the-village.bc.nu>#1/1 X-Deja-AN: 364179101 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96.980619001045.17049A-100000@mindmeld.dyn.ml.org> Newsgroups: muc.lists.linux-kernel > > 10000 files at the same time and holding onto the file handles. > > This is exactly what's required to build a top end webserver to get > > winning Specweb96 numbers on NT using TransmitFile. > > > > Yes, I know this. > > Is it not possible to configure Linux to be able to use 10k or greater file > descriptors (in 2.1.xxx) by tweaking /proc/sys/fs/file-max and inode-max? > (shooting down the earlier comment regarding recompiling the kernel to allow 10k > or greater file descriptors...) With Bill Hawes patches for handling file arrays it is. For the generic case its not. Note that you can forget using select() with 10K descriptors if you ever want to get any work done. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: abel...@phobos.illtel.denver.co.us (Alex Belits) Subject: Re: Thread implementations... Date: 1998/06/19 Message-ID: <Pine.LNX.3.96.980619055839.26631A-100000@phobos.illtel.denver.co.us>#1/1 X-Deja-AN: 364188372 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806191136.VAA09491@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Fri, 19 Jun 1998, Richard Gooch wrote: > David S. Miller writes: > > Date: Thu, 18 Jun 1998 11:37:28 -0700 (PDT) > > From: Dean Gaudet <dgaudet-list-linux-ker...@arctic.org> > [...] > > Unix multiplexing facilities -- select and poll -- are wake-all > > primitives. When something happens, everything waiting is awakened > > and immediately starts fighting for something to do. What a waste. > > They make a lot of sense for processes though. On NT completion > > ports provide wake-one semantics... which are perfect for threads. > > > > Yes, this does in fact suck. However, the path to go down is not to > > expect the way select/poll work to change, rather look at other > > existing facilities or invent new ones which solve this problem. > > Too much user code exists which depends upon the wake-all semantics, > > so the only person to blame is whoever designed the behaviors of these > > unix operations to begin with ;-) > > On the other hand you could say that the UNIX semantics are fine and > are quite scalable, provided you use them sensibly. Some of these > "problems" are due to applications not being properly thought out in > the first place. #ifdef SARCASM "Thundering Herd Problem II", with all original cast... ;-) This time it's not accept(), but poll(), and the whole thing is multithreaded... #endif > If for example you have N threads each polling a > chunk of FDs, things can run well, provided you don't have *each* > thread polling *all* FDs. Of course, you want to use poll(2) rather > than select(2), but other than that the point stands. Can anyone provide a clear explanation, what is the benefit of doing that in multiple threads vs. having one thread polling everything, if the response on fd status change takes negligible time for the thread/process that is polling them (other processes complete the operation while polling comtinues)? I have a server that uses separate process mostly for polling, however I'm not sure what poll()/select() scalability problems it may encounter if used with huge fd number. -- Alex - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: da...@dm.cobaltmicro.com (David S. Miller) Subject: Re: Thread implementations... Date: 1998/06/19 Message-ID: <199806191311.GAA14665@dm.cobaltmicro.com>#1/1 X-Deja-AN: 364193482 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96.980619055839.26631A-100000@phobos.illtel.denver.co.us> Newsgroups: muc.lists.linux-kernel Date: Fri, 19 Jun 1998 06:11:10 -0700 (PDT) From: Alex Belits <abel...@phobos.illtel.denver.co.us> Can anyone provide a clear explanation, what is the benefit of doing that in multiple threads vs. having one thread polling everything, if the response on fd status change takes negligible time for the thread/process that is polling them (other processes complete the operation while polling comtinues)? I have a server that uses separate process mostly for polling, however I'm not sure what poll()/select() scalability problems it may encounter if used with huge fd number. I look at it this way. If you can divide the total set of fd's logically into seperate groups, one strictly to a particular thread. Do it this way. The problem with one thread polling all fd's and passing event notification to threads via some other mechanism has the problem that this one thread becomes the bottle neck. The problem, for one, with web etc. servers is the incoming connection socket. If you could tell select/poll "hey, when a new conn comes in, wake up one of us", poof this issue would be solved. However the defined semantics for these interfaces says to wake everyone polling on it up. Later, David S. Miller da...@dm.cobaltmicro.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: rjo...@orchestream.com (Richard Jones) Subject: Re: Thread implementations... Date: 1998/06/19 Message-ID: <358A7A62.BCE36B77@orchestream.com>#1/1 X-Deja-AN: 364218529 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96.980619055839.26631A-100000@phobos.illtel.denver.co.us> Organization: Orchestream Ltd. Newsgroups: muc.lists.linux-kernel David S. Miller wrote: > The problem, for one, with web etc. servers is the incoming connection > socket. If you could tell select/poll "hey, when a new conn comes in, > wake up one of us", poof this issue would be solved. However the > defined semantics for these interfaces says to wake everyone polling > on it up. Apache handles this very nicely. It runs a group of processes, and each *blocks* on accept(2). When a new connection comes in, the kernel wakes up one, which handles that socket alone, using blocking I/O (it uses alarm(2) to do timeouts). This way they avoid the poll/select issue entirely. [This applies to Apache 1.2, not sure about later versions] Rich. -- Richard Jones rjo...@orchestream.com Tel: +44 171 598 7557 Fax: 460 4461 Orchestream Ltd. 125 Old Brompton Rd. London SW7 3RP PGP: www.four11.com "boredom ... one of the most overrated emotions ... the sky is made of bubbles ..." Original message content Copyright © 1998 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: f...@omnicron.com (Mike Ford Ditto) Subject: Re: Thread implementations... Date: 1998/06/19 Message-ID: <358AB36A.30CD@yoda.omnicron.com>#1/1 X-Deja-AN: 364284416 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <358A7A62.BCE36B77@orchestream.com> Newsgroups: muc.lists.linux-kernel > > The problem, for one, with web etc. servers is the incoming connection > > socket. If you could tell select/poll "hey, when a new conn comes in, > > wake up one of us", poof this issue would be solved. However the > > defined semantics for these interfaces says to wake everyone polling > > on it up. > > Apache handles this very nicely. It runs a group of processes, > and each *blocks* on accept(2). When a new connection comes in, > the kernel wakes up one, which handles that socket alone, using > blocking I/O (it uses alarm(2) to do timeouts). This demonstrates the point that select and poll are workarounds for the lack of threading support in Unix. They aren't needed if you use a threads facility (or a separate process for each thread you need). Once you have threads you can stick to the intuitive synchronous model of system calls, which has always effectively handled waking one of multiple waiters. Off topic, I would like to pick a nit: accept() is a system call. accept(2) is not a system call, it is a manual page. One doesn't block on accept(2), one *reads* accept(2) to find out how to use accept(). -=] Ford [=- "Heaven is exactly like where you (In Real Life: Mike Ditto) are right now, only much, much better." f...@omnicron.com -- Laurie Anderson http://www.omnicron.com/~ford/ford.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: a...@lxorguk.ukuu.org.uk (Alan Cox) Subject: Re: Thread implementations... Date: 1998/06/19 Message-ID: <m0yn6zH-000aOpC@the-village.bc.nu>#1/1 X-Deja-AN: 364289956 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <358AB36A.30CD@yoda.omnicron.com> Newsgroups: muc.lists.linux-kernel > > the kernel wakes up one, which handles that socket alone, using > > blocking I/O (it uses alarm(2) to do timeouts). > > This demonstrates the point that select and poll are workarounds for > the lack of threading support in Unix. They aren't needed if you use > a threads facility (or a separate process for each thread you need). Actually select and poll are more efficient ways of describing most multiple source event models without the overhead of threads. And there are plenty of cases where each one is better. Select is clearly a better model for inetd for example. > accept() is a system call. accept(2) is not a system call, it is a > manual page. One doesn't block on accept(2), one *reads* accept(2) > to find out how to use accept(). Using accept(2) to indicate you are talking about the system call goes back to at least my student days read comp.unix.wizards Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: Thread implementations... Date: 1998/06/20 Message-ID: <Pine.LNX.3.96dg4.980619181258.29884c-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 364363398 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806191136.VAA09491@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Fri, 19 Jun 1998, Richard Gooch wrote: > On the other hand you could say that the UNIX semantics are fine and > are quite scalable, provided you use them sensibly. Some of these > "problems" are due to applications not being properly thought out in > the first place. If for example you have N threads each polling a > chunk of FDs, things can run well, provided you don't have *each* > thread polling *all* FDs. Of course, you want to use poll(2) rather > than select(2), but other than that the point stands. You may not be able to exploit the parallism available in the hardware unless you can "load balance" the descriptors well enough... Dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/20 Message-ID: <199806200952.TAA16430@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 364448988 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980619181258.29884c-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel Dean Gaudet writes: > > On Fri, 19 Jun 1998, Richard Gooch wrote: > > > On the other hand you could say that the UNIX semantics are fine and > > are quite scalable, provided you use them sensibly. Some of these > > "problems" are due to applications not being properly thought out in > > the first place. If for example you have N threads each polling a > > chunk of FDs, things can run well, provided you don't have *each* > > thread polling *all* FDs. Of course, you want to use poll(2) rather > > than select(2), but other than that the point stands. > > You may not be able to exploit the parallism available in the hardware > unless you can "load balance" the descriptors well enough... Use 10 threads. Seems to me that would provide reasonable load balancing. And increasing that to 100 threads would be even better. The aim is to ensure that, statistically, most threads will remain sleeping for several clock ticks. With a bit of extra work you could even slowly migrate consistently active FDs to one or a few threads. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: l...@bitmover.com (Larry McVoy) Subject: Re: Thread implementations... Date: 1998/06/20 Message-ID: <199806201951.MAA30491@bitmover.com>#1/1 X-Deja-AN: 364548603 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner Newsgroups: muc.lists.linux-kernel : Even with the debugging problems solved, linuxthreads are heavier : than solaris pthreads or NT fibers. So how about quantifying that a bit and show us some numbers and how they affect things in real life? : Unix multiplexing facilities -- select and poll -- are wake-all : primitives. When something happens, everything waiting is awakened : and immediately starts fighting for something to do. What a waste. : They make a lot of sense for processes though. On NT completion : ports provide wake-one semantics... which are perfect for threads. : : Yes, this does in fact suck. However, the path to go down is not to : expect the way select/poll work to change, rather look at other : existing facilities or invent new ones which solve this problem. : Too much user code exists which depends upon the wake-all semantics, Hmm. SGI changed accept() from wakeup-all to wakeup-one with no problem. I'd be interested in knowing which programs depend on the race. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: l...@bitmover.com (Larry McVoy) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <199806210128.SAA31866@bitmover.com>#1/1 X-Deja-AN: 364611098 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner Newsgroups: muc.lists.linux-kernel : This demonstrates the point that select and poll are workarounds for : the lack of threading support in Unix. They aren't needed if you use : a threads facility (or a separate process for each thread you need). : : Once you have threads you can stick to the intuitive synchronous model : of system calls, which has always effectively handled waking one of : multiple waiters. There are a number of people, usually systems / kernel types, who realize that multiple threads/processes can have a severe negative effect on performance, especially when you are trying to make things fit in a small processor cache. Event driven programming tends to use less system resources than threaded programming. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: Thread implementations... Date: 1998/06/20 Message-ID: <Pine.LNX.3.96dg4.980620132805.15494K-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 364552022 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806200952.TAA16430@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Sat, 20 Jun 1998, Richard Gooch wrote: > Dean Gaudet writes: > > > > On Fri, 19 Jun 1998, Richard Gooch wrote: > > > > > On the other hand you could say that the UNIX semantics are fine and > > > are quite scalable, provided you use them sensibly. Some of these > > > "problems" are due to applications not being properly thought out in > > > the first place. If for example you have N threads each polling a > > > chunk of FDs, things can run well, provided you don't have *each* > > > thread polling *all* FDs. Of course, you want to use poll(2) rather > > > than select(2), but other than that the point stands. > > > > You may not be able to exploit the parallism available in the hardware > > unless you can "load balance" the descriptors well enough... > > Use 10 threads. Seems to me that would provide reasonable load > balancing. And increasing that to 100 threads would be even better. No it wouldn't. 100 kernel-level threads is overkill. Unless your box can do 100 things at a time there's no benefit from giving the kernel 100 objects to schedule. 10 is a much more reasonable number, and even that may be too high. You only need as many kernel threads as there is parallelism to exploit in the hardware. Everything else can, and should, happen in userland where timeslices can be maximized and context switches minimized. > The aim is to ensure that, statistically, most threads will remain > sleeping for several clock ticks. What? If I am wasting system memory for a kernel-level thread I'm not going to go about ensuring that it remains asleep! no way. I'm going to use each and every time slice to its fullest -- because context switches have a non-zero cost, it may be small, but it is non-zero. > With a bit of extra work you could even slowly migrate consistently > active FDs to one or a few threads. But migrating them costs you extra CPU time. That's time that strictly speaking, which does not need to be spent. NT doesn't have to spend this time when using completion ports (I'm sounding like a broken record). Look at this another way. If I'm using poll() to implement something, then I typically have a structure that describes each FD and the state it is in. I'm always interested in whether that FD is ready for read or write. When it is ready I'll do some processing, modify the state, read/write something, and then do nothing with it until it is ready again. To do this I list for the kernel all the FDs and call poll(). Then the kernel goes around and polls everything. For many descriptors (i.e. slow long haul internet clients) this is a complete waste. There are two approaches I've seen to deal with this: - don't poll everything as frequently, do complex migration between different "pools" sorted by how active the FD is. This reduces the number of times slow sockets are polled. This is a win, but I feel it is far too complex (read: easy to get wrong). - let the kernel queue an event when the FD becomes ready. So rather than calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD basis "when this is ready for read/write queue an event on this pipe, and could you please hand me back this void * with it? thanks". In this model when a write() returns EWOULDBLOCK the kernel implicitly sets that FD up as "waiting for write", similarly for a read(). This means that no matter what speed the socket is, it won't be polled and no complex dividing of the FDs into threads needs to be done. The latter model is a lot like completion ports... but probably far easier to implement. When the kernel changes an FD in a way that could cause it to become ready for read or write it checks if it's supposed to queue an event. If the event queue becomes full the kernel should queue one event saying "event queue full, you'll have to recover in whatever way you find suitable... like use poll()". Dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: Thread implementations... Date: 1998/06/20 Message-ID: <Pine.LNX.3.96dg4.980620142324.15494N-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 364563215 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806201951.MAA30491@bitmover.com> Newsgroups: muc.lists.linux-kernel On Sat, 20 Jun 1998, Larry McVoy wrote: > : Even with the debugging problems solved, linuxthreads are heavier > : than solaris pthreads or NT fibers. > > So how about quantifying that a bit and show us some numbers and how they > affect things in real life? As a matter of fact I can quantify this somewhat. NSPR provides two modes of operation on linux -- one uses pthreads, the other users a portable userland threads library (the standard setjmp/longjmp deal although it uses sigsetjmp/siglongjmp, and needs a little more optimization). I've ported apache 1.3 to NSPR as an experiment for future versions of apache. I built the non-debugging versions of the NSPR library, linked my apache-nspr code against it, and set up a rather crude benchmark. % dd if=/dev/zero of=htdocs/6k bs=1024 count=6 (the squid folks used to tell me 6k was the average object size on the net, maybe the number is different these days) % zb 127.0.0.1 /6k -p 8080 -c 10 -t 10 -k (this is zeusbench asking for the 6k document, 10 simultaneous clients (it uses select to multiplex), run for 10 seconds, use keep-alive persistent http connections) With pthreads it achieves 811 req/s. With user threads it achieves 1024.40 req/s. The machine is a single cpu ppro 200 with 128Mb of RAM running 2.1.104. Caveats: While NSPR has been designed extremely well, and the interfaces don't show any immediate problems with doing underlying optimizations, it's certainly not top speed yet. This applies in both cases however. NSPR has a hybrid user/system model that lives on top of pthreads, I haven't tried it yet (it's not ported to linux according to the docs). I can do comparisons with the process-model based apache, and I used to have a native pthreads port of apache... but the latter is out of date now because I switched my efforts to NSPR in order to have greater portability (including win32). Larry does lmbench have a threads component that can benchmark different threads libraries easily? I have to admit I'm not terribly familiar with lmbench... but if you've got some benchmarks you'd like me to run I can try them. Or you can try them -- NSPR comes with mozilla, after downloading the tarball, "cd mozilla/nsprpub", then do "make BUILD_OPT=1" to get the user-threads version, and do "make BUILD_OPT=1 USE_PTHREADS=1" to get the pthreads version. Dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: da...@dm.cobaltmicro.com (David S. Miller) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <199806210213.TAA02349@dm.cobaltmicro.com>#1/1 X-Deja-AN: 364616896 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980620142324.15494N-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel Date: Sat, 20 Jun 1998 14:37:36 -0700 (PDT) From: Dean Gaudet <dgaudet-list-linux-ker...@arctic.org> With pthreads it achieves 811 req/s. With user threads it achieves 1024.40 req/s. The machine is a single cpu ppro 200 with 128Mb of RAM running 2.1.104. If you have the opportunity, perform the same benchmark on an architecture that implements context pids in the TLB. The entire TLB is for all intents and purposes, flushed entirely of all userland translations for even thread context switches. Later, David S. Miller da...@dm.cobaltmicro.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: mi...@valerie.inf.elte.hu (MOLNAR Ingo) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <Pine.GSO.3.96.980621045311.25881B-100000@valerie.inf.elte.hu>#1/1 X-Deja-AN: 364627002 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806210213.TAA02349@dm.cobaltmicro.com> Reply-To: MOLNAR Ingo <mi...@valerie.inf.elte.hu> Newsgroups: muc.lists.linux-kernel On Sat, 20 Jun 1998, David S. Miller wrote: > With pthreads it achieves 811 req/s. > With user threads it achieves 1024.40 req/s. > > The machine is a single cpu ppro 200 with 128Mb of RAM running 2.1.104. > > If you have the opportunity, perform the same benchmark on an > architecture that implements context pids in the TLB. The entire TLB > is for all intents and purposes, flushed entirely of all userland > translations for even thread context switches. on x86 it is not flushed across thread-thread switches ... and on a PPro, parts of the TLB are tagged as 'global' (kernel pages obviously), which keeps the TLB-lossage even across non-shared-VM threads small. (zb->apache and apache->zb switches in this case). one thing i noticed about LinuxThreads, the most 'out of balance' basic pthreads operation in pthread_create(). Does NSPR create a pre-allocated pool of threads? (or some kind of adaptive pool?) If it's creating threads heavily (say per-request), then thats bad, at least with the current LinuxThreads implementation. We have a 1:5 gap between the latency of clone() and pthread_create() there... -- mingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: da...@dm.cobaltmicro.com (David S. Miller) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <199806210312.UAA02799@dm.cobaltmicro.com>#1/1 X-Deja-AN: 364627003 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.GSO.3.96.980621045311.25881B-100000@valerie.inf.elte.hu> Newsgroups: muc.lists.linux-kernel Date: Sun, 21 Jun 1998 05:03:29 +0200 (MET DST) From: MOLNAR Ingo <mi...@valerie.inf.elte.hu> on x86 it is not flushed across thread-thread switches ... and on a PPro, parts of the TLB are tagged as 'global' (kernel pages obviously), which keeps the TLB-lossage even across non-shared-VM threads small. (zb->apache and apache->zb switches in this case). I assumed that TSS switches were defined to reload csr3, which by definition flushes the TLB of user entires. Later, David S. Miller da...@dm.cobaltmicro.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: mi...@valerie.inf.elte.hu (MOLNAR Ingo) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <Pine.GSO.3.96.980621051437.25881D-100000@valerie.inf.elte.hu>#1/1 X-Deja-AN: 364627004 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806210312.UAA02799@dm.cobaltmicro.com> Newsgroups: muc.lists.linux-kernel On Sat, 20 Jun 1998, David S. Miller wrote: > I assumed that TSS switches were defined to reload csr3, which by > definition flushes the TLB of user entires. it does have a 'short-cut' in the microcode, it does not flush the TLB if cr3(A) == cr3(B) ... ugly :( -- mingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: da...@dm.cobaltmicro.com (David S. Miller) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <199806210320.UAA02864@dm.cobaltmicro.com>#1/1 X-Deja-AN: 364630047 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806210312.UAA02799@dm.cobaltmicro.com> Newsgroups: muc.lists.linux-kernel Date: Sat, 20 Jun 1998 20:12:35 -0700 From: "David S. Miller" <da...@dm.cobaltmicro.com> I assumed that TSS switches were defined to reload csr3, which by definition flushes the TLB of user entires. Thats broken, not because it's a silly workaround for the Intel TLB mis-design, but rather because it changes behavior from what older CPU's did. So if someone optimized things to defer TLB flushes for mapping changes, when they knew they would task switch once before running the task again, this "microcode optimization" would break the behavior such a trick would depend upon. Later, David S. Miller da...@dm.cobaltmicro.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: mi...@valerie.inf.elte.hu (MOLNAR Ingo) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <Pine.GSO.3.96.980621052422.25881E-100000@valerie.inf.elte.hu>#1/1 X-Deja-AN: 364630046 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806210320.UAA02864@dm.cobaltmicro.com> Newsgroups: muc.lists.linux-kernel On Sat, 20 Jun 1998, David S. Miller wrote: > I assumed that TSS switches were defined to reload csr3, which by > definition flushes the TLB of user entires. > > Thats broken, not because it's a silly workaround for the Intel TLB > mis-design, but rather because it changes behavior from what older > CPU's did. So if someone optimized things to defer TLB flushes for > mapping changes, when they knew they would task switch once before > running the task again, this "microcode optimization" would break the > behavior such a trick would depend upon. unless this deferred TLB flush feature gets into 2.1, i plan on making a new version of the softswitch stuff (that replaces TSS switching) for 2.3, which should give us more pronounced control over TLB flushes and more ... -- mingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <199806210320.NAA20480@vindaloo.atnf.CSIRO.AU> X-Deja-AN: 364630052 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980620132805.15494K-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel Dean Gaudet writes: > > > On Sat, 20 Jun 1998, Richard Gooch wrote: [...] > > Use 10 threads. Seems to me that would provide reasonable load > > balancing. And increasing that to 100 threads would be even better. > > No it wouldn't. 100 kernel-level threads is overkill. Unless your box > can do 100 things at a time there's no benefit from giving the kernel 100 > objects to schedule. 10 is a much more reasonable number, and even that > may be too high. You only need as many kernel threads as there is > parallelism to exploit in the hardware. Everything else can, and should, > happen in userland where timeslices can be maximized and context switches > minimized. > > > The aim is to ensure that, statistically, most threads will remain > > sleeping for several clock ticks. > > What? If I am wasting system memory for a kernel-level thread I'm not > going to go about ensuring that it remains asleep! no way. I'm going to > use each and every time slice to its fullest -- because context switches > have a non-zero cost, it may be small, but it is non-zero. The point is that *most* FDs are inactive. If in every timeslice you have only 5 active FDs (taken from a uniform random distribution), then with 10 threads only half of those are woken up. Hence only half the number of FDs have to be scanned when these threads have processed the activity. For 1000 FDs, then is a saving of 500 FD scans, which is 1.5 ms. So scanning load has gone from 30% to 15% (10 ms timeslice). Also note that only 5 threads are woken up (scheduled), the other 5 remain asleep. Now lets look at 100 threads. With 5 active FDs, you still get at most 5 threads woken up. But now FD scanning after processing activity drops to a total of 50 FDs. So scanning load (per timeslice!) has dropped to 150 us. So compared with the 10 thread case, we have saved 1.35 ms of FD scanning time. Compared with the 1 thread case, we have saved 2.85 ms of scanning time (as always, per 10 ms timeslice). In other words, only 0.15% scanning load. And still we are only scheduling 5 threads *this timeslice*! I don't know why you care so much about context switches: the time taken for select(2) or poll(2) for many FDs is dominant! Just how much time do you think scheduling is taking??? > > With a bit of extra work you could even slowly migrate consistently > > active FDs to one or a few threads. > > But migrating them costs you extra CPU time. That's time that strictly > speaking, which does not need to be spent. NT doesn't have to spend this > time when using completion ports (I'm sounding like a broken record). Migration is pretty cheap: it's a matter of swapping some entries in a table. And migration only happens upon FD activity. Adding a few extra microseconds for migration is peanuts compared with the time taken to process a datagram. > Look at this another way. If I'm using poll() to implement something, > then I typically have a structure that describes each FD and the state it > is in. I'm always interested in whether that FD is ready for read or > write. When it is ready I'll do some processing, modify the state, > read/write something, and then do nothing with it until it is ready again. Yep, fine. My conceptual model is that I call a callback for each active FD. Same thing. > To do this I list for the kernel all the FDs and call poll(). Then the > kernel goes around and polls everything. For many descriptors (i.e. slow > long haul internet clients) this is a complete waste. There are two > approaches I've seen to deal with this: > > - don't poll everything as frequently, do complex migration between > different "pools" sorted by how active the FD is. This reduces the number > of times slow sockets are polled. This is a win, but I feel it is far too > complex (read: easy to get wrong). It only needs to be done "right" once. In a library. Heck, I might even modify my own FD management library code to do this just to prove the point. Write once, use many! Note that even the "complex" migration is optional: simply dividing up FDs equally between N threads is a win. Having migration between a small number of threads is going to be a *real* win. > - let the kernel queue an event when the FD becomes ready. So rather than > calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD > basis "when this is ready for read/write queue an event on this pipe, and > could you please hand me back this void * with it? thanks". In this > model when a write() returns EWOULDBLOCK the kernel implicitly sets that > FD up as "waiting for write", similarly for a read(). This means that no > matter what speed the socket is, it won't be polled and no complex > dividing of the FDs into threads needs to be done. I think this will be more complex to implement than a small userspace library that uses a handful of threads. > The latter model is a lot like completion ports... but probably far easier > to implement. When the kernel changes an FD in a way that could cause it > to become ready for read or write it checks if it's supposed to queue an > event. If the event queue becomes full the kernel should queue one event > saying "event queue full, you'll have to recover in whatever way you find > suitable... like use poll()". This involves kernel bloat. It seems to me that there is such a simple userspace solution, so why bother hacking the kernel? I'd much rather hack the kernel to speed up select(2) and poll(2) a few times. This benefits all existing Linux/UNIX programmes. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <Pine.LNX.3.96dg4.980621134214.28501J-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 364786346 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806210320.NAA20480@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Sun, 21 Jun 1998, Richard Gooch wrote: > Just how much time do you think scheduling is taking??? I care more about cache pollution. That is a side-effect of context-switching which isn't entirely obvious from the context-switch cost itself. > It only needs to be done "right" once. In a library. Heck, I might > even modify my own FD management library code to do this just to prove > the point. Write once, use many! > Note that even the "complex" migration is optional: simply dividing up > FDs equally between N threads is a win. > Having migration between a small number of threads is going to be a > *real* win. Right, and if you'll release this in a license other than GPL (i.e. LGPL or MPL) so that it can be reused in non-GPL code (i.e. NSPR which is NPL), that would be most excellent. (acronyms rewl). > This involves kernel bloat. It seems to me that there is such a simple > userspace solution, so why bother hacking the kernel? I don't think the userspace solution is as fast as the event queue solution. Dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: ko...@jagunet.com (John Kodis) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <19980621103913.09836@jagunet.com>#1/1 X-Deja-AN: 364714846 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806200952.TAA16430@vindaloo.atnf.CSIRO.AU> Reply-To: ko...@jagunet.com Newsgroups: muc.lists.linux-kernel On Sat, Jun 20, 1998 at 01:49:50PM -0700, Dean Gaudet wrote: > - let the kernel queue an event when the FD becomes ready. So rather than > calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD > basis "when this is ready for read/write queue an event on this pipe, and > could you please hand me back this void * with it? thanks". Yow! Shades of VMS! This sounds very much like the VMS Async System Trap mechanism that allowed you to perform a queued IO operation using a call something like: status = sys$qio( READ_OPCODE, fd, buffer, sizeof(buffer), <lots of other parameters that I've long since forgotten>, ast_function, ast_parameter, ...); The read would get posted, and when complete the ast_function would get called with the ast_parameter in the context of the process that posted the QIO. This provided a powerful and easy-to-use method of dealing with async IO. It's one of the few VMS features that I wish Unix supported. -- John Kodis. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: groud...@club-internet.fr (Gerard Roudier) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <Pine.LNX.3.95.980621171244.475A-100000@localhost>#1/1 X-Deja-AN: 364743618 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <19980621103913.09836@jagunet.com> Newsgroups: muc.lists.linux-kernel On Sun, 21 Jun 1998, John Kodis wrote: > On Sat, Jun 20, 1998 at 01:49:50PM -0700, Dean Gaudet wrote: > > > - let the kernel queue an event when the FD becomes ready. So rather than > > calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD > > basis "when this is ready for read/write queue an event on this pipe, and > > could you please hand me back this void * with it? thanks". > > Yow! Shades of VMS! This sounds very much like the VMS Async System > Trap mechanism that allowed you to perform a queued IO operation using > a call something like: > > status = sys$qio( > READ_OPCODE, fd, buffer, sizeof(buffer), > <lots of other parameters that I've long since forgotten>, > ast_function, ast_parameter, ...); > > The read would get posted, and when complete the ast_function would > get called with the ast_parameter in the context of the process that > posted the QIO. This provided a powerful and easy-to-use method of > dealing with async IO. It's one of the few VMS features that I wish > Unix supported. RSX and friends (IAS, ...) already had such a feature. With such a mechanism, application programs get IO completion (software) interrupt as the kernel get completion interrupt from the hardware. DEC O/Ses have had AST mechanisms for years without offering threads. Speaking about VMS, you can pass data (or event) using interlocked queues between AST and process and between processes using shared memory and so you donnot need to use critical sections for synchonizing data or event passing. No need to use several threads sharing a process address space to make things rights. Using multi-threading into a single process context is, IMO, just importing into user-land kernel-like problems and providing such a feature complexifies significantly the involved kernel. Multi-threading into processes is not the way to go, IMO, especially if you want to be portable across platforms. If one really need to use threads, then, one of the following is true, in my opinion: - One likes complexity since one is stupid as most programmers. - One's O/S handles processes as bloat entities. - One has heared too much O/S 2 lovers. - One is believing that MicroSoft-BASIC is multi-threaded. There is probably lots of OS2 multi-threaded programs that can only be broken on SMP, since I often heared OS2 multi-braindeaded programmers assuming that threads inside a process are only preempted when they call a system service. I have written and maintained lots of application programs under VMS, UNIX, some have been ported to a dozen of O/S, none of them uses threads. I donnot envision to use multi-threads in application software and I donnot want to have to deal with applications that uses this, for the simple reasons that threads semantics differs too much between operating systems and that application programs are often large programs that donnot follow the high level of quality of O/S softwares. Traditionnal UNIXes used light processes and preferently blocking I/Os. Signals were preferently for error conditions. The select() semantic has been a hack that has been very usefull for implementing event-driven applications using a low number of fds, as the X Server. Trying to use such a semantic to deal with thousands of handles can only lead to performance problems. This is trivial. Regards, Gerard. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: a...@lxorguk.ukuu.org.uk (Alan Cox) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <m0yntEW-000aOnC@the-village.bc.nu>#1/1 X-Deja-AN: 364820020 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980621134214.28501J-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel > > This involves kernel bloat. It seems to me that there is such a simple > > userspace solution, so why bother hacking the kernel? > > I don't think the userspace solution is as fast as the event queue > solution. I think thats pretty obvious. Select() is an event queue mechanism which does a setup for each select(). Asynchronous I/O has some similar properties (clone, I/O , signal) but is only per handle. A pure event queue model does one setup per handle only per handle that matters and not per event setups. You just get the queue overheads - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/21 Message-ID: <199806212338.JAA26410@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 364833942 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <m0yntEW-000aOnC@the-village.bc.nu> Newsgroups: muc.lists.linux-kernel Alan Cox writes: > > > This involves kernel bloat. It seems to me that there is such a simple > > > userspace solution, so why bother hacking the kernel? > > > > I don't think the userspace solution is as fast as the event queue > > solution. > > I think thats pretty obvious. Select() is an event queue mechanism which > does a setup for each select(). Asynchronous I/O has some similar > properties (clone, I/O , signal) but is only per handle. A pure event > queue model does one setup per handle only per handle that matters and > not per event setups. You just get the queue overheads The point is a good userspace solution should be *fast enough*. I define "fast enough" to be "such that polling overheads contribute less than 10% of the application load". Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/22 Message-ID: <199806220715.RAA28264@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 364905841 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980621134214.28501J-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel I've written a document that tries to cover the various issues with I/O events. Check out: http://www.atnf.csiro.au/~rgooch/linux/docs/io-events.html Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: Thread implementations... Date: 1998/06/22 Message-ID: <Pine.LNX.3.96dg4.980622004318.19675P-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 364909153 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806220715.RAA28264@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel Note that the poll_ctrl you introduce in <ftp://ftp.atnf.csiro.au/pub/people/rgooch/linux/kernel-patches/v2.1/fastpoll-readme> is almost all the work required for a completion queue. The additional code required is to add "void *user_data; int completion_fd;" to the event structure. If the low level code is smart enough to fill in your events structure it's smart enough to plop a word into a pipe when necessary. So are you sure it'd be too much bloat to do completion queues? :) Dean On Mon, 22 Jun 1998, Richard Gooch wrote: > I've written a document that tries to cover the various issues with > I/O events. Check out: > http://www.atnf.csiro.au/~rgooch/linux/docs/io-events.html > > Regards, > > Richard.... > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/22 Message-ID: <199806220753.RAA28663@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 364911751 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980622004318.19675P-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel Dean Gaudet writes: > Note that the poll_ctrl you introduce in > > <ftp://ftp.atnf.csiro.au/pub/people/rgooch/linux/kernel-patches/v2.1/fastpoll-readme> Hey! Someone's already read it:-) > is almost all the work required for a completion queue. The additional > code required is to add "void *user_data; int completion_fd;" to the event > structure. If the low level code is smart enough to fill in your events > structure it's smart enough to plop a word into a pipe when necessary. So > are you sure it'd be too much bloat to do completion queues? :) > > On Mon, 22 Jun 1998, Richard Gooch wrote: > > > I've written a document that tries to cover the various issues with > > I/O events. Check out: > > http://www.atnf.csiro.au/~rgooch/linux/docs/io-events.html The new mechanism I introduce optimises an existing POSIX interface. Also, it is optional: drivers which continue to do things the old way will still work, they just won't be as fast. With completion ports all drivers will have to be modified, so it involves a lot more work. I do agree that if my fastpoll optimisation is added, then the logical place to add completion port support is in poll_notify(). I've added a note in my documentation about that. BTW: what happens when a FD is closed before the completion event is read by the application? Protecting against that could be tricky, and may require more code than simply dropping an int into a pipe. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: l...@bitmover.com (Larry McVoy) Subject: Re: Thread implementations... Date: 1998/06/22 Message-ID: <199806221544.IAA03108@bitmover.com>#1/1 X-Deja-AN: 365026270 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner Newsgroups: muc.lists.linux-kernel : > If one really need to use threads, then, one of the following is true, : > in my opinion: : > - One likes complexity since one is stupid as most programmers. : > - One's O/S handles processes as bloat entities. : > - One has heared too much O/S 2 lovers. : > - One is believing that MicroSoft-BASIC is multi-threaded. : : Wow! This is really arrogant! Maybe, maybe not. I happen to agree with him, minus the inflammatory stuff. : > The select() semantic has been a hack that has been very usefull for : > implementing event-driven applications using a low number of fds, as : > the X Server. Trying to use such a semantic to deal with thousands of : > handles can only lead to performance problems. This is trivial. : : A lightweight userspace solution that uses a modest number of threads : is cabable of giving us a fast and scalable mechanism for handling : very large numbers of FDs. And it can do this without changing one : line of kernel code. So this is interesting. Can you either point towards a document or explain why using threads would make your mechanism faster? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: groud...@club-internet.fr (Gerard Roudier) Subject: Re: Thread implementations... Date: 1998/06/22 Message-ID: <Pine.LNX.3.95.980622200832.371A-100000@localhost>#1/1 X-Deja-AN: 365082668 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806212336.JAA26380@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Mon, 22 Jun 1998, Richard Gooch wrote: > Gerard Roudier writes: > > > > Using multi-threading into a single process context is, IMO, just > > importing into user-land kernel-like problems and providing such > > a feature complexifies significantly the involved kernel. > > Multi-threading into processes is not the way to go, IMO, especially > > if you want to be portable across platforms. > > I'm proposing a userspace abstraction that, on Unix systems, uses > select(2)/poll(2) and a modest number of threads. It could be ported > to another OS which has completion ports, if you cared. If I have completion, then I donnot need threads at all. > > If one really need to use threads, then, one of the following is true, > > in my opinion: > > - One likes complexity since one is stupid as most programmers. > > - One's O/S handles processes as bloat entities. > > - One has heared too much O/S 2 lovers. > > - One is believing that MicroSoft-BASIC is multi-threaded. > > Wow! This is really arrogant! Nothing arrogant here, only kindness. The thread-mania started with OS2 and I have had to deal for years with people who claim that threads are fine since you can use 1 thread to read the keyboard and another thread to send data to the printer. No need to use an O/S for such a construct, doing I/O directly from the hardware is easier to me. OS2 did not have completion nor real signals, so you had to use threads if you wanted to be asynchronous. BTW, no need to kill a dead O/S. About the thread-oriented Win/NT which is a 32 bit hardwired O/S that has been invented once 32 bit architectures has become obsolete, I could tell some other kindnesses, too ... What about the ridiculous 32 bit port to Alpha? The thread-mania that does bloat UNIX systems comes from these brain-deaded things. Microsoft guys are modern alchemists who are able to make gold from sh*t. Gold is for them, sh*t is for us. :-) > > There is probably lots of OS2 multi-threaded programs that can only be > > broken on SMP, since I often heared OS2 multi-braindeaded programmers > > assuming that threads inside a process are only preempted when > > they call a system service. > > I don't see what this has to do with real threads on a real Unix. When I see a > 5 MB kernel image, I donnot beleive it is a real UNIX. I have the impression that recent UNIXen try to look like some proprietary O/Ses and POSIX extensions to UNIX services lead to BLOATIX, IMO. > > I have written and maintained lots of application programs under VMS, > > UNIX, some have been ported to a dozen of O/S, none of them uses threads. > > I donnot envision to use multi-threads in application software and I > > donnot want to have to deal with applications that uses this, for the > > simple reasons that threads semantics differs too much between operating > > systems and that application programs are often large programs that > > donnot follow the high level of quality of O/S softwares. > > Threads have their uses. Sure, they can be abused. So what? I agree with you that threads have their uses, but it seems to me that programmers want to use them even if it is not needed. > > Traditionnal UNIXes used light processes and preferently blocking I/Os. > > Signals were preferently for error conditions. > > The select() semantic has been a hack that has been very usefull for > > implementing event-driven applications using a low number of fds, as > > the X Server. Trying to use such a semantic to deal with thousands of > > handles can only lead to performance problems. This is trivial. > > A lightweight userspace solution that uses a modest number of threads > is cabable of giving us a fast and scalable mechanism for handling > very large numbers of FDs. And it can do this without changing one > line of kernel code. > Independently, we can optimise the kernel to speed up select(2) and > poll(2) so that both this userspace library as well as other Unix > programmes benefit. select() and poll() are slow by design, at least in user-land. Existing programs will get benefits, but this is not a long term solution. The right solution is an asynchronous completion mechanism as DEC O/Ses are offering since more than 20 years. Regards, Gerard. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/23 Message-ID: <199806230249.MAA04068@vindaloo.atnf.CSIRO.AU> X-Deja-AN: 365192280 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.95.980622200832.371A-100000@localhost> Newsgroups: muc.lists.linux-kernel Gerard Roudier writes: > > On Mon, 22 Jun 1998, Richard Gooch wrote: > > > Gerard Roudier writes: > > > > > > Using multi-threading into a single process context is, IMO, just > > > importing into user-land kernel-like problems and providing such > > > a feature complexifies significantly the involved kernel. > > > Multi-threading into processes is not the way to go, IMO, especially > > > if you want to be portable across platforms. > > > > I'm proposing a userspace abstraction that, on Unix systems, uses > > select(2)/poll(2) and a modest number of threads. It could be ported > > to another OS which has completion ports, if you cared. > > If I have completion, then I donnot need threads at all. Completion ports are not likely to ever be POSIX. Hence we should first evaluate a lightweight solution based on threads before implementing completion ports. If the threads solution is shown to be unsatisfactory, only then does it make sense to look further. > > > If one really need to use threads, then, one of the following is true, > > > in my opinion: > > > - One likes complexity since one is stupid as most programmers. > > > - One's O/S handles processes as bloat entities. > > > - One has heared too much O/S 2 lovers. > > > - One is believing that MicroSoft-BASIC is multi-threaded. > > > > Wow! This is really arrogant! > > Nothing arrogant here, only kindness. The thread-mania started with OS2 > and I have had to deal for years with people who claim that threads are > fine since you can use 1 thread to read the keyboard and another thread > to send data to the printer. No need to use an O/S for such a construct, > doing I/O directly from the hardware is easier to me. > OS2 did not have completion nor real signals, so you had to use threads > if you wanted to be asynchronous. BTW, no need to kill a dead O/S. > About the thread-oriented Win/NT which is a 32 bit hardwired O/S that > has been invented once 32 bit architectures has become obsolete, I could > tell some other kindnesses, too ... What about the ridiculous 32 bit > port to Alpha? The thread-mania that does bloat UNIX systems comes > from these brain-deaded things. It's arrogant because when someone proposes a solution based on threads, you belittle the whole idea (and by implication, the person). Instead, you should first evaluate the idea on it's merits. I may well be that a clever solution based on threads will work very well. If my userspace solution doesn't perform well, then I'll advocate (and maybe even implement) completion ports. But I first want to see how far we can go without departing from UNIX. > Microsoft guys are modern alchemists who are able to make gold from sh*t. > Gold is for them, sh*t is for us. :-) I have no interst in what M$ does in their OS. They are already a step behind the UNIX world, two steps behind Linux, and we are widening the gap. > > > There is probably lots of OS2 multi-threaded programs that can only be > > > broken on SMP, since I often heared OS2 multi-braindeaded programmers > > > assuming that threads inside a process are only preempted when > > > they call a system service. > > > > I don't see what this has to do with real threads on a real Unix. > > When I see a > 5 MB kernel image, I donnot beleive it is a real UNIX. > I have the impression that recent UNIXen try to look like some > proprietary O/Ses and POSIX extensions to UNIX services lead to > BLOATIX, IMO. So Linux is bloatware? After all, it has threads. And I still don't see the relevance of your kernel-bashing arguments. I'm proposing a wholly userspace solution. Hence it requires no extra kernel code. Unlike completion ports, I might add... > > > I have written and maintained lots of application programs under VMS, > > > UNIX, some have been ported to a dozen of O/S, none of them uses threads. > > > I donnot envision to use multi-threads in application software and I > > > donnot want to have to deal with applications that uses this, for the > > > simple reasons that threads semantics differs too much between operating > > > systems and that application programs are often large programs that > > > donnot follow the high level of quality of O/S softwares. > > > > Threads have their uses. Sure, they can be abused. So what? > > I agree with you that threads have their uses, but it seems to me that > programmers want to use them even if it is not needed. Well, I'm not one to jump to using threads just for the hell of it. Why not read the proposal carefully before jumping up and saying a threads-based solution is flawed? See: http://www.atnf.csiro.au/~rgooch/linux/docs/io-events.html > > > Traditionnal UNIXes used light processes and preferently blocking I/Os. > > > Signals were preferently for error conditions. > > > The select() semantic has been a hack that has been very usefull for > > > implementing event-driven applications using a low number of fds, as > > > the X Server. Trying to use such a semantic to deal with thousands of > > > handles can only lead to performance problems. This is trivial. > > > > A lightweight userspace solution that uses a modest number of threads > > is cabable of giving us a fast and scalable mechanism for handling > > very large numbers of FDs. And it can do this without changing one > > line of kernel code. > > Independently, we can optimise the kernel to speed up select(2) and > > poll(2) so that both this userspace library as well as other Unix > > programmes benefit. > > select() and poll() are slow by design, at least in user-land. > Existing programs will get benefits, but this is not a long term > solution. The right solution is an asynchronous completion mechanism > as DEC O/Ses are offering since more than 20 years. The right solution is one that works with minimal departure from the UNIX interface. If completion ports provide no measurable performance advantage over a userspace solution, there is no point to implementing completion ports. We want compatibility with the UNIX world, not with VMS. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: Thread implementations... Date: 1998/06/23 Message-ID: <Pine.LNX.3.96dg4.980622230902.20096T-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 365227081 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806220753.RAA28663@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Mon, 22 Jun 1998, Richard Gooch wrote: > The new mechanism I introduce optimises an existing POSIX > interface. Also, it is optional: drivers which continue to do things > the old way will still work, they just won't be as fast. With > completion ports all drivers will have to be modified, so it involves > a lot more work. As long as ext2 and sockets support it I'd be happy ;) > I do agree that if my fastpoll optimisation is added, then the logical > place to add completion port support is in poll_notify(). I've added a > note in my documentation about that. > > BTW: what happens when a FD is closed before the completion event is > read by the application? Protecting against that could be tricky, and > may require more code than simply dropping an int into a pipe. I don't see a problem -- it's the application that interprets the meanings of the ints coming off the pipe. If the app closes while it possibly still has outstanding stuff then that's a bug in the app. There's no problem for the kernel -- if the FD doesn't get re-used it'll return EBADF when the app tries to use it... if it's re-used then the app gets whatever damage it creates. But suppose it was re-used. The data coming off the completion port means only "ready for read" or "ready for write". The app is almost certainly using non-blocking read/write, and when it attempts it'll get EWOULDBLOCK if things weren't ready. Although I suppose you could queue a special event on close... so that the app could be sure that all events were flushed. Dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/23 Message-ID: <199806230607.QAA05654@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 365227083 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980622230902.20096T-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel Dean Gaudet writes: > > > On Mon, 22 Jun 1998, Richard Gooch wrote: > > > The new mechanism I introduce optimises an existing POSIX > > interface. Also, it is optional: drivers which continue to do things > > the old way will still work, they just won't be as fast. With > > completion ports all drivers will have to be modified, so it involves > > a lot more work. > > As long as ext2 and sockets support it I'd be happy ;) ext2? You mean regular files? > > I do agree that if my fastpoll optimisation is added, then the logical > > place to add completion port support is in poll_notify(). I've added a > > note in my documentation about that. > > > > BTW: what happens when a FD is closed before the completion event is > > read by the application? Protecting against that could be tricky, and > > may require more code than simply dropping an int into a pipe. > > I don't see a problem -- it's the application that interprets the meanings > of the ints coming off the pipe. If the app closes while it possibly > still has outstanding stuff then that's a bug in the app. There's no > problem for the kernel -- if the FD doesn't get re-used it'll return EBADF > when the app tries to use it... if it's re-used then the app gets whatever > damage it creates. > > But suppose it was re-used. The data coming off the completion port means > only "ready for read" or "ready for write". The app is almost certainly > using non-blocking read/write, and when it attempts it'll get EWOULDBLOCK > if things weren't ready. > > Although I suppose you could queue a special event on close... so that the > app could be sure that all events were flushed. What does NT do? If we're considering implementing something similar to NT, it would be worth knowing what the NT policy is. I still think that you can get as good performance with a few threads and simple FD migration, though. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: Thread implementations... Date: 1998/06/23 Message-ID: <Pine.LNX.3.96dg4.980622233140.20096V-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 365227084 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806230607.QAA05654@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Tue, 23 Jun 1998, Richard Gooch wrote: > ext2? You mean regular files? Yeah. > > > I do agree that if my fastpoll optimisation is added, then the logical > > > place to add completion port support is in poll_notify(). I've added a > > > note in my documentation about that. > > > > > > BTW: what happens when a FD is closed before the completion event is > > > read by the application? Protecting against that could be tricky, and > > > may require more code than simply dropping an int into a pipe. > > > > I don't see a problem -- it's the application that interprets the meanings > > of the ints coming off the pipe. If the app closes while it possibly > > still has outstanding stuff then that's a bug in the app. There's no > > problem for the kernel -- if the FD doesn't get re-used it'll return EBADF > > when the app tries to use it... if it's re-used then the app gets whatever > > damage it creates. > > > > But suppose it was re-used. The data coming off the completion port means > > only "ready for read" or "ready for write". The app is almost certainly > > using non-blocking read/write, and when it attempts it'll get EWOULDBLOCK > > if things weren't ready. > > > > Although I suppose you could queue a special event on close... so that the > > app could be sure that all events were flushed. > > What does NT do? If we're considering implementing something similar > to NT, it would be worth knowing what the NT policy is. You know it just occured to me that some time back I stopped advocating the NT method -- and by extension the VMS method... but I wasn't too clear on describing my current method maybe. NT/VMS actually do completion -- for example, if you do a write() you're told when it completes. That I think is way too expensive... I'm totally with you on the bloatness of that. What I'm advocating now is something akin to select()/poll(), and I've been wrong to be calling it "completion ports". It's more like a "ready queue" -- a queue of FDs which are ready for read or write. You do a write() the kernel says EWOULDBLOCK, so you stop writing and you put that "thread" to sleep (you note that it's waiting for the FD to become ready for write). Sometime later the kernel tells you its ready for write() by sending the FD down the ready queue. That seems like a fairly light kernel change (and I think the stuff may be in POSIX already -- rt signals and aio? I need to get a copy of the POSIX docs some day!). If you need true completion ports you can implement the rest of them at user-level. > I still think that you can get as good performance with a few threads > and simple FD migration, though. Yeah I need to investigate this anyhow -- because I still need to support other unixes. So it's probably the first approach that would be good once me (or someone else) gets into optimizing NSPR. Dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/23 Message-ID: <199806230732.RAA06474@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 365239583 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980622233140.20096V-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel Dean Gaudet writes: > On Tue, 23 Jun 1998, Richard Gooch wrote: > > > ext2? You mean regular files? > > Yeah. You currently can't poll for when a regular file delivers the block of data you asked for. I'm not aware of any UNIX that supports this. This is a whole new can of worms than the implementation of completion ports/whatever. > You know it just occured to me that some time back I stopped advocating > the NT method -- and by extension the VMS method... but I wasn't too clear > on describing my current method maybe. NT/VMS actually do completion -- > for example, if you do a write() you're told when it completes. That I > think is way too expensive... I'm totally with you on the bloatness > of that. What exactly do you mean "you're told when it completes"? > What I'm advocating now is something akin to select()/poll(), and I've > been wrong to be calling it "completion ports". It's more like a "ready > queue" -- a queue of FDs which are ready for read or write. You do a > write() the kernel says EWOULDBLOCK, so you stop writing and you put that > "thread" to sleep (you note that it's waiting for the FD to become ready > for write). Sometime later the kernel tells you its ready for write() > by sending the FD down the ready queue. Earlier last year when you described completion ports, you suggested that the queue for the completion events could just be a simple FD, so I've assumed that's what you meant. How is this different from "completion ports" in NT/VMS? It looks to me these "event queues" are much the same as "completion ports", based on the (vague) descriptions. > That seems like a fairly light kernel change (and I think the stuff may > be in POSIX already -- rt signals and aio? I need to get a copy of the > POSIX docs some day!). If you need true completion ports you can > implement the rest of them at user-level. When people have talked about implementing AIO in Linux, they had in mind a userspace library which used threads to do the work. Each AIO request is given a thread. I think part of the reason for such an implementation is that you can't poll a regular file, so you need a blocking thread. The other reason is why do it in the kernel if we can develop a good userspace solution? > > I still think that you can get as good performance with a few threads > > and simple FD migration, though. > > Yeah I need to investigate this anyhow -- because I still need to support > other unixes. So it's probably the first approach that would be good > once me (or someone else) gets into optimizing NSPR. Or you could use my library... Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: Thread implementations... Date: 1998/06/23 Message-ID: <Pine.LNX.3.96dg4.980623014005.20096i-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 365254338 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806230732.RAA06474@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Tue, 23 Jun 1998, Richard Gooch wrote: > You currently can't poll for when a regular file delivers the block of > data you asked for. I'm not aware of any UNIX that supports this. > This is a whole new can of worms than the implementation of completion > ports/whatever. This is just asynch i/o. I'd be surprised if any of the commercial unixes lack it. > What exactly do you mean "you're told when it completes"? You write/read a buffer, and control returns immediately. Some unspecified time later, when the write/read completes, your program is informed either via a completion port (NT), or via a function you passed to the kernel (VMS). > How is this different from "completion ports" in NT/VMS? It looks to > me these "event queues" are much the same as "completion ports", based > on the (vague) descriptions. Nope, completion ports are far heavier... they actually imply that some I/O has completed. Whereas what I'm advocating only implies that some I/O wouldn't block if tried. > When people have talked about implementing AIO in Linux, they had in > mind a userspace library which used threads to do the work. Each AIO > request is given a thread. I think part of the reason for such an > implementation is that you can't poll a regular file, so you need a > blocking thread. The other reason is why do it in the kernel if we can > develop a good userspace solution? This is going in circles -- this is exactly the point I've been debating -- whether this is "good" or not. > Or you could use my library... I believe you said in a previous post that you don't care about NT. Unfortunately, I do. And I can't find anything on your pages about portability... I'm assuming you're referring to karma. NSPR is already ported to 20 unixes, plus WIN32, and has ports underway for pretty much everything else of interest. Dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/23 Message-ID: <199806230904.TAA07206@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 365257749 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980623014005.20096i-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel Dean Gaudet writes: > On Tue, 23 Jun 1998, Richard Gooch wrote: > > > You currently can't poll for when a regular file delivers the block of > > data you asked for. I'm not aware of any UNIX that supports this. > > This is a whole new can of worms than the implementation of completion > > ports/whatever. > > This is just asynch i/o. I'd be surprised if any of the commercial unixes > lack it. Ah, OK, you're referring explicitely to aio_*(), right? > > What exactly do you mean "you're told when it completes"? > > You write/read a buffer, and control returns immediately. Some > unspecified time later, when the write/read completes, your program is > informed either via a completion port (NT), or via a function you passed > to the kernel (VMS). Can these NT completion ports multiple events from multiple FDs? > > How is this different from "completion ports" in NT/VMS? It looks to > > me these "event queues" are much the same as "completion ports", based > > on the (vague) descriptions. > > Nope, completion ports are far heavier... they actually imply that some > I/O has completed. Whereas what I'm advocating only implies that some I/O > wouldn't block if tried. We have that now with non-blocking I/O. I still don't understand the model you are proposing. > > When people have talked about implementing AIO in Linux, they had in > > mind a userspace library which used threads to do the work. Each AIO > > request is given a thread. I think part of the reason for such an > > implementation is that you can't poll a regular file, so you need a > > blocking thread. The other reason is why do it in the kernel if we can > > develop a good userspace solution? > > This is going in circles -- this is exactly the point I've been debating > -- whether this is "good" or not. So you want AIO in the kernel. That is even more bloatware than "completion ports", "event queues" or whatever you're calling them. From what I've seen on this list in the past, a kernel-space AIO implementation is not favoured. If you think that a userpace implementation is going to be too slow, you have to show evidence of that. > > Or you could use my library... > > I believe you said in a previous post that you don't care about NT. I don't care about it in the context of a solution for a UNIX system. If there is an NT solution, but it doesn't exist in UNIX, then it doesn't help me, or others who want to get the best out of their UNIX systems. > Unfortunately, I do. And I can't find anything on your pages about > portability... I'm assuming you're referring to karma. NSPR is already Karma has been ported to: VXMVX alpha_OSF1 c2_ConvexOS crayPVP_UNICOS hp9000_HPUX i386_Linux i386_Solaris mips1_IRIX5 mips1_ULTRIX mips2_IRIX5 mips2_IRIX6 mips4_IRIX6 rs6000_AIX sparc_Solaris sparc_SunOS and code that doesn't care about CPU type (i.e. most of it) will also compile on a "generic" POSIX machine. plus I'll be distributing a small tarball which contains just the stuff needed to compile the FD management package, so it can be included in a separate package/library. > ported to 20 unixes, plus WIN32, and has ports underway for pretty much > everything else of interest. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/23 Message-ID: <199806230909.TAA07258@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 365257750 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806230904.TAA07206@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel Richard Gooch writes: > Dean Gaudet writes: > > On Tue, 23 Jun 1998, Richard Gooch wrote: > > > > > What exactly do you mean "you're told when it completes"? > > > > You write/read a buffer, and control returns immediately. Some > > unspecified time later, when the write/read completes, your program is > > informed either via a completion port (NT), or via a function you passed > > to the kernel (VMS). > > Can these NT completion ports multiple events from multiple FDs? Make that: "Can these NT completion ports multiplex events from multiple FDs?" > > > Or you could use my library... > > > > I believe you said in a previous post that you don't care about NT. > > I don't care about it in the context of a solution for a UNIX > system. If there is an NT solution, but it doesn't exist in UNIX, then > it doesn't help me, or others who want to get the best out of their > UNIX systems. I should also say that I have no problem with making use of some native NT mechanism where appropriate, *for NT*. My library makes the best use of whatever the OS supplies. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: Thread implementations... Date: 1998/06/24 Message-ID: <Pine.LNX.3.96dg4.980623164906.29998D-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 366064024 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806230909.TAA07258@vindaloo.atnf.CSIRO.AU> Reply-To: Dean Gaudet <dgaudet-list-linux-ker...@arctic.org> Newsgroups: muc.lists.linux-kernel On Tue, 23 Jun 1998, Richard Gooch wrote: > Richard Gooch writes: > > Dean Gaudet writes: > > > On Tue, 23 Jun 1998, Richard Gooch wrote: > > > > > > > What exactly do you mean "you're told when it completes"? > > > > > > You write/read a buffer, and control returns immediately. Some > > > unspecified time later, when the write/read completes, your program is > > > informed either via a completion port (NT), or via a function you passed > > > to the kernel (VMS). > > > > Can these NT completion ports multiple events from multiple FDs? > > Make that: "Can these NT completion ports multiplex events from > multiple FDs?" Yes. A typical method of using them is to maintain a homogenous pool of worker threads. Each worker thread can pick up a completed I/O, do further processing on the request, and "suspend" the request when it next needs to do I/O, and loop back to pick up some other completed I/O. To get an event on the port you have to start an I/O and the kernel then registers when the I/O has completed. This is different from select/poll event processing. In this case the events that the kernel delivers are of the form "if you read/write this FD right now, it won't block". To get an event to occur you first try to read/write and get EWOULDBLOCK and then you ask the kernel to tell you when it wouldn't block. Your proposal puts an event structure onto each FD, which the low level driver updates to indicate read/write readiness. I'm advocating taking that one step further and plop that readiness event onto a readiness queue. In this way you can completely avoid the select/poll and all the associated overhead -- instead you get a stream of "readiness" events from the kernel. Note that with sockets/pipes there is a read and write buffer, and it's obvious how the above works for them (readiness indicates a non-empty/non-full buffer as appropriate). It's somewhat less critical for non-sockets, but something similar is possible. Readiness for read means that a readahead completed... when the app finally read()s the buffer may or may not be present -- if it isn't present then return EWOULDBLOCK. For write, "readiness for write" means that there is buffer space to take at least one page of data. And if the app takes too long to issue the write(), return EWOULDBLOCK. i.e. just pretend there is a read and write buffer... there is one, it's all the buffer memory. Now, completion ports and readiness queues are totally related. You can implement a completion port in userland given a readiness queue... and you can implement a completion port in userland given select/poll. At issue is the efficiency of each solution. BTW there's another class of problems with regular files which applications like Squid run into (and which Apache will possibly run into as we thread it... although I think I have an architecture to mostly avoid the problems). open(), close(), unlink(), rename(), ... all the metadata operations are synchronous. For example if I write a lean and mean single threaded poll() based web server I'm still stuck with synchronous open()... and to work around that I need to spawn multiple threads which do the synchronous work. (This is how Squid works.) Making all of this work without extra threads is a lot of trouble... and is probably not worth it. Dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/24 Message-ID: <199806240030.KAA06585@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 366064066 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980623164906.29998D-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel Dean Gaudet writes: > > > On Tue, 23 Jun 1998, Richard Gooch wrote: > > > Richard Gooch writes: > > > Can these NT completion ports multiple events from multiple FDs? > > > > Make that: "Can these NT completion ports multiplex events from > > multiple FDs?" > > Yes. > > A typical method of using them is to maintain a homogenous pool of worker > threads. Each worker thread can pick up a completed I/O, do further > processing on the request, and "suspend" the request when it next needs to > do I/O, and loop back to pick up some other completed I/O. To get an > event on the port you have to start an I/O and the kernel then registers > when the I/O has completed. > > This is different from select/poll event processing. In this case the > events that the kernel delivers are of the form "if you read/write this FD > right now, it won't block". To get an event to occur you first try to > read/write and get EWOULDBLOCK and then you ask the kernel to tell you > when it wouldn't block. > > Your proposal puts an event structure onto each FD, which the low level > driver updates to indicate read/write readiness. I'm advocating taking > that one step further and plop that readiness event onto a readiness > queue. In this way you can completely avoid the select/poll and all the > associated overhead -- instead you get a stream of "readiness" events from > the kernel. Sorry, I still don't see the difference between your completion ports and event queues. In both cases, as far as I can tell, when I/O completes a "message" is sent to some place. The application can then pick off these events. Part of the message includes the FD which had the completed I/O. > Note that with sockets/pipes there is a read and write buffer, and it's > obvious how the above works for them (readiness indicates a > non-empty/non-full buffer as appropriate). > > It's somewhat less critical for non-sockets, but something similar is > possible. Readiness for read means that a readahead completed... when the > app finally read()s the buffer may or may not be present -- if it isn't > present then return EWOULDBLOCK. For write, "readiness for write" means > that there is buffer space to take at least one page of data. And if the > app takes too long to issue the write(), return EWOULDBLOCK. i.e. just > pretend there is a read and write buffer... there is one, it's all the > buffer memory. The last time I tried non-blocking I/O on a regular file, it still blocked :-( This was with Linux 2.1.x. Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: ch...@cybernet.co.nz (Chris Wedgwood) Subject: Re: Thread implementations... Date: 1998/06/24 Message-ID: <19980624125030.A8986@caffeine.ix.net.nz>#1/1 X-Deja-AN: 366064008 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806230909.TAA07258@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Wed, Jun 24, 1998 at 10:30:00AM +1000, Richard Gooch wrote: > > The last time I tried non-blocking I/O on a regular file, it still > blocked :-( This was with Linux 2.1.x. I just looked at the fs code briefly and don't see anything to handle O_NONBLOCK for regular files. In fact... I'm not even sure how easy this would be to add to the kernel as you would really need a kernel thread for each outstanding request (this is starting to go in circles). I was looking into this for sendfile(2), which can have similar constraints and requirements. -Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: Thread implementations... Date: 1998/06/24 Message-ID: <Pine.LNX.3.96dg4.980623181943.29998O-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 366064027 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <19980624125030.A8986@caffeine.ix.net.nz> Newsgroups: muc.lists.linux-kernel On Wed, 24 Jun 1998, Chris Wedgwood wrote: > I was looking into this for sendfile(2), which can have similar constraints > and requirements. It occured to me last night that sendfile() may not be the best thing... my latest scheme for speeding up apache involves what I'm calling "HTTP flows", and the short story is that the web server has a front-end and a back-end. The front-end is extremely light, dumb, and single threaded; the back-end is full featured, and looks almost the same as current apache. The front-end handles only well-formed HTTP requests and only requests that fit patterns that the back-end has fed it. In its simplest form it's a mapping from URL to mmap-region/FD (but it can handle far more than just these static-only servers). If sendfile() is blocking I can't use it for this. I've got a prototype of this method already, and it outperforms threaded apache by a factor of 50%. It all makes sense when you sit back and realise the cache benefits from a single thread, not to mention the coding short-cuts I can take because I can punt any request that isn't well-formed to the slower, fully functional, backend. The backend is fully threaded (one thread per request) because it's far easier to let folks extend the server in a threaded programming model... the backend wouldn't have any problem with a blocking sendfile(). But the front-end is where sendfile() would be of the most use... right now it's a typical poll()/write() implementation. Food for thought... glad to see someone is thinking about sendfile() :) Dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: ch...@cybernet.co.nz (Chris Wedgwood) Subject: Re: Thread implementations... Date: 1998/06/24 Message-ID: <19980624150059.A9649@caffeine.ix.net.nz>#1/1 X-Deja-AN: 366064034 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <19980624125030.A8986@caffeine.ix.net.nz> Newsgroups: muc.lists.linux-kernel On Tue, Jun 23, 1998 at 06:37:58PM -0700, Dean Gaudet wrote: > It occured to me last night that sendfile() may not be the best thing... Its probably not. I'm not even sure if sendfile belongs in the kernel (well, not initially, long term it probably does), but it probably does need implementing as some point as most other OS's have or will have some variant of it. > my latest scheme for speeding up apache involves what I'm calling "HTTP > flows", and the short story is that the web server has a front-end and a > back-end. The front-end is extremely light, dumb, and single threaded; > the back-end is full featured, and looks almost the same as current > apache. I've looked at the code and stuff. Looks pretty nice, but my head still needs twisting before I can get my mind completely around it. How does this scale for n processors, n frontends? > The front-end handles only well-formed HTTP requests and only requests > that fit patterns that the back-end has fed it. In its simplest form it's > a mapping from URL to mmap-region/FD (but it can handle far more than just > these static-only servers). If sendfile() is blocking I can't use it for > this. sendfile needn't be blocking, but the question is, under which conditions should sendfile block? For something like (al la PH-UX): ssize_t sendfile(int s, int fd, off_t offset, size_t nbytes, const struct iovec *hdtrl, int flags); where s is the NETWORK socket, fd is the FILESYSTEM file descriptor. Now, if both s and fd are set non-blocking, then logically, sendfile shouldn't block, if s and fd are set to block, then logically it should block. But, what is s is blocking and fd isn't, or vice versa? I would say here we are entitled (and perhaps should be required) to block, but its not terribly clear what is logical in this instance. Oh, logically being defined as what I think makes sense. YMMV. > The backend is fully threaded (one thread per request) because it's far > easier to let folks extend the server in a threaded programming model... One thread/request? I assume this means when I send "GET /index.html HTTP/0.9" it wakes up one thread (from a preallocated pool), does the work, then sleeps (returning the thread) ? > the backend wouldn't have any problem with a blocking sendfile(). But the > front-end is where sendfile() would be of the most use... right now it's a > typical poll()/write() implementation. > > Food for thought... glad to see someone is thinking about sendfile() :) As mentioned above, if async. IO can be done (at least in part) in userspace, then I think sendfile should probably be implemented at the libc level to start with. Sure, this partially defeats the purpose of it to some extent, but then again some buffer-cache + vm tweaks may make it quite viable. -Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/24 Message-ID: <199806240915.TAA09504@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 366064016 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <19980624150059.A9649@caffeine.ix.net.nz> Newsgroups: muc.lists.linux-kernel Chris Wedgwood writes: > On Tue, Jun 23, 1998 at 06:37:58PM -0700, Dean Gaudet wrote: > > > It occured to me last night that sendfile() may not be the best thing... > > Its probably not. I'm not even sure if sendfile belongs in the kernel (well, > not initially, long term it probably does), but it probably does need > implementing as some point as most other OS's have or will have some variant > of it. > > > my latest scheme for speeding up apache involves what I'm calling "HTTP > > flows", and the short story is that the web server has a front-end and a > > back-end. The front-end is extremely light, dumb, and single threaded; > > the back-end is full featured, and looks almost the same as current > > apache. > > I've looked at the code and stuff. Looks pretty nice, but my head still > needs twisting before I can get my mind completely around it. > > How does this scale for n processors, n frontends? > > > The front-end handles only well-formed HTTP requests and only requests > > that fit patterns that the back-end has fed it. In its simplest form it's > > a mapping from URL to mmap-region/FD (but it can handle far more than just > > these static-only servers). If sendfile() is blocking I can't use it for > > this. > > sendfile needn't be blocking, but the question is, under which conditions > should sendfile block? > > For something like (al la PH-UX): > > ssize_t sendfile(int s, int fd, off_t offset, size_t nbytes, > const struct iovec *hdtrl, int flags); > > where s is the NETWORK socket, fd is the FILESYSTEM file descriptor. > > Now, if both s and fd are set non-blocking, then logically, sendfile > shouldn't block, if s and fd are set to block, then logically it should > block. > > But, what is s is blocking and fd isn't, or vice versa? I would say here we > are entitled (and perhaps should be required) to block, but its not terribly > clear what is logical in this instance. > > Oh, logically being defined as what I think makes sense. YMMV. > > > The backend is fully threaded (one thread per request) because it's far > > easier to let folks extend the server in a threaded programming model... > > One thread/request? > > I assume this means when I send "GET /index.html HTTP/0.9" it wakes up one > thread (from a preallocated pool), does the work, then sleeps (returning the > thread) ? > > > the backend wouldn't have any problem with a blocking sendfile(). But the > > front-end is where sendfile() would be of the most use... right now it's a > > typical poll()/write() implementation. > > > > Food for thought... glad to see someone is thinking about sendfile() :) > > As mentioned above, if async. IO can be done (at least in part) in > userspace, then I think sendfile should probably be implemented at the libc > level to start with. Why bother with sendfile() if you have aio_*() available? sendfile() is a trivial wrapper to aio_*(). Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet) Subject: Re: Thread implementations... Date: 1998/06/24 Message-ID: <Pine.LNX.3.96dg4.980624025515.26983E-100000@twinlark.arctic.org>#1/1 X-Deja-AN: 366064026 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806240915.TAA09504@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Wed, 24 Jun 1998, Richard Gooch wrote: > Why bother with sendfile() if you have aio_*() available? sendfile() > is a trivial wrapper to aio_*(). aio_* are user space. So they use either read() or mmap() to get the data to be sent... which are the methods already available to apps, so there's no need to use aio. read() is painful because it involves an extra copy of the data -- although that could be optimized by putting page flipping into the kernel, and writing the app to ensure it uses page aligned buffers. read() cannot exercise the hardware to its fullest. mmap() is painful when your working set exceeds the RAM available because it doesn't readahead more than a page. read() does 4 page readahead (I think these are the numbers), and outperforms mmap() in this situation. DavidM gave me a patch to improve things... but they're still not quite at the level that read() is at... and read() isn't at the level the hardware can handle. sendfile() could be used to give a huge hint to the kernel about the nature of the data to be sent... so the kernel can make better judgements about when to readahead, and what to throw away in low memory situations. It isn't terribly necessary if the mmap() readahead problem is solved, but DavidM made it sound like that was an icky problem to solve. The main reason you want mmap() (or sendfile()) over read() is to be able to perform single-copy and zero-copy TCP. read() with page-flipping is another way to do it, but I really don't know the difficulty. Dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch) Subject: Re: Thread implementations... Date: 1998/06/24 Message-ID: <199806241213.WAA10661@vindaloo.atnf.CSIRO.AU>#1/1 X-Deja-AN: 366116036 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <Pine.LNX.3.96dg4.980624025515.26983E-100000@twinlark.arctic.org> Newsgroups: muc.lists.linux-kernel Dean Gaudet writes: > > > On Wed, 24 Jun 1998, Richard Gooch wrote: > > > Why bother with sendfile() if you have aio_*() available? sendfile() > > is a trivial wrapper to aio_*(). > > aio_* are user space. So they use either read() or mmap() to get the data > to be sent... which are the methods already available to apps, so there's > no need to use aio. OK, you're looking from the point of view of squeezing out more performance. Whether aio_*() is implemented in user-space or kernel-space probably makes very little difference. > read() is painful because it involves an extra copy of the data -- > although that could be optimized by putting page flipping into the kernel, > and writing the app to ensure it uses page aligned buffers. read() cannot > exercise the hardware to its fullest. > > mmap() is painful when your working set exceeds the RAM available because > it doesn't readahead more than a page. read() does 4 page readahead (I > think these are the numbers), and outperforms mmap() in this situation. > DavidM gave me a patch to improve things... but they're still not quite at > the level that read() is at... and read() isn't at the level the hardware > can handle. That could be fixed with some decent flags for madvise(2). We could do with that anyway for other applications. > sendfile() could be used to give a huge hint to the kernel about the > nature of the data to be sent... so the kernel can make better judgements > about when to readahead, and what to throw away in low memory situations. > It isn't terribly necessary if the mmap() readahead problem is solved, but > DavidM made it sound like that was an icky problem to solve. I think the madvise(2) problem needs to be solved in any case. > The main reason you want mmap() (or sendfile()) over read() is to be able > to perform single-copy and zero-copy TCP. read() with page-flipping is > another way to do it, but I really don't know the difficulty. If we get madvise(2) right, we don't need sendfile(2), correct? Regards, Richard.... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: ch...@cybernet.co.nz (Chris Wedgwood) Subject: Re: Thread implementations... Date: 1998/06/25 Message-ID: <19980625161310.B22513@caffeine.ix.net.nz>#1/1 X-Deja-AN: 366135313 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806240915.TAA09504@vindaloo.atnf.CSIRO.AU> Newsgroups: muc.lists.linux-kernel On Wed, Jun 24, 1998 at 10:13:57PM +1000, Richard Gooch wrote: > If we get madvise(2) right, we don't need sendfile(2), correct? It would probably suffice. In fact, having a working implementation of madvise, etc. would make sendfile pretty trivial to do in libc. (Again, I assuming that whether or not we need it, if it can be implemented in userspace then why not...) -Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: torva...@transmeta.com (Linus Torvalds) Subject: Re: Thread implementations... Date: 1998/06/25 Message-ID: <6msk1d$n4$1@palladium.transmeta.com>#1/1 X-Deja-AN: 366135316 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <199806240915.TAA09504@vindaloo.atnf.CSIRO.AU> Organization: Transmeta Corporation, Santa Clara, CA Newsgroups: muc.lists.linux-kernel In article <19980625161310.B22...@caffeine.ix.net.nz>, Chris Wedgwood <ch...@cybernet.co.nz> wrote: >On Wed, Jun 24, 1998 at 10:13:57PM +1000, Richard Gooch wrote: > >> If we get madvise(2) right, we don't need sendfile(2), correct? > >It would probably suffice. In fact, having a working implementation of >madvise, etc. would make sendfile pretty trivial to do in libc. (Again, I >assuming that whether or not we need it, if it can be implemented in >userspace then why not...) However, the thing to notice is that a "sendfile()" system call can potentially be a lot faster than anything else. In particular, it can be as clever as it wants about sending stuff directly from kernel buffers etc. I know there are a lot of people who think zero-copying is cool, and that tricks with mmap() etc can be used to create zero-copy. But don't forget that it's a major mistake to think that performance is about whether the algorithm is O(1) or O(n) or O(n^2). People tend to forget the constant factor, and look blindly at other things. In particular, doing a mmap() itself is fairly expensive. It implies a lot of bookkeeping, and it also implies a fair amount of mucking around with CPU VM issues (TLBs, page tables etc). In short, it can be rather expensive. Due to that expense, things that use mmap() often have a "cache" of mappings that they have active. Thet gets rid of one expense, but then there is the new expense of maintaining that cache (and it can be a fairly costly thing to maintain if you want to doa threaded webserver). In contrast, a "sendfile()" approach can be extremely light-weight, and threads much better because it doesn't imply the same kinds of maintenance. Now, I'm no NT person, but I suspect that we actually do want to have a "sendfile()" kind of thing just because it should be fairly easy to implement, and would offer some interesting performance advantages for some cases. No, it's not truly generic, but it is useful enough in many circustances. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: e...@arbat.com (Erik Corry) Subject: Re: Thread implementations... Date: 1998/06/25 Message-ID: <19980625090558.A1141@arbat.com>#1/1 X-Deja-AN: 366135218 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner Newsgroups: muc.lists.linux-kernel In article <6msk1d$n...@palladium.transmeta.com> you wrote: > Now, I'm no NT person, but I suspect that we actually do want to have a > "sendfile()" kind of thing just because it should be fairly easy to > implement, and would offer some interesting performance advantages for > some cases. No, it's not truly generic, but it is useful enough in many > circustances. I'm a little curious as to which circumstances you are thinking of. As far as I can see, it's a syscall for a single application (a web server serving static objects) which is basically little more than a benchmark. If you really have such a hugely loaded web server you are likely to be doing lots of database lookups, cookie-controlled variable content, shtml, other cgi trickery, etc. And if you really just want to serve static objects as fast as possible, a round-robin DNS with multiple servers gets you more robustness and a solution that scales above Ethernet speeds. Would we just be doing this to look good agains NT in webstones? -- Erik Corry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu
From: torva...@transmeta.com (Linus Torvalds) Subject: Re: Thread implementations... Date: 1998/06/25 Message-ID: <Pine.LNX.3.95.980625094857.27350A-100000@penguin.transmeta.com>#1/1 X-Deja-AN: 366135400 Approved: g...@greenie.muc.de Sender: muc.de!l-linux-kernel-owner References: <19980625090558.A1141@arbat.com> Newsgroups: muc.lists.linux-kernel On Thu, 25 Jun 1998, Erik Corry wrote: > > [ "sendfile()" ] > > I'm a little curious as to which circumstances you are thinking of. > As far as I can see, it's a syscall for a single application (a > web server serving static objects) which is basically little more > than a benchmark. It's actually perfectly usable for other things too, like ftp servers etc. The way I would probably implement it, it would actually work for "cp" as well - you could "sendfile()" to another file, not just to a socket. > If you really have such a hugely loaded web server > you are likely to be doing lots of database lookups, cookie-controlled > variable content, shtml, other cgi trickery, etc. My personal observation has been that most webservers do mostly static stuff, with a small percentage of dynamic behaviour. For example, even if they have lots of CGI etc, often a big part of the page (bandwidth-wise) tend to be pictures etc. > And if you really > just want to serve static objects as fast as possible, a round-robin > DNS with multiple servers gets you more robustness and a solution that > scales above Ethernet speeds. That works if you have a _completely_ static setup. Which is one common thing to have, but at the same time it is certainly not what most people want to have. > Would we just be doing this to look good agains NT in webstones? We want to do that too. I don't think it's only that, though. The apache people get some impressive numbers out of Linux, but when I talk to Dean Gaudet I also very often get the feeling that in order to get better numbers they have to do really bad things, and those things are going to slow them down in many circumstances. One thing is actually the latency of setting up a small transfer. This sounds unimportant, but it's actually fairly important in order to do well under load: the lower latency you have, the more likely you are to not get into the bad situation that you have lots of outstanding requests and all while you serve those you get new requests at the same rate and never make any progress after a certain load. That's one reason I don't like mmap() - it has horrible latency. mmap under linux is fast, but it's really slow compared to what _could_ be done. Similarly, "read()+write()" implies using user-space buffers, which implies a certain amount of memory management and certainly bad utilization of memory that could be better used for caching something. And web serving is one of the things a lot of people want. And if they make their judgements by benchmarks, we'd better be good at them. Never discount benchmark numbers just because you don't like the benchmark: I much prefer to go by real numbers than by "feeling". I know some people that every time they see Linux beating somebody at a benchmark, they claim that "the benchmark is meaningless, under real load the issues are different". That's a cop-out. If NT is better than Linux at something, we'd better look out or have a _really_ good explanation.. And I think webstone is "real enough" that we can't really explain it away. (I'm not saying NT is faster - I don't actually know the numbers. But I don't want to be in the situation that it could be faster). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu