From: Aaron Lehmann <aar...@vitelus.com> Subject: The stability crisis Date: 1999/06/29 Message-ID: <fa.jvgopmv.b04ihs@ifi.uio.no>#1/1 X-Deja-AN: 495369892 Original-Date: Tue, 29 Jun 1999 22:08:30 +0000 ( ) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.05.9906292204410.603-100000@vitelus.com> To: torva...@transmeta.com, a...@lxorguk.ukuu.org.uk, linux-ker...@vger.rutgers.edu Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Linux 2.2.36 was a very stable kernel. I have never experianced a crash with it. However, this does not at all hold true for the 2.2.x series. During the initial stage of the 2.2 series, it was pretty darn stable. I got about 60 days of uptime out of 2.2.1 until a power failure or a need to mess with hardware or something. (Actually, now I think it was a hard lockup). Back then we knew that 2.2 was not at all as stable as 2.0.36, but we knew it would mature. WRONG! Linus waited a few months to open the 2.3 branch. A lot of untested patches were making it into the 2.2 series! People like me breathed a sigh of relief when Linus opened up the 2.3 branch. Now we knew that all of the patches would go into 2.3 and 2.2 would become mature and stable like 2.0.36 But that was only half right. Linus decided to hasten the release of 2.4 to "in the fall", and all of the developers jumped onto the 2.3 kernel, leaving us with a stable kernel which is totally inadequate. 2.2.10 is by far less stable than any operating system I have used excuding MacOS. During the past _week_ I have had three oopsen using kernel 2.2.9 and 2.2.10. I have never had an oops before this week with the exception of Linux on platforms where the ports are excusabe immature and on unstable hardware. Once I found a small bug with a friend in 2.0.x that caused an oops but it wasn't anything major. It was fixed immediately. All the attention has shifted to 2.3. Most people as well as benchmarkers are using 2.2.10. Helloo??? This is a perfect time for Microsoft to spread FUD since the "stable" branch of Linux is far less stable than even Windows NT. THIS IS NOT GOOD FOR LINUX OR THE PEOPLE WHO USE IT! Something needs to be done about this fast. I reccomend that 2.2.10 be made rock solid. Most features and new device drivers can wait until fall with 2.4. Of course, 2.4 should be made and kept very stable as a 2.5 or 2.9 is opened up immediately. I hate to bitch about stuff like this but if I were to try to write kernel code I would probably just add more fatal bugs :). Maybe Alan Cox should voulenteer to maintain 2.2 :). He did a great job with 2.0. And all kernel hackers out there, PLEASE help make 2.2 more stable. Speed is a problem that has been dealt with a lot lately, due to the numerous benchmarks. I believe that this is also a priority, but secondary to stability, at least at this level of instability. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds <torva...@transmeta.com> Subject: Re: The stability crisis Date: 1999/06/29 Message-ID: <fa.lgrpp1v.b1auij@ifi.uio.no>#1/1 X-Deja-AN: 495379219 Original-Date: Tue, 29 Jun 1999 15:46:52 -0700 (PDT) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.10.9906291530430.821-100000@penguin.transmeta.com> References: <fa.jvgopmv.b04ihs@ifi.uio.no> To: Aaron Lehmann <aar...@vitelus.com> X-Authentication-Warning: penguin.transmeta.com: torvalds owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu So why didn't you even include a ksymoops version of the crash? Or a good hardware description? People do try to follow it, but it's not as if I've seen very good reports even from people who say it's obviously bad. And others are completely unable to reproduce the problem, so.. Right now the problem is (a) lack of good data and (b) the fact that there were very few changes between 2.2.7 (which many claim is stable) and 2.2.9 (which many claim is broken). The major changes were actually just reverts of 2.2.8 (which _was_ badly broken due to fs) - the majority by far is actually ARM, Sparc, PPC and alpha merges.. SMP? MTRR enabled? gcc version? Quotas? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Aaron Lehmann <aar...@vitelus.com> Subject: Re: The stability crisis Date: 1999/06/29 Message-ID: <fa.jvguqev.f06j9o@ifi.uio.no>#1/1 X-Deja-AN: 495392018 Original-Date: Tue, 29 Jun 1999 23:04:10 +0000 ( ) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.05.9906292250220.821-100000@vitelus.com> References: <fa.lgrpp1v.b1auij@ifi.uio.no> To: Linus Torvalds <torva...@transmeta.com> Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu I really wish I could report my oopses, but this is a production box and I can't just let it sit there while I write down an oops. The syslog doesn't catch the OOPS except for sometimes the first few lines. Uptime is very important to me, as you have probably noticed from my rant. I don't want to use a serial console becuase I don't have another machine in the vicinity of 20 feet that would be capable of easilly logging kernel messages. I've heard about a new patch that lets the kernel dump oopsen to a floppy, and I'll try it. It scares me that I might accidentally leave a floppy in the drive that actually has data. As I said in a previous message to linux-kernel, I'd be happy to maintain a bug database if that would be within my realm of comprehension (I don't know very much about the kernel internals...). The machine is a Cyrix 6x86MX (no SMP) running RedHat 5.1 with most of the packages at either 5.2 or 6.0 versions. MTRR is enabled in the kernel but I haven't used it for anything yet so I would assume that it is not causing problems. I don't run X. No quotas. [aaronl@vitelus aaronl]$ gcc --version egcs-2.91.66 On Tue, 29 Jun 1999, Linus Torvalds wrote: > > > So why didn't you even include a ksymoops version of the crash? Or a good > hardware description? People do try to follow it, but it's not as if I've > seen very good reports even from people who say it's obviously bad. And > others are completely unable to reproduce the problem, so.. > > Right now the problem is (a) lack of good data and (b) the fact that there > were very few changes between 2.2.7 (which many claim is stable) and 2.2.9 > (which many claim is broken). The major changes were actually just reverts > of 2.2.8 (which _was_ badly broken due to fs) - the majority by far is > actually ARM, Sparc, PPC and alpha merges.. > > SMP? > > MTRR enabled? > > gcc version? > > Quotas? > > Linus > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Mark Hull-Richter <ma...@procom.com> Subject: Re: The stability crisis Date: 1999/06/30 Message-ID: <fa.c9l1jlv.n7i0p4@ifi.uio.no>#1/1 X-Deja-AN: 495627048 Original-Date: Wed, 30 Jun 1999 07:56:31 -0700 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <377A301F.BA573561@procom.com> References: <fa.lgrpp1v.b1auij@ifi.uio.no> To: linux-ker...@vger.rutgers.edu Original-References: <Pine.LNX.4.10.9906291530430.821-100...@penguin.transmeta.com> X-Accept-Language: en Content-Type: multipart/mixed; boundary="------------90942A2696FE5ABBEFF44B52" X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Procom Technology, Inc. MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu I think the problem may have something to do with the size of the oops information and how much is left on the screen. I am doing Alpha development here, and virtually every oops I get a) double oopses and b) leaves no traces in dmesg or the log. In this situation, the first oops is long gone before I can even see it, and the second one is more than half gone by the time it's done, which means what I see on the screen is close to useless. At present this is not a sufficiently critical issue for us that we need to dive in and debug them, and when it becomes one I suspect we'll wire up the serial line for a more persistent tracking device (like a serial printer, a la Ingo's suggestion elsewhere on this list). Just my $.02, and only for a part of the issues Linus notes. Linus Torvalds wrote: > > So why didn't you even include a ksymoops version of the crash? Or a good > hardware description? People do try to follow it, but it's not as if I've > seen very good reports even from people who say it's obviously bad. And > others are completely unable to reproduce the problem, so.. > > Right now the problem is (a) lack of good data and (b) the fact that there > were very few changes between 2.2.7 (which many claim is stable) and 2.2.9 > (which many claim is broken). The major changes were actually just reverts > of 2.2.8 (which _was_ badly broken due to fs) - the majority by far is > actually ARM, Sparc, PPC and alpha merges.. > > SMP? > > MTRR enabled? > > gcc version? > > Quotas? > > Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Vanecek <mev0...@unt.edu> Subject: Re: The stability crisis Date: 1999/07/01 Message-ID: <fa.dnc08jv.12umbu@ifi.uio.no>#1/1 X-Deja-AN: 496078407 Original-Date: Thu, 01 Jul 1999 11:04:42 -0500 Sender: owner-linux-ker...@vger.rutgers.edu Content-Transfer-Encoding: 7bit Original-Message-ID: <377B919A.897A81A4@unt.edu> References: <fa.c9l1jlv.n7i0p4@ifi.uio.no> To: linux-ker...@vger.rutgers.edu Original-References: <Pine.LNX.4.10.9906291530430.821-100...@penguin.transmeta.com> <377A301F.BA573...@procom.com> X-Accept-Language: en Content-Type: text/plain; charset=us-ascii X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: University of North Texas MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Mark Hull-Richter wrote: > > I think the problem may have something to do with the size of the oops > information and how much is left on the screen. I am doing Alpha > development here, and virtually every oops I get a) double oopses and b) > leaves no traces in dmesg or the log. In this situation, the first oops > is long gone before I can even see it, and the second one is more than > half gone by the time it's done, which means what I see on the screen is > close to useless. At present this is not a sufficiently critical issue > for us that we need to dive in and debug them, and when it becomes one I > suspect we'll wire up the serial line for a more persistent tracking > device (like a serial printer, a la Ingo's suggestion elsewhere on this > list). > > Just my $.02, and only for a part of the issues Linus notes. > > Linus Torvalds wrote: > > > > So why didn't you even include a ksymoops version of the crash? Or a good > > hardware description? People do try to follow it, but it's not as if I've > > seen very good reports even from people who say it's obviously bad. And That's pretty much been my experience. 2.2.10 crashes on a regular basis. Certainly more than previous kernels. Why? Who's to say? It leaves behind no information. Nothing in the logs, nothing in dmesg (which changes with each boot up, anyhow). There's no way to try the magic SysRq key, as the keyboard is completely locked up. I can't telnet/ssh/ftp/ping the box, as it evidently stops processing all network requests. In short, there is absolutely no indication, not the slightest oops or byte left over, to even begin to give the inkling of a clue about why the system crashed. So how do you debug that? I don't even know how to cause the crash; usually i get up in the morning, or come home from work, to find the machine all locked up. This particular machine is an AMD K6-2/350 *not* OC'ed, 64M Ram, Asus P5A mobo, Buslogic BT-932 with a 4.5 Seagate, 24x Panasonic, and a JVC 2010 clone CDR, with a NetGear FA310TX (new one) nic, and an SiS 6326 video card. I use Redhat 6.0, fixed. Currently, I have X (SVGA), Window Maker 0.60.0 (I upgraded, in case it was WM causing lockups), Samba 2.0.4b, knfsd 1.3.2, and my gnome is custom compiled. OTOH, my masq machine doesn't crash. It's a little 486/120 w/32M Ram, mobo unknown, running a D-Link 220 NIC, an SIIG (Promise chipset) EIDE controller card, an Orchid Fahrenheit video card (rarely used), and a Zoom modem. With the exception or X programs, it's got the same software as my workstation, RH 6.0, samba, knfsd, etc. Just no X. I suffer, and hope that 2.2.11 will solve my problems with my WS. -- Matthew Vanecek Course of Study: http://www.unt.edu/bcis Visit my Website at http://people.unt.edu/~mev0003 For answers type: perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);' ***************************************************************** For 93 million miles, there is nothing between the sun and my shadow except me. I'm always getting in the way of something... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: br...@worldcontrol.com Subject: Re: The stability crisis Date: 1999/07/02 Message-ID: <fa.i6n72nv.6k65h6@ifi.uio.no>#1/1 X-Deja-AN: 496368903 Original-Date: Fri, 2 Jul 1999 02:54:25 -0700 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <19990702025425.B1097@top.worldcontrol.com> References: <fa.lgrpp1v.b1auij@ifi.uio.no> To: linux-ker...@vger.rutgers.edu Original-References: <Pine.LNX.4.05.9906292204410.603-100...@vitelus.com> <Pine.LNX.4.10.9906291530430.821-100...@penguin.transmeta.com> Content-Type: text/plain; charset=us-ascii X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Mime-Version: 1.0 User-Agent: Mutt/0.96.2i Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Tue, Jun 29, 1999 at 03:46:52PM -0700, Linus Torvalds wrote: > So why didn't you even include a ksymoops version of the crash? Or a good > hardware description? People do try to follow it, but it's not as if I've > seen very good reports even from people who say it's obviously bad. And > others are completely unable to reproduce the problem, so.. > > Right now the problem is (a) lack of good data and (b) the fact that there > were very few changes between 2.2.7 (which many claim is stable) and 2.2.9 > (which many claim is broken). The major changes were actually just reverts > of 2.2.8 (which _was_ badly broken due to fs) - the majority by far is > actually ARM, Sparc, PPC and alpha merges.. > > SMP? > > MTRR enabled? > > gcc version? > > Quotas? > > Linus I'm not the person Linus was addressing, but I've had plenty of oopses with 2.2.1 - 2.2.10 and have not sent any in. So far as I know there are only two ways to capture the data related to an oops. Write it down with a pencil, or capture it via a serial port on another machine. The first seems too prone to errors, and the second just isn't realistic for me and my cluster of machines. Too many serial cables going every which way. Or maybe I'm just lazy. I have a setup which oopses in 5 minutes to a few days when compiled with SMP support. The identical source compiled without SMP runs forever as far as I can tell. Since all things have previously be discussed on this list, I'm going to let my linux "newbieism" show by asking for a feature which has undoubtably been asked for before and has undoubtable been shot down for very legitimate reasons. I would like my oops'ing systems to send the oops to another system via an ethernet interface. How about a UDP packet? Nice connectionless protocol. Compile the MAC/IP address into the kernel. Opps occurs, build the UDP packet with the measly 2K oops message in it and send. -- Brian Litzinger <br...@litzinger.com> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/