From: hedrick@dumas.rutgers.edu (Charles Hedrick) Newsgroups: alt.os.linux Subject: another dead filesystem and that fsck can't fix Date: 2 Feb 92 20:28:48 GMT Organization: Rutgers Univ., New Brunswick, N.J. I'm a victim of what is probably the same problem somebody reported a bit ago: I have a directory that fsck complains doesn't have . and .. at the beginning. Whenever I try to look anyhthing in it, the kernel panics, claims it's trying to deallocated 0. fsck reports an error, but doesn't do anything about it. I can probably rebuild the file system, but it's a pain. By the way, it's now pretty clear that there's a timing problem (a race or something) in the file system or hd code. Basically whenever I am doing file system I/O in two jobs at the same time (e.g. on two screens), I lose. Examples: copying a large file from MSDOS to linux, using mread. At the same time I log in on a different screen, which does a bit of I/O (the login program, init files for bash). The system hung. Or extracting from a large tar file and simultaneously doing ls and du to see how things are progressing. My disk is fairly fast (it's one of the new Connor IDE disks, which I believe is 8 msec average seek). Perhaps it turns up race conditions not seen with slower disks. This makes the system sort of dangerous to use, given that fsck won't fix it. Even a way to manually remove the directory would be welcome.
From: rad@merlin.think.com (Bob Doolittle) Newsgroups: alt.os.linux Subject: Re: another dead filesystem and that fsck can't fix Date: 13 Feb 92 13:50:51 GMT Organization: Thinking Machines Corporation, Cambridge Mass., USA NNTP-Posting-Host: merlin.think.com In-reply-to: zuazaga@ucunix.san.uc.edu's message of 4 Feb 92 19:56:05 GMT In article < Feb.2.15.28.47.1992.19090@dumas.rutgers.edu> hedrick@dumas.rutgers.edu (Charles Hedrick) writes: >By the way, it's now pretty clear that there's a timing problem (a >race or something) in the file system or hd code. Basically whenever >I am doing file system I/O in two jobs at the same time (e.g. on two >screens), I lose. As I said in an earlier posting, when I tried to copy partitions via: "tar cvf - foo bar | (cd blech; tar xf -)" it hung my system as well. It copied a few files, then the disk stopped being accessed and everything just sat there. Sounds like the same problem. What tools are folks using to debug kernel problems? There is no adb or even ps yet, so what do you do? dd from /dev/mem and disassemble? kernel printfs (eek!)? Enquiring minds need to know... -Bob ------ Bob Doolittle Thinking Machines Corporation rad@think.com (617)234-2734 -- -------------------------------------------------------------------------------- Bob Doolittle Thinking Machines Corporation (617) 234-2734 245 First Street rad@think.com Cambridge, MA 02142 --------------------------------------------------------------------------------
From: torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds) Newsgroups: alt.os.linux Subject: Re: another dead filesystem and that fsck can't fix Date: 14 Feb 92 11:00:31 GMT Organization: University of Helsinki In article < RAD.92Feb13145051@merlin.think.com> rad@merlin.think.com (Bob Doolittle) writes: > >As I said in an earlier posting, when I tried to copy partitions via: >"tar cvf - foo bar | (cd blech; tar xf -)" >it hung my system as well. It copied a few files, then the disk stopped >being accessed and everything just sat there. Sounds like the same >problem. Well, yes. I'm still hoping it's the out-of-memory bug (which I have corrected), but I'm looking into the fs as well :(. >What tools are folks using to debug kernel problems? There is no adb or >even ps yet, so what do you do? dd from /dev/mem and disassemble? kernel >printfs (eek!)? Tools? We don't need no ... :) Printk's in the kernel is the standard "debugging" trick. If somebody comes up with something better, feel free to post: I'm not too happy about it either, but it's simple. This doesn't just extend to the kernel: debugging user programs isn't exactly easy under linux either :( - I've resorted to things like $ od -hx executable | less to find errors efter a program crash... That's what the debugging info printed after exceptions is there for. Oh, well.. Linus
From: bir7@leland.Stanford.EDU (Ross Biro) Newsgroups: alt.os.linux Subject: Re: another dead filesystem and that fsck can't fix Date: 15 Feb 92 20:47:54 GMT Organization: DSG, Stanford University, CA 94305, USA In article <1992Feb14.110031.2731@klaava.Helsinki.FI> torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds) writes: >This doesn't just extend to the kernel: debugging user programs isn't >exactly easy under linux either :( - I've resorted to things like GDB is almost useable. The current status is that it can set break points, check memory (variables) look at source, and restart breakpoints that were coded into the executable. Currently it cannot restart breakpoints which were set from within GDB. I'm working on it. Ross Biro bir7@leland.stanford.edu
From: joel@wam.umd.edu (Joel M. Hoffman) Newsgroups: alt.os.linux Subject: Re: [file system problem or memory problem?] Date: 16 Feb 92 21:59:24 GMT Organization: University of Maryland at College Park Nntp-Posting-Host: rac2.wam.umd.edu Many people have reported that Linux crashes during disk-intensive operations, and have specualted that it's either a file system probem (unlikely) or a mem. management problem (more likely, they say). Is it possible that it's a hard-drive problem? I know that on my machine (a 386 at 25MHz, IDE drive), DJGPP (GCC for DOS) occaisonally reports a ``Not Ready Error Reading Drive C:'' which or course is preposterous. It's a fixed disk and is always ready. GNU Emacs (DEMACS) also crashes sometimes during disk access, presumably because it's getting the not ready error and doesn't know what to do about it. Does the kernal check for this? I don't really know what can be done about the problem. I know that (with DJGPP) by the time the error message pops up, it's too late. The machine has crashed. But perhaps this need not be so. Alas, like so many other problems, I have yet to find an exact way of replicating the problem. Editing a 4MB binary file with DEMACS usually does it, though....
From: bir7@leland.Stanford.EDU (Ross Biro) Newsgroups: alt.os.linux Subject: Re: [file system problem or memory problem?] Date: 17 Feb 92 06:53:09 GMT Organization: DSG, Stanford University, CA 94305, USA In article <1992Feb16.215924.3334@wam.umd.edu> joel@wam.umd.edu (Joel M. Hoffman) writes: >Many people have reported that Linux crashes during disk-intensive >operations, and have specualted that it's either a file system probem >(unlikely) or a mem. management problem (more likely, they say). Is >it possible that it's a hard-drive problem? > Another data point. I have a 386/20 8 meg with a 330 meg ESDI hard drive. I think there is a hardware problem related to the hard drive, ESIX would periodically hang with the disk-access light on, and sometimes complain about nmi's. These things would always happend when the hard drive was under intensive use. Linux has crashed with the hard drive light on a few times, and with it off many times. One time the crash happened when I had about 800 pages of free memory. I have never had a problem under dos. Perhaps other people are experiencing similiar hardware problems. I know the sytems are the similiar. Ross Biro bir7@leland.stanford.edu
From: joel@wam.umd.edu (Joel M. Hoffman) Newsgroups: alt.os.linux Subject: Re: [file system problem or memory problem?] Date: 17 Feb 92 13:42:57 GMT Organization: University of Maryland at College Park Nntp-Posting-Host: rac2.wam.umd.edu In article <1992Feb17.065309.7827@morrow.stanford.edu> bir7@leland.Stanford.EDU (Ross Biro) writes: >In article <1992Feb16.215924.3334@wam.umd.edu> joel@wam.umd.edu (Joel M. Hoffman) writes: >>Many people have reported that Linux crashes during disk-intensive >>operations, and have specualted that it's either a file system probem >>(unlikely) or a mem. management problem (more likely, they say). Is >>it possible that it's a hard-drive problem? >> > > Another data point. I have a 386/20 8 meg with a 330 meg ESDI >hard drive. I think there is a hardware problem related to the hard >drive, ESIX would periodically hang with the disk-access light on, and >sometimes complain about nmi's. These things would always happend >when the hard drive was under intensive use. Linux has crashed with >the hard drive light on a few times, and with it off many times. One >time the crash happened when I had about 800 pages of free memory. I >have never had a problem under dos. Perhaps other people are One more point of clarification. When my system would crash, the hard drive light would also stay on. And I never experienced the problem in real mode, only protected. -Joel