2.2.13aa6 (bugfix release II)
Andrea Arcangeli (andrea@suse.de)
Fri, 17 Dec 1999 16:34:21 +0100 (CET)
The main features of 2.2.13aa6 are:
o Support for 4Gigabyte of RAM (me and Gerhard.Wichert)
o Improved VM for high end machines with enough ram and doing
heavy I/O under high memory pressure (me)
o RAW-IO (also on bigmem) (Stephen C. Tweedie)
o updated with all showstopper/necessary bugfixes discovered into
the 2.2.x kernels over the time.
NOTE (2.2.14pre): if you don't need the 4g support and raw-io and your
machine has a workstation load (so you don't do heavy I/O) you should
ignore 2.2.13aa6 and I suggest to use 2.2.14pre14 plus my
block_dev-fs-corruption patch.
NOTE (raid): if you want to use the latest raid patches
(raid0145-19990824-2.2.11) over 2.2.13aa6 simply apply the raid patch over
2.2.13aa6 and then apply this incremental patch on the resulting kernel:
ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.14pre11/set_blocksize-1-raid0145-19990824-2.2.11.gz
Then raid will work just fine.
Side note: I am not including the new raid code in 2.2.x because at least
raid0 is just rock solid in the stock 2.2.13 kernel and I don't want to
force people to convert the on-disk format of their raid device in order
to run 2.2.13aa6. People who wants to use raid can go with my incremental
raid fix.
Incremental description of 2.2.13aa6:
------ 2.2.13aa6 --------
diff -u --exclude version.gz 2.2.13aa5 2.2.13aa6
Only in 2.2.13aa6: block_dev-fs-corruption-1.gz
block_dev-fs-corruption-1.gz -> fixes fs corruption in the
blockdevice layer (me)
------ 2.2.13aa5 --------
diff -u --exclude version.gz 2.2.13aa4 2.2.13aa5
Only in 2.2.13aa4: buffer-races-2.2.10-A.gz
Only in 2.2.13aa5: buffer-races-2.2.13-3.gz
Only in 2.2.13aa5: ext2-1.gz
buffer-races-2.2.13-3.gz -> includes the
buffer-races-2.2.10-A.gz
features and it also fixes fs
corruption generated by hdparm or
flushb on an active filesystem and
a minor problem in sync_dev (me)
ext2-1.gz -> fixes bugs that may lead to
ext2 fs corruption or
fsync errors (me)
------ 2.2.13aa5 --------
diff -u --exclude version.gz 2.2.13aa3 2.2.13aa4
Only in 2.2.13aa4: inode-recycle-fixes.gz
Only in 2.2.13aa4: java-proc.gz
Only in 2.2.13aa4: signal-race.gz
Only in 2.2.13aa4: syncookies.gz
inode-recycle-fixes.gz -> fixes an inode leakage (me)
syncookies.gz -> fixed syncooky bug (without the
fix at the first synflood
the machine will forbid
connections to all hosts, must
check only the SYN/ACK/FIN
bit and not the data offset
and window of the incoming
packet ;). (Alan Cox)
signal-race.gz -> fixes a race in the send sig path
(David Miller)
java-proc.gz -> revertd the semantic change that
make difference between
/proc/00000$$ and /proc/$$, this
allows backwards compatibilty of
a misfeature and it _won't_ hurt
security. There's no downside
in reverting the 2.2.13 semantic
change.
------ 2.2.13aa3 --------
diff -u --exclude version.gz 2.2.13aa2 2.2.13aa3
Only in 2.2.13aa3: dcache-hashfn.gz
Only in 2.2.13aa3: fdset-fix.gz
Only in 2.2.13aa2: z-bigmem-2.2.13aa2-6.gz
Only in 2.2.13aa3: z-bigmem-2.2.13aa3-7.gz
z-bigmem-2.2.13aa3-7.gz -> fixed a obvious silly bigmem
bug that will lead
to processes killed randomly.
(all the credit goes to Leonard N.
Zubkoff)
fdset-fix.gz -> fixed a fdset bug that may lead to
memory corruption and Oopses
(credits goes to
Savochkin Andrey Vladimirovich,
I only backported the 2.3.x patch
to a four liner against 2.3.13)
dcache-hashfn.gz -> use only the dentry noise for
randomizing the dcache hashfn
(all the credit goes to David S.
Miller)
------ 2.2.13aa2 --------
SMP-scheduler-2.2.11-E.gz -> rewrote of reschedule_idle. (me)
buffer-hash.gz -> fixes lowmem box hash size. (me)
buffer-races-2.2.10-A.gz -> fixes of race condition that may lead
to bad things in invalidate_buffers()
and set_blocksize(). (me)
clear-backlog-2.gz -> fixes for a SMP race condition in
the main network backlog handling. (me)
dcache-hash.gz -> dcache hash dynamic (with my
own heuristc). (started from 2.2.13ac1
but then reimplemented by me)
free_page.gz -> cleanup of the __free_pages
interface. (me)
hashed-buffers-2.2.10.gz -> minor fix to increase the debugging
information in the right place. (me)
inode-leak-2.2.10-A.gz -> make sure to not leak memory
by allocating lots of sockets (DoS),
and let know the admin to enlarge
the max-inodes if the admin really
wants more unfreeable memory in the
icache. (me)
kupdate-sigstop-2.2.11-1.gz -> allow kupdate to be stopped via
SIGSTOP (currently it must be stopped
by setting interval to zero via
sysctl). (me)
no-swapout-2.2.10-B.gz -> avoid swapin/swapouts during heavy
I/O (strictly necessary for decent
performances on very I/O and MM loaded
servers). (me)
oom-2.2.12-I.gz -> assorted OOM fixes (deadlocks in
pagein, Alpha SIGBUS fix, avoid
sigkilling iopl() application send
a sigterm instead, avoid init
to be killed), it's the same
patch merged by Alan into 2.2.14pre2. (me)
pagecache-hash.gz -> pagecache hash dynamic (I think
it's DaveM's work, literally I took it
from 2.2.13ac1). I agree with the
heuristc used. It allocates
num_physpages buckets for the pagecache
and this basically means all the
buckets will be filled supposing a
perfect hash distribution with all the memory
allocated in the cache. (all credits
to David S. Miller)
probe-irq-2.3.14-pre2-1.gz -> avoid a pending irq to be mistaken
for a spurious irq. (me)
shrink_all_cache-2.2.10-A.gz -> make sure that big memory boxes will
shrink the cache well enough. (me)
trashing-mem-2.2.10-A.gz -> heuristic to penalize memory hogs,
the system will remains responsive
also during heavy swapout. (me)
version.gz -> set the EXTRAVERSION to aa2 ;)
wait-event-smp-races.gz -> Put the two mb() after setting the
task state as blocking and before
checking if the event is just happend
(SMP race fix). (me)
wait4-smp-race.gz -> _Critical_ SMP race fix.
Without this one liner each time you
run `ls` from bash, the bash is going
to deadlock in wait4 if you are unlucky
enough. The race is very small
but there are machine under heavy
fork load load that reproduced this
race regularly after some day of load.
The SMP race can happen only
with an SMP kernel on a SMP hardware. (me)
wakeup_bdflush-2.2.10-A.gz -> avoid deadlocking in wakeup_bdflush
(the run_task_queue() can sleep for
example while running the loop
request function). (me)
z-bigmem-2.2.13aa2-6.gz -> 4GB support on x86. (me and
Gerhard Wichert)
z-bigmem-nodebug.gz -> turn the bigmem code into production
mode.
z-bigmem-rawio-2.2.13aa2-1.gz -> rawio working even with bigmem memory
(I started with rawio from 2.2.13ac1
and SCT's 2.3.x rawio bounce buffers,
all the credits go to Stephen C.
Tweedie)
zmagic-all-blocksize.gz -> allow zmagic binaries to run
also on 4k filesystems (it's the same
that gone into 2.2.14pre2). (me)
----------------------------------------------------------------------
To go in sync with 2.2.13aa6 you can:
mkdir 2.2.13aa6
cd 2.2.13aa6
wget --retr-symlinks -A\*.gz
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.2/2.2.13aa6/\*
cd ..
and now you'll have all the interesting patches in the directory
2.2.13aa6.
At this point rename the 2.2.13 sources to 2.2.13aa6:
mv linux-2.2.13 linux-2.2.13aa6
cd linux-2.2.13aa6
and apply all the 2.2.13aa6 patches that you previously downloaded from
the ftp site:
apply-patches.sh ../2.2.13aa6
At this point your tree will be in sync with 2.2.13aa6. Just configure
recompile and boot the new kernel.
You can find the `apply-patches.sh` bash script I written to easily apply
my kernel patches here:
ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/tools/apply-patches/apply-patches.sh.gz
There is also a README on how to use it:
ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/tools/apply-patches/README.gz
The 2.2.13aa6 kernel is placed here:
ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.2/2.2.13aa6/
Have fun! ;)
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
From: ursus <ur...@usa.net> Subject: Re: [2.2.13aa6 (bugfix release II) ] Date: 1999/12/20 Message-ID: <fa.l5tasev.1u742a6@ifi.uio.no>#1/1 X-Deja-AN: 563124487 Original-Date: 20 Dec 99 14:51:32 EST Sender: owner-linux-ker...@vger.rutgers.edu Content-Transfer-Encoding: 8BIT Original-Message-ID: <19991220195132.10068.qmail@nw179.netaddress.usa.net> To: Andrea Arcangeli <and...@suse.de> X-Priority: 1 Content-Type: text/plain; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Read-Receipt: ur...@usa.net X-MSMail-Priority: High Disposition-Notification-To: ur...@usa.net Mime-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu In newsgroup fa.linux.kernel, Andrea Arcangeli wrote: > Date: Fri, 17 Dec 1999 16:34:21 +0100 (CET) > From: Andrea Arcangeli <and...@suse.de> > Subject: 2.2.13aa6 (bugfix release II) > > [...] > The main features of 2.2.13aa6 are: > > o Support for 4Gigabyte of RAM (me and Gerhard.Wichert) > o Improved VM for high end machines with enough ram and doing > heavy I/O under high memory pressure (me) > o RAW-IO (also on bigmem) (Stephen C. Tweedie) > > o updated with all showstopper/necessary bugfixes discovered into > the 2.2.x kernels over the time. > Andrea: Thanks for the updated 2.2.13aa6 patchset, especially that it works with the raid-0.90 patches cleanly! I've been using Alan Cox's 2.2.13ac3 patches for the raid-0.90 support, but really wanted to run with your SMP scheduling changes, since they would seem to help performance/stability with my application (high-load webserver on dual-PIII machine). Also I was getting errors regarding "Out of memory" which you have a couple of patches for in aa6 ... I upgraded a cluster of servers (Compaq 6400R, 2 x PIII-500) from 2.2.13ac3 to 2.2.13aa6+raid-0.90 (and the incremental "set_blocksize" patch you kindly provided) and Don Becker's eepro.c 1.09l (not sure if this is latest?) in hopes I can finally have a really stable setup ... these had been running well for about 12 hours, but I just had one of the servers crash with the following error (seen before under 2.2.13ac3): wait_on_bh, CPU 3: (this is the first processor) irq: 0 [0 0] bh: 1 [0 0] <[8010b39d]> <[80150daa]> <[80150d46]> <[8012912b]> \ <[8012a367]> <[801291a6]> <[8012921f]> <[801092ac]> I tried to correlate the registers above with System.map: 8010b360 T synchronize_bh 8010b3b0 T synchronize_irq 80150d20 t sock_close 80150d5c t sock_fasync 8012910c T __fput 80129154 T filp_close 8012a350 T fput 8012a398 T put_filp 80129154 T filp_close 801291b0 T sys_close 801291b0 T sys_close 80129238 T sys_vhangup 80109278 T system_call 801092b0 T ret_from_sys_call If I press ALT+SysRq+P, the EIP shows "0010:[<80166671>]" which appears to be related to functions (from System.map): 80166660 T tcp_send_delayed_ack 801666b4 T tcp_send_ack In some earlier posts I read that "wait_on_bh" means that the system is waiting on the bottom half (SMP-specific), so I've edited my /etc/lilo.conf to add "nosmp noapic", and I'll see if the servers run stable w/o SMP ... this isn't a real solution of course. Any help/pointers/patches would be greatly appreciated. In an earlier post I mentioned this is part of a larger project to upgrade about 100 webservers based on 2.0.36 kernel to 2.2.13+ ... the overall load is 1Billion hits per day currently. This would be a yet another testament to Linux's viability in the enterprise environment, assuming I can nail down this SMP problem :) PS: in your directory on the ftp.*.kernel.org mirrors, I see a patch regarding bh_latency for 2.2.14pre; does this address the above "wait_on_bh" problem? Thanks in advance -- ur...@usa.net ____________________________________________________________________ Get free email and a permanent address at http://www.netaddress.com/?N=1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@suse.de> Subject: Re: [2.2.13aa6 (bugfix release II) ] Date: 1999/12/21 Message-ID: <fa.m8ag9hv.vi2dpo@ifi.uio.no>#1/1 X-Deja-AN: 563335229 Original-Date: Tue, 21 Dec 1999 10:42:38 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.21.9912211015000.24670-100000@Fibonacci.suse.de> References: <fa.l5tasev.1u742a6@ifi.uio.no> X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc To: ursus <ur...@usa.net> X-Authentication-Warning: Fibonacci.suse.de: andrea owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On 20 Dec 1999, ursus wrote: >I upgraded a cluster of servers (Compaq 6400R, 2 x PIII-500) >from 2.2.13ac3 to 2.2.13aa6+raid-0.90 (and the incremental >"set_blocksize" patch you kindly provided) and Don Becker's That's a fine kernel ;) >[..] I just had one of the servers >crash with the following error (seen before under 2.2.13ac3): I changed nothing in my aa patches related to the problem you have, so it's normal you get it as in the 2.2.13ac3 kernel. Your report gives interesting info, thanks. >to add "nosmp noapic", and I'll see if the servers >run stable w/o SMP ... this isn't a real solution I bet it will be rock solid in UP. This looks like a genuine SMP race (of course trusting it's not an hardware issue). >assuming I can nail down this SMP problem :) We'll nail it down ;). > I see a patch regarding bh_latency for 2.2.14pre; > does this address the above "wait_on_bh" problem? It's won't help you, it's performance stuff (and it's not complete yet). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: ur...@usa.net Subject: Re: [2.2.13aa6 (bugfix release II)] Date: 1999/12/23 Message-ID: <83tdcd$hr8$1@nnrp1.deja.com>#1/1 X-Deja-AN: 564153809 To: and...@suse.de X-Http-User-Agent: Mozilla (X11; I; Linux 2.0.32 i586) X-Http-Proxy: 1.0 x38.deja.com:80 (Squid/1.1.22) for client 199.95.209.163 Organization: Deja.com - Before you buy. X-Article-Creation-Date: Thu Dec 23 14:59:58 1999 GMT X-MyDeja-Info: XMYDJUIDray450 Reply-To: ur...@usa.net Newsgroups: fa.linux.kernel In article <fa.m8ag9hv.vi2...@ifi.uio.no>, Andrea Arcangeli <and...@suse.de> wrote: > I changed nothing in my aa patches related to the problem > you have, so it's normal you get it as in the 2.2.13ac3 kernel. Andrea: This explains why I'm still having the hangs :( > I bet it will be rock solid in UP. > This looks like a genuine SMP race I've been testing the same servers with "nosmp noapic" appended to the bootprompt (via LILO) without success; while I don't see the wait_on_bh crash anymore, instead the system just hangs without any errors whatsoever, I'll see a "normal" login prompt at the console except no characters are echoed and only SysRq [partially] works. I can't successfully Sync, Unmount via SysRq but reBoot does work. Also nothing is logged in /var/log/messages regarding the crash. I also recompiled 2.2.13aa6 for true UniProcessor mode and ran the UP kernel on the same servers for 2 days, with the same errorless crashes under heavy network load. I haven't been able to get your IKD patchset into this kernel (and boot successfully, that is). If I press ALT+SysRq+P after one of these crashes, the EIP always points to an address which maps to "timer_bh" (according to System.map) Is this the same timer_bh problem William Montgomory was discussing with you in another recent thread? > We'll nail it down ;). In article <fa.m8ag9hv.vi2...@ifi.uio.no>, Andrea Arcangeli <and...@suse.de> wrote: > I changed nothing in my aa patches related to the problem > you have, so it's normal you get it as in the 2.2.13ac3 kernel. Andrea: This explains why I'm still having the hangs :( > I bet it will be rock solid in UP. > This looks like a genuine SMP race I've been testing the same servers with "nosmp noapic" appended to the bootprompt (via LILO) without success; while I don't see the wait_on_bh crash anymore, instead the system just hangs without any errors whatsoever, I'll see a "normal" login prompt at the console except no characters are echoed and only SysRq [partially] works. I can't successfully Sync, Unmount via SysRq but reBoot does work. Also nothing is logged in /var/log/messages regarding the crash. I also recompiled 2.2.13aa6 for true UniProcessor mode and ran the UP kernel on the same servers for 2 days, with the same errorless crashes under heavy network load. I haven't been able to get your IKD patchset into this kernel (and boot successfully, that is). If I press ALT+SysRq+P after one of these crashes, the EIP always points to an address which maps to "timer_bh" (according to System.map) Is this the same timer_bh problem William Montgomory was discussing with you in another recent thread? > We'll nail it down ;). Thanks for your and the list-members' assistance ... Please let me know if I can help in any way, whether to test patches, provide crash traces, etc. -- ur...@usa.net Sent via Deja.com http://www.deja.com/ Before you buy.
From: Andrea Arcangeli <and...@suse.de> Subject: Re: [2.2.13aa6 (bugfix release II) ] Date: 2000/01/09 Message-ID: <fa.jrefbqv.a0ulib@ifi.uio.no>#1/1 X-Deja-AN: 570505709 Original-Date: Sun, 9 Jan 2000 21:43:01 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.21.0001092136450.11394-100000@alpha.random> References: <fa.l5tasev.1u742a6@ifi.uio.no> X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc To: ursus <ur...@usa.net> X-Sender: and...@alpha.random Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On 20 Dec 1999, ursus wrote: >I upgraded a cluster of servers (Compaq 6400R, 2 x PIII-500) >from 2.2.13ac3 to 2.2.13aa6+raid-0.90 (and the incremental >"set_blocksize" patch you kindly provided) and Don Becker's >eepro.c 1.09l (not sure if this is latest?) in hopes I can >finally have a really stable setup ... these had been running >well for about 12 hours, but I just had one of the servers >crash with the following error (seen before under 2.2.13ac3): > > wait_on_bh, CPU 3: (this is the first processor) > irq: 0 [0 0] > bh: 1 [0 0] > <[8010b39d]> <[80150daa]> <[80150d46]> <[8012912b]> \ > <[8012a367]> <[801291a6]> <[8012921f]> <[801092ac]> > >I tried to correlate the registers above with System.map: > > 8010b360 T synchronize_bh > 8010b3b0 T synchronize_irq > > 80150d20 t sock_close > 80150d5c t sock_fasync > > 8012910c T __fput > 80129154 T filp_close > > 8012a350 T fput > 8012a398 T put_filp > > 80129154 T filp_close > 801291b0 T sys_close > > 801291b0 T sys_close > 80129238 T sys_vhangup > > 80109278 T system_call > 801092b0 T ret_from_sys_call > >If I press ALT+SysRq+P, the EIP shows "0010:[<80166671>]" >which appears to be related to functions (from System.map): > > 80166660 T tcp_send_delayed_ack > 801666b4 T tcp_send_ack > >In some earlier posts I read that "wait_on_bh" >means that the system is waiting on the bottom half >(SMP-specific), so I've edited my /etc/lilo.conf >to add "nosmp noapic", and I'll see if the servers >run stable w/o SMP ... this isn't a real solution >of course. > >Any help/pointers/patches would be greatly appreciated. I think I spotted and fixed the bug that is soft-deadlocking your 2.2.x compaq cluster (all seems to make sense :). Could you try the below patch against 2.2.14 (or 2.2.14aa1 or 2.2.13 or 2.2.13aa6)? --- 2.2.14/net/ipv4/tcp_output.c.~1~ Fri Jan 7 18:19:25 2000 +++ 2.2.14/net/ipv4/tcp_output.c Sun Jan 9 21:32:04 2000 @@ -1004,7 +1004,7 @@ unsigned long timeout; /* Stay within the limit we were given */ - timeout = tp->ato; + timeout = (tp->ato << 1) >> 1; if (timeout > max_timeout) timeout = max_timeout; timeout += jiffies; I uploaded the above patch here too: ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.14/delack-timer-1.gz Have fun! Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: "David S. Miller" <da...@redhat.com> Subject: Re: [2.2.13aa6 (bugfix release II) ] Date: 2000/01/09 Message-ID: <fa.hesm80v.157e704@ifi.uio.no>#1/1 X-Deja-AN: 570541496 Original-Date: Sun, 9 Jan 2000 15:36:09 -0800 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-Id: <200001092336.PAA11026@pizda.ninka.net> References: <fa.jrefbqv.a0ulib@ifi.uio.no> To: and...@suse.de Original-References: <Pine.LNX.4.21.0001092136450.11394-100...@alpha.random> X-Authentication-Warning: pizda.ninka.net: davem set sender to da...@redhat.com using -f X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Date: Sun, 9 Jan 2000 21:43:01 +0100 (CET) From: Andrea Arcangeli <and...@suse.de> I think I spotted and fixed the bug that is soft-deadlocking your 2.2.x compaq cluster (all seems to make sense :). Could you try the below patch against 2.2.14 (or 2.2.14aa1 or 2.2.13 or 2.2.13aa6)? Wrong, all callers of tcp_send_delayed_ack _guarentee_ that the quickack bit is clear. Your patch does nothing, put an assert there if you don't believe me. Later, David S. Miller da...@redhat.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@suse.de> Subject: Re: [2.2.13aa6 (bugfix release II) ] Date: 2000/01/10 Message-ID: <fa.jof3chv.a0elqb@ifi.uio.no>#1/1 X-Deja-AN: 570872471 Original-Date: Mon, 10 Jan 2000 16:11:28 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.21.0001100246170.18403-100000@alpha.random> References: <fa.hesm80v.157e704@ifi.uio.no> X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc To: "David S. Miller" <da...@redhat.com> X-Sender: and...@alpha.random Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Sun, 9 Jan 2000, David S. Miller wrote: > Date: Sun, 9 Jan 2000 21:43:01 +0100 (CET) > From: Andrea Arcangeli <and...@suse.de> > > I think I spotted and fixed the bug that is soft-deadlocking your > 2.2.x compaq cluster (all seems to make sense :). Could you try the > below patch against 2.2.14 (or 2.2.14aa1 or 2.2.13 or 2.2.13aa6)? > >Wrong, all callers of tcp_send_delayed_ack _guarentee_ that the >quickack bit is clear. Your patch does nothing, put an assert tcp_delack_timer doesn't really guarantee that, see: void tcp_delack_timer(unsigned long data) { struct sock *sk = (struct sock*)data; if(!sk->zapped && sk->tp_pinfo.af_tcp.delayed_acks && sk->state != TCP_CLOSE) { /* If socket is currently locked, defer the ACK. */ if (!atomic_read(&sk->sock_readers)) tcp_send_ack(sk); else tcp_send_delayed_ack(&(sk->tp_pinfo.af_tcp), HZ/10); } } You should as well guarantee by design that none timer is pending after you turn on the quickack bit and before dropping the bh or sock lock. It seems to me you are guaranteeing that by always sending an ack on the wire after you set up the quickack bit but it's not trivial to prove and right now the only explanation for the deadlock reported by urban and that other people is experiencing is that a delack timer triggers while delayed_acks is > 0 and the quickack bit is set. If the quickack bit is set while calling tcp_send_delayed_ack the kernel will lockup immediatly in a way that matches the reports from ursus. The reason for the deadlock is that the expired field of the timer will be set in the past and so the timer will reinsert inself in the first heap slot and so it will continue to reinsert and rexecute it in an infinite loop -> soft deadlock. I'd like to also make the timer code robust against these kind of subsystem bugs later but actually I am only focused to fix the offending code in TCP. I have to admit that I can't yet see exactly the path that sets the quickack bit without sending data on the wire but you agree with me that the tcp_send_delayed_ack function is not interested about the quickack bit and it's interested only about the real "ato" information, so my patch is obviously correct and in the worst case it won't change anything. I believe it will as well fix the lockup fine, and that it's the right approch to avoid these kind of subtle mistakes. It makes more sense than destroying the quickack information before reinserting the the delack timer from the delack timer, no? BTW, while reading the code I found a lockup-unrelated bug in the delack handling: --- 2.2.14/net/ipv4/tcp_input.c Fri Jan 7 18:19:25 2000 +++ /tmp/tcp_input.c Mon Jan 10 03:41:10 2000 @@ -1428,6 +1428,7 @@ if (!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) { /* A retransmit, 2nd most common case. Force an imediate ack. */ SOCK_DEBUG(sk, "retransmit received: seq %X\n", TCP_SKB_CB(skb)->seq); + tp->delayed_acks++; tcp_enter_quickack_mode(tp); kfree_skb(skb); return; Anyway the above fix is not interesting for real world because it seems impossible to me to reach such path in the TCP code (so basically such check is useless but after all I like it for completeness of the function). This because we call tcp_data only when we know the packet is in our receive window (otherwise we force an ack by hand prior calling tcp_queue). And really tp->delayed_acks is meaningless as far I can tell and the right thing to do is to remove the delayed_acks field completly (this must be done at least in 2.3.x). Removing it will avoid wasting time in the TCP code and will decrease half of the the delacks code braindamage :). Ursus, please apply also this patch on the top of my fix in the previous email as for David correct suggestion of putting an assert there. If you'll see a printk with the below patch applyed we'll have the proof my theory about the source of your deadlocks is correct and that my fix made the difference for you. Without the below patch applyed you could think you are not deadlocking anymore because of luck :). --- 2.2.14/net/ipv4/tcp_timer.c Fri Jan 7 18:19:25 2000 +++ /tmp/tcp_timer.c Mon Jan 10 16:02:14 2000 @@ -173,7 +173,12 @@ if (!atomic_read(&sk->sock_readers)) tcp_send_ack(sk); else + { + struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp); + if (tcp_in_quickack_mode(tp)) + printk(KERN_ERR "quickack bit set!!!!\n"); tcp_send_delayed_ack(&(sk->tp_pinfo.af_tcp), HZ/10); + } } } If my fix doesn't fix the deadlock completly I have really no other rasonable ideas on what can be going wrong right now. Thinking thinking... Comments? Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: "David S. Miller" <da...@redhat.com> Subject: Re: [2.2.13aa6 (bugfix release II) ] Date: 2000/01/11 Message-ID: <fa.hfcqagv.13m24g2@ifi.uio.no>#1/1 X-Deja-AN: 570965301 Original-Date: Mon, 10 Jan 2000 11:43:40 -0800 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-Id: <200001101943.LAA11888@pizda.ninka.net> References: <fa.jof3chv.a0elqb@ifi.uio.no> To: and...@suse.de Original-References: <Pine.LNX.4.21.0001100246170.18403-100...@alpha.random> X-Authentication-Warning: pizda.ninka.net: davem set sender to da...@redhat.com using -f X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Date: Mon, 10 Jan 2000 16:11:28 +0100 (CET) From: Andrea Arcangeli <and...@suse.de> tcp_delack_timer doesn't really guarantee that, see: Ok, thanks for catching this. If the quickack bit is set while calling tcp_send_delayed_ack the kernel will lockup immediatly in a way that matches the reports from ursus. How? Always in such a case, timeout > max_timeout because this bit is set and the values are unsigned. The reason for the deadlock is that the expired field of the timer will be set in the past and so the timer will reinsert inself in the first heap slot and so it will continue to reinsert and rexecute it in an infinite loop -> soft deadlock. Not if the timeout>max_timeout test passes, which I think it will. I'd like to also make the timer code robust against these kind of subsystem bugs later but actually I am only focused to fix the offending code in TCP. Agreed, so I want your fix to go in anyways. But I do want to discuss where the error comes from and why the timeout>max_timeout test does not prevent it. And really tp->delayed_acks is meaningless as far I can tell and the right thing to do is to remove the delayed_acks field completly (this must be done at least in 2.3.x). Removing it will avoid wasting time in the TCP code and will decrease half of the the delacks code braindamage :). It's already done in the patch sets I've been feeding Linus for 2.3.x It has died in our sources, it will be no more :-) Later, David S. Miller da...@redhat.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: ursus <ur...@usa.net> Subject: Re: [Re: [2.2.13aa6 (bugfix release II) ]] Date: 2000/01/10 Message-ID: <fa.nsvi2rv.1e4401n@ifi.uio.no>#1/1 X-Deja-AN: 570902240 Original-Date: 10 Jan 00 13:30:13 EST Sender: owner-linux-ker...@vger.rutgers.edu Content-Transfer-Encoding: 8BIT Original-Message-ID: <20000110183013.13273.qmail@nwcst314.netaddress.usa.net> To: Andrea Arcangeli <and...@suse.de>, "David S. Miller" <da...@redhat.com> Content-Type: text/plain; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Mime-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Andrea Arcangeli <and...@suse.de> wrote: > please apply also this patch on the top of my fix > in the previous email as for David [Miller's] correct > suggestion of putting an assert there. If you see > a printk with the below patch applyed, we'll have > [proof that] my theory about the source of your deadlocks > is correct and that my fix made the difference for you. ok, I'll apply the previous "delack-timer-1" patch, as well as the one below. However, can you upload the below patch to the ftp.*.kernel.org mirrors also, just so I can ensure the spacings and such are correct. Thanks. > --- 2.2.14/net/ipv4/tcp_timer.c Fri Jan 7 18:19:25 2000 As a side note, the machines have still have not crashed, so almost certainly the TCP_DELAY_ACK bug is the culprit, at least in UniProcessor case. Going to try using SMP kernel again after I'm sure these patches do the job. Thanks for your help ... -- RW <ur...@usa.net> ____________________________________________________________________ Get free email and a permanent address at http://www.netaddress.com/?N=1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@suse.de> Subject: Re: [Re: [2.2.13aa6 (bugfix release II) ]] Date: 2000/01/10 Message-ID: <fa.inbme8v.1a58kjb@ifi.uio.no>#1/1 X-Deja-AN: 570914675 Original-Date: Mon, 10 Jan 2000 19:53:05 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.21.0001101945490.4259-100000@alpha.random> References: <fa.nsvi2rv.1e4401n@ifi.uio.no> X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc To: ursus <ur...@usa.net> X-Sender: and...@alpha.random Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On 10 Jan 2000, ursus wrote: >as well as the one below. However, can you upload >the below patch to the ftp.*.kernel.org mirrors also, ok, I put it here: ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.14/delack-assert-1.gz >at least in UniProcessor case. Going to try using SMP >kernel again after I'm sure these patches do the job. Fine. BTW, assuming TCP does its locking right (either doing lock_sock or running in atomic bh context) SMP/UP shouldn't matter. And if TCP is missing a lock_sock a race can trigger also in UP. So hopefully if we get it fixed on UP we should be fine on SMP too later. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@suse.de> Subject: timer_bh robusteness fix against potential deadlocks Date: 2000/01/12 Message-ID: <fa.hpj3vlv.752u12@ifi.uio.no>#1/1 X-Deja-AN: 571586341 Original-Date: Wed, 12 Jan 2000 02:00:02 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.21.0001120135430.312-100000@alpha.random> References: <fa.nsvi2rv.1e4401n@ifi.uio.no> X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc To: ursus <ur...@usa.net> X-Sender: and...@alpha.random Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu I fixed the timer code to be robust against the bad scenario I discovered in the last days. The bad secnario consists in a timer that reinsert itself with an expire <= jiffies (or more precisely < timer_jiffies). In the current 2.2.x and 2.3.x this scenario will lead in a plain deadlock. I verifyed the correctness the patch in an userspace simulation. Ursus could you please check out if you can still deadlock your machine with this patch against 2.2.14 applyed on the top of your current tree? --- 2.2.14/kernel/sched.c Wed Jan 5 14:16:56 2000 +++ /tmp/sched.c Wed Jan 12 00:45:15 2000 @@ -535,6 +535,15 @@ /* can happen if you add a timer with expires == jiffies, * or you set a timer to go off in the past */ + if ((signed long) idx < -50) + /* Nobody should set a timer so insanely in the past or + waiting so many timer interrupts between reading + jiffies and calling the timer code. The timer code + is completly robust against this condition but + a printk may let us know about bugs in the + caller we might not notice otherwise. */ + printk(KERN_WARNING + "timer inserted in the past, idx = %ld\n", idx); insert_timer(timer, tv1.vec, tv1.index); } else if (idx <= 0xffffffffUL) { int i = (expires >> (TVR_BITS + 3 * TVN_BITS)) & TVN_MASK; @@ -1124,10 +1133,31 @@ tv->index = (tv->index + 1) & TVN_MASK; } +/* defer current timers to the next pass */ +static void cascade_current_timers(void) +{ + struct timer_list * timer; + int index = tv1.index; + + timer = tv1.vec[index]; + tv1.index = (tv1.index + 1) & TVR_MASK; + + while (timer) + { + struct timer_list *tmp = timer; + timer = timer->next; + insert_timer(tmp, tv1.vec, tv1.index); + } + tv1.vec[index] = NULL; +} + static inline void run_timer_list(void) { + long passes; + spin_lock_irq(&timerlist_lock); - while ((long)(jiffies - timer_jiffies) >= 0) { + passes = jiffies - timer_jiffies; + while (passes-- >= 0) { struct timer_list *timer; if (!tv1.index) { int n = 1; @@ -1135,17 +1165,21 @@ cascade_timers(tvecs[n]); } while (tvecs[n]->index == 1 && ++n < NOOF_TVECS); } - while ((timer = tv1.vec[tv1.index])) { + timer = tv1.vec[tv1.index]; + tv1.vec[tv1.index] = 0; + while (timer) { void (*fn)(unsigned long) = timer->function; unsigned long data = timer->data; - detach_timer(timer); - timer->next = timer->prev = NULL; + struct timer_list * tmp = timer; + timer = timer->next; + detach_timer(tmp); + tmp->next = tmp->prev = NULL; spin_unlock_irq(&timerlist_lock); fn(data); spin_lock_irq(&timerlist_lock); } ++timer_jiffies; - tv1.index = (tv1.index + 1) & TVR_MASK; + cascade_current_timers(); } spin_unlock_irq(&timerlist_lock); } The same patch will apply cleanly also to 2.3.38 by specifying as file to patch linux/kernel/timer.c . Or you can download the patch from here: ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.14/timer_bh-deadlock-1.gz ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.3/2.3.38/timer_bh-deadlock-1.gz Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@suse.de> Subject: Re: timer_bh robusteness fix against potential deadlocks Date: 2000/01/12 Message-ID: <fa.hs37vlv.2l2u1e@ifi.uio.no>#1/1 X-Deja-AN: 571572295 Original-Date: Wed, 12 Jan 2000 02:49:56 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.21.0001120239430.822-100000@alpha.random> References: <fa.hpj3vlv.752u12@ifi.uio.no> X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc To: ursus <ur...@usa.net> X-Sender: and...@alpha.random Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Wed, 12 Jan 2000, Andrea Arcangeli wrote: >I fixed the timer code to be robust against the bad scenario I discovered >in the last days. The bad secnario consists in a timer that reinsert >itself with an expire <= jiffies (or more precisely < timer_jiffies). As TyTso suggested the problem could be ato to be zero. I checked and I finally also spotted the offending bug that was causing the above condition to happen (second chunk of the patch): --- 2.2.14-tcp/net/ipv4/tcp_output.c.~1~ Fri Jan 7 18:19:25 2000 +++ 2.2.14-tcp/net/ipv4/tcp_output.c Wed Jan 12 02:47:32 2000 @@ -1004,7 +1004,7 @@ unsigned long timeout; /* Stay within the limit we were given */ - timeout = tp->ato; + timeout = (tp->ato << 1) >> 1; if (timeout > max_timeout) timeout = max_timeout; timeout += jiffies; @@ -1044,6 +1044,8 @@ */ if(tcp_in_quickack_mode(tp)) tcp_exit_quickack_mode(tp); + if (!tp->ato) + tp->ato = tp->rto; tcp_send_delayed_ack(tp, HZ/2); return; } An incoming synack doesn't carry any data into the packet so the tcp_delack_estimator gets not recalled from tcp_ack, and the ato stays zero. Then tcp_send_ack (the one we send to put the other end in enstablished state) goes oom and queue the delack timer while ato is still zero. Then the timer gets reinserted in the current queue from run_timer_list and boom! The fact an oom condition was necessary to trigger the bug, perfectly explains why it wasn't reproducible in most machines. ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.14/delack-timer-2.gz Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: "David S. Miller" <da...@redhat.com> Subject: Re: timer_bh robusteness fix against potential deadlocks Date: 2000/01/12 Message-ID: <fa.hacs8ov.17nc6oe@ifi.uio.no>#1/1 X-Deja-AN: 571586340 Original-Date: Tue, 11 Jan 2000 18:40:23 -0800 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-Id: <200001120240.SAA02150@pizda.ninka.net> References: <fa.hs37vlv.2l2u1e@ifi.uio.no> To: and...@suse.de Original-References: <Pine.LNX.4.21.0001120239430.822-100...@alpha.random> X-Authentication-Warning: pizda.ninka.net: davem set sender to da...@redhat.com using -f X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Date: Wed, 12 Jan 2000 02:49:56 +0100 (CET) From: Andrea Arcangeli <and...@suse.de> @@ -1044,6 +1044,8 @@ */ if(tcp_in_quickack_mode(tp)) tcp_exit_quickack_mode(tp); + if (!tp->ato) + tp->ato = tp->rto; tcp_send_delayed_ack(tp, HZ/2); return; } Yep, I bet this is it. Good spotting. Both of these fixes to tcp_output.c are in my tree now. Later, David S. Miller da...@redhat.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: ty...@valinux.com Subject: Re: timer_bh robusteness fix against potential deadlocks Date: 2000/01/14 Message-ID: <fa.ikttrov.2ks7bu@ifi.uio.no>#1/1 X-Deja-AN: 572547617 Original-Date: Wed, 12 Jan 2000 11:23:45 -0800 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-Id: <E128TMv-0007md-00@dcl.su.varesearch.com> References: <fa.ikaud7v.1b50lj0@ifi.uio.no> To: and...@suse.de X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Phone: (781) 391-3464 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Date: Wed, 12 Jan 2000 20:12:54 +0100 (CET) From: Andrea Arcangeli <and...@suse.de> >It adds one conditional inside a rare 'if' case, so it's not a >performance issue, and it means that the next time something like this >happens, the machine will cleanly panic, and leave a very easy to >understand indication of what went wrong. I don't like a panic for a thing that we can recover gracefully and allowing the user to also see the message even if he was running under X 8). Fine,so make it set a standard timeout and do a printk instead. This is a "never can happen" situation, right? if (!timeout) { timeout = tp->rto; if (!timeout) { printk("Bugcheck: tcp_send_delayed_ack ato and rto are 0"); timeout = HZ/50; } - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: ty...@valinux.com Subject: Re: timer_bh robusteness fix against potential deadlocks Date: 2000/01/14 Message-ID: <fa.iib9h8v.3609ru@ifi.uio.no>#1/1 X-Deja-AN: 572550514 Original-Date: Wed, 12 Jan 2000 07:59:35 -0800 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-Id: <E128QBL-0007kE-00@dcl.su.varesearch.com> References: <fa.ilr2dov.18l4l35@ifi.uio.no> To: and...@suse.de X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Phone: (781) 391-3464 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Date: Wed, 12 Jan 2000 04:02:35 +0100 (CET) From: Andrea Arcangeli <and...@suse.de> Ok, moving the check inside the tcp_send_delayed_ack it's fine for me. But there's to say that the only other way the delack timer could be posted is via __tcp_ack_snd_check. And if __tcp_ack_snd_check is using tcp_send_delayed_ack instead of tcp_send_ack before in order packets with data are arrived (so before the ato is been initalized to something different than zero) it probably means there's a genuine bug in tcp. True; but my paranoia says that even if there isn't a problem *now*, there may be later. Which is why why I'll suggest one more change to your patch: if (!timeout) { timeout = tp->rto; if (!timeout) panic("Bugcheck: tcp_send_delayed_ack ato and rto are 0"); } It adds one conditional inside a rare 'if' case, so it's not a performance issue, and it means that the next time something like this happens, the machine will cleanly panic, and leave a very easy to understand indication of what went wrong. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@suse.de> Subject: Re: timer_bh robusteness fix against potential deadlocks Date: 2000/01/14 Message-ID: <fa.ikreegv.1bl0kr2@ifi.uio.no>#1/1 X-Deja-AN: 572766913 Original-Date: Fri, 14 Jan 2000 15:56:55 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.21.0001141552150.2653-100000@alpha.random> References: <fa.hpj3vlv.752u12@ifi.uio.no> X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc To: ursus <ur...@usa.net> X-Sender: and...@alpha.random Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Wed, 12 Jan 2000, Andrea Arcangeli wrote: >+ while (timer) { > void (*fn)(unsigned long) = timer->function; > unsigned long data = timer->data; >- detach_timer(timer); >- timer->next = timer->prev = NULL; >+ struct timer_list * tmp = timer; >+ timer = timer->next; >+ detach_timer(tmp); >+ tmp->next = tmp->prev = NULL; > spin_unlock_irq(&timerlist_lock); If at this point the timer pointed by "timer" gets detached while "fn" is running, at the next loop the machine is going to fail. I am sorry. >Or you can download the patch from here: > > ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.14/timer_bh-deadlock-1.gz > ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.3/2.3.38/timer_bh-deadlock-1.gz Please don't use the two timer_bh patches quoted above (neither the -2 optimized version). Having the timer robust against buggy users is not necessary but only desiderable, so actually you don't need it. Nevertheless I'll fix the problem soon. FYI: the delack-timer-3 patch here: ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.14/delack-timer-3.gz seems to fix the wait_on_bh popular deadlock on UP/SMP webservers fine :). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: ty...@valinux.com Subject: Re: timer_bh robusteness fix against potential deadlocks Date: 2000/01/15 Message-ID: <fa.i6sti0v.cjc93u@ifi.uio.no>#1/1 X-Deja-AN: 572800041 Original-Date: Tue, 11 Jan 2000 18:03:40 -0800 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-Id: <E128D8O-0007ai-00@dcl.su.varesearch.com> References: <fa.hs37vlv.2l2u1e@ifi.uio.no> To: and...@suse.de X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Phone: (781) 391-3464 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Date: Wed, 12 Jan 2000 02:49:56 +0100 (CET) From: Andrea Arcangeli <and...@suse.de> As TyTso suggested the problem could be ato to be zero. I checked and I finally also spotted the offending bug that was causing the above condition to happen (second chunk of the patch): --- 2.2.14-tcp/net/ipv4/tcp_output.c.~1~ Fri Jan 7 18:19:25 2000 +++ 2.2.14-tcp/net/ipv4/tcp_output.c Wed Jan 12 02:47:32 2000 @@ -1004,7 +1004,7 @@ unsigned long timeout; /* Stay within the limit we were given */ - timeout = tp->ato; + timeout = (tp->ato << 1) >> 1; if (timeout > max_timeout) timeout = max_timeout; timeout += jiffies; Note that max_timeout is always a small positive number (HZ/2 or HZ/10), and timeout is a unsigned long. Hence if the quickack bit is set, timeout is > max_timeout, and so timeout gets capped to max_timeout. Without the patch, we simply delay the hack by the max_timeout instead of the current value of ato. Probably not the best, but not a disaster, either. @@ -1044,6 +1044,8 @@ */ if(tcp_in_quickack_mode(tp)) tcp_exit_quickack_mode(tp); + if (!tp->ato) + tp->ato = tp->rto; tcp_send_delayed_ack(tp, HZ/2); return; } This fixes the bug, but I'd be much happier if we put a belt-and-suspenders check in tcp_send_delayed_ack. If there's some other place which allows tcp_send_delayed_ack() to be called with tp->ato set to zero, we shouldn't lock up the entire kernel. So I'd propose adding to the tcp_send_delayed_ack() something like this: #define min_timeout HZ/50 if (timeout < min_timeout) timeout = min_timeout; /* This prevents an endless kernel loop */ (who me, paranoid?) - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: <arij...@valinux.com> Subject: Re: timer_bh robusteness fix against potential deadlocks Date: 2000/01/15 Message-ID: <fa.lcv2c3v.1r0obru@ifi.uio.no>#1/1 X-Deja-AN: 572878309 Original-Date: Fri, 14 Jan 2000 17:43:21 -0500 (EST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.10.10001141737580.1356-100000@lap.valinuxny.net> References: <fa.ikreegv.1bl0kr2@ifi.uio.no> To: Andrea Arcangeli <and...@suse.de> X-Sender: arij...@lap.valinuxny.net X-Authentication-Warning: lap.valinuxny.net: arijort owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Andrea, Just to be clear... Are you saying that the entire timer_bh-deadlock patch is bad? Or simply the hunk that you refer to below, which begins like this: @@ -1135,17 +1165,20 @@ ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.14/timer_bh-deadlock-1.gz I'm assuming it's the whole patch. ari On Fri, 14 Jan 2000, Andrea Arcangeli wrote: > On Wed, 12 Jan 2000, Andrea Arcangeli wrote: > > >+ while (timer) { > > void (*fn)(unsigned long) = timer->function; > > unsigned long data = timer->data; > >- detach_timer(timer); > >- timer->next = timer->prev = NULL; > >+ struct timer_list * tmp = timer; > >+ timer = timer->next; > >+ detach_timer(tmp); > >+ tmp->next = tmp->prev = NULL; > > spin_unlock_irq(&timerlist_lock); > > If at this point the timer pointed by "timer" gets detached while "fn" is > running, at the next loop the machine is going to fail. I am sorry. > > >Or you can download the patch from here: > > > > ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.14/timer_bh-deadlock-1.gz > > ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.3/2.3.38/timer_bh-deadlock-1.gz > > Please don't use the two timer_bh patches quoted above (neither the -2 > optimized version). Having the timer robust against buggy users is not > necessary but only desiderable, so actually you don't need it. > Nevertheless I'll fix the problem soon. > > FYI: the delack-timer-3 patch here: > > ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.14/delack-timer-3.gz > > seems to fix the wait_on_bh popular deadlock on UP/SMP webservers fine :). > > Andrea > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@suse.de> Subject: Re: timer_bh robusteness fix against potential deadlocks Date: 2000/01/15 Message-ID: <fa.jmunc1v.bgikqb@ifi.uio.no>#1/1 X-Deja-AN: 572924322 Original-Date: Sat, 15 Jan 2000 01:46:12 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.21.0001150145061.14161-100000@alpha.random> References: <fa.lcv2c3v.1r0obru@ifi.uio.no> X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc To: arij...@valinux.com X-Sender: and...@alpha.random Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Fri, 14 Jan 2000 arij...@valinux.com wrote: >Are you saying that the entire timer_bh-deadlock patch is bad? >Or simply the hunk that you refer to below, which begins like this: All. You can safely reverse such patch completly because it's not necessary for now. >ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.14/timer_bh-deadlock-1.gz > >I'm assuming it's the whole patch. You are correct. The whole patch. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/