From: ur...@usa.net Subject: 2.2.13ac3 crashes under high network load Date: 1999/12/15 Message-ID: <837hq1$hat$1@nnrp1.deja.com> X-Deja-AN: 560862255 X-Http-Proxy: 1.0 x35.deja.com:80 (Squid/1.1.22) for client 199.95.209.163 Organization: Deja.com - Before you buy. X-Article-Creation-Date: Wed Dec 15 08:00:02 1999 GMT X-MyDeja-Info: XMYDJUIDray450 Newsgroups: comp.os.linux.networking X-Http-User-Agent: Mozilla (X11; I; Linux 2.0.32 i586) Alan C/Dave M/Ingo M/Dan K et. al ... I'm trying to build a "ultra high-performance" webserver which needs to handle at least 500 (preferably 1000) sustained hits/second (my entire webserver farm needs to handle excess of 1 billion hits/day already, soon to be 2 billion/day). Yes I've read the C10K page at http://www.kegel.com/c10k.html and tried some of the experimental webservers, most of which are way too experimental (and lacking features :) which I need. The issue isn't with Apache not cutting it, but rather with some kernel (and/or network driver) issues I think ... Here is my hardware and software configuration: Compaq 6400R (rackmount): dual PIII-550 MHz processors 2048 MByte ECC memory Intel EtherExpress Pro/100 NIC Linux 2.2.13: with Alan Cox's "2.2.13ac3" super-patch applied using Donald Becker's eepro100.c:v1.09l (8/7/99) "BIGMEM" option enabled in kernel configuration using raid0145 (raidtools-0.90) (in ac3 patchset) system on single 9.1G Ultra-Wide 10K SCSI drive web content on RAID-0 across 2 identical UWSCSI drives modify some SysCtl parameters via script (see below) I have the above setup on 10 servers running Apache 1.3.9 (modified with an updated version of his TOP_FUEL patch) in a cluster, managed by a Cisco LocalDirector 420. Under test conditions, I can pull content fast enough from Apache, to saturate the 100MBit/s full-duplex [switched] interface on each webserver (according to `ab`, anyhow). HOWEVER, after 2-3 days of uptime, handling on average between 50 and 200 hits/second (per server) which means from 100 to 400 (and up) open connections per server ... these machines will start to fail. Usually this happens, when the number of active TCP/IP connections is high, say about 1000 for a brief period of time. I did have to increase the route-cache limits in /proc/sys/net/ipv4 per a posting from Alexey Kuznetsov in fa.linux.kernel, else I was getting tons of "dst_cache_overflow" errors) The server becomes unreachable (cannot be PING'ed) and on the console (finally figured out how to disable the damned screen-blanker/powersave via a little script during init, also pasted below). First I see some stuff about INODE already being cleared (seems like RAID related problems?) and shortly after, different messages start looping on the console: <[8010b37d]> <[80162067]> <[80162140]> \ <[8016f168]> <[801511d4]> <[80161fa6]> wait_on_bh, CPU 3: irq: 0 [0 0] bh: 1 [0 0] (repeats about once per second) I've compiled in the 'Magic SysRq' feature into the kernel, so upon pressing <ALT> + <SysRq> + P, we see the following: (there might be a few mistakes, hand-copied!) >SysRq: Show Regs EIP: 0010:[<8010b38c>] EFLAGS: 00000202 EAX: 00000001 EBX: fae00e80 ECX: f0aeff14 EDX: aee2c21d ESI: 00000000 EDI: f0aee000 EBP: f0aee000 DS: 0018 ES: 0018 CR0: 8005003b CR2: 2aac1000 CR3: 69e5f000 Pressing <ALT> + <SysRq> + M shows the usual, nothing unusual about usage and no swap in use. I did some searches on DejaNews about wait_on_bh, and found references from January, March 1999 from David Miller, Alan Cox, saying this was SMP-related ("wait on bottom-half") and fixed, in 2.2.3 (?) Apparently it's not really fixed :/ On other servers in the same cluster, I'm not seeing the "wait_on_bh" error above, but rather some resource-starvation issue (I think); the console displays the usual /etc/issue message and a login: prompt, but when I try to login (any user) there is a message displayed [very fast/unreadable] and I'm kicked back to the login screen again. Suspect that the message is about "Resource temporarily unavailable". One one of these machines which would not let me login, after reducing the network load (bled off some traffic) I could finally login interactively. I went ahead and pressed <ALT> + <SysRq> + P on all the machines stuck at the login screen, and they mostly all were at EIP "80107a71" (cpu_idle, according to my System.map). The output from <ALT> + <SysRq> + M was more interesting, some of these servers had NEGATIVE "buffer hash" values, e.g. "buffer hash: -191435", "buffer hash:-178450", etc). In all cases I was able to sync via <ALT> + <SysRq> + S (some servers I had to <ALT> + <SysRq> + E first, though) and Unmount/reBoot via SysCtl also in most of the crashes. For the time being, I've edited /etc/lilo.conf, and appended "nosmp noapic" to the active kernel entry, to force non-SMP mode; I'm going to run the webserver cluster for several days, and see if the same problems occur ... Any help/patches/advice on the above is greatly appreciated. Thanks in advance ... RW ---------------------------------------------------------------------- /etc/rc.d/init.d/noblank ---------------------------------------------------------------------- #!/bin/sh # # Prevents situations where kernel crashes, # but we cannot see any console error messages # because the screen was blanked earlier. IFS=' ' TTYS='1 2 3 4 5 6' for n in ${TTYS} do echo "eval `/usr/bin/setterm -blank 0`" > /dev/tty${n} echo "eval `/usr/bin/setterm -powersave off`" > /dev/tty${n} done exit 0 --------------------------------------------------------------------- /etc/rc.d/init.d/proctune ---------------------------------------------------------------------- #!/bin/bash # # /etc/rc.d/init.d/proctune.linuxcare # # chkconfig: 345 80 20 # description: SysCtl (proc) tunings from LinuxCare \ # based on research of Jim Dennis # # processname: proctune.linuxcare # Source function library. . /etc/rc.d/init.d/functions # # See how we were called. # case "$1" in start) echo "Running proctune.linuxcare:" # echo -n " /proc/sys/fs/file-max . . . . . . . . . . . " echo '16384' > /proc/sys/fs/file-max cat /proc/sys/fs/file-max # echo -n " /proc/sys/fs/inode-max . . . . . . . . . . . " echo '65536' > /proc/sys/fs/inode-max cat /proc/sys/fs/inode-max # echo -n " /proc/sys/net/ipv4/ip_local_port_range . . . " echo "32768 65535" > /proc/sys/net/ipv4/ip_local_port_range cat /proc/sys/net/ipv4/ip_local_port_range # echo -n " /proc/sys/net/ipv4/route/gc_elasticity . . . " echo '2' > /proc/sys/net/ipv4/route/gc_elasticity cat /proc/sys/net/ipv4/route/gc_elasticity # echo -n " /proc/sys/net/ipv4/route/gc_min_interval . . " echo '1' > /proc/sys/net/ipv4/route/gc_min_interval #echo '0' > /proc/sys/net/ipv4/route/gc_min_interval cat /proc/sys/net/ipv4/route/gc_min_interval # #echo -n " /proc/sys/net/ipv4/route/gc_thresh . . . . ." #echo '256' > /proc/sys/net/ipv4/route/gc_thresh #echo '512' > /proc/sys/net/ipv4/route/gc_thresh #cat /proc/sys/net/ipv4/route/gc_thresh # echo -n " /proc/sys/net/ipv4/route/max_size . . . . . " #echo '4096' > /proc/sys/net/ipv4/route/max_size echo '8192' > /proc/sys/net/ipv4/route/max_size cat /proc/sys/net/ipv4/route/max_size # # echo -n " /proc/sys/vm/bdflush . . . . . . . . . . . . " echo '98 100 128 256 15 500 1884 2 2' > /proc/sys/vm/bdflush cat /proc/sys/vm/bdflush # # echo -n " /proc/sys/vm/bdflush . . . . . . . . . . . . " echo '98 100 128 256 15 500 1884 2 2' > /proc/sys/vm/bdflush cat /proc/sys/vm/bdflush # # echo -n " /proc/sys/vm/buffermem . . . . . . . . . . . " echo '90 10 98' > /proc/sys/vm/buffermem cat /proc/sys/vm/buffermem # #echo -n " /proc/sys/vm/overcommit_memory . . . . . . ." #echo '1' > /proc/sys/vm/overcommit_memory #cat /proc/sys/vm/overcommit_memory # echo -n " /proc/sys/vm/page-cluster . . . . . . . . . " echo '5' > /proc/sys/vm/page-cluster cat /proc/sys/vm/page-cluster # echo -n " /proc/sys/vm/pagecache . . . . . . . . . . . " echo '80 30 95' > /proc/sys/vm/pagecache cat /proc/sys/vm/pagecache # ;; stop) : ;; *) echo "Usage: /etc/rc.d/init.d/proctune.linuxcare {start|stop}" exit 1 esac exit 0 Sent via Deja.com http://www.deja.com/ Before you buy.