Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> X-Authentication-Warning: loke.as.arizona.edu: ckulesa owned process doing -bs Original-Date: Mon, 24 Sep 2001 05:08:49 -0700 (MST) From: Craig Kulesa <ckul...@as.arizona.edu> To: <linux-ker...@vger.kernel.org> Subject: 2.4.10 VM vs. 2.4.9-ac14 (+ ac14-aging) Original-Message-ID: <Pine.LNX.4.33.0109232255250.14107-100000@loke.as.arizona.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Mon, 24 Sep 2001 12:13:00 GMT Message-ID: <fa.k18fuuv.1uksjgg@ifi.uio.no> Lines: 146 Well, things are looking up. The split trees of 2.4 VM seem to be both performing "pretty well" here. Following are some tests and comments about recent kernels that I hope will be vaguely illuminative toward further improvement. Description of tests: - Streaming IO test: dbench, 'dd if=/dev/zero of=dummy.dat bs=1024k count=512' and 'cat dummy.dat > /dev/null' while performing streaming tasks like mp3's and general interactive use. This is obscene, but dirty page overloading needs to be handled at least *acceptably*, without resorting to low-latency or preemptible patches - Common user application test: the idea is to load a mix of applications to drive the system into different kinds of memory loads. [sequential] a) fill dentry/inode caches with slocate b) create lots of anonymous pages w/ a large blank image in GIMP [make sure GIMP's tile cache is set to a high value to test kernel VM and not GIMP's 'temp' file handling] c) loading StarOffice w/file creates lots of disk i/o, stretches VM cache and then allocates lots of user memory... [report loading time] d) load suite of apps to drive the system into mild swap *activity* (not just swap allocation in 2.4.9-ac) e) Now that some pages have aged a bit, try to rotate that GIMP image (major use of "older" anon pages and creation of many more) [time the rotation] f) note WHO's paged out w/ ps, log to file g) close all apps sequentially, sorta LIFO, note swap-ins vmstat & periodic dumps from /proc/meminfo and /proc/slabinfo log all statistics throughout the tests Summary of Results: - Test machines ranged from 32 MB to 192 MB, the latter is described here. - 2.4.8 and 2.4.9 were poor, degenerating to _awful_ somewhere in 2.4.10-pre. Example: it was darn near impossible to evict dentry and inode caches in 2.4.8. Also, freshly loaded apps were paged out under load, then repeatedly paged back in, then back out... (poor interaction and/or balancing between the various inactive lists, coupled presumably w/ broken aging). 2.4.8 streaming IO test: failed (stutters, huge gaps in playback) 2.4.8 app test: 45524 kB swapped out; 29638 kB swapped in (cumulative) 28 second StarOffice load time; 10 sec GIMP img rotate - 2.4.10-pre11 changed the nature of the VM problems, but most major issues seem to have been fixed by pre14 (certainly 2.4.10 final). pre11 would spin in kswapd & 'somewhere else' (balance classzone?) -- sometimes loading StarOffice 5.2 would take 50% longer due partly to kswapd; no pages were actually swapped out. Fixed by/before pre14. Even in 2.4.10 final, choice of evicted pages is not always good (many more cumulative swapins than ac14 when apps are closed). Performance is otherwise pretty impressive. 2.4.10 streaming IO test: failed (stutters, frequent gaps in playback) 2.4.10 app test: 30020 kB swapped out; 22308 kB swapped in (cumulative) 22 second StarOffice load time; 6-7 sec GIMP img rotate - 2.4.9-ac1* has pretty consistent, functioning VM. Looks like aging is still mildly broken. Performance however is quite excellent for the most part; cache contains the "right pages" and what is paged out is "mostly the right pages". Recent 2.4.9-ac (ac14 tested) had the best streaming I/O interactivity; it also outperforms everything else until lots of anonymous pages have to be allocated in swapcache (esp. when you're talking about a large scientific simulation on a HIGHMEM box; see Dirk Wetter's posts from around 12 July 2001 and Marcelo's comments). 2.4.9-ac14 streaming IO test: passed, skip-less playback (ac14-aging patch results identical) 2.4.9-ac14 app test: 30968 kB swapped out; 12900 kB swapped back in 18 second StarOffice load time; 8 sec GIMP img rotate ac14+aging, app test: 31664 kB swapped out; 14604 kB swapped back in 18 second StarOffice load time; 8 sec GIMP img rotate As above, Rik's latest ac14-aging was tested. It, like ac12-aging, has performed pretty well. I'm not sure that it's doing all the right things in detail. For example, plain ac14 swapped out just as many pages, but swapped fewer of them back in when the apps were closed. Inactive daemons loaded at boot time are among the oldest pages on the system; ac14 swapped them entirely out. 2.4.10 and ac14+aging had similar behavior and only paged them a little (ex. out of 2 MB=SIZE, 0.5 MB was still RSS) and hit loaded 'younger' loaded apps (with big RSS) somewhat harder instead. Not sure if that's right; pure aging should presumably page the unused daemons first, but drawing from big, idle hogs might be more fruitful? The aging patch simplifies the code a bit, and I think that's a good thing. ac14-aging easily collapses the dentry and inode caches under load. This works well here, but others might want to check to see if it's _too_ aggressive. Suspect it's okay... Rik's page launder patch for ac12 was also applied to ac14; it failed the streaming IO test. ac14 and ac14+aging were the only tested kernels to pass. No preemptive kernel patches were applied. Comments: I dunno what to think about the split VM trees. The traditional 2.4 VM looks quite good in latest 2.4.9-ac, could stand addn'l careful analysis & pruning. I suspect most of the problems relate to inactive lists interacting/balancing badly with each other, but the overall design seems sensible. Much of it is pretty well documented (even *I* can follow it in some kind of coarse sense) & that effort is deeply appreciated. Andrea's classzone approach reduces inactive list complexity, but I remain confused about the classzone design itself. [Have to look at it more; rather new at this.] I mean, I look at 'traditional' 2.4 VM and wonder why it sometimes doesn't work like it should; in contrast, I look at classzone and wonder how/why it manages to work so well. :) Totally IMHO, my VM wishlist for 2.5 would be to see the return of some aspects of 2.4 VM that got nixed. I liked the overall design, although implementation of inactive-lists/anon-pages needs to be made more maintainable. In particular, so-called 'anonymous' pages *have* to be handled in a more sensible way. Dump them in the active list (?), allocate them in a separate fs from what-will-hopefully-become-swapfs-in- 2.5, or *something*. Improved get_swap_page(), swap_out() & associates probably should be on that list somewhere. But things are looking *much* better now -- a real huge 'thank you' is in order. :) And looking forward to testing patches, and 2.5... Best regards to all, Craig Kulesa Univ. of Arizona, Steward Observatory - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Tue, 25 Sep 2001 20:51:43 +0200 From: Andrea Arcangeli <and...@suse.de> To: Craig Kulesa <ckul...@as.arizona.edu> Cc: linux-ker...@vger.kernel.org Subject: Re: 2.4.10 VM vs. 2.4.9-ac14 (+ ac14-aging) Original-Message-ID: <20010925205143.C8350@athlon.random> Original-References: <Pine.LNX.4.33.0109232255250.14107-100...@loke.as.arizona.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <Pine.LNX.4.33.0109232255250.14107-100000@loke.as.arizona.edu>; from ckulesa@as.arizona.edu on Mon, Sep 24, 2001 at 05:08:49AM -0700 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Tue, 25 Sep 2001 18:53:06 GMT Message-ID: <fa.go19h3v.66i7pb@ifi.uio.no> References: <fa.k18fuuv.1uksjgg@ifi.uio.no> Lines: 14 On Mon, Sep 24, 2001 at 05:08:49AM -0700, Craig Kulesa wrote: > 2.4.10 streaming IO test: failed (stutters, frequent gaps in playback) > 2.4.10 app test: 30020 kB swapped out; 22308 kB swapped in (cumulative) > 22 second StarOffice load time; 6-7 sec GIMP img rotate I'd appreciate if you could repeat the test with vm-tweaks-1 applied to see the difference. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu!news.tele.dk! small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> X-Authentication-Warning: loke.as.arizona.edu: ckulesa owned process doing -bs Original-Date: Wed, 26 Sep 2001 06:38:48 -0700 (MST) From: Craig Kulesa <ckul...@as.arizona.edu> To: <linux-ker...@vger.kernel.org> Subject: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff) Original-Message-ID: <Pine.LNX.4.33.0109260617450.3929-100000@loke.as.arizona.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Wed, 26 Sep 2001 13:41:27 GMT Message-ID: <fa.l1grl8v.fkk201@ifi.uio.no> Lines: 124 As requested, here are a number of tests of the latest VM patches. Tests are described in a previous post, archived here: http://www.uwsg.indiana.edu/hypermail/linux/kernel/0109.3/0033.html Results: 2.4.10 performance is great compared to 2.4.[7-9], but these tests still seem to point out some room for improvement in the 2.4.10 VM tree. 2.4.10 and 2.4.10(+00_vm-tweaks-1) performed similarly. The vm-tweaks patch improved the swap smoothness, but the number of pages swapped out didn't change measurably, nor did the large number of swap-ins. Clogging the system with dirty pages via 'dd' still causes XMMS to skip badly. Let's push the aging/list-order code more by driving the system a bit harder in step d), namely adding mozilla to the common user application test. We will also stream mp3 audio throughout the entire test. 2.4.10(+00_vm-tweaks-1) 48 sec StarOffice load time 28 sec 2560x2560 GIMP image rotation 82400 KB swapped out, 92148 KB swapped back in 2.4.9-ac14 + aging 33 sec StarOffice load time 25 sec GIMP image rotation 30072 KB swapped out, 22252 KB swapped back in 2.4.9-ac15 + aging + launder 33 sec StarOffice load time 24 sec GIMP image rotation 57556 KB swapped out, 25900 KB swapped back in 'vmstat 1' sessions for these three cases are available at: http://loke.as.arizona.edu/~ckulesa/kernel/ 2.4.10+ is clearly working a LOT harder to keep dentry and inode caches in memory, and is swapping out harder to compensate. The ac14/ac15 tree frees those caches more freely, and don't page application working sets out so readily. Let's test this statement by not pre-filling the inode and dentry caches with 'slocate' and performing the same test: 2.4.10(+00_vm-tweaks) 26 sec StarOffice load time 24 sec GIMP image rotation 48332 KB swapped out, 33521 KB swapped back in 2.4.9-ac14 + aging 32 sec StarOffice load time 26 sec GIMP image rotation 37392 KB swapped out, 11952 KB swapped back in 2.4.9-ac15 + aging + launder 32 sec StarOffice load time 22 second GIMP image rotation 23884 KB swapped out, 10828 KB swapped back in 2.4.10 does much better this time; in particular the StarOffice loading that was so plagued by swapouts, pressured by dentry/inode caching last time, went smoothly. But there's still more paging than with 2.4.9-ac1[4-5]. Let's try one more aging/list-order experiment. Instead of creating a 2560x2560 GIMP image first, then loading StarOffice and many other applications after (to start swapping, and cause GIMP pages to be candidates for reaping) -- this time let's load StarOffice first and then create the GIMP image. This should keep the GIMP image at a 'younger' age and presumably shouldn't page back into memory (rotation should be faster). StarOffice may swap itself entirely out however. 2.4.10(+00_vm-tweaks) 25 sec StarOffice load time 29 sec GIMP image rotation 64427 KB swapped out, 77422 KB swapped back in 2.4.9-ac14 + aging 30 sec StarOffice load time 24 sec GIMP image rotation 22147 KB swapped out, 8922 swapped back in 2.4.9-ac15 + aging + launder 31 sec StarOffice load time 21 second GIMP image rotation 17204 KB swapped out, 8224 swapped back in The 2.4.10 behavior surprised me. The GIMP pages are younger in memory, yet the rotation was slowed by swapin & swapout activity -- slower than before. Plus more StarOffice pages were swapped out, so it had to be paged back in order to close the application. I'm puzzled. The ac14/ac15 behavior was closer to what I expected; the GIMP pages were young and unswapped, only the earliest StarOffice pages had to be recalled. These are samples of rather 'ordinary' loads which 2.4.10 needs some work handling; the ac15 tree is doing a better job with this particular set right now (ac15 tree also doesn't skip XMMS with the creation of lots of dirty pages via 'dd'). But all three kernels tested kept the user interface relatively responsive, which is an improvement over previous 2.4 releases. Very cool. A note on page_launder(). ac14 has the smoothest swapping, with small chunks laundered at a time. ac14+aging and ac15+aging+launder both swap out huge (10-20 MB) chunks at a time. Admittedly, the user interface is responsive and XMMS doesn't skip a beat, but most of the 60 MB of actual swapout in the first test in ac15+stuff came from only THREE lines of 'vmstat 1' output. Otherwise there was no swapout activity. Best regards, and thanks for the excellent work! Craig Kulesa Steward Observatory, Univ. of Arizona - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu!news.tele.dk! small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Wed, 26 Sep 2001 16:03:47 +0200 From: Andrea Arcangeli <and...@suse.de> To: Craig Kulesa <ckul...@as.arizona.edu> Cc: linux-ker...@vger.kernel.org Subject: Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff) Original-Message-ID: <20010926160347.F27945@athlon.random> Original-References: <Pine.LNX.4.33.0109260617450.3929-100...@loke.as.arizona.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <Pine.LNX.4.33.0109260617450.3929-100000@loke.as.arizona.edu>; from ckulesa@as.arizona.edu on Wed, Sep 26, 2001 at 06:38:48AM -0700 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Wed, 26 Sep 2001 14:05:38 GMT Message-ID: <fa.ebjq3mv.1mgutbl@ifi.uio.no> References: <fa.l1grl8v.fkk201@ifi.uio.no> Lines: 14 On Wed, Sep 26, 2001 at 06:38:48AM -0700, Craig Kulesa wrote: > in memory, and is swapping out harder to compensate. The ac14/ac15 tree 2.4.10 is swapping out more also because I don't keep track of which pages are just uptodate on the swap space. This will be fixed as soon as I teach get_swap_page to collect away from the swapcache mapped exclusive swap pages. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!195.158.233.21!news1.ebone.net! news.ebone.net!news1.fra.nextra.com!news2.oke.nextra.no!nextra.com! uninett.no!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Wed, 26 Sep 2001 11:23:44 -0300 (BRST) From: Rik van Riel <r...@conectiva.com.br> X-X-Sender: <r...@imladris.rielhome.conectiva> To: Andrea Arcangeli <and...@suse.de> Cc: Craig Kulesa <ckul...@as.arizona.edu>, <linux-ker...@vger.kernel.org> Subject: Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff) In-Reply-To: <20010926160347.F27945@athlon.random> Original-Message-ID: <Pine.LNX.4.33L.0109261123070.19147-100000@imladris.rielhome.conectiva> X-spambait: aardv...@kernelnewbies.org X-spammeplease: aardv...@nl.linux.org MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Wed, 26 Sep 2001 14:25:28 GMT Message-ID: <fa.q4q8otv.1ulom87@ifi.uio.no> References: <fa.ebjq3mv.1mgutbl@ifi.uio.no> Lines: 26 On Wed, 26 Sep 2001, Andrea Arcangeli wrote: > On Wed, Sep 26, 2001 at 06:38:48AM -0700, Craig Kulesa wrote: > > in memory, and is swapping out harder to compensate. The ac14/ac15 tree > > 2.4.10 is swapping out more also because I don't keep track of which > pages are just uptodate on the swap space. This will be fixed as soon > as I teach get_swap_page to collect away from the swapcache mapped > exclusive swap pages. Wouldn't that be easier to do from try_to_swap_out() ? cheers, Rik -- IA64: a worthy successor to i860. http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to aardv...@nl.linux.org (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Wed, 26 Sep 2001 16:49:35 +0200 From: Andrea Arcangeli <and...@suse.de> To: Rik van Riel <r...@conectiva.com.br> Cc: Craig Kulesa <ckul...@as.arizona.edu>, linux-ker...@vger.kernel.org Subject: Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff) Original-Message-ID: <20010926164935.J27945@athlon.random> Original-References: <20010926160347.F27...@athlon.random> <Pine.LNX.4.33L.0109261123070.19147-100...@imladris.rielhome.conectiva> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <Pine.LNX.4.33L.0109261123070.19147-100000@imladris.rielhome.conectiva>; from riel@conectiva.com.br on Wed, Sep 26, 2001 at 11:23:44AM -0300 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Wed, 26 Sep 2001 14:52:43 GMT Message-ID: <fa.eb3m4nv.1l0qsbj@ifi.uio.no> References: <fa.q4q8otv.1ulom87@ifi.uio.no> Lines: 26 On Wed, Sep 26, 2001 at 11:23:44AM -0300, Rik van Riel wrote: > On Wed, 26 Sep 2001, Andrea Arcangeli wrote: > > On Wed, Sep 26, 2001 at 06:38:48AM -0700, Craig Kulesa wrote: > > > in memory, and is swapping out harder to compensate. The ac14/ac15 tree > > > > 2.4.10 is swapping out more also because I don't keep track of which > > pages are just uptodate on the swap space. This will be fixed as soon > > as I teach get_swap_page to collect away from the swapcache mapped > > exclusive swap pages. > > Wouldn't that be easier to do from try_to_swap_out() ? Of course that's a possibility but then we'd have to duplicate it in all other get_swap_page callers, see? And I think it much better fits hided in get_swap_page: the semantics of get_swap_page() are "give to the caller a newly allocated swap entry". So IMHO it is its own business to discard our "optimizations" to generate a free swap entry in case all swap was just allocated. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! skynet.be!skynet.be!news.algonet.se!algonet!newsfeed1.uni2.dk! news.net.uni-c.dk!uninett.no!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> X-Authentication-Warning: ping.us.dell.com: robert owned process doing -bs Original-Date: Wed, 26 Sep 2001 13:17:29 -0500 (CDT) From: Robert Macaulay <robert_macau...@dell.com> X-X-Sender: <rob...@ping.us.dell.com> Reply-To: Robert Macaulay <robert_macau...@dell.com> To: Andrea Arcangeli <and...@suse.de> cc: Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, <linux-ker...@vger.kernel.org> Subject: Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff) In-Reply-To: <20010926164935.J27945@athlon.random> Original-Message-ID: <Pine.LNX.4.33.0109261310340.23259-100000@ping.us.dell.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Wed, 26 Sep 2001 18:19:09 GMT Message-ID: <fa.kqf6llv.o7oqpt@ifi.uio.no> References: <fa.eb3m4nv.1l0qsbj@ifi.uio.no> Lines: 46 We've tried the 2.4.10 with vmtweaks2 on out machine with 8GB RAM. It was looking good for a while, until it just stopped. Here is what was happening on the machine. I was ftping files into the box at a rate of about 8MB/sec. This continued until all the RAM was in the cache column. This was earlier in the included vmstat output. The I started a dd if=/dev/sde of=/dev/null in a new window. All was looking good until it just stopped. I captured the vmstat below. vmstat continued running for about 1 minute, then it died too. What other info can I provide? 2 0 0 4148 3612 36088 7946652 0 0 15488 64 10216 23346 0 11 88 2 0 1 4148 6424 36100 7944288 0 0 11526 40 7107 15848 0 18 82 1 1 1 4132 5452 36112 7945444 0 0 11642 6208 7531 16983 0 17 83 2 1 1 4132 4972 36128 7946100 0 0 14272 11904 10651 24330 0 13 87 3 0 0 4132 4480 36144 7946588 0 0 13120 6760 11007 25144 0 12 88 0 1 0 4132 5312 36160 7944964 0 0 15616 0 9935 22793 0 10 89 0 3 1 4132 2924 36168 7947052 0 0 6727 11010 5049 11226 0 26 74 0 2 2 4132 2668 36168 7946396 0 0 1666 8598 2230 4598 0 11 89 0 2 2 4132 3776 36168 7946396 0 0 0 0 159 5 0 0 100 0 2 2 4132 3768 36168 7946396 0 0 0 0 121 5 0 0 100 0 2 2 4132 3760 36168 7946396 0 0 0 0 126 4 0 0 100 0 2 2 4132 3756 36168 7946396 0 0 0 0 139 4 0 0 100 0 2 2 4132 3756 36168 7946396 0 0 0 0 148 5 0 0 100 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Wed, 26 Sep 2001 20:36:51 +0200 From: Andrea Arcangeli <and...@suse.de> To: Robert Macaulay <robert_macau...@dell.com> Cc: Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, linux-ker...@vger.kernel.org Subject: Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff) Original-Message-ID: <20010926203651.Q27945@athlon.random> Original-References: <20010926164935.J27...@athlon.random> <Pine.LNX.4.33.0109261310340.23259-100...@ping.us.dell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="qMm9M+Fa2AknHoGS" Content-Disposition: inline In-Reply-To: <Pine.LNX.4.33.0109261310340.23259-100000@ping.us.dell.com>; from robert_macaulay@dell.com on Wed, Sep 26, 2001 at 01:17:29PM -0500 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Wed, 26 Sep 2001 18:40:32 GMT Message-ID: <fa.ecj24fv.1ngutj7@ifi.uio.no> References: <fa.kqf6llv.o7oqpt@ifi.uio.no> Lines: 167 --qMm9M+Fa2AknHoGS Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Sep 26, 2001 at 01:17:29PM -0500, Robert Macaulay wrote: > We've tried the 2.4.10 with vmtweaks2 on out machine with 8GB RAM. It was > looking good for a while, until it just stopped. Here is what was > happening on the machine. > > I was ftping files into the box at a rate of about 8MB/sec. This continued > until all the RAM was in the cache column. This was earlier in the > included vmstat output. The I started a dd if=/dev/sde of=/dev/null in a > new window. > > All was looking good until it just stopped. I captured the vmstat below. > vmstat continued running for about 1 minute, then it died too. What other > info can I provide? the best/first info in this case would be sysrq+T along with the system.map. You may want to give a spin also to the patch in the attached email. thanks, Andrea --qMm9M+Fa2AknHoGS Content-Type: message/rfc822 Content-Disposition: inline Date: Wed, 26 Sep 2001 16:45:42 +0200 From: Andrea Arcangeli <and...@suse.de> To: "Oleg A. Yurlov" <k...@spylog.com> Cc: linux-ker...@vger.kernel.org, Bob Matthews <bmatth...@redhat.com>, Linus Torvalds <torva...@transmeta.com>, Marcelo Tosatti <marc...@conectiva.com.br>, Rik van Riel <r...@conectiva.com.br> Subject: Re: 2.4.10aa1 - 0-order allocation failed. Message-ID: <20010926164542.I27...@athlon.random> References: <1601012257268.20010926180...@spylog.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1601012257268.20010926180...@spylog.com>; from k...@spylog.com on Wed, Sep 26, 2001 at 06:07:48PM +0400 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc On Wed, Sep 26, 2001 at 06:07:48PM +0400, Oleg A. Yurlov wrote: > > Hi, Andrea, > > We have next problem on our servers: > > Sep 26 11:22:39 sol kernel: __alloc_pages: 0-order allocation failed (gfp=0x20/0) > Sep 26 11:22:39 sol kernel: f048dd94 e02ab000 00000000 00000020 00000000 00000020 00000020 e298f820 > Sep 26 11:22:39 sol kernel: e298f844 00000001 e030a56c e030a6c4 00000020 00000000 e01382be 00000000 > Sep 26 11:22:39 sol kernel: e013874a e013488c 00000000 e298f820 00000202 e298f898 00000202 00000246 > Sep 26 11:22:39 sol kernel: Call Trace: [put_dirty_page+122/132] [flush_old_exec+234/572] [sys_ustat+212/268] [kill_super+232/352] [unix_gc+394/748] > Sep 26 11:22:39 sol kernel: [Unused_offset+27374/99203] [Unused_offset+12842/99203] [call_spurious_interrupt+14521/27705] [Unused_offset+43342/99203] [call_spurious_interrupt+14615/27705] [call_spurious_interrupt+16483/27705] > Sep 26 11:22:39 sol kernel: [Unused_offset+90704/99203] [ipgre_rcv+233/636] [ipgre_rcv+503/636] [fcntl_getlk+327/624] [do_invalid_TSS+43/96] > Sep 26 11:22:39 sol kernel: __alloc_pages: 0-order allocation failed (gfp=0x20/0) > Sep 26 11:22:39 sol kernel: f048ddd4 e02ab000 00000000 00000020 00000000 00000020 00000020 e298f820 > Sep 26 11:22:39 sol kernel: e298f844 00000001 e030a56c e030a6c4 00000020 00000000 e01382be 00000000 > Sep 26 11:22:39 sol kernel: e013874a e013488c 00000000 e298f820 00000202 e298f898 00000202 00000246 > Sep 26 11:22:39 sol kernel: Call Trace: [put_dirty_page+122/132] [flush_old_exec+234/572] [sys_ustat+212/268] [kill_super+232/352] [unix_gc+394/748] > Sep 26 11:22:39 sol kernel: [Unused_offset+27374/99203] [call_spurious_interrupt+13905/27705] [call_spurious_interrupt+17048/27705] [Unused_offset+90704/99203] [ipgre_rcv+233/636] [ipgre_rcv+503/636] > Sep 26 11:22:39 sol kernel: [fcntl_getlk+327/624] [do_invalid_TSS+43/96] the system.map is wrong but this should be harmless, just a notice (if you do the reverse lookup to find the address and you resolve the right symbols we could make sure of that). For driver writers (since it could be on topic with those GFP_ATOMIC faliures): as I suggested to the SG folks make sure to never use GFP_ATOMIC in normal kernel context, if you want lowlatency use GFP_NOIO instead. GFP_NOIO can schedule (so you must release all the spinlocks first) but it will never block on I/O so it will provide a small latency too _but_ it will be able to shrink the clean cache so it is very unlikely it will fail unless you have lots of dirty or mapped cache in ram. > Also, we see next in process status: > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > vz 927 0.0 625.1 43900 4267034752 ? S 08:10 0:00 hits > vz 1030 0.0 625.1 43900 4267034752 ? S 08:11 0:00 hits > vz 4561 1.3 625.1 45948 4267034724 ? S 10:48 0:00 hits > root 4564 0.0 0.0 1460 548 pts/2 S 10:48 0:00 grep hits > vz 4566 0.0 625.1 45948 4267034724 ? S 10:48 0:00 hits Ben sent the fix for this one [Linus, you can find it on l-k if you weren't cc'ed] (was a missing check in the tlb shootdown smp fixes) but it's only a beauty issue, so really don't worry about it :) > After these errors we see some uninterruptable processes (with flag D in > process status), gdb say that function "fdatasync" called and no returned... > Soft reboot not work. > > Server has 2 CPUs (Pentium III Katmai), 2Gb RAM, 2Gb swap, Hardware > RAID (Mylex DAC960PTL1 PCI RAID Controller). > > Any ideas ? Yes you have highmem. Last night I spent one hour on the traces from Bob (btw, many thanks for the helpful report Bob!) and the first suspect is the recent GFP_NOHIGHIO logic. Despite Bob's traces not obviously showing this, I think I can see a potential problem with writepage with regard to the GFP_NOHIGHIO logic (I just checked 2.4.9ac15 has the same issue too, see the CAN_DO_FS definition so this shouldn't been introduced recently). This should fix it, and please also apply vm-tweaks-2 posted to l-k a few minutes ago. --- 2.4.10aa1/mm/vmscan.c Sun Sep 23 22:16:22 2001 +++ vm/mm/vmscan.c Wed Sep 26 16:34:30 2001 @@ -392,7 +384,7 @@ int (*writepage)(struct page *); writepage = page->mapping->a_ops->writepage; - if ((gfp_mask & __GFP_FS) && writepage) { + if ((gfp_mask & __GFP_FS) && ((gfp_mask & __GFP_HIGHIO) || !PageHighMem(page)) && writepage) { ClearPageDirty(page); page_cache_get(page); spin_unlock(&pagemap_lru_lock); And if the above patch still doesn't help can you just apply this below patch to disable the NOHIGHIO logic all together, just to make sure we're looking in the right place? --- 2.4.10aa1/mm/highmem.c.~1~ Sun Sep 23 21:11:43 2001 +++ 2.4.10aa1/mm/highmem.c Wed Sep 26 16:38:34 2001 @@ -328,7 +328,7 @@ struct page *page; repeat_alloc: - page = alloc_page(GFP_NOHIGHIO); + page = alloc_page(GFP_NOIO); if (page) return page; /* @@ -366,7 +366,7 @@ struct buffer_head *bh; repeat_alloc: - bh = kmem_cache_alloc(bh_cachep, SLAB_NOHIGHIO); + bh = kmem_cache_alloc(bh_cachep, SLAB_NOIO); if (bh) return bh; /* Of course also make sure that a SYSRQ+e or SYSRQ+i doesn't relieve the machine and allows to kill the D tasks :). thanks! Andrea --qMm9M+Fa2AknHoGS-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Fri, 28 Sep 2001 00:13:21 +0200 From: Andrea Arcangeli <and...@suse.de> To: Robert Macaulay <robert_macau...@dell.com> Cc: Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, linux-ker...@vger.kernel.org, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br>, Linus Torvalds <torva...@transmeta.com> Subject: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] Original-Message-ID: <20010928001321.L14277@athlon.random> Original-References: <20010926164935.J27...@athlon.random> <Pine.LNX.4.33.0109261310340.23259-100...@ping.us.dell.com> <20010926203651.Q27...@athlon.random> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010926203651.Q27945@athlon.random>; from andrea@suse.de on Wed, Sep 26, 2001 at 08:36:51PM +0200 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Thu, 27 Sep 2001 22:15:08 GMT Message-ID: <fa.eais27v.1lgovri@ifi.uio.no> References: <fa.ecj24fv.1ngutj7@ifi.uio.no> Lines: 104 On Wed, Sep 26, 2001 at 08:36:51PM +0200, Andrea Arcangeli wrote: > On Wed, Sep 26, 2001 at 01:17:29PM -0500, Robert Macaulay wrote: > > We've tried the 2.4.10 with vmtweaks2 on out machine with 8GB RAM. It was > > looking good for a while, until it just stopped. Here is what was > > happening on the machine. > > > > I was ftping files into the box at a rate of about 8MB/sec. This continued > > until all the RAM was in the cache column. This was earlier in the > > included vmstat output. The I started a dd if=/dev/sde of=/dev/null in a > > new window. > > > > All was looking good until it just stopped. I captured the vmstat below. > > vmstat continued running for about 1 minute, then it died too. What other > > info can I provide? > > the best/first info in this case would be sysrq+T along with the system.map. Ok, both your trace and Bob's trace show the problem clearly. thanks to both for the helpful feedback btw. The deadlock happens because of a collision between write_some_buffers() and the GFP_NOHIGHIO logic. The deadlock was not introduced in the vm rewrite but it was introduced with the nohighio logic. The problem is that we are locking a couple of buffers, and later - after they're all locked - we start writing them via write_locked_buffers. The deadlock happens in the middle of write_locked_buffers when we hit an highmem buffer, so while allocating with GFP_NOHIGHIO we end doing sync_page_buffers on any page that isn't highmem, but that incidentally is one of the other next buffers in the array that we previously locked in write_some_buffers but that aren't in the I/O queue yet (so we'll wait forever since they depends on us to be written). Robert just confirmed that dropping the NOHIGHIO logic fixes the problem. So the fix is either: 1) to drop the NOHIGHIO logic like my test patch did 2) or to keep track of what buffers we must not wait while releasing ram I'll try approch 2) in the below untested patch (the nohighio logic make sense so I'd prefer not to drop it), Robert and Bob, can you give it a spin on the highmem boxes and check if it helps? I suggest to test it on top of 2.4.10+vm-tweaks-2. --- 2.4.10aa2/fs/buffer.c.~1~ Wed Sep 26 18:45:29 2001 +++ 2.4.10aa2/fs/buffer.c Fri Sep 28 00:04:44 2001 @@ -194,6 +194,7 @@ struct buffer_head * bh = *array++; bh->b_end_io = end_buffer_io_sync; submit_bh(WRITE, bh); + clear_bit(BH_Pending_IO, &bh->b_state); } while (--count); } @@ -225,6 +226,7 @@ if (atomic_set_buffer_clean(bh)) { __refile_buffer(bh); get_bh(bh); + set_bit(BH_Pending_IO, &bh->b_state); array[count++] = bh; if (count < NRSYNC) continue; @@ -2519,7 +2521,9 @@ int tryagain = 1; do { - if (buffer_dirty(p) || buffer_locked(p)) { + if (unlikely(buffer_pending_IO(p))) + tryagain = 0; + else if (buffer_dirty(p) || buffer_locked(p)) { if (test_and_set_bit(BH_Wait_IO, &p->b_state)) { if (buffer_dirty(p)) { ll_rw_block(WRITE, 1, &p); --- 2.4.10aa2/include/linux/fs.h.~1~ Wed Sep 26 18:51:25 2001 +++ 2.4.10aa2/include/linux/fs.h Fri Sep 28 00:01:54 2001 @@ -217,6 +217,7 @@ BH_New, /* 1 if the buffer is new and not yet written out */ BH_Async, /* 1 if the buffer is under end_buffer_io_async I/O */ BH_Wait_IO, /* 1 if we should throttle on this buffer */ + BH_Pending_IO, /* 1 if the buffer is locked but not in the I/O queue yet */ BH_PrivateStart,/* not a state bit, but the first bit available * for private allocation by other entities @@ -277,6 +278,7 @@ #define buffer_mapped(bh) __buffer_state(bh,Mapped) #define buffer_new(bh) __buffer_state(bh,New) #define buffer_async(bh) __buffer_state(bh,Async) +#define buffer_pending_IO(bh) __buffer_state(bh,Pending_IO) #define bh_offset(bh) ((unsigned long)(bh)->b_data & ~PAGE_MASK) Thanks, Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> X-Authentication-Warning: penguin.transmeta.com: torvalds owned process doing -bs Original-Date: Thu, 27 Sep 2001 16:16:11 -0700 (PDT) From: Linus Torvalds <torva...@transmeta.com> To: Andrea Arcangeli <and...@suse.de> cc: Robert Macaulay <robert_macau...@dell.com>, Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, <linux-ker...@vger.kernel.org>, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: Re: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] In-Reply-To: <20010928001321.L14277@athlon.random> Original-Message-ID: <Pine.LNX.4.33.0109271605550.25667-100000@penguin.transmeta.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Thu, 27 Sep 2001 23:17:51 GMT Message-ID: <fa.ncm16mv.fn2g8d@ifi.uio.no> References: <fa.eais27v.1lgovri@ifi.uio.no> Lines: 45 On Fri, 28 Sep 2001, Andrea Arcangeli wrote: > > The deadlock happens in the middle of write_locked_buffers when we hit > an highmem buffer, so while allocating with GFP_NOHIGHIO we end doing > sync_page_buffers on any page that isn't highmem, but that incidentally is one of the > other next buffers in the array that we previously locked in > write_some_buffers but that aren't in the I/O queue yet (so we'll wait > forever since they depends on us to be written). Interesting, indeed.. However, your patch is racy: > --- 2.4.10aa2/fs/buffer.c.~1~ Wed Sep 26 18:45:29 2001 > +++ 2.4.10aa2/fs/buffer.c Fri Sep 28 00:04:44 2001 > @@ -194,6 +194,7 @@ > struct buffer_head * bh = *array++; > bh->b_end_io = end_buffer_io_sync; > submit_bh(WRITE, bh); > + clear_bit(BH_Pending_IO, &bh->b_state); No way can we clear the bit here, because the submit_bh() may have caused the buffer to be unlocked and IO to have completed, and it is no longer "owned" by us - somebody else might have started IO on it and we'd be clearing the bit for the wrong user. I would suggest a totally different approach: make the "can we wait for existing buffer heads" condition a GFP bit the same way the HIGHIO thing is a GFP bit, and just not set it for GFP_NOHIGHIO. Thinking about it, I think GFP_NOIO also implies "we must not wait for other buffers", because that could deadlock for _other_ things too, like loop and NBD (which use NOIO to make sure that they don't recurse - but that should also imply not waiting for themselves). The GFP_xxx approach should fix those deadlocks too. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> X-Authentication-Warning: penguin.transmeta.com: torvalds owned process doing -bs Original-Date: Thu, 27 Sep 2001 16:18:58 -0700 (PDT) From: Linus Torvalds <torva...@transmeta.com> To: Andrea Arcangeli <and...@suse.de> cc: Robert Macaulay <robert_macau...@dell.com>, Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, <linux-ker...@vger.kernel.org>, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: Re: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] In-Reply-To: <Pine.LNX.4.33.0109271605550.25667-100000@penguin.transmeta.com> Original-Message-ID: <Pine.LNX.4.33.0109271618120.25667-100000@penguin.transmeta.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Thu, 27 Sep 2001 23:20:51 GMT Message-ID: <fa.nalr6uv.dncg00@ifi.uio.no> References: <fa.ncm16mv.fn2g8d@ifi.uio.no> Lines: 59 On Thu, 27 Sep 2001, Linus Torvalds wrote: > > Thinking about it, I think GFP_NOIO also implies "we must not wait for > other buffers", because that could deadlock for _other_ things too, like > loop and NBD (which use NOIO to make sure that they don't recurse - but > that should also imply not waiting for themselves). The GFP_xxx approach > should fix those deadlocks too. Ie the patch would be something like the attached.. Linus ------ diff -u --recursive --new-file v2.4.10/linux/fs/buffer.c linux/fs/buffer.c --- v2.4.10/linux/fs/buffer.c Wed Sep 26 11:53:42 2001 +++ linux/fs/buffer.c Thu Sep 27 16:13:47 2001 @@ -2522,7 +2373,7 @@ ll_rw_block(WRITE, 1, &p); tryagain = 0; } else if (buffer_locked(p)) { - if (gfp_mask & __GFP_WAIT) { + if (gfp_mask & __GFP_WAITBUF) { wait_on_buffer(p); tryagain = 1; } else diff -u --recursive --new-file v2.4.10/linux/include/linux/mm.h linux/include/linux/mm.h --- v2.4.10/linux/include/linux/mm.h Sun Sep 23 11:41:01 2001 +++ linux/include/linux/mm.h Thu Sep 27 16:13:35 2001 @@ -550,16 +550,17 @@ #define __GFP_IO 0x40 /* Can start low memory physical IO? */ #define __GFP_HIGHIO 0x80 /* Can start high mem physical IO? */ #define __GFP_FS 0x100 /* Can call down to low-level FS? */ +#define __GFP_WAITBUF 0x200 /* Can we wait for buffers to complete? */ #define GFP_NOHIGHIO (__GFP_HIGH | __GFP_WAIT | __GFP_IO) #define GFP_NOIO (__GFP_HIGH | __GFP_WAIT) -#define GFP_NOFS (__GFP_HIGH | __GFP_WAIT | __GFP_IO | __GFP_HIGHIO) +#define GFP_NOFS (__GFP_HIGH | __GFP_WAIT | __GFP_IO | __GFP_HIGHIO | __GFP_WAITBUF) #define GFP_ATOMIC (__GFP_HIGH) -#define GFP_USER ( __GFP_WAIT | __GFP_IO | __GFP_HIGHIO | __GFP_FS) -#define GFP_HIGHUSER ( __GFP_WAIT | __GFP_IO | __GFP_HIGHIO | __GFP_FS | __GFP_HIGHMEM) -#define GFP_KERNEL (__GFP_HIGH | __GFP_WAIT | __GFP_IO | __GFP_HIGHIO | __GFP_FS) -#define GFP_NFS (__GFP_HIGH | __GFP_WAIT | __GFP_IO | __GFP_HIGHIO | __GFP_FS) -#define GFP_KSWAPD ( __GFP_WAIT | __GFP_IO | __GFP_HIGHIO | __GFP_FS) +#define GFP_USER ( __GFP_WAIT | __GFP_IO | __GFP_HIGHIO | __GFP_WAITBUF | __GFP_FS) +#define GFP_HIGHUSER ( __GFP_WAIT | __GFP_IO | __GFP_HIGHIO | __GFP_WAITBUF | __GFP_FS | __GFP_HIGHMEM) +#define GFP_KERNEL (__GFP_HIGH | __GFP_WAIT | __GFP_IO | __GFP_HIGHIO | __GFP_WAITBUF | __GFP_FS) +#define GFP_NFS (__GFP_HIGH | __GFP_WAIT | __GFP_IO | __GFP_HIGHIO | __GFP_WAITBUF | __GFP_FS) +#define GFP_KSWAPD ( __GFP_WAIT | __GFP_IO | __GFP_HIGHIO | __GFP_WAITBUF | __GFP_FS) /* Flag - indicates that the buffer will be suitable for DMA. Ignored on some platforms, used as appropriate on others */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Fri, 28 Sep 2001 01:37:30 +0200 From: Andrea Arcangeli <and...@suse.de> To: Linus Torvalds <torva...@transmeta.com> Cc: Robert Macaulay <robert_macau...@dell.com>, Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, linux-ker...@vger.kernel.org, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: Re: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] Original-Message-ID: <20010928013730.Y14277@athlon.random> Original-References: <Pine.LNX.4.33.0109271605550.25667-100...@penguin.transmeta.com> <Pine.LNX.4.33.0109271618120.25667-100...@penguin.transmeta.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <Pine.LNX.4.33.0109271618120.25667-100000@penguin.transmeta.com>; from torvalds@transmeta.com on Thu, Sep 27, 2001 at 04:18:58PM -0700 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Thu, 27 Sep 2001 23:39:25 GMT Message-ID: <fa.eb2s2nv.1l0ovb3@ifi.uio.no> References: <fa.nalr6uv.dncg00@ifi.uio.no> Lines: 21 On Thu, Sep 27, 2001 at 04:18:58PM -0700, Linus Torvalds wrote: > > On Thu, 27 Sep 2001, Linus Torvalds wrote: > > > > Thinking about it, I think GFP_NOIO also implies "we must not wait for > > other buffers", because that could deadlock for _other_ things too, like > > loop and NBD (which use NOIO to make sure that they don't recurse - but > > that should also imply not waiting for themselves). The GFP_xxx approach > > should fix those deadlocks too. > > Ie the patch would be something like the attached.. well this approch is much less finegrined... but yes, it would fix the deadlock. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!148.122.208.68!news2.oke.nextra.no! nextra.com!uninett.no!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Fri, 28 Sep 2001 01:47:20 +0200 From: Andrea Arcangeli <and...@suse.de> To: Linus Torvalds <torva...@transmeta.com> Cc: Robert Macaulay <robert_macau...@dell.com>, Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, linux-ker...@vger.kernel.org, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: Re: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] Original-Message-ID: <20010928014720.Z14277@athlon.random> Original-References: <20010928001321.L14...@athlon.random> <Pine.LNX.4.33.0109271605550.25667-100...@penguin.transmeta.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <Pine.LNX.4.33.0109271605550.25667-100000@penguin.transmeta.com>; from torvalds@transmeta.com on Thu, Sep 27, 2001 at 04:16:11PM -0700 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Thu, 27 Sep 2001 23:48:55 GMT Message-ID: <fa.eais2vv.1lgouj0@ifi.uio.no> References: <fa.ncm16mv.fn2g8d@ifi.uio.no> Lines: 34 On Thu, Sep 27, 2001 at 04:16:11PM -0700, Linus Torvalds wrote: > > On Fri, 28 Sep 2001, Andrea Arcangeli wrote: > However, your patch is racy: > > > --- 2.4.10aa2/fs/buffer.c.~1~ Wed Sep 26 18:45:29 2001 > > +++ 2.4.10aa2/fs/buffer.c Fri Sep 28 00:04:44 2001 > > @@ -194,6 +194,7 @@ > > struct buffer_head * bh = *array++; > > bh->b_end_io = end_buffer_io_sync; > > submit_bh(WRITE, bh); > > + clear_bit(BH_Pending_IO, &bh->b_state); > > No way can we clear the bit here, because the submit_bh() may have caused > the buffer to be unlocked and IO to have completed, and it is no longer > "owned" by us - somebody else might have started IO on it and we'd be > clearing the bit for the wrong user. Moving clear_bit just above submit_bh will fix it (please Robert make this change before testing it), because if we block in submit_bh in the bounce, then we won't deadlock on ourself because of the pagehighmem check, and all previous non-pending bh are ok too, (only the next are problematic, and they're still marked pending_IO so we can't deadlock on them). So you can re-consider my approch, the design of the fix was ok, it was just a silly implementation error. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!sn-xit-02!supernews.com! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> X-Authentication-Warning: penguin.transmeta.com: torvalds owned process doing -bs Original-Date: Thu, 27 Sep 2001 17:03:49 -0700 (PDT) From: Linus Torvalds <torva...@transmeta.com> To: Andrea Arcangeli <and...@suse.de> cc: Robert Macaulay <robert_macau...@dell.com>, Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, <linux-ker...@vger.kernel.org>, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: Re: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] In-Reply-To: <20010928014720.Z14277@athlon.random> Original-Message-ID: <Pine.LNX.4.33.0109271700001.32086-100000@penguin.transmeta.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Fri, 28 Sep 2001 00:05:53 GMT Message-ID: <fa.na5j5dv.d74hg6@ifi.uio.no> References: <fa.eais2vv.1lgouj0@ifi.uio.no> Lines: 27 On Fri, 28 Sep 2001, Andrea Arcangeli wrote: > > Moving clear_bit just above submit_bh will fix it (please Robert make > this change before testing it), because if we block in submit_bh in the > bounce, then we won't deadlock on ourself because of the pagehighmem > check We won't block on _ourselves_, but we can block on _two_ people doing it, and blocking on each others requests that are blocked waiting on a bounce buffer. Both will have one locked buffer, both will be waiting for the other person unlocking that buffer, and neither will ever make progress. You could clear that bit _after_ the bounce buffer allocation, I suspect. But I also suspect that it doesn't matter much, and as I can imagine similar problems with GFP_NOIO and loopback etc (do you see any reason why loopback couldn't deadlock on waiting for itself?), I think the GFP_XXX thing is the proper fix. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!sn-xit-02!supernews.com! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Fri, 28 Sep 2001 02:08:10 +0200 From: Andrea Arcangeli <and...@suse.de> To: Linus Torvalds <torva...@transmeta.com> Cc: Robert Macaulay <robert_macau...@dell.com>, Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, linux-ker...@vger.kernel.org, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: Re: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] Original-Message-ID: <20010928020810.C14277@athlon.random> Original-References: <20010928001321.L14...@athlon.random> <Pine.LNX.4.33.0109271605550.25667-100...@penguin.transmeta.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <Pine.LNX.4.33.0109271605550.25667-100000@penguin.transmeta.com>; from torvalds@transmeta.com on Thu, Sep 27, 2001 at 04:16:11PM -0700 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Fri, 28 Sep 2001 00:09:50 GMT Message-ID: <fa.ea2u1vv.1k0uvjm@ifi.uio.no> References: <fa.ncm16mv.fn2g8d@ifi.uio.no> Lines: 27 On Thu, Sep 27, 2001 at 04:16:11PM -0700, Linus Torvalds wrote: > Thinking about it, I think GFP_NOIO also implies "we must not wait for > other buffers", because that could deadlock for _other_ things too, like > loop and NBD (which use NOIO to make sure that they don't recurse - but > that should also imply not waiting for themselves). The GFP_xxx approach > should fix those deadlocks too. I don't understand very well your point about GFP_NOIO, GFP_NOIO is a no brainer, loop/NDB etc.. all them are safe since GFP_NOIO will forbid to arrive in sync_page_buffers in first place. The only brainer is the GFP_NOHIGHIO that can arrive there on lowmem pages since it only protects against itself from all the callers via the pagehighmem logic, so only the callers that locks down highmem and then nohighmem and then start the I/O on the highmem are subject to the highmem deadlock. The only point that locks down highmem and then nohighmem and then starts I/O on highmem seems to be the write_some_buffers. However I could agree if you're worried other places does it too, but if they do we could teach them to use the pending_IO information too so we could be more finegrined with my approch. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!sn-xit-02!supernews.com! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Fri, 28 Sep 2001 02:11:15 +0200 From: Andrea Arcangeli <and...@suse.de> To: Linus Torvalds <torva...@transmeta.com> Cc: Robert Macaulay <robert_macau...@dell.com>, Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, linux-ker...@vger.kernel.org, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: Re: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] Original-Message-ID: <20010928021115.D14277@athlon.random> Original-References: <20010928014720.Z14...@athlon.random> <Pine.LNX.4.33.0109271700001.32086-100...@penguin.transmeta.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <Pine.LNX.4.33.0109271700001.32086-100000@penguin.transmeta.com>; from torvalds@transmeta.com on Thu, Sep 27, 2001 at 05:03:49PM -0700 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Fri, 28 Sep 2001 00:12:37 GMT Message-ID: <fa.ea3826v.1k0kvro@ifi.uio.no> References: <fa.na5j5dv.d74hg6@ifi.uio.no> Lines: 38 On Thu, Sep 27, 2001 at 05:03:49PM -0700, Linus Torvalds wrote: > > On Fri, 28 Sep 2001, Andrea Arcangeli wrote: > > > > Moving clear_bit just above submit_bh will fix it (please Robert make > > this change before testing it), because if we block in submit_bh in the > > bounce, then we won't deadlock on ourself because of the pagehighmem > > check > > We won't block on _ourselves_, but we can block on _two_ people doing it, If other people waits for us it's ok (if they waits it means they're not using GFP_NOIO and they're also not using GFP_NOHIGHIO). We cannot wait on other two people doing it since they would be highmem pages and the pagehighmem check forbids that. > and blocking on each others requests that are blocked waiting on a bounce > buffer. Both will have one locked buffer, both will be waiting for the > other person unlocking that buffer, and neither will ever make progress. > > You could clear that bit _after_ the bounce buffer allocation, I suspect. I don't think it's necessary. > But I also suspect that it doesn't matter much, and as I can imagine > similar problems with GFP_NOIO and loopback etc (do you see any reason why > loopback couldn't deadlock on waiting for itself?), I think the GFP_XXX > thing is the proper fix. GFP_NOIO is a no brainer, it cannot go wrong see the other email. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!news-feed.ifi.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Thu, 27 Sep 2001 20:51:42 -0300 (BRST) From: Rik van Riel <r...@conectiva.com.br> X-X-Sender: <r...@imladris.rielhome.conectiva> To: Andrea Arcangeli <and...@suse.de> Cc: Linus Torvalds <torva...@transmeta.com>, Robert Macaulay <robert_macau...@dell.com>, Craig Kulesa <ckul...@as.arizona.edu>, <linux-ker...@vger.kernel.org>, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: Re: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] In-Reply-To: <20010928013730.Y14277@athlon.random> Original-Message-ID: <Pine.LNX.4.33L.0109272050570.19147-100000@imladris.rielhome.conectiva> X-spambait: aardv...@kernelnewbies.org X-spammeplease: aardv...@nl.linux.org MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Fri, 28 Sep 2001 01:23:51 GMT Message-ID: <fa.q3qkolv.1vlkm00@ifi.uio.no> References: <fa.eb2s2nv.1l0ovb3@ifi.uio.no> Lines: 24 On Fri, 28 Sep 2001, Andrea Arcangeli wrote: > well this approch is much less finegrined... I'd consider that a feature. Undocumented subtle stuff tends to break within 6 months, sometimes even due to changes made by the same person who did the original subtle trick. cheers, Rik -- IA64: a worthy successor to i860. http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to aardv...@nl.linux.org (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Fri, 28 Sep 2001 03:26:55 +0200 From: Andrea Arcangeli <and...@suse.de> To: Rik van Riel <r...@conectiva.com.br> Cc: Linus Torvalds <torva...@transmeta.com>, Robert Macaulay <robert_macau...@dell.com>, Craig Kulesa <ckul...@as.arizona.edu>, linux-ker...@vger.kernel.org, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: Re: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] Original-Message-ID: <20010928032655.H14277@athlon.random> Original-References: <20010928013730.Y14...@athlon.random> <Pine.LNX.4.33L.0109272050570.19147-100...@imladris.rielhome.conectiva> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <Pine.LNX.4.33L.0109272050570.19147-100000@imladris.rielhome.conectiva>; from riel@conectiva.com.br on Thu, Sep 27, 2001 at 08:51:42PM -0300 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Fri, 28 Sep 2001 01:28:34 GMT Message-ID: <fa.ec3a2fv.1m0mv3j@ifi.uio.no> References: <fa.q3qkolv.1vlkm00@ifi.uio.no> Lines: 18 On Thu, Sep 27, 2001 at 08:51:42PM -0300, Rik van Riel wrote: > On Fri, 28 Sep 2001, Andrea Arcangeli wrote: > > > well this approch is much less finegrined... > > I'd consider that a feature. Undocumented subtle stuff > tends to break within 6 months, sometimes even due to > changes made by the same person who did the original > subtle trick. by the same argument we could drop the NOHIGHIO logic in first place. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> X-Authentication-Warning: penguin.transmeta.com: torvalds owned process doing -bs Original-Date: Thu, 27 Sep 2001 18:28:48 -0700 (PDT) From: Linus Torvalds <torva...@transmeta.com> To: Rik van Riel <r...@conectiva.com.br> cc: Andrea Arcangeli <and...@suse.de>, Robert Macaulay <robert_macau...@dell.com>, Craig Kulesa <ckul...@as.arizona.edu>, <linux-ker...@vger.kernel.org>, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: Re: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] In-Reply-To: <Pine.LNX.4.33L.0109272050570.19147-100000@imladris.rielhome.conectiva> Original-Message-ID: <Pine.LNX.4.33.0109271827001.3101-100000@penguin.transmeta.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Fri, 28 Sep 2001 01:30:42 GMT Message-ID: <fa.o9q3d7v.g485rd@ifi.uio.no> References: <fa.q3qkolv.1vlkm00@ifi.uio.no> Lines: 15 Note that if you do end up applying my suggested patch for testing, you also need to add __GFP_WAITBUF to SLAB_LEVEL_MASK in <linux/slab.h> otherwise the slab allocator will be really unhappy the first time it sees any normal allocation.. (Ie very early at boot). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> X-Authentication-Warning: ping.us.dell.com: robert owned process doing -bs Original-Date: Thu, 27 Sep 2001 21:12:25 -0500 (CDT) From: Robert Macaulay <robert_macau...@dell.com> X-X-Sender: <rob...@ping.us.dell.com> Reply-To: Robert Macaulay <robert_macau...@dell.com> To: Andrea Arcangeli <and...@suse.de> cc: Linus Torvalds <torva...@transmeta.com>, Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, <linux-ker...@vger.kernel.org>, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: Re: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] In-Reply-To: <20010928014720.Z14277@athlon.random> Original-Message-ID: <Pine.LNX.4.33.0109272108400.29056-100000@ping.us.dell.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Fri, 28 Sep 2001 02:14:56 GMT Message-ID: <fa.kpv6ktv.tn0q1k@ifi.uio.no> References: <fa.eais2vv.1lgouj0@ifi.uio.no> Lines: 24 On Thu, 27 Sep 2001, Andrea Arcangeli wrote: > > Moving clear_bit just above submit_bh will fix it (please Robert make > this change before testing it), because if we block in submit_bh in the > bounce, then we won't deadlock on ourself because of the pagehighmem > check, and all previous non-pending bh are ok too, (only the next are > problematic, and they're still marked pending_IO so we can't deadlock on > them). > It worked. The box did not freeze. We can try Linus' patch as well if needed. I had actually applied it and rebooted before the warning, and as predicted, it froze very early in the boot process. Thanks Andrea. I'll see if we can repeat the 0-page alloc again. Robert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu! news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no! ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> Original-Date: Fri, 28 Sep 2001 04:24:17 +0200 From: Andrea Arcangeli <and...@suse.de> To: Robert Macaulay <robert_macau...@dell.com> Cc: Linus Torvalds <torva...@transmeta.com>, Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, linux-ker...@vger.kernel.org, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: Re: highmem deadlock fix [was Re: VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)] Original-Message-ID: <20010928042417.J14277@athlon.random> Original-References: <20010928014720.Z14...@athlon.random> <Pine.LNX.4.33.0109272108400.29056-100...@ping.us.dell.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <Pine.LNX.4.33.0109272108400.29056-100000@ping.us.dell.com>; from robert_macaulay@dell.com on Thu, Sep 27, 2001 at 09:12:25PM -0500 X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Fri, 28 Sep 2001 02:26:02 GMT Message-ID: <fa.ea3g2fv.1k0sv3j@ifi.uio.no> References: <fa.kpv6ktv.tn0q1k@ifi.uio.no> Lines: 19 On Thu, Sep 27, 2001 at 09:12:25PM -0500, Robert Macaulay wrote: > Thanks Andrea. I'll see if we can repeat the 0-page alloc again. Ok, it is possible the 0-page alloc failed because NOHIGHIO was disabled, Linus's fix being less finegrined than mine could also lead more easily to 0-page alloc failed. However failing bounce-allocation is not important since we have the reserved pool for those allocations. Not having to use the reserved pool only allows an higher amount of I/O in parallel. This is why I said we could have dropped the NOHIGHIO logic in first place if we wanted to go the non finegrined way. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu! newsfeeds.belnet.be!news.belnet.be!news.tele.dk!small.news.tele.dk! 194.213.69.151!news.algonet.se!algonet!newsfeed1.uni2.dk!news.net.uni-c.dk! uninett.no!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist Newsgroups: fa.linux.kernel Return-Path: <linux-kernel-ow...@vger.kernel.org> X-Authentication-Warning: ping.us.dell.com: robert owned process doing -bs Original-Date: Fri, 28 Sep 2001 09:02:18 -0500 (CDT) From: Robert Macaulay <robert_macau...@dell.com> X-X-Sender: <rob...@ping.us.dell.com> Reply-To: Robert Macaulay <robert_macau...@dell.com> To: Andrea Arcangeli <and...@suse.de> cc: Linus Torvalds <torva...@transmeta.com>, Rik van Riel <r...@conectiva.com.br>, Craig Kulesa <ckul...@as.arizona.edu>, <linux-ker...@vger.kernel.org>, Bob Matthews <bmatth...@redhat.com>, Marcelo Tosatti <marc...@conectiva.com.br> Subject: LILO causes segmentation fault and panic [was Re: highmem deadlock fix] In-Reply-To: <20010928042417.J14277@athlon.random> Original-Message-ID: <Pine.LNX.4.33.0109280859280.30080-100000@ping.us.dell.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-ow...@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Organization: Internet mailing list Date: Fri, 28 Sep 2001 14:05:03 GMT Message-ID: <fa.klfim5v.s7gr9n@ifi.uio.no> References: <fa.ea3g2fv.1k0sv3j@ifi.uio.no> Lines: 73 Not sure if this is 100% related to the latest patch, but after we had our 0-order allocation failures, I ran lilo to switch to a new kernel, and it paniced. Its never done this before, so it might be related. Robert ksymoops 2.4.3 on i686 2.4.10-aaStuff. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.10-aaStuff/ (default) -m linux-2.4.10/System.map (specified) Warning (compare_maps): mismatch on symbol partition_name , ksyms_base says c01 cf820, System.map says c015a2b0. Ignoring ksyms_base entry invalid operand: 0000 CPU: 3 EIP: 0010:[<c012fb27>] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010206 eax: cc403648 ebx: cc403638 ecx: 000003f0 edx: 00000000 esi: cc403638 edi: 000003f0 ebp: 00000246 esp: e3659ea0 ds: 0018 es: 0018 ss: 0018 Process lilo (pid: 6666, stackpage=e3659000) Stack: 00000000 e3658000 e3658000 00001a0b c024d6bb c011bec6 00001a0b cc403638 cc403640 cc403638 00000246 c0130191 cc403638 000003f0 e3658000 00001a0b c024d6bb 00001a0b c0340018 ea03f400 fffffff4 c03415a0 00000000 f89166b6 Call Trace: [<c011bec6>] [<c0130191>] [<f89166b6>] [<c0193332>] [<c0141936>] [<c0138656>] [<c014ce9c>] [<c013855d>] [<c014481e>] [<c0138894>] [<c010710b> Code: 0f 0b f7 c7 00 10 00 00 0f 85 10 02 00 00 b8 00 e0 ff ff 21 >>EIP; c012fb26 <kmem_cache_grow+16/240> <===== Trace; c011bec6 <sys_waitpid+16/20> Trace; c0130190 <kmalloc+150/180> Trace; f89166b6 <[ide-cd]ide_cdrom_open+36/80> Trace; c0193332 <ide_open+d2/100> Trace; c0141936 <blkdev_open+76/d0> Trace; c0138656 <dentry_open+e6/190> Trace; c014ce9c <dput+1c/160> Trace; c013855c <filp_open+4c/60> Trace; c014481e <getname+5e/a0> Trace; c0138894 <sys_open+34/c0> Trace; c010710a <system_call+32/38> Code; c012fb26 <kmem_cache_grow+16/240> 00000000 <_EIP>: Code; c012fb26 <kmem_cache_grow+16/240> <===== 0: 0f 0b ud2a <===== Code; c012fb28 <kmem_cache_grow+18/240> 2: f7 c7 00 10 00 00 test $0x1000,%edi Code; c012fb2e <kmem_cache_grow+1e/240> 8: 0f 85 10 02 00 00 jne 21e <_EIP+0x21e> c012fd44 <kmem_cache_g row+234/240> Code; c012fb34 <kmem_cache_grow+24/240> e: b8 00 e0 ff ff mov $0xffffe000,%eax Code; c012fb38 <kmem_cache_grow+28/240> 13: 21 00 and %eax,(%eax) 1 warning issued. Results may not be reliable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/