From: Rik van Riel <r...@conectiva.com.br> Subject: TODO list for new VM Date: 2000/09/16 Message-ID: <linux.kernel.Pine.LNX.4.21.0009160544000.1519-100000@duckman.distro.conectiva>#1/1 X-Deja-AN: 670474882 Approved: n...@nntp-server.caltech.edu X-To: linux...@kvack.org Content-Type: TEXT/PLAIN; charset=US-ASCII MIME-Version: 1.0 X-cc: linux-ker...@vger.kernel.org, Linus Torvalds <torva...@transmeta.com>, Matthew Dillon <dil...@apollo.backplane.com> Newsgroups: mlist.linux.kernel Hi, Here is the TODO list for the new VM. The only thing really needed for 2.4 is the OOM handler and the page->mapping->flush() callback is really wanted by the journaling filesystem folks. The rest are mostly extra's that would be nice; these things won't be pushed for inclusion except if it turns out to be really trivial to implement, high performance on the cases they're supposed to affect and their influence is highly localised... (sorry folks, but for 2.4 I'll be really conservative) ---> TODO list for the new VM <--- for kernel 2.4, necessary: - out of memory handling [integrate the OOM killer, 10 minutes work] for kernel 2.4, really wanted: - page->mapping->flush() callback in page_launder(), for easier integration with journaling filesystems and maybe the network filesystems [about 30 minutes of work on the VM side] for kernel 2.4, wanted: - include Ben LaHaise's code, which moves readahead to the VMA level, this way we can do streaming swap IO, complete with drop_behind() - code to make the "knee" smoother, currently the system keeps eating memory from the cache up to a certain point and then starts to swap a lot, it would be nice to smooth this curve a bit - thrashing control, maybe process suspension with some forced swapping ? for kernel 2.5: - physical->virtual reverse mapping, so we can do much better page aging with less CPU usage spikes - better IO clustering for swap (and filesystem) IO - move all the global VM variables, lists, etc. into the pgdat struct for better NUMA scalability - (maybe) some QoS things, as far as they are major improvements with minor intrusion regards, Rik -- "What you're running that piece of s*** Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@conectiva.com.br> Subject: TODO list for new VM (oct 2000) Date: 2000/10/02 Message-ID: <linux.kernel.Pine.LNX.4.21.0010021447430.22539-100000@duckman.distro.conectiva>#1/1 X-Deja-AN: 676743861 Approved: n...@nntp-server.caltech.edu X-To: linux-ker...@vger.kernel.org Content-Type: TEXT/PLAIN; charset=US-ASCII MIME-Version: 1.0 X-cc: linux...@kvack.org, Matthew Dillon <dil...@apollo.backplane.com>, Linus Torvalds <torva...@transmeta.com> Newsgroups: mlist.linux.kernel [MM TODO list, updated for october 2000] --- Here is the TODO list for the new VM. The only thing really needed for 2.4 is the OOM handler and a fix for the highmem deadlock. The page->mapping->flush() callback is really wanted by the journaling filesystem folks. The rest are mostly extra's that would be nice; these things won't be pushed for inclusion except if it turns out to be really trivial to implement, high performance on the cases they're supposed to affect and their influence is highly localised... (sorry folks, but for 2.4 I'll be really conservative) ---> TODO list for the new VM <--- for kernel 2.4, necessary: - out of memory handling [integrate the OOM killer, 10 minutes work] - fix the highmem deadlock, where the swapper cannot create low memory bounce buffers OR swap out low memory because it has consumed all resources [old bug, already reported with 2.4.0-test6, probably before] for kernel 2.4, really wanted: - page->mapping->flush() callback in page_launder(), for easier integration with journaling filesystems and maybe the network filesystems [about 30 minutes of work on the VM side] for kernel 2.4, wanted: - maybe rebalance the swapper a bit ... we do page aging now so maybe refill_inactive_scan() / shm_swap() and swap_out() need to be rebalanced a bit for kernel 2.5: (maybe available as patch for 2.4 ???) - physical->virtual reverse mapping, so we can do much better page aging with less CPU usage spikes - better IO clustering for swap (and filesystem) IO - move all the global VM variables, lists, etc. into the pgdat struct for better NUMA scalability - (maybe) some QoS things, as far as they are major improvements with minor intrusion - thrashing control, maybe process suspension with some forced swapping ? - include Ben LaHaise's code, which moves readahead to the VMA level, this way we can do streaming swap IO, complete with drop_behind() regards, Rik -- "What you're running that piece of s*** Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds <torva...@transmeta.com> Subject: Re: TODO list for new VM (oct 2000) Date: 2000/10/02 Message-ID: <linux.kernel.Pine.LNX.4.10.10010021117540.828-100000@penguin.transmeta.com>#1/1 X-Deja-AN: 676743872 Approved: n...@nntp-server.caltech.edu X-To: Rik van Riel <r...@conectiva.com.br> Content-Type: TEXT/PLAIN; charset=US-ASCII MIME-Version: 1.0 X-cc: linux-ker...@vger.kernel.org, linux...@kvack.org, Matthew Dillon <dil...@apollo.backplane.com> Newsgroups: mlist.linux.kernel Why do you apparently ignore the fact that page-out write-back performance is horribly crappy because it always starts out doing synchronous writes? I pointed out previously in a private email that page_launder() must be buggy as it stands now, you seem to have ignored that part (and the test-program that shows 1MB/s writeout speeds due to it) completely. The whole _point_ of the new VM was performance. Without that, the new VM is pointless, and discussing TODO features is equally pointless. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@conectiva.com.br> Subject: Re: TODO list for new VM (oct 2000) Date: 2000/10/02 Message-ID: <linux.kernel.Pine.LNX.4.21.0010021524140.22539-100000@duckman.distro.conectiva>#1/1 X-Deja-AN: 676743875 Approved: n...@nntp-server.caltech.edu X-To: Linus Torvalds <torva...@transmeta.com> Content-Type: TEXT/PLAIN; charset=US-ASCII MIME-Version: 1.0 X-cc: linux-ker...@vger.kernel.org, linux...@kvack.org, Matthew Dillon <dil...@apollo.backplane.com> Newsgroups: mlist.linux.kernel On Mon, 2 Oct 2000, Linus Torvalds wrote: > Why do you apparently ignore the fact that page-out write-back > performance is horribly crappy because it always starts out > doing synchronous writes? Because it is fixed in the patch I mailed yesterday? regards, Rik -- "What you're running that piece of s*** Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@conectiva.com.br> Subject: Re: TODO list for new VM (oct 2000) Date: 2000/10/02 Message-ID: <linux.kernel.Pine.LNX.4.21.0010021531360.22539-100000@duckman.distro.conectiva>#1/1 X-Deja-AN: 676743870 Approved: n...@nntp-server.caltech.edu X-To: Linus Torvalds <torva...@transmeta.com> Content-Type: TEXT/PLAIN; charset=US-ASCII MIME-Version: 1.0 X-cc: linux-ker...@vger.kernel.org, linux...@kvack.org, Matthew Dillon <dil...@apollo.backplane.com> Newsgroups: mlist.linux.kernel On Mon, 2 Oct 2000, Rik van Riel wrote: > On Mon, 2 Oct 2000, Linus Torvalds wrote: > > > Why do you apparently ignore the fact that page-out write-back > > performance is horribly crappy because it always starts out > > doing synchronous writes? > > Because it is fixed in the patch I mailed yesterday? One small warning though. Please don't apply that patch yet because I fixed 3 more small problems today. I'll send you an updated patch... - the compile warnings are fixed - in try_to_free_pages(), we forgot to set PF_MEMALLOC in current->flags (oops) - in grow_buffers(), in case we cannot get a buffer head, we must unlock the page A patch against 2.4.0-test9-pre8 with these 3 changes will be on its way once I've tested it a bit... regards, Rik -- "What you're running that piece of s*** Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Dillon <dil...@apollo.backplane.com> Subject: Re: TODO list for new VM (oct 2000) Date: 2000/10/04 Message-ID: <linux.kernel.200010050108.SAA83892@apollo.backplane.com>#1/1 X-Deja-AN: 677710359 Approved: n...@nntp-server.caltech.edu X-To: Rik van Riel <r...@conectiva.com.br> X-Cc: Linus Torvalds <torva...@transmeta.com>, linux-ker...@vger.kernel.org, linux...@kvack.org, Matthew Dillon <dil...@apollo.backplane.com> Newsgroups: mlist.linux.kernel :On Mon, 2 Oct 2000, Rik van Riel wrote: :> On Mon, 2 Oct 2000, Linus Torvalds wrote: :> :> > Why do you apparently ignore the fact that page-out write-back :> > performance is horribly crappy because it always starts out :> > doing synchronous writes? :> :> Because it is fixed in the patch I mailed yesterday? : :One small warning though. Please don't apply that patch :yet because I fixed 3 more small problems today. I'll :send you an updated patch... :... :regards, : :Rik My experience with FreeBSD's asynchronous paging is that you have to carefully limit the number of I/O's you queue at once. Or, more specifically, you have to limit the seeking load the async pageouts place on the system. The performance curve from the point of user processes in the system looks like a bell, while the paging performance looks like a log curve (increased performance with diminishing returns)... if you queue too few pages (degenerate into synchronous paging), you have low paging performance and high user process performance, but you can't clean pages fast enough in a heavily loaded system. If you queue too many pages at once, you have high paging performance (but with diminishing returns) and low user process performance due to the seeking load you've placed on the disk. Excessive seeking from pageouts will ruin the disk's performance from the point of view of other processes in the system. FreeBSD has a sysctl variable called vm.max_page_launder which limits the number of pages the pageout daemon will queue to I/O at once. The default is 32. Numbers between 16 and 32 were found to fit the sweet spot of the curve the best. Numbers lower then 16 reduced system performance because potentially contiguous pageouts would get split (causing more seeking rather then less when mixed with I/O initiated from user processes), and numbers higher then 32 reduced user process performance due to the additional seeking from the queued pageouts. The sysadmin can adjust the value to effectively give paging more or less priority. A smaller number reduces paging performance but increasing system performance for other processes (though anything less then 4 will reduce performance for everyone). A higher number increases paging performance at the cost of system performance for other processes. Virtually all FreeBSD installations that I know about leave the sysctl variable alone. Note that the performance bell holds true whether you sort disk requests or not, the whole bell simply moves up or down on the graph. There are a number of things that can be done to mitigate the seeking issue, which I discussed with Rik a few months ago. The jist of it, though, is that there is a trade-off between page-in and page-out performance based on how you try to cluster swap allocation. FreeBSD clusters swap allocations to optimize page-out performance at the cost of page-in performance and that seems to work very well under heavy system loads. -Matt Matthew Dillon <dil...@backplane.com> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/