Message-ID: <3D3500AA.131CE2EB@zip.com.au> Date: Wed, 17 Jul 2002 07:30:06 +0200 From: Andrew Morton <a...@zip.com.au> X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.4.19-pre9 i686) X-Accept-Language: en MIME-Version: 1.0 Subject: [patch 1/13] minimal rmap Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: a.27.anti-phl.bofh.it Newsgroups: linux.kernel Organization: linux.*_mail_to_news_unidirectional_gateway Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu! cyclone.bc.net!news.mailgate.org!bofh.it!robomod X-Original-Cc: lkml <linux-ker...@vger.kernel.org> X-Original-Date: Tue, 16 Jul 2002 22:29:14 -0700 X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: Linus Torvalds <torva...@transmeta.com> Lines: 1759 This is the "minimal rmap" patch, writen by Rik, ported to 2.5 by Craig Kulsea. Basically, before: When the page reclaim code decides that is has scanned too many unreclaimable pages on the LRU it does a scan of process virtual address spaces for pages to add to swapcache. ptes pointing at the page are unmapped as the scan proceeds. When all ptes referring to a page have been unmapped and it has been written to swap the page is reclaimable. after: When an anonymous page is encountered on the tail of the LRU we use the rmap to see if it hasn't been referenced lately. If so then add it to swapcache. When the page is again encountered on the LRU, if it is still unreferenced then try to unmap all ptes which refer to it in one hit, and if it is clean (ie: on swap) then free it. The rest of the VM - list management, the classzone concept, etc remains unchanged. There are a number of things which the per-page pte chain could be used for. Bill Irwin has identified the following. (1) page replacement no longer goes around randomly unmapping things (2) referenced bits are more accurate because there aren't several ms or even seconds between find the multiple pte's mapping a page (3) reduces page replacement from O(total virtually mapped) to O(physical) (4) enables defragmentation of physical memory (5) enables cooperative offlining of memory for friendly guest instance behavior in UML and/or LPAR settings (6) demonstrable benefit in performance of swapping which is common in end-user interactive workstation workloads (I don't like the word "desktop"). c.f. Craig Kulesa's post wrt. swapping performance (7) evidence from 2.4-based rmap trees indicates approximate parity with mainline in kernel compiles with appropriate locking bits (8) partitioning of physical memory can reduce the complexity of page replacement searches by scanning only the "interesting" zones implemented and merged in 2.4-based rmap (9) partitioning of physical memory can increase the parallelism of page replacement searches by independently processing different zones implemented, but not merged in 2.4-based rmap (10) the reverse mappings may be used for efficiently keeping pte cache attributes coherent (11) they may be used for virtual cache invalidation (with changes) (12) the reverse mappings enable proper RSS limit enforcement implemented and merged in 2.4-based rmap The code adds a pointer to struct page, consumes additional storage for the pte chains and adds computational expense to the page reclaim code (I measured it at 3% additional load during streaming I/O). The benefits which we get back for all this are, I must say, theoretical and unproven. If it has real advantages (or, indeed, disadvantages) then why has nobody demonstrated them? There are a number of things remaining to be done: 1: Demonstrate the above advantages. 2: Make it work with pte-highmem (Bill Irwin is signed up for this) 3: Don't add pte_chains to non-shared pages optimisation (Dave McCracken's patch does this) 4: Move the pte_chains into highmem too (Bill, I guess) 5: per-cpu pte_chain freelists (Rik?) 6: maybe GC the pte_chain backing pages. (Seems unavoidable. Rik?) 7: multithread the page reclaim code. (I have patches). 8: clustered add-to-swap. Not sure if I buy this. anon pages are often well-ordered-by-virtual-address on the LRU, so it "just works" for benchmarky loads. But there may be some other loads... 9: Fix bad IO latency in page reclaim (I have lame patches) 10: Develop tuning tools, use them. 11: The nightly updatedb run is still evicting everything. Patch . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Date: Wed, 17 Jul 2002 10:30:12 +0200 From: Russell King <r...@arm.linux.org.uk> Subject: Re: [patch 1/13] minimal rmap Message-ID: <20020717092446.A4329@flint.arm.linux.org.uk> References: <3D3500AA.131CE2EB@zip.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3D3500AA.131CE2EB@zip.com.au>; from akpm@zip.com.au on Tue, Jul 16, 2002 at 10:29:14PM -0700 Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: a.645.anti-phl.bofh.it Newsgroups: linux.kernel Organization: linux.*_mail_to_news_unidirectional_gateway Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu! news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!nntp.infostrada.it! bofh.it!robomod X-Original-Cc: Linus Torvalds <torva...@transmeta.com>, lkml <linux-ker...@vger.kernel.org> X-Original-Date: Wed, 17 Jul 2002 09:24:46 +0100 X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: Andrew Morton <a...@zip.com.au> Lines: 53 On Tue, Jul 16, 2002 at 10:29:14PM -0700, Andrew Morton wrote: I'm puzzling over this difference: > --- /dev/null Thu Aug 30 13:30:55 2001 > +++ 2.5.26-akpm/include/asm-arm/proc-armv/rmap.h Tue Jul 16 21:59:40 2002 >... > +static inline void pgtable_add_rmap(pte_t * ptep, struct mm_struct * mm, unsigned long address) > +{ > + struct page * page = virt_to_page(ptep); > + > + page->mm = mm; > + page->index = address & ~((PTRS_PER_PTE * PAGE_SIZE) - 1); > +} and > --- /dev/null Thu Aug 30 13:30:55 2001 > +++ 2.5.26-akpm/include/asm-generic/rmap.h Tue Jul 16 21:59:40 2002 > +static inline void pgtable_add_rmap(struct page * page, struct mm_struct * mm, unsigned long address) > +{ > +#ifdef BROKEN_PPC_PTE_ALLOC_ONE > + /* OK, so PPC calls pte_alloc() before mem_map[] is setup ... ;( */ > + extern int mem_init_done; > + > + if (!mem_init_done) > + return; > +#endif > + page->mapping = (void *)mm; > + page->index = address & ~((PTRS_PER_PTE * PAGE_SIZE) - 1); > +} Note that the ARM one seems to be using page->mm but everything else uses page->mapping. Also, this comment: > + * ARM is different since hardware page tables are smaller than > + * the page size and Linux uses a "duplicate" one with extra info. > + * For rmap this means that the first 2 kB of a page are the hardware > + * page tables and the last 2 kB are the software page tables. is no longer true for 2.5 (although it is still true for 2.4.) -- Russell King (r...@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Date: Wed, 17 Jul 2002 14:20:08 +0200 From: Rik van Riel <r...@conectiva.com.br> X-X-Sender: r...@imladris.surriel.com Subject: Re: [patch 1/13] minimal rmap In-Reply-To: <20020717092446.A4329@flint.arm.linux.org.uk> Message-ID: <Pine.LNX.4.44L.0207170908130.12241-100000@imladris.surriel.com> X-Spambait: aardv...@kernelnewbies.org X-Spammeplease: aardv...@nl.linux.org MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: a.866.anti-phl.bofh.it Newsgroups: linux.kernel Organization: linux.*_mail_to_news_unidirectional_gateway Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu! newsmi-us.news.garr.it!newsmi-eu.news.garr.it!newsrm.news.garr.it! NewsITBone-GARR!newsfeeder.edisontel.com!bofh.it!robomod References: <20020717092446.A4329@flint.arm.linux.org.uk> X-Original-Cc: Andrew Morton <a...@zip.com.au>, Linus Torvalds <torva...@transmeta.com>, lkml <linux-ker...@vger.kernel.org> X-Original-Date: Wed, 17 Jul 2002 09:10:08 -0300 (BRT) X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: Russell King <r...@arm.linux.org.uk> Lines: 31 On Wed, 17 Jul 2002, Russell King wrote: > I'm puzzling over this difference: > > > --- /dev/null Thu Aug 30 13:30:55 2001 > > +++ 2.5.26-akpm/include/asm-arm/proc-armv/rmap.h Tue Jul 16 21:59:40 2002 Then I guess I messed up the ARM rmap.h for 2.5. I knew it had to be different than the 2.4 one somehow and was under the impression that you changed the pagetable layout in 2.5 to have "4 kB page tables" with 2 kB hardware and 2 kB software page tables in the same page. The page->mm thing is a stupid, stupid typo. I guess akpm didn't have an ARM machine for testing, either ;) regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Date: Wed, 17 Jul 2002 14:30:08 +0200 From: Rik van Riel <r...@conectiva.com.br> X-X-Sender: r...@imladris.surriel.com Subject: Re: [patch 1/13] minimal rmap In-Reply-To: <3D3500AA.131CE2EB@zip.com.au> Message-ID: <Pine.LNX.4.44L.0207170914060.12241-100000@imladris.surriel.com> X-Spambait: aardv...@kernelnewbies.org X-Spammeplease: aardv...@nl.linux.org MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: robo...@news.nic.it X-Mailing-List: linux-kernel@vger.kernel.org Approved: robo...@news.nic.it (1.20) NNTP-Posting-Host: a.897.anti-phl.bofh.it Newsgroups: linux.kernel Organization: linux.*_mail_to_news_unidirectional_gateway Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu! news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.icl.net! newsfeed.fjserv.net!news.mailgate.org!bofh.it!robomod References: <3D3500AA.131CE2EB@zip.com.au> X-Original-Cc: Linus Torvalds <torva...@transmeta.com>, lkml <linux-ker...@vger.kernel.org> X-Original-Date: Wed, 17 Jul 2002 09:21:50 -0300 (BRT) X-Original-Sender: linux-kernel-ow...@vger.kernel.org X-Original-To: Andrew Morton <a...@zip.com.au> Lines: 59 On Tue, 16 Jul 2002, Andrew Morton wrote: > The rest of the VM - list management, the classzone concept, etc > remains unchanged. > 5: per-cpu pte_chain freelists (Rik?) Will look into this soon. > 6: maybe GC the pte_chain backing pages. (Seems unavoidable. Rik?) And probably into this, if it turns out that we're wasting too much memory in no longer used pte_chains in real workloads, which will probably happen ;) > 7: multithread the page reclaim code. (I have patches). Rmap for 2.4 also has some code which could be used for this. > 8: clustered add-to-swap. Not sure if I buy this. anon pages are > often well-ordered-by-virtual-address on the LRU, so it "just > works" for benchmarky loads. But there may be some other loads... Benchmarky loads without a working set probably aren't all that suitable for evaluating page replacement. VM (and general caching) works _because_ of the working set property. Does anybody know of a working set simulator we could use to test things like this ? > 9: Fix bad IO latency in page reclaim (I have lame patches) > > 10: Develop tuning tools, use them. > > 11: The nightly updatedb run is still evicting everything. That's the "minimal" part of "minimal rmap" ;)) Ed Tomlinson has some code for 11), which should be mergeable soon. In combination with changed page replacement priorities we'll be able to make sure updatedb won't evict everything. The importance of rmap here is making sure we're _able_ to do this kind of tuning instead of tweaking the same magic knobs we've (unsuccessfully) tweaked in the last 8 years. regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/