From: David Grothe <d...@gcom.com> Subject: get_vm_area and GFP_KERNEL Date: 1998/10/22 Message-ID: <fa.dvn9shv.n6ou9q@ifi.uio.no>#1/1 X-Deja-AN: 403997034 Original-Date: Thu, 22 Oct 1998 12:33:27 -0500 Sender: owner-linux-ker...@vger.rutgers.edu Content-Transfer-Encoding: 7bit Original-Message-ID: <362F6C67.96A22EE6@gcom.com> To: Linux Kernel <linux-ker...@vger.rutgers.edu>, Alan Cox <a...@lxorguk.ukuu.org.uk> Content-Type: text/plain; charset=us-ascii X-Orcpt: rfc822;linux-ker...@vger.rutgers.edu Organization: Gcom, Inc MIME-Version: 1.0 Reply-To: d...@gcom.com Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Hello: The routine get_vm_area calls kmalloc passing GFP_KERNEL. Get_vm_area is called from vremap (2.0 kernels) and ioremap (2.1 kernels). These routines are used to map physical memory (as in PCI device registers) to kernel address space. The problem: If these routines are called from a bottom half task queue routine then kmalloc thinks that it is being called from interrupt level without GFP_ATOMIC and prints a message to that effect. Would any harm be done by changing this? These would be the patches: Kernel version 2.0.36: --- vmalloc.c Wed Jun 3 17:17:50 1998 +++ vmalloc.new.c Thu Oct 22 12:25:41 1998 @@ -239,7 +239,7 @@ void *addr; struct vm_struct **p, *tmp, *area; - area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_KERNEL); + area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_ATOMIC); if (!area) return NULL; addr = (void *) VMALLOC_START; Kernel version 2.1.120: --- vmalloc.c Fri Sep 18 15:43:33 1998 +++ vmalloc.new.c Thu Oct 22 12:24:35 1998 @@ -155,7 +155,7 @@ unsigned long addr; struct vm_struct **p, *tmp, *area; - area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_KERNEL); + area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_ATOMIC); if (!area) return NULL; addr = VMALLOC_START; To the casual observer, it looks safe. Will someone make these modifications to the kernel source? Thanks, Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: "Benjamin C.R. LaHaise" <b...@kvack.org> Subject: Re: get_vm_area and GFP_KERNEL Date: 1998/10/22 Message-ID: <fa.k0m9mtv.f5i1is@ifi.uio.no>#1/1 X-Deja-AN: 404034403 Original-Date: Thu, 22 Oct 1998 15:41:37 -0400 (EDT) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.95.981022152401.20122A-100000@as200.spellcast.com> References: <fa.dvn9shv.n6ou9q@ifi.uio.no> To: David Grothe <d...@gcom.com> X-Sender: b...@as200.spellcast.com Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-ker...@vger.rutgers.edu Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Thu, 22 Oct 1998, David Grothe wrote: ... > The problem: If these routines are called from a bottom half task queue > routine then kmalloc thinks that it is being called from interrupt level > without GFP_ATOMIC and prints a message to that effect. But who is calling ioremap or vremap from a bottom half? That isn't permitted! Device drivers are supposed to setup their mappings ahead of time, right? > Would any harm be done by changing this? Yes, reliability of vmalloc/ioremap would be degraded, whereas many of the callers expect the system to make a reasonable attempt to perform the operation (we all know about the sound & floppy driver transient failures due to DMA). If such a hack must be added, it should use something like the gfp_any macro. -ben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: David Grothe <d...@gcom.com> Subject: Re: get_vm_area and GFP_KERNEL Date: 1998/12/30 Message-ID: <fa.dso5pfv.l1c2q5@ifi.uio.no>#1/1 X-Deja-AN: 426982918 Original-Date: Tue, 29 Dec 1998 15:48:01 -0600 Sender: owner-linux-ker...@vger.rutgers.edu Content-Transfer-Encoding: 7bit Original-Message-ID: <36894E11.C5334911@gcom.com> References: <fa.k0m9mtv.f5i1is@ifi.uio.no> To: "Benjamin C.R. LaHaise" <b...@kvack.org> Original-References: <Pine.LNX.3.95.981022152401.20122A-100...@as200.spellcast.com> Content-Type: text/plain; charset=us-ascii X-Orcpt: rfc822;linux-ker...@vger.rutgers.edu Organization: Gcom, Inc MIME-Version: 1.0 Reply-To: d...@gcom.com Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Benjamin C.R. LaHaise wrote: > On Thu, 22 Oct 1998, David Grothe wrote: > > ... > > The problem: If these routines are called from a bottom half task queue > > routine then kmalloc thinks that it is being called from interrupt level > > without GFP_ATOMIC and prints a message to that effect. > > But who is calling ioremap or vremap from a bottom half? That isn't > permitted! Device drivers are supposed to setup their mappings ahead of > time, right? Not in my case. I have drivers that execute queued requests from a bottom half routine. The driver accepts configuration information from a user-space utility that runs long after boot time. The configuration information specifies such hardware parameters as IRQ and shared memory address. I don't understand what the problem is here. When get_vm_area is called with GFP_KERNEL and it finds itself "at interrupt level", it simply changes the priority parameter to GFP_ATOMIC and proceeds with the request. The allocation routines seem to have code in them to protect critical sections from being interrupted. So why have the restriction on when device memory can be mapped? Or why not create a variant of ioremap/vremap with a "priority" parameter so that I can have some control over this? Thanks, Dave > > > > Would any harm be done by changing this? > > Yes, reliability of vmalloc/ioremap would be degraded, whereas many of the > callers expect the system to make a reasonable attempt to perform the > operation (we all know about the sound & floppy driver transient failures > due to DMA). If such a hack must be added, it should use something like > the gfp_any macro. If so, then why does get_vm_area simply change the value of the priority parameter and proceed with the request? -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: David Grothe <d...@gcom.com> Subject: get_vm_area needs spin lock Date: 1998/12/31 Message-ID: <fa.een5qhv.n681ac@ifi.uio.no>#1/1 X-Deja-AN: 427334243 Original-Date: Wed, 30 Dec 1998 10:27:57 -0600 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <368A548D.23BBF8DE@gcom.com> References: <fa.k0m9mtv.f5i1is@ifi.uio.no> To: "Benjamin C.R. LaHaise" <b...@kvack.org> Original-References: <Pine.LNX.3.95.981022152401.20122A-100...@as200.spellcast.com> Content-Type: multipart/mixed; boundary="------------D18E65256A1F34E347650765" X-Orcpt: rfc822;linux-ker...@vger.rutgers.edu Organization: Gcom, Inc MIME-Version: 1.0 Reply-To: d...@gcom.com Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu This is a multi-part message in MIME format. --------------D18E65256A1F34E347650765 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Kernel hackers: As a result of some correspondence concerning the routine get_vm_area, I have been taking a close look at that routine. It seems to me that in an SMP environment this routine needs spin lock protection around the loop that runs down the linked list of areas. If two or more CPUs are executing this routine simultaneously the list could get tangled. I am enclosing a patch for 2.2.0pre1 (mm/vmalloc.c) that I think fixes this problem. The patch is enclosed as a MIME attachment to avoid line-wrapping problems with cutting/pasting into an e-mail. I put in spin lock protection for all cases of traversing the linked list. Someone who knows the code better could probably find shorter spans of code to protect. This problem would not be present in a 2.0 kernel since the SMP technique there is to protect the entire kernel with one lock. -- Dave --------------D18E65256A1F34E347650765 Content-Type: text/plain; charset=us-ascii; name="vmalloc.c.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="vmalloc.c.patch" --- vmalloc.c.old Fri Nov 20 13:43:19 1998 +++ vmalloc.c Wed Dec 30 09:50:14 1998 @@ -10,6 +10,7 @@ #include <asm/uaccess.h> static struct vm_struct * vmlist = NULL; +static spinlock_t vmlist_spinlock; static inline void free_area_pte(pmd_t * pmd, unsigned long address, unsigned long size) { @@ -152,28 +153,34 @@ struct vm_struct * get_vm_area(unsigned long size) { unsigned long addr; + unsigned long save_flags; struct vm_struct **p, *tmp, *area; - area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_KERNEL); + area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_ATOMIC); if (!area) return NULL; addr = VMALLOC_START; + spin_lock_irqsave(&vmlist_spinlock, save_flags); for (p = &vmlist; (tmp = *p) ; p = &tmp->next) { if (size + addr < (unsigned long) tmp->addr) break; - if (addr > VMALLOC_END-size) + if (addr > VMALLOC_END-size) { + spin_unlock_irqrestore(&vmlist_spinlock, save_flags); return NULL; + } addr = tmp->size + (unsigned long) tmp->addr; } area->addr = (void *)addr; area->size = size + PAGE_SIZE; area->next = *p; *p = area; + spin_unlock_irqrestore(&vmlist_spinlock, save_flags); return area; } void vfree(void * addr) { + unsigned long save_flags; struct vm_struct **p, *tmp; if (!addr) @@ -182,14 +189,17 @@ printk("Trying to vfree() bad address (%p)\n", addr); return; } + spin_lock_irqsave(&vmlist_spinlock, save_flags); for (p = &vmlist ; (tmp = *p) ; p = &tmp->next) { if (tmp->addr == addr) { *p = tmp->next; + spin_unlock_irqrestore(&vmlist_spinlock, save_flags); vmfree_area_pages(VMALLOC_VMADDR(tmp->addr), tmp->size); kfree(tmp); return; } } + spin_unlock_irqrestore(&vmlist_spinlock, save_flags); printk("Trying to vfree() nonexistent vm area (%p)\n", addr); } @@ -217,11 +227,13 @@ struct vm_struct *tmp; char *vaddr, *buf_start = buf; unsigned long n; + unsigned long save_flags; /* Don't allow overflow */ if ((unsigned long) addr + count < count) count = -(unsigned long) addr; + spin_lock_irqsave(&vmlist_spinlock, save_flags); for (tmp = vmlist; tmp; tmp = tmp->next) { vaddr = (char *) tmp->addr; if (addr >= vaddr + tmp->size - PAGE_SIZE) @@ -245,5 +257,6 @@ } while (--n > 0); } finished: + spin_unlock_irqrestore(&vmlist_spinlock, save_flags); return buf - buf_start; } --------------D18E65256A1F34E347650765-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: torva...@transmeta.com (Linus Torvalds) Subject: Re: get_vm_area needs spin lock Date: 1998/12/31 Message-ID: <fa.i1u35ev.1dmee9r@ifi.uio.no>#1/1 X-Deja-AN: 427474451 Original-Date: 31 Dec 1998 06:54:13 GMT Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <76f72l$aqi$1@palladium.transmeta.com> References: <fa.een5qhv.n681ac@ifi.uio.no> To: linux-ker...@vger.rutgers.edu Original-References: <Pine.LNX.3.95.981022152401.20122A-100...@as200.spellcast.com> <368A548D.23BBF...@gcom.com> X-Authentication-Warning: palladium.transmeta.com: bin set sender to n...@transmeta.com using -f X-Orcpt: rfc822;linux-ker...@vger.rutgers.edu Organization: Transmeta Corporation, Santa Clara, CA Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu In article <368A548D.23BBF...@gcom.com>, David Grothe <d...@gcom.com> wrote: > >As a result of some correspondence concerning the routine get_vm_area, I >have been taking a close look at that routine. It seems to me that in >an SMP environment this routine needs spin lock protection around the >loop that runs down the linked list of areas. If two or more CPUs are >executing this routine simultaneously the list could get tangled. This should never be the case: vmalloc() should not be called without holding the kernel lock. Whether that is actually the case right now is not something I have verified, but I would be surprised if it isn't. As such, 2.1.x is in the same position as 2.0.x in this regard. Has there actually been problems that caused you to look at this? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@e-mind.com> Subject: Re: get_vm_area needs spin lock Date: 1999/01/01 Message-ID: <fa.iq0je6v.1dgmph7@ifi.uio.no>#1/1 X-Deja-AN: 427607064 Original-Date: Thu, 31 Dec 1998 18:56:14 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.96.981231183721.658B-100000@laser.bogus> References: <fa.een5qhv.n681ac@ifi.uio.no> To: David Grothe <d...@gcom.com> X-Sender: and...@laser.bogus Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-ker...@vger.rutgers.edu X-PgP-Public-Key-URL: http://e-mind.com/~andrea/aa.asc Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Wed, 30 Dec 1998, David Grothe wrote: > I am enclosing a patch for 2.2.0pre1 (mm/vmalloc.c) that I think fixes > this problem. The patch is enclosed as a MIME attachment to avoid There is no need for irqsave since vmalloc can be used only from a normal kernel context (noirq and nobh). And there's no reason for GFP_ATOMIC I think. So I changed a bit your patch this way: Index: linux/mm/vmalloc.c diff -u linux/mm/vmalloc.c:1.1.1.2 linux/mm/vmalloc.c:1.1.1.1.2.3 --- linux/mm/vmalloc.c:1.1.1.2 Fri Nov 27 11:19:11 1998 +++ linux/mm/vmalloc.c Thu Dec 31 18:55:11 1998 @@ -10,6 +10,7 @@ #include <asm/uaccess.h> static struct vm_struct * vmlist = NULL; +static spinlock_t vmlist_spinlock; static inline void free_area_pte(pmd_t * pmd, unsigned long address, unsigned long size) { @@ -158,17 +159,21 @@ if (!area) return NULL; addr = VMALLOC_START; + spin_lock(&vmlist_spinlock); for (p = &vmlist; (tmp = *p) ; p = &tmp->next) { if (size + addr < (unsigned long) tmp->addr) break; - if (addr > VMALLOC_END-size) + if (addr > VMALLOC_END-size) { + spin_unlock(&vmlist_spinlock); return NULL; + } addr = tmp->size + (unsigned long) tmp->addr; } area->addr = (void *)addr; area->size = size + PAGE_SIZE; area->next = *p; *p = area; + spin_unlock(&vmlist_spinlock); return area; } @@ -182,14 +187,18 @@ printk("Trying to vfree() bad address (%p)\n", addr); return; } + spin_lock(&vmlist_spinlock); for (p = &vmlist ; (tmp = *p) ; p = &tmp->next) { if (tmp->addr == addr) { *p = tmp->next; - vmfree_area_pages(VMALLOC_VMADDR(tmp->addr), tmp->size); + spin_unlock(&vmlist_spinlock); + vmfree_area_pages(VMALLOC_VMADDR(tmp->addr), + tmp->size - PAGE_SIZE); kfree(tmp); return; } } + spin_unlock(&vmlist_spinlock); printk("Trying to vfree() nonexistent vm area (%p)\n", addr); } @@ -222,6 +231,7 @@ if ((unsigned long) addr + count < count) count = -(unsigned long) addr; + spin_lock(&vmlist_spinlock); for (tmp = vmlist; tmp; tmp = tmp->next) { vaddr = (char *) tmp->addr; if (addr >= vaddr + tmp->size - PAGE_SIZE) @@ -245,5 +255,6 @@ } while (--n > 0); } finished: + spin_unlock(&vmlist_spinlock); return buf - buf_start; } Andrea Arcangeli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/