Subject: demand paging: proposal Date: Sat, 16 Nov 91 16:28:57 PST From: pmacdona@sol.UVic.CA (Peter MacDonald) To: linux-activists@joker.cs.hut.fi I finally bit the bullet. Yup, I blew away my dos partition and put Linux on it. Now I would like to start a project. I consider virtual consoles to be high priority, but until init/login etc is done, there is probably no use starting that. Thus I am proposing to look at demand paging from the file system. If Linus agrees to consider adding it to linux when it is done, and nobody successfully shoots this proposal down, I will start tuit suite. If someone else wants to help (or do all of it) let me know. I have broken it down into phases to clarify understanding, not necessarily to imply they might be released in this order (if ever). If you think this is a house of cards, let me know ASAP. Proposed Design: Phase 1: - Upon loading an executable, create a map that is stored in the process that locates all blocks on disk. Do not look at fs again. - Load only the first 4K page and execute. - Upon a code page fault load the required 4 blocks into ram. - Make no attempt to lock file image (count on seg violation?) Phase 2: - Attempt to share executable images in ram (shared-text). Phase 3: - Attempt to implement the stickey bit, to pin an executable in memory once loaded. - Find a way to flush it (all) from memory when done. Phase 3: - Attempt to manage working sets in memory if data requirements exceed available ram (down to ~15%). Phase 4: - Paging (writing) data to a partition or fixed size file. - Locking paged image file. Issues: - Allocating/deallocating memory for the program maps. - Enable/disable paging when booting from shoelace? - Do not use working set with pinned pages? - File locks held in ram only?
Subject: alternative to paging Date: Sat, 16 Nov 91 17:07:24 PST From: pmacdona@sol.UVic.CA (Peter MacDonald) To: linux-activists@joker.cs.hut.fi I just downloaded sed, tr etc from TSX-11 and are they ever big! An alternative to the demand paging scenario I just posted would solve another, probably more pressing, problem. 9k : size of minix executable of diff 36k : size of gcc compiled executable of diff 4k : size of gcc compiled diff.o It won't be to long before my disk flow'th over. Not to mention ram requirments. Can anyone give me any advice on implementing dynamic link shared libraries? I need to figure out: - should the entire shared library be kept in ram? (simplest?) - how best to map this into each processes code space? - how to modify the linker and runtime loader?
Subject: Re: demand paging: proposal Date: Sat, 16 Nov 91 22:47:00 -0500 From: tytso@ATHENA.MIT.EDU (Theodore Ts'o) To: pmacdona@sol.UVic.CA Cc: linux-activists@joker.cs.hut.fi In-Reply-To: Peter MacDonald's message of Sat, 16 Nov 91 16:28:57 PST, Reply-To: tytso@athena.mit.edu From: pmacdona@sol.UVic.CA (Peter MacDonald) > - Attempt to implement the stickey bit, to pin an executable > in memory once loaded. Everyone should note that this is not the original meaning of the sticky bit, in the BSD 4.3 sense. In the BSD sense, what the sticky bit means is that after the program exits, its entry in the text table is not purged and its swap space is not reclaimed. The program was _not_ locked into memory, and would get swapped out if demands were placed on the VM system. The rationale behind this is that in a time-sharing environment, programs like GNU emacs are almost in use by *someone*, and in the case where the last person finishes running emacs, the program should not be flushed from the text cache, swap space, and ram, in the hope that another user would be starting an emacs before the unused program got swapped out to disk. In that case, the user would win since no disk I/O would be needed. However, in a single-user environment, the value of the sticky bit is rather dubious, and Project Athena workstations don't even bother using it for this reason. That being said, I'm not too sure that pinning an executable into memory is a good idea. Unless you have gobs and gobs of memory, you wouldn't be able to "sticky bit" more than one or two programs before you run out of physical memory, and in the meantime, locking down large amounts of memory would increase the amount of VM thrashing when other programs are running. A better idea might be to not flush the text table entry once its ref count goes to zero, but to rather wait until the last of its text pages are paged out (which will happen sooner rather or later because there is no program using the text). However, if the text is used by a process before the last pages are reclaimed then you will win because some number of pages won't need to be brought back in from disk. Thus, this system will do the right thing for someone who has enough memory to keep gcc, make, etc. in core at the same time during a build. The advantage that this scheme has is that it will automatically adapt to whatever text image is being repeatedly used --- you don't need to sticky bit a limited number of processes. For this reason, it is also a much fairer system. ------------------------------------ >9k : size of minix executable of diff >36k : size of gcc compiled executable of diff >4k : size of gcc compiled diff.o >It won't be to long before my disk flow'th over. Not to mention >ram requirments. The culprit here is the floating point library (i.e., the soft floats); even if your program isn't using floats, printf() drags large portions of it in. The bottom line is that every integer-only executable will be roughly 10-20k bigger than it has to be. That's most of the problem, anyway; the most of the rest of it is due to things like gmalloc.o, which is 4979 bytes. (malloc.o on the vax is 1176 bytes). There are a couple of hack solutions, which I perhaps shouldn't mention because they might actually get implemented :-) One way to do it would be to include the soft-float library in the kernel, and map those pages read-only at the top of the address space for every process. This has the advantage that you don't need to run a linker at run-time, since the addresses for the soft-float library would be well-known constants. (you'd probably put a jump table at the tippy-top, and make all the rum-time routines reference the jump table). This is very ugly, and it is only (barely) justifiable because (1) every single program is going to need the floating point library and (2) it is unlikely that the FP library will be changing often, so requiring that you recompile the kernel to replace the FP library isn't quite as nasty as it first seems. It is definitely not something that you would want to use for sharing copies of the dbm library or the X toolkit. The only reason why I mention it is that doing shared libraries for real would probably entail a lot more work. :-) ----------------------------------- I've noticed that there is a bug in the system call waitpid() --- the parent process gets told that exit code for the child process is always zero. I've already sent a patch to Linus privately, but if anyone else is interested in the batch, it is available on TSX-11.MIT.EDU, in /pub/linux/patches/exit.c.patch . I've also uploaded tar files containg the binaries and libraries of Bison and Flex to TSX-11, nic.funet.fi, and tupac-amaru. Bison and Flex are yacc and lex replacements from the GNU project. Flex passes all but the COMPRESSION=-Cf and COMPRESSION=-CF regressions tests --- however, it looks like it's not a fault of flex, but rather of GCC. It looks like gcc is not compiling the resulting (very large) scan.c files correctly. For normal usage, however, flex (and gcc) should work just fine. Enjoy! - Ted
Subject: more on 486/33 weirdness Date: Sat, 16 Nov 91 23:11:47 PST To: linux-activists@joker.cs.hut.fi From: John T Kohl <jtkohl@cs.berkeley.edu> Reply-To: jtkohl@cs.berkeley.edu Here are more details on my problems with booting in turbo mode: system config: 486DX/33, 256K cache, 8MB main memory American Megatrends, Inc. BIOS real-time clock 1 3.5" floppy (A:), 1 5.25" floppy (B:) 1 200MB IDE drive (connor) C:, /dev/hd2, /dev/hd3, D: partitions (in that order) SVGA: (Orchid ProDesigner IIs/1MB) I boot off the 3.5" floppy, with root partition on /dev/hd2 and/or /dev/hd3. sample errors when in "turbo" (33MHz) mode: <startup stuff> Partition table Ok. general protection: 0000 EIP 0008:0000CBB3 EFLAGS 0010206 ESP 0300:0001D210 fs: 0010 base 04000000, limit 000A0000 Pid: 1, process number 1 f3 a5 50 e8 7d fb ff ff 83 c4 ... or general protection: 67b4 EIP 10008:00000004 EFLAGS 00010206 ESP 0020:000067B6 fs: 0010 base 04000000, limit 000A0000 Pid: 1, process number 1 07 20 00 00 27 30 00 00 07 40 sometimes I get two "general protection" errors (same process, usually). When I toggle the turbo on & off while running, I don't observe any change in running times of programs (but I can discern differences under MS-DOS 5.0), even when I run programs which try to time themselves. Any suggestions?
Subject: Re: more on 486/33 weirdness Date: Sun, 17 Nov 91 17:20:38 WST From: nicholas@cs.uwa.oz.au (Nicholas Yue) To: jtkohl@cs.berkeley.edu, linux-activists@joker.cs.hut.fi G'day, I had a similar observation when using the 386/387 assembly. I had wanted a mathlib badly so I made try to compile the code from DJ's GPP (for MS-DOS). The maths routine are written mostly in assembly language. I compiled the C and assembly code and made libm.a, ranlib the things. All without problem. I then proceeded to write a small C program with a sin() and tan() call. That is when I had device not available: 0000 EIP: 000f:00000060 EFLAGS: 00010246 ESP: 0017:03FFFED0 fs: 0010 base: 10000000. limit: 04000000 Stack: 00000044 66666666 40026666 03fffe48 Pid: 2123, process nr: 4 dd 44 24 04 d9 fe df e0 9e 7b A wild guess, does it have anything to do with the fact that on the 486, the maths coprocessor is already incorporated onto the chip it-self? BTW, I am trying to compile the bsd-reno's maths library temporary. Anybody with warning, advice, question please e-mail to me. nicholas@cs.uwa.oz.au
Subject: demand-loding etc Date: Sun, 17 Nov 1991 12:36:42 +0200 From: Linus Benedict Torvalds <torvalds@cc.helsinki.fi> To: Linux-activists@joker.cs.hut.fi pmacdona@sol.UVic.CA (Peter MacDonald) > Thus I am proposing to look at demand paging from the file system. > If Linus agrees to consider adding it to linux when it is done, > and nobody successfully shoots this proposal down, I will start > tuit suite. > Proposed Design: > > Phase 1: > [deleted] You don't start small, do you :-). If I /agree/ to add it to linux? If anybody implementes paging, he's going to get 2 extra copies of linux for free. How's that for an offer? Seriously, adding demand-loading should be relatively easy. I wouldn't suggest going past the filesystem, even for saving the block-numbers somewhere (and a bit-map won't do it, block nr must be ordered). Having the inode-pointer in the process table entry (and not releasing it before an exit), and using that to find the blocks wouldn't be too hard. A bit slower, but conceptually much easier. Then you can use the routines already on hand (map() etc). Note that the "relatively easy" must be taken with a bit of salt: you have to add the routines to the paging unit etc etc. No major problems at least. Also I'd like to keep linux simple even at the cost of some speed hit, as otherwise it grows until nobody really understands it. I'm kinda proud of my mm: it's not many lines of code (although it's not very clear code.) Re: sticky bits, shared text... I don't like sticky bits (in the meaning that they lock something into memory). I doubt it's really that useful on a small machine, that is essentially single-user. It's easy to grow the cache to 6M or more if you have the memory, and currently I don't see much unnecessary disk I/O in a heavy 'make'. Besides - sticky bits are hard to keep track of. Shared text and sticky bits have another shared problem: right now linux allows writes to the code segment. This isn't a very big problem, as the changes to 'mm' are minimal, but you'd have to check that the code segment is a multiple of 4096 (I /think/ I made ld do this, but I'm not sure). The biggest problem however is the amount of data you have to keep track of. You'll have to add a lot of structures to know which pages are in which executable etc. I don't think it's worth it, especially if real paging (with a partition of it's own) is implemented. tytso@ATHENA.MIT.EDU (Theodore Ts'o): > > >9k : size of minix executable of diff > >36k : size of gcc compiled executable of diff > >4k : size of gcc compiled diff.o > > >It won't be to long before my disk flow'th over. Not to mention > >ram requirments. > > The culprit here is the floating point library (i.e., the soft floats); > even if your program isn't using floats, printf() drags large portions > of it in. The bottom line is that every integer-only executable will be > roughly 10-20k bigger than it has to be. That's most of the problem, > anyway; the most of the rest of it is due to things like gmalloc.o, > which is 4979 bytes. (malloc.o on the vax is 1176 bytes). Yes. The right way to do this is to add an floating-point emulator in the kernel, even if shared libraries are added. This is a real b**ch. I'll probably take a look at the djgpp package, but I'd really rather do it myself (it's an interesting project). I don't really have the time though, so ... > [bug in waitpid] Yes, I got the fix, but for once I had the bug corrected already. Very silly thing that came into the system with the "great reordering", when I changed the process tables. Luckily it isn't very bad. Another related bug was in crts.o, which always exits with a code of zero if main exits with a return xx. John T Kohl <jtkohl@cs.berkeley.edu> > [486/33 bootup problem] > > When I toggle the turbo on & off while running, I don't observe any > change in running times of programs (but I can discern differences under > MS-DOS 5.0), even when I run programs which try to time themselves. This would suggest to me that you have one of those braindead "slowdown" devices which aren't really completely hardware, but due to some software solution. At least some Gateway-2000's had this, which meant that they couldn't run well in protected mode under windows either (while accessing floppies or something like that). I don't really know what to do about it, as I don't know the hardware enough (I've leart a lot about PC's in the 11 months I've had one, but still...) I assume linux is running at full speed all the time? nicholas@cs.uwa.oz.au (Nicholas Yue) > G'day, > I had a similar observation when using the 386/387 assembly. I had wanted > a mathlib badly so I made try to compile the code from DJ's GPP (for MS-DOS). > The maths routine are written mostly in assembly language. I compiled the > C and assembly code and made libm.a, ranlib the things. All without problem. > I then proceeded to write a small C program with a sin() and tan() call. > That is when I had > > device not available: 0000 > EIP: 000f:00000060 > EFLAGS: 00010246 > ESP: 0017:03FFFED0 > fs: 0010 > base: 10000000. limit: 04000000 > Stack: 00000044 66666666 40026666 03fffe48 > Pid: 2123, process nr: 4 > dd 44 24 04 d9 fe df e0 9e 7b This should happen only on a 386-system with no 387. "dd 44..." is some math instruction (fld? whatever). However, linux depends on the BIOS to set the %cr0 bits for a coprocessor, and doesn't test for one itself, so even with 387 you might get this error if you BIOS doesn't set it up correctly. If you have a 387 but this error still happens, contact me, I guess I'll have to add the test-routine to linux. Linus PS. I now have a kernel that senses the video card (thanks to Galen Hunt), and the current version of fsck tries to correct things. I'm working on a "mkfs" that also senses bad blocks, as bad blocks have wreaked havoc on at least one drive (instant mess). I have a simple "fdisk" which doesn't try to change the partition tables, but at least it tells which devices you can use. Anything else that would ease the installation?
Subject: Re: demand-loding etc Date: Sun, 17 Nov 91 22:47:43 -0500 From: tytso@ATHENA.MIT.EDU (Theodore Ts'o) To: Linux-activists@joker.cs.hut.fi In-Reply-To: [84] Reply-To: tytso@athena.mit.edu > The right way to do this is to add an floating-point emulator in >the kernel, even if shared libraries are added. This is a real b**ch. >I'll probably take a look at the djgpp package, but I'd really rather do >it myself (it's an interesting project). I don't really have the time >though, so ... Did you mean implementing a 387/487 emulator, or something specific for the gcc soft-float routines? I was wondering what sort of speed hit you would take (in either case) if each floating point operation required a trap to the kernel. That's why my previous suggestion had suggested mapping certain pages into the processes address space, so that the calling the FP routines wouldn't require a context switch. I was thinking, however, that another, possibly more elegant solution would be to assign shared libraries (including the FP routines) to a segment which would be visible to all processes. Then all the stub routines would need to do is to do a far call to a predefined segment. What do people think? - Ted P.S. Having the kernel emulating 387 instructions would still be neat; I was just wondering if it would be too slow for normal operations.
Subject: protection violations Date: Sun, 17 Nov 91 21:06:33 PST From: pmacdona@sol.UVic.CA (Peter MacDonald) To: linux-activists@joker.cs.hut.fi Now that I have my hard drive set up, I keep getting protection violations. It mostly happens when using gcc. It is along the lines of: protection violation: 0000 The hex dump at the bottom is often different. Like ls began with "c3 ". But this also happens when I boot up sometimes. Or use ls. But then other commands like em work. I have no 80387. Could this just be a problem with the bios not setting up the %cr0 flag correctly? Basically, I haven't even been able to compile the hello world program yet! Any ideas? Is it time to get an 80387 (or is a 80287 sufficient).
Subject: Include files Date: Mon, 18 Nov 91 10:58:05 CET From: Wolfgang Thiel < UPSYF173%DBIUNI11.BITNET@FINHUTC.hut.fi> Reply-To: UPSYF173%comparex.hrz.uni-bielefeld.de@FINHUTC.hut.fi To: linux-activists@joker.cs.hut.fi Hi, which include files should be used? The ones in include.tar.Z or those in the kernel/mm/fs .tar file? Wolfgang
Subject: Re: demand-loding etc Date: Mon, 18 Nov 91 17:09:17 +1100 From: Bruce Evans <bde@runx.oz.au> To: Linux-activists@joker.cs.hut.fi Linus: >Re: sticky bits, shared text... >I don't like sticky bits (in the meaning that they lock something into >memory). I doubt it's really that useful on a small machine, that is >essentially single-user. It's easy to grow the cache to 6M or more if Shared text helps a lot with recursive commands. I'm surprised linux doesn't already have it. Fork is most naturally done by sharing text and making it copy-on-write or no-write. >you have the memory, and currently I don't see much unnecessary disk I/O >in a heavy 'make'. Besides - sticky bits are hard to keep track of. The cache i/o that you don't see hurts (there should be a LED for it :). Building the library under Minix-386-cached-text takes 25% longer when the shell is bash. The overhead is actually for something else - copying 40K data from the cache and zeroing 200K bss. Under another version of Minix with copy-on-access forks, building the library takes another 10% longer. There is only slightly less copying because a lot of text gets copied, and more overhead from page faults. The other thing that hurts without cached text is that heavily-used programs will be duplicated in the disk cache and as text. Perhaps mapping the disk cache to text instead of copying it would be almost as good as managing text separately. [big program sizes] >> The culprit here is the floating point library (i.e., the soft floats); >> even if your program isn't using floats, printf() drags large portions >Yes. The right way to do this is to add an floating-point emulator in >the kernel, even if shared libraries are added. This is a real b**ch. You would need a printf-emulator in the kernel :-(. The floating point emulation itself shouldn't be that big. >I'll probably take a look at the djgpp package, but I'd really rather do >it myself (it's an interesting project). I don't really have the time The djgpp emulator (in 32-bit C) is 14+ times slower than my library routines (in 32-bit asm) (the emulator in Turbo C++ is only 7 times slower :-). It takes a large amount of code to the emulation compared with doing the guts of the library. >John T Kohl <jtkohl@cs.berkeley.edu> >> [486/33 bootup problem] >This would suggest to me that you have one of those braindead "slowdown" >devices which aren't really completely hardware, but due to some >software solution. At least some Gateway-2000's had this, which meant This doesn't explain why the slow mode worked to boot. Perhaps the fast mode is done in software ;-). I guess the bug is really in the BIOS+linux with a hot interrupt. >nicholas@cs.uwa.oz.au (Nicholas Yue) >> I then proceeded to write a small C program with a sin() and tan() call. >> >> device not available: 0000 >This should happen only on a 386-system with no 387. "dd 44..." is some >math instruction (fld? whatever). However, linux depends on the BIOS to >set the %cr0 bits for a coprocessor, and doesn't test for one itself, so >even with 387 you might get this error if you BIOS doesn't set it up >correctly. The error seems normal. You are lucky to get it instead of a crash. My 1987 Award BIOS and 1990 AMI BIOS can be relied on to set up the %CR0 bits _incorrectly_ (Award always clears them, on a machine without an x87). After booting, AMI on my 486 has set %CR0 to 0x10. The relevant bits are 0x01: MP (Math Present): should be set (WRONG) 0x02: EM (Emulation): should be clear to use x87/486 this is what gives the device not available trap - linux must be setting it 0x04: ET (Extension Type): should be set (always set by 486 h/w so BIOS gets this one right :) 0x08: NE (Numeric Error): 486 s/w should use this to get error reporting via exception 16 instead of the crufty IRQ 13 necessary for 286-386/287-387 systems Perhaps Linus meant the coprocessor bits in the parameter RAM. These had a reuputation for being unreliable for distinguishing between 287's and no-coprocessor but I think they are OK for 387's. Bruce
Subject: anwering as best I can Date: Mon, 18 Nov 1991 13:57:12 +0200 From: Linus Benedict Torvalds <torvalds@cc.helsinki.fi> To: Linux-activists@joker.cs.hut.fi tytso@ATHENA.MIT.EDU (Theodore Ts'o): > > Did you mean implementing a 387/487 emulator, or something specific for > the gcc soft-float routines? ... Real emulation. > ... I was wondering what sort of speed hit you > would take (in either case) if each floating point operation required a > trap to the kernel. That's why my previous suggestion had suggested > mapping certain pages into the processes address space, so that the > calling the FP routines wouldn't require a context switch. System calls under linux never require a context switch. In fact context switches are extremely rare: they happen ONLY when one process stops running and another one starts. The floating point exceptions would be slow, but that's mainly because they would have to decode the effective addresses etc.. Not very much fun, but somewhat interesting. Still, there are a lot of reasons to have a FP-emulator in the kernel. If somebody wants fast floating point, he'd better get a 387, but I'd like to be able to support all programs on all machines independent of the 387. Currently the library is soft-float, which means that you cannot reliably use it with a program that has been compiled with "-m80387". Big drawback, as is the possibility that someone has a program that uses the 387, and half the beta-testers cannot use it.. > I was thinking, however, that another, possibly more elegant solution > would be to assign shared libraries (including the FP routines) to a > segment which would be visible to all processes. Then all the stub > routines would need to do is to do a far call to a predefined segment. > What do people think? This is the preferred solution: it's simple and easy to add to the kernel. The routines in libc.a would just be stubs calling the "library segment". No problem, except for the math's. Also, I don't find the math-instructions that time-critical: they are relatively few in most programs. If you do number-crunching, you have a 387 anyway (as they are quite inexpensive nowadays - I got one just to be able to test the kernel routines). Ari Lemmke <arl@sauna.cs.hut.fi>: > lib contain new *.s (sig_restore.s and crt0.s) files, > without those would be hard to create sw to Linux. > > bin has some new useful utilities, like kermit. Kermit has a problem with ^C. I didn't even try to fix this, as I didn't know if it should be fixed. Anybody know? It traps it nicely, and exists, but somehow I expected kermit to ignore ^C when in terminal mode. Oh well, I ported it so that I could download files, and it works for that. pmacdona@sol.UVic.CA (Peter MacDonald): > Now that I have my hard drive set up, I keep getting protection > violations. It mostly happens when using gcc. It is along the > lines of: > > protection violation: 0000 > > The hex dump at the bottom is often different. Like ls began with "c3 ". > > But this also happens when I boot up sometimes. Or use ls. But > then other commands like em work. I have no 80387. It's not the lack of a 80387 (and don't get a 287, I'm not sure I can support it... working on it). It seems more like a corrupted filesystem, but I have been known to be wrong (sometimes ;-). I'll post the (new) fsck with binaries to nic some time today, and they might show up sometime. Still even more beta than the system :-). The "general protection violation" is a general error: it happens at most programming errors (or if you try to use minix binaries, or if the executable file is corrupted). You can see where it happens by looking at the EIP:-line EIP: segm:address. segm =0008 means kernel problem, segm=0017 means it happened in the user program. Note that kernel problems don't necessarily mean that the bug was in the kernel: it might be a user program that gave a bad pointer to a system-call. Indeed pgv is so general that it might be anything: a unexpected interrupt (but they should be trapped, so this isn't that probable) or something like that... Wolfgang Thiel < UPSYF173@DBIUNI11.BITNET>: > Hi, > which include files should be used? The ones in include.tar.Z or > those in the kernel/mm/fs .tar file? > Wolfgang I use the ones in include.tar.Z, but they should really be identical. This isn't absolutely true, but if there are differences, they should be minimal. I'll have to consolidate them (no big problem now when I have no minix include-files I can mess up with). Bruce Evans <bde@runx.oz.au>: > Shared text helps a lot with recursive commands. I'm surprised linux > doesn't already have it. Fork is most naturally done by sharing text and > making it copy-on-write or no-write. Oh, shared test works fine after a fork (try running 10-20 bashes inside each other), but not when executing a new program that already is in memory. It should be easy to check for (especially once demand-loading is in place), so this too will eventually be implemented (so that bash doesn't have to be loaded completely if it is executed from within some other program). Linus (torvalds@kruuna.helsinki.fi) PS. Small bug-report: signals weren't correctly reported as the cause for exiting, but they are now (in my version). 287-coprocessors weren't noticed at all (this was the problem with nicholas), and I'm not sure the code uses them correctly now either, but at least it tries... Coprocessor errors now correctly cause a SIGFPE, no longer the debugging info. Likewise "not-present" errors. Also, variable speed serial connections are now implemented, and seem to work (testing from kermit). Direct reads/writes to a >64M partition should also work now (thanks to who-ever. My mind is going... )