Linux/PowerPC Kernel

From: broth...@coho.halcyon.com (Joseph L. Brothers)
Subject: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/07/07
Message-ID: <3tk1r3$l8o@news1.halcyon.com>#1/1
X-Deja-AN: 105826807
organization: Northwest Nexus, Inc. - Professional Internet Services
keywords: linux powerpc
newsgroups: comp.os.linux.development.system,comp.sys.powerpc
summary: available

7 July 1995

Source code is now available for Linux 1.2 on a PowerPC platform.  The
following files are available for anonymous ftp from

    ftp://liber.stanford.edu/pub/linuxppc/

-rw-r--r-- 1 ftp   999  3421159 Jul  7 07:48 binutils.tar.gz
-rw-r--r-- 1 ftp   999  8415240 Jul  7 07:55 linux-ppc.tar.gz
-rw-r--r-- 1 ftp   999  4680379 Jul  7 08:00 linux-ppc.update-950626.tar.gz
-rw-r--r-- 1 ftp   999  4347391 Jul  7 08:06 linux-ppc.update-950705.tar.gz
-rw-r--r-- 1 ftp   999  1834475 Jul  7 08:08 powerpc-lib.tar.gz
-rw-r--r-- 1 ftp   999    10663 Jul  7 08:08 powerpc-lib.update-950626.tar.gz
-rw-r--r-- 1 ftp   999    58500 Jul  7 08:08 simppc-linux.tar.gz
-rw-r--r-- 1 ftp   999   268578 Jul  7 08:09 tools.tar.gz

This announcement obsoletes the current FAQ and the Linux Project Map
entry for Linux/PowerPC.  Updates will be announced when available.

This release of Linux/PowerPC is fragmentary and cannot recompile
itself.  It boots only on a Motorola 1603 board.  It has minimal
capabilities, including ability to boot via Ethernet to a ramdisk file
system, run single-user, execute the rc shell, and run a few simple
utilities like ls.

The extended binutils-2.5.2 included with this release implements
ELF on this platform.  Patches to gcc-2.7.0 adequate to cross-compile
this release of Linux/PowerPC are forthcoming and will be announced
on linux-...@vger.rutgers.edu when they can be ftp'd.

The Motorola 1603 board is the only platform currently supported.  Work
continues on Apple NuBus PowerMacs, Motorola Ultra and PowerStack and
IBM RS6000 PowerPC platforms.  Others are welcome.  Contributions to
Linux/PowerPC can be made via the linux-...@vger.rutgers.edu mailing
list or by anonymous ftp to

    ftp://liber.stanford.edu/pub/incoming

I am not the developer of this software, nor its archivist (source
supervision is still in debate), but I will attempt to respond to email
as time permits.  Linus Torvalds will get first priority for his
suggestions, naturally.

Many thanks are due to the people who have put code, time, documentation,
effort, hardware and inspiration into Linux, the tools and this port. 

-Joseph
-- 
broth...@halcyon.com           Coordinator         Linux/PowerPC Project
--
broth...@halcyon.com           Coordinator         Linux/PowerPC Project

From: phi...@cs.wits.ac.za (Philip Machanick)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/07/13
Message-ID: <philip-1307951424390001@macadamia.cs.wits.ac.za>#1/1
X-Deja-AN: 106175695
references: <3tk1r3$l8o@news1.halcyon.com>
organization: Computer Science Dept., Wits
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <3tk1r3$...@news1.halcyon.com>, broth...@coho.halcyon.com
(Joseph L. Brothers) wrote:

> Many thanks are due to the people who have put code, time, documentation,
> effort, hardware and inspiration into Linux, the tools and this port. 

I agree -- and look forward to a Mac version. The intel version puts not
only Microsloth to shame (soft target) but also every commercial UNIX
system I've used.
-- 
Philip Machanick                   phi...@cs.wits.ac.za
Department of Computer Science, University of the Witwatersrand
2050 Wits, South Africa
phone 27(11)716-3309  fax 27(11)339-7965

From: mhhen...@mobius02.math.uwaterloo.ca (Mark Hendriks)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/07/16
Message-ID: <DBsGpy.27w@undergrad.math.uwaterloo.ca>#1/1
X-Deja-AN: 106331167
sender: n...@undergrad.math.uwaterloo.ca (news spool owner)
references: <3tk1r3$l8o@news1.halcyon.com> 
<philip-1307951424390001@macadamia.cs.wits.ac.za>
organization: University of Waterloo
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

From: phi...@cs.wits.ac.za (Philip Machanick)
>> Many thanks are due to the people who have put code, time, documentation,
>> effort, hardware and inspiration into Linux, the tools and this port. 

> I agree -- and look forward to a Mac version. The intel version puts not
> only Microsloth to shame (soft target) but also every commercial UNIX
> system I've used.

You're lucky FreeBSD guys don't seem to hang around on comp.sys.powerpc.
If you thought Mac users were religeous...

 
#----------------------------- Mark H. Hendriks -----------------------------#

	Networking: It's not who you know, it's not what you know;
		    It's who you know how to contact.

My Real Name: mhhendr...@undergrad.math.uwaterloo.ca

From: h...@alumni.EECS.Berkeley.EDU (Jeffrey Hsu)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/07/19
Message-ID: <3ui5m7$rru@agate.berkeley.edu>#1/1
X-Deja-AN: 106428035
references: <3tk1r3$l8o@news1.halcyon.com> 
<philip-1307951424390001@macadamia.cs.wits.ac.za> 
<DBsGpy.27w@undergrad.math.uwaterloo.ca>
organization: University of California, Berkeley
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <DBsGpy....@undergrad.math.uwaterloo.ca>,
Mark Hendriks <mhhen...@mobius02.math.uwaterloo.ca> wrote:
>From: phi...@cs.wits.ac.za (Philip Machanick)
>>> Many thanks are due to the people who have put code, time, documentation,
>>> effort, hardware and inspiration into Linux, the tools and this port. 
>
>> I agree -- and look forward to a Mac version. The intel version puts not
>> only Microsloth to shame (soft target) but also every commercial UNIX
>> system I've used.
>
>You're lucky FreeBSD guys don't seem to hang around on comp.sys.powerpc.
>If you thought Mac users were religeous...

In general FreeBSD fans do not go around touting the merits of their
operating system on every single newsgroup in the world like rabid linux
fans.  I single handedly try to compensate for the rest of the FreeBSD
crowd.

Wat a minute---what am I saying?  Merits of linux?  That's like "merits
of Windows on top of DOS."  Millions of people use it every day, but
they have never looked under the hood to see what a gross hack on top
of a gross hack Windows really is.  The answer to people who ask me,
"Why not linux?" is "You should know better."

							Jeffrey

From: j...@dostoevsky.ucr.edu (Joe Sloan)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/07/27
Message-ID: <3v6tac$i4k@galaxy.ucr.edu>#1/1
X-Deja-AN: 107029415
references: <3tk1r3$l8o@news1.halcyon.com> 
<philip-1307951424390001@macadamia.cs.wits.ac.za> 
<DBsGpy.27w@undergrad.math.uwaterloo.ca> <3ui5m7$rru@agate.berkeley.edu>
organization: University of Calfornia at Riverside
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <3ui5m7$...@agate.berkeley.edu>,
Jeffrey Hsu <h...@alumni.EECS.Berkeley.EDU> wrote:

>Wat a minute---what am I saying?  Merits of linux?  That's like "merits
>of Windows on top of DOS."  Millions of people use it every day, but
>they have never looked under the hood to see what a gross hack on top
>of a gross hack Windows really is.  The answer to people who ask me,
>"Why not linux?" is "You should know better."
>
>							Jeffrey

c'mon, get a life - linux hackers don't go around knocking BSD, 
so why are you trying to start a flamefest here?

Linux is a good thing, and so are the BSD variants. 

If you don't like linux, don't use it, but when you make postings 
like this, you sound like an ignorant and opinionated 13 year old.

Sounds to me like you are just a bit jealous at the impressive growth
of the linux movement. Why should you be? Microsoft is your enemy, not
linux.

--
 Joe Sloan     j...@engr.ucr.edu   http://dostoevsky.ucr.edu
     Win95? No, none for me, thanks - I'm already running Linux...
 Microsoft is not the answer - Microsoft is the question; the answer is NO!

From: Gerry S Hayes <sumn...@CMU.EDU>
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/07/28
Message-ID: <gk6GFrW00YUwADPEIh@andrew.cmu.edu>#1/1
X-Deja-AN: 107029485
references: <3tk1r3$l8o@news1.halcyon.com> 
<DBsGpy.27w@undergrad.math.uwaterloo.ca> <3ui5m7$rru@agate.berkeley.edu> 
<DC86oE.JJH@pell.com>
organization: Sophomore, Mathematics, Carnegie Mellon, Pittsburgh, PA
newsgroups: comp.sys.powerpc,comp.os.linux.development.system

I'm interested, now.  What things do you see as broken about Linux?  I
am well aware of certain things that still need to be implemented
before Linux can claim to be a complete Unix implementation (real file
descriptor passing, accounting & quotas, make sure /proc is secure)
and that kernel threads really need to be added at some point in the
(near?) future, but this is just indicative of the fact that Linux is
a work in progress.  It certainly has advantages over *BSD (better
scheduler and file system, vastly superior non-X console) and used to
be more POSIX compliant (is this still the case?).  OTOH, *BSD seem to
have done a much better job integrating security into the standard
distribution (default shadow passwds, non-broken ftpd, no glaring
holes like /proc).
   However, I think that development of Linux is currently aimed at
fixing some of these (admittedly major) problems.  What I'm interested
in is any glaring limitations in the design of Linux that would
require nasty hacks to fix or anything that the Linux community can't
reasonably expect to fix in a clean manner in the next couple of
years.  I've only sat through one NetBSD installation and played with
it for a couple hours, so I could be very mistaken about some of the
above.  Please correct me if so; I'm really quite interested in this
matter.  

TTFN,

  Sumner

From: h...@alumni.EECS.Berkeley.EDU (Jeffrey Hsu)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/07/28
Message-ID: <3va10b$klc@agate.berkeley.edu>#1/1
X-Deja-AN: 107029533
references: <3tk1r3$l8o@news1.halcyon.com>  
<DBsGpy.27w@undergrad.math.uwaterloo.ca> <3ui5m7$rru@agate.berkeley.edu> 
<3v6tac$i4k@galaxy.ucr.edu>
organization: University of California, Berkeley
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

  In article <3v6tac$...@galaxy.ucr.edu> ,
  Joe Sloan <j...@dostoevsky.ucr.edu>  wrote:
  >>Wait a minute---what am I saying?  Merits of linux?  That's like "merits
  >>of Windows on top of DOS."  Millions of people use it every day, but
  >>they have never looked under the hood to see what a gross hack on top
  >>of a gross hack Windows really is.  The answer to people who ask me,
  >>"Why not linux?" is "You should know better."
  > 
  > c'mon, get a life - linux hackers don't go around knocking BSD, 

I've read so many articles from Linux users who have never tried
BSD and still knock it that I know this statement is simply not true.
The reputation of rabid linux fans is well deserved.

  > so why are you trying to start a flamefest here?

Your memory is faulty.  I didn't start this.

  > Linux is a good thing, and so are the BSD variants. 

Good is relative.  Windows on top of DOS is a great improvement over
just using DOS.  Linux is a great improvement over Windows.  It is
not a great improvement or BSD or real SVR4 unix.  Why settle for
a look-alike when you can have the real thing?  (In case that's
not obvious, that's a rhetorical question.  I know all the standard
replies.)

  > when you make postings 
  > like this, you sound like an ignorant and opinionated 13 year old.

How did you know my age?  I am opinionated, but that's my right.
Ignorant on linux, BSD, and SVR3 and SVR4 internals, however, I am not.

  > Sounds to me like you are just a bit jealous at the impressive growth
  > of the linux movement. Why should you be? Microsoft is your enemy, not
  > linux.

Microsoft is indeed the evil empire.  But Linux is not far behind.  It
is often noted in business school that in technology, it's not the always
the best solution that wins, but usually the second best solution.  On
the larger scale, this is true of Microsoft versus Unix, as you have
noted.  But it's also true of BSD versus Linux.

							Jeffrey

From: lm@neteng (Larry McVoy)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/07/28
Message-ID: <3vbn86$178@fido.asd.sgi.com>#1/1
X-Deja-AN: 107029472
references: <3tk1r3$l8o@news1.halcyon.com>  
<DBsGpy.27w@undergrad.math.uwaterloo.ca> <3ui5m7$rru@agate.berkeley.edu> 
<3v6tac$i4k@galaxy.ucr.edu> <3va10b$klc@agate.berkeley.edu>
followup-to: comp.os.linux.development.system,comp.sys.powerpc
organization: Silicon Graphics Inc., Mountain View, CA
reply-to: l...@slovax.engr.sgi.com
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

Jeffrey Hsu (h...@alumni.EECS.Berkeley.EDU) wrote:
:   > Linux is a good thing, and so are the BSD variants. 

: Good is relative.  Windows on top of DOS is a great improvement over
: just using DOS.  Linux is a great improvement over Windows.  It is
: not a great improvement or[sic] BSD or real SVR4 unix.  
: a look-alike when you can have the real thing?  (In case that's
: not obvious, that's a rhetorical question.  I know all the standard
: replies.)
:
: How did you know my age?  I am opinionated, but that's my right.
: Ignorant on linux, BSD, and SVR3 and SVR4 internals, however, I am not.
:
: 							Jeffrey

Hmm.  You sound like you really know what you are talking about.  I'm
probably not as educated as you since all I've done is be part of a 6
person team that ported svr3 to a supercomputer, added networking to
SCO Unix, spent 6 years in Sun's kernel group as a POSIX, file system,
VM, and all around perf grunt, diffed and analyzed all of the diffs of
*all* of 386BSD and 4.4lite against SunOS 4.x, ditto for *BSD against
version 6 Unix, ditto for *BSD against 32v Unix, wrote OS measurement
tools and used those tools to compare the performance of most of the
commercial and free operating systems, and wrote up results of my
work.

You probably have a much more extensive background, given your more
strident opinions on the OS qualities.  Perhaps you want to elaborate
on your qualifications?

In my lowly opinion, the BSD crowd is deluded as to the quality of
that kernel.  There isn't much difference between Linux and *BSD from a
user level point of view.  They run the same apps - Linux probably runs 
more apps.

From a kernel point of view, you would think that *BSD would be better
since SunOS is derived from *BSD.  Don't bet on it.  Sun had legions of
nerds that cared passionately about the kernel structure, architecture,
and implementation during the 3.x and 4.x days (I count myself among 
those nerds and am proud of it.  Probably did my best work there.)

*BSD, on the other hand, is nothing to write home about.  It has a
medium quality vnode layer and bogus VM layer that was borrowed from
MACH.  It has the same features but not the same architecture as
SunOS.  If you don't have a unified page and buffer cache, you are just
toying with the problem.

Linux isn't much better but it is improving at the rate at which you
can educate the people that actually make changes to the kernel.  That
seems to be the limiting factor in Linux.  *BSD is embroiled in politics.

Face it, Linux is where the cool work is happening.  Beat 'em or join 'em,
but don't just sit there and whine.
--
---
Larry McVoy			(415) 390-1804			 l...@sgi.com

From: lm@neteng (Larry McVoy)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/07/31
Message-ID: <3vhoe9$8pt@fido.asd.sgi.com>#1/1
X-Deja-AN: 107187183
references: <3tk1r3$l8o@news1.halcyon.com> 
<3v6tac$i4k@galaxy.ucr.edu> <3va10b$klc@agate.berkeley.edu> 
<3vbn86$178@fido.asd.sgi.com> <3vc6qu$mb6@agate.berkeley.edu>
followup-to: comp.os.linux.development.system,comp.sys.powerpc
organization: Silicon Graphics Inc., Mountain View, CA
reply-to: l...@slovax.engr.sgi.com
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

Jeffrey Hsu (h...@alumni.EECS.Berkeley.EDU) wrote:
:   In article <3vbn86$...@fido.asd.sgi.com>,
:   Larry McVoy <l...@slovax.engr.sgi.com> wrote:
:   > BSD has a medium quality vnode layer
: Strange, John H. seems to think his vfs architecture is better than
: the one in SVR4.  Go reference the Ficus papers for details.

I've read all of those papers, including John's thesis.  His vfs
architecture was implemented in SunOS, which has a Sun designed and
implemented VFS and VM layer that are light years ahead of anything in
*BSD.

John's architecture is built on top of the Sun VFS/VM layer.  Ask John
how he felt about getting his stuff to work in *BSD.

:   > and bogus VM layer that was borrowed from MACH.
:   > It has the same features but not the same architecture as
:   > SunOS.  If you don't have a unified page and buffer cache, you are just
:   > toying with the problem.

: Yeah, no arguments there about the bogus MACH vm interface.  However,
: lots of work has gone into the vm layer in FreeBSD since then and now
: it does have an page and buffer cache.  What else should we improve
: on in BSD?

Make the buffer cache/page cache be one and the same.  I.e.,  mmapped files
and read/write I/O are consistent.

Basically,  all I/O needs to happen through getpage()/putpage() interfaces
with read/write implemented as warts on top of this.  If you do it this
way, the kernel implements read/write as

	m = mmap(file_loc_into_kernel_space)
	bcopy(m, user_buffer, n)

and the kernel can/may take page faults on itself.  This is how SunOS works.
It means mmap is the only way you really look at pages; user or kernel.  
Talk to McKuisick, he knows the BSD VM is gunk.

:   > Linux isn't much better
:   > but it is improving at the rate at which you
:   > can educate the people that actually make changes to the kernel.  That
:   > seems to be the limiting factor in Linux.  *BSD is embroiled in politics.

: Wrong.  Since the BSD source is freely available and the source is
: documented in Stevens networking book and McKusick's upcoming
: revision of "The Design and Implementation of 4.3BSD", the same
: education process applies to BSD as well.  As for politics, there are
: still fewer competeing BSD distributions than there are linux versions.
: So much for a cohesive front.

There are at least 3 different kernel efforts, all uncoordinated in BSDland.
In Linuxland, you just send your diffs to Linus.  End of story.  Completely
cohesive - there is one maintainer of the kernel source.  
--
---
Larry McVoy			(415) 390-1804			 l...@sgi.com

From: lm@neteng (Larry McVoy)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/07/31
Message-ID: <3vhq0a$8pt@fido.asd.sgi.com>
X-Deja-AN: 107187201
references: <3tk1r3$l8o@news1.halcyon.com>  
<DBsGpy.27w@undergrad.math.uwaterloo.ca> <3ui5m7$rru@agate.berkeley.edu> 
<3v6tac$i4k@galaxy.ucr.edu> <3va10b$klc@agate.berkeley.edu> 
<3vbn86$178@fido.asd.sgi.com> <3vd021$a2o@park.uvsc.edu>
followup-to: comp.os.linux.development.system,comp.sys.powerpc
organization: Silicon Graphics Inc., Mountain View, CA
reply-to: l...@slovax.engr.sgi.com
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

Terry Lambert (te...@cs.weber.edu) wrote:
: Hi Larry!  8^).

Hi Terry!  I sent Caldera marketing looking to hire you.  No kidding....

: Larry is former chief architect at Sun and is now one of the
: top guys at SGI.

No tongue in cheek here :-)  I was/am an engineer at both companies.
Perhaps I got a little more lattitude than your average engineer but
I wouldn't say I was more important than anyone else.  Nice that you
think so highly of me, though....`
: lm@neteng (Larry McVoy) wrote:
: ]
: ] Jeffrey Hsu (h...@alumni.EECS.Berkeley.EDU) wrote:
: ]  

: And despite the IPX support from the guys at Caldera (Hi Jim,
: Brian, Ron, guys!  I hear you stole Bryce from Novell!), the
: networking is still lacking.

I think you are correct in terms of stability (maybe).  I doubt it in
terms of performance.  The last time I looked at the BSD stack it was
dog slow.  Sun's STREAMS stack was about at parity with the BSD stack 
and lemme tell you, man, that ain't nothin' to write home about.

: ] From a kernel point of view, you would think that *BSD would
: ] be better since SunOS is derived from *BSD.  Don't bet on it.
: ] Sun had legions of nerds that cared passionately about the
: ] kernel structure, architecture, and implementation during the
: ] 3.x and 4.x days (I count myself among those nerds and am
: ] proud of it.  Probably did my best work there.)

: But that doesn't mean that SunOS doesn't have warts as well.  In
: particular, the file system multithreading on the SMP version
: was rather abbysmal, even if the code *was* massively cleaned
: up.  

Yup.  You wouldn't believe some of the things they wanted to do to the
filesystem to "multi thread it".  I'm not going to air dirty laundry
here, but suffice to say that I threw up my hands in disgust and went
over to the hardware side of the company rather than take part in
that.

When I say "SunOS" I mean SunOS 4.x; when we talk about 5.x, I
typically refer to it disdainfully as that Solaris thingy.  I don't do
Solaris.  Not for any price.  Too gross for me.

:The Unisys SVR4 port actually did this the "right" way,
: since the Sun preemption point/VOP/VFSOP mapping wasn't well
: documented externally, it was a bitch to reverse engineer the
: interfaces (having done it).

I can believe that.

: I have some horror stories about the serial and pty drivers that'd
: curl your toes.  8-).

Blame STREAMS.  That's where it went south.

: ] *BSD, on the other hand, is nothing to write home about.  It
: ] has a medium quality vnode layer

: This would be John Heidmann's work from Ficus, that was rolled
: into BSD 4.4 Lite and supports vnode stacking ala the Usenix
: paper (Rosenthal went wrong, IMO, by stacking on a vnode basis).

No, that's not what I meant at all.  John's work is coolness (and done,
originally, in SunOS 4.x).

The BSD vnode layer has some fundemental flaws.  Some are tied up in the
the VM/VFS interfaces.

The Sun VFS/VM layers have some cool interactions that only become useful
as you try and depend on the architecture to do new things.  A great 
example would be a cache consistent remote file system.  Good luck doing
that in *BSD - you don't get the call back when you want it.  In SunOS,
there is a rule: only a VFS gets to move page in/out of the page cache.
In *BSD, the page cache is managed by the VM system, and the buffer cache
is managed by the VFS, and mmap is crow barred into the middle of this.

In SunOS, all page state changes are handled by the vnode->{get,put}page()
functions.  Even read/write I/O caching happens through these function.
There is *no* buffer cache.  Everything is a page.  When you want to 
get a page  you do this:

	trap
	as_fault
	seg_fault
	get_page

		if (page_find(page)) {
			page_enter(page);
			return;
		}

		/* start the I/O */

		/* wait for I/O */

		page_enter(page)
		return;

page_find() askes the VM system to find the page.  seg_fault could have done
that directly, saving a function call.  Baaaaad choice - the VFS wouldn't
have been told that you have the page and only the VFS can manage cache
consistency.

This is fairly obscure stuff, but talk to people that have tried to do
cache consistent file systems and they will all tell you that SunOS 
had/has the only VM/VFS model that works right.

This has little to do with John's work, which, as I have said, is way cool,
but it is stacked on top of the vnode layer that I was referring to.

: At least in FreeBSD, the VM layer has been rewritten by David
: Greenman and John Dyson (mostly -- correct me if I'm wrong here),
: with input and small patches from others.

Cool.

: ]  It has the same features but not the same architecture as
: ] SunOS.

: Just that it's not necessarily bad that it's not the same as
: SunOS.  What comes out of the SMP work (that is currently
: running, albiet at low grain parallelism) will probably change
: it again.

Nah, forget the SMP stuff.  The SunOS SMP stuff isn't interesting to me.
I'm talking about the Vm/VFS interfaces and architecture.

: ] Linux isn't much better but it is improving at the rate at
: ] which you can educate the people that actually make changes
: ] to the kernel.  That seems to be the limiting factor in Linux.
: ] 
: ] *BSD is embroiled in politics.

: Not true.  You are probably referring to the NetBS/FreeBSD "split"
: (as opposed to the Slackware/Yggdrasil "split" in Linux).

He, he, he.  Here's the deal.  As long as *BSD is covered by the BSD
copyright, the current "owners" of a source base can dictate policy,
whatever.  They have ownership rights that [may] have little or nothing
to do with their contributions to the source base.  They can form a
company and choose what they want to ship in terms of source.  This
environment fosters "power" and politics.  I want no part of it - I
watched that happen in Sun when McNealy and the marketing idiots
decided what was best for Sun.

Linux has a copyright that makes the supposed "Slackware/Yggdrasil split"
meaningless.  All of the code is free and has to stay free.  That means 
that the only people with any power are the people that are fixing the
code.  The code is "owned" by whomevver understands it enough to work on
it.

BSD is a rat hole, because of its copyright.  Anyone who has been
around for a while is perfectly aware that the code can get locked up.
Linux code can never be locked up.  Never again will I become dependent
on code that might go away.  I have years of investment into a source
base that Sun is trying to throw away.  Why should I invest my effort
and talent into another source base that someone else could productize,
lock up, and eventually throw away.  No thanks.  I'll put my cards in
the Linux basket - there are enough other people that understand the
legalities and they are doing the same thing.  

: ] Face it, Linux is where the cool work is happening.  Beat 'em
: ] or join 'em, but don't just sit there and whine.

: 8-).  I pick "Beat 'em".  But only because you didn't give "work
: with 'em on projects of mutual interest" as a third option.

Hey, that third choice is fine by me as long as Linux gets copylefted
versions of the "projects of mutual interest".  Did you know that you
can release the same exact code under an infinite number of copyrights?
So you can shlep your work into *BSD under the BSD copyright and into
Linux under the GPL, and everyone is happy.  Completely legal.  Done all
the time.

:                                         Terry Lambert

Good to chat again, Terry.  It's been a while.  I guess I'm coming out
of my slump.
--
---
Larry McVoy			(415) 390-1804			 l...@sgi.com

From: iia...@iifeak.swan.ac.uk (Alan Cox)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/07/31
Message-ID: <DCKxqJ.BJB@info.swan.ac.uk>#1/1
X-Deja-AN: 107281741
sender: n...@info.swan.ac.uk
x-nntp-posting-host: iifeak.swan.ac.uk
references: <3va10b$klc@agate.berkeley.edu> 
<3vbn86$178@fido.asd.sgi.com> <3vc6qu$mb6@agate.berkeley.edu>
organization: Institute For Industrial Information Technology
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <3vc6qu$...@agate.berkeley.edu> h...@alumni.EECS.Berkeley.EDU 
(Jeffrey Hsu) writes:
>education process applies to BSD as well.  As for politics, there are
>still fewer competeing BSD distributions than there are linux versions.
>So much for a cohesive front.

There is only Linux kernel, distributions are a different matter and mostly
powered by commercial aims (which is good). All the distributions of any
relevance follow a single file system standard. 

The FreeBSD approach of '_the_ distribution' hasn't suited the practicalities
of Linux distribution and making CD's. There are for example Linux
distributions built for specialist jobs (like Xdenu which is an Xterminal
distribution).

Alan
-- 
  ..-----------,,----------------------------,,----------------------------,,
 // Alan Cox  //  iia...@www.linux.org.uk   //  GW4PTS@GB7SWN.#45.GBR.EU  //
Redistribution of this message via the Microsoft Network is prohibited
Do you trust your web client. <IMG src="file:/dev/zero"><IMG src="file:/com1:">

From: lm@neteng (Larry McVoy)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/07/31
Message-ID: <3vj3s3$eps@fido.asd.sgi.com>#1/1
X-Deja-AN: 107281756
references: <3tk1r3$l8o@news1.halcyon.com> <3v6tac$i4k@galaxy.ucr.edu> 
<3va10b$klc@agate.berkeley.edu> <3vbn86$178@fido.asd.sgi.com> 
<3vc6qu$mb6@agate.berkeley.edu> <3vhoe9$8pt@fido.asd.sgi.com> 
<DCKF9B.H08@cunews.carleton.ca>
followup-to: comp.os.linux.development.system,comp.sys.powerpc
organization: Silicon Graphics Inc., Mountain View, CA
reply-to: l...@slovax.engr.sgi.com
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

Mike Shaver (sha...@neon.ingenia.com) wrote:
: Larry McVoy (lm@neteng) wrote:
: : I've read all of those papers, including John's thesis.  His vfs
: : architecture was implemented in SunOS, which has a Sun designed and
: : implemented VFS and VM layer that are light years ahead of anything in
: : *BSD.

: Any thoughts on Solaris' VM stuff?

It's basically the SunOS 4.x stuff ported in SVR4.  And SMP threaded.  There
has been a lot of problems with it as Sun's engineers have learned about the
joys and agonies of MP threading kernel software.  It sucked rocks when I
left, I understand that it is better now, but I'm not really qualified to
say since I no longer work on or use Solaris (never really did, he he).

: (I won't ask you to comment on IRIX', at least not 5.1's.)

Grumble.  Thank you for sparing me on that topic.  

: : : education process applies to BSD as well.  As for politics, there are
: : : still fewer competeing BSD distributions than there are linux versions.
: : : So much for a cohesive front.

: : There are at least 3 different kernel efforts, all uncoordinated in BSDland.
: : In Linuxland, you just send your diffs to Linus.  End of story.  Completely
: : cohesive - there is one maintainer of the kernel source.  

: For the most part, that's true.
: There are at least 3 different versions of the SMP/MT kernels, I 
: think.

: Of course, all involved have the goal of re-integrating their mods into
: the One True Linux, via Linus.

I'm hoping that the SMP stuff will go into a different source base, long 
term.  SMP completely ruins your uniprocessor performance.
--
---
Larry McVoy			(415) 390-1804			 l...@sgi.com

From: iia...@iifeak.swan.ac.uk (Alan Cox)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/01
Message-ID: <DCMM0H.Ms9@info.swan.ac.uk>#1/1
X-Deja-AN: 107281753
sender: n...@info.swan.ac.uk
x-nntp-posting-host: iifeak.swan.ac.uk
references: <3vbn86$178@fido.asd.sgi.com> <3vc6qu$mb6@agate.berkeley.edu> 
<3vhoe9$8pt@fido.asd.sgi.com>
organization: Institute For Industrial Information Technology
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <3vhoe9$...@fido.asd.sgi.com> l...@slovax.engr.sgi.com writes:
>Basically,  all I/O needs to happen through getpage()/putpage() interfaces
>with read/write implemented as warts on top of this.  If you do it this
>way, the kernel implements read/write as
>
>	m = mmap(file_loc_into_kernel_space)
>	bcopy(m, user_buffer, n)
>
>and the kernel can/may take page faults on itself.  This is how SunOS works.
>It means mmap is the only way you really look at pages; user or kernel.  
>Talk to McKuisick, he knows the BSD VM is gunk.

Doesn't the kernel taking faults on itself doing a write() via mmap make
O_APPEND semantics even more exciting than normal. Also wouldn't it be more
sensible at that point to put the mmap/memcpy in userspace and abolish the
write system call totally.

Alan
-- 
  ..-----------,,----------------------------,,----------------------------,,
 // Alan Cox  //  iia...@www.linux.org.uk   //  GW4PTS@GB7SWN.#45.GBR.EU  //
Redistribution of this message via the Microsoft Network is prohibited
Do you trust your web client. <IMG src="file:/dev/zero"><IMG src="file:/com1:">

From: Terry Lambert <te...@cs.weber.edu>
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/01
Message-ID: <3vkk18$scd@park.uvsc.edu>
X-Deja-AN: 107281785
references: <3tk1r3$l8o@news1.halcyon.com>  
<DBsGpy.27w@undergrad.math.uwaterloo.ca> <3ui5m7$rru@agate.berkeley.edu> 
<3v6tac$i4k@galaxy.ucr.edu> <3va10b$klc@agate.berkeley.edu> 
<3vbn86$178@fido.asd.sgi.com> <3vd021$a2o@park.uvsc.edu> <3vhq0a$8pt@fido.asd.sgi.com>
organization: Utah Valley State College, Orem, Utah
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

lm@neteng (Larry McVoy) wrote:
] : And despite the IPX support from the guys at Caldera (Hi Jim,
] : Brian, Ron, guys!  I hear you stole Bryce from Novell!), the
] : networking is still lacking.
] 
] I think you are correct in terms of stability (maybe).  I doubt it in
] terms of performance.  The last time I looked at the BSD stack it was
] dog slow.  Sun's STREAMS stack was about at parity with the BSD stack 
] and lemme tell you, man, that ain't nothin' to write home about.

The slowness in the TCP/IP in BSD has actually been fixed using
the Vegas patches (see the papers at the X-kernel site at the
University of Arizona).  It does a nice job of resolving all
the messiest issues.

The streams stuff, I can agree with.  The biggest issue that made
the UnixWare version of NetWare for UNIX only 6-8% faster than
NetWare running on the exact same hardware (yes, this is true)
was the stack latency.  And fully 20% of that was going from DDI/DKI
drivers to streams versions of the ODI drivers.

Streams really, really sucks, when you get down to it, and the
major place it sucks is the taking model.  It's actually possible
to fix, but only if you have kernel preemption and a thread
context to run it un without appealing to user space (basically
the same thing that makes Chorus faster than Mach).

It is possible to fix streams, though not probable.

] When I say "SunOS" I mean SunOS 4.x; when we talk about 5.x, I
] typically refer to it disdainfully as that Solaris thingy.  I don't do
] Solaris.  Not for any price.  Too gross for me.

I do that too.  Pisses Solaris-locvers right off.  8^).

] The BSD vnode layer has some fundemental flaws.  Some are tied up in the
] the VM/VFS interfaces.
] 
] The Sun VFS/VM layers have some cool interactions that only become useful
] as you try and depend on the architecture to do new things.  A great 
] example would be a cache consistent remote file system.  Good luck doing
] that in *BSD - you don't get the call back when you want it.  In SunOS,
] there is a rule: only a VFS gets to move page in/out of the page cache.
] In *BSD, the page cache is managed by the VM system, and the buffer cache
] is managed by the VFS, and mmap is crow barred into the middle of this.
] 
] In SunOS, all page state changes are handled by the vnode->{get,put}page()
] functions.  Even read/write I/O caching happens through these function.
] There is *no* buffer cache.  Everything is a page.  When you want to 
] get a page  you do this:
] 
] 	trap
] 	as_fault
] 	seg_fault
] 	get_page
] 		
] 		if (page_find(page)) {
] 			page_enter(page);
] 			return;
] 		}
] 
] 		/* start the I/O */
] 
] 		/* wait for I/O */
] 
] 		page_enter(page)
] 		return;
] 
] page_find() askes the VM system to find the page.  seg_fault could have done
] that directly, saving a function call.  Baaaaad choice - the VFS wouldn't
] have been told that you have the page and only the VFS can manage cache
] consistency.

I don't know what source you've been reading, but here's some code
from ufs_vnops.c that implements bread...

/* 
 * Calculate the logical to physical mapping if not done already,
 * then call the device strategy routine.
 */
int
ufs_strategy(ap)
        struct vop_strategy_args /* { 
                struct buf *a_bp;
        } */ *ap;
{
        register struct buf *bp = ap->a_bp;
        register struct vnode *vp = bp->b_vp;
        register struct inode *ip;
        int error;

        ip = VTOI(vp);
        if (vp->v_type == VBLK || vp->v_type == VCHR)
                panic("ufs_strategy: spec");
        if (bp->b_blkno == bp->b_lblkno) {
                error = VOP_BMAP(vp, bp->b_lblkno, NULL, &bp->b_blkno, NULL);
                if (error) {
                        bp->b_error = error;
                        bp->b_flags |= B_ERROR;
                        biodone(bp);
                        return (error);
                }
                if ((long)bp->b_blkno == -1)
                        vfs_bio_clrbuf(bp);
        }
        if ((long)bp->b_blkno == -1) {
                biodone(bp);
                return (0);
        }
        vp = ip->i_devvp;
        bp->b_dev = vp->v_rdev;
        VOCALL (vp->v_op, VOFFSET(vop_strategy), ap);
        return (0);
}

This is exactly what you're suggesting, isn't it?

] : Just that it's not necessarily bad that it's not the same as
] : SunOS.  What comes out of the SMP work (that is currently
] : running, albiet at low grain parallelism) will probably change
] : it again.
] 
] Nah, forget the SMP stuff.  The SunOS SMP stuff isn't interesting
] to me.  I'm talking about the Vm/VFS interfaces and architecture.

The SMP stuff in the BSD camps, at least, is most likely going
to consist of 95% interface cleanup that ought to be done anyway;
you're right that John's stuff was pounded into 4.4.  As a reason
for code cleanup, it's as good as any other reason.  8-).

As far as implementation details are concerned, I believe that
all of the work that has been done so far is compile time optioned
in -- that is, it's not at the expense of the UP code, since you
can continue to make a UP kernel.

The problem with taking all the synchronization points out is that
it loses you kernel preemption, which you need for RT and for
support of idiot device drivers without buzz loops (the floppy
controller based QIC-40/80 tape drives are examples of hardware
that needs long duration buzz-loops in the current Solaris, BSD,
UnixWare, and Linux code).

] BSD is a rat hole, because of its copyright.  Anyone who has been
] around for a while is perfectly aware that the code can get locked up.
] Linux code can never be locked up.  Never again will I become dependent
] on code that might go away.  I have years of investment into a source
] base that Sun is trying to throw away.  Why should I invest my effort
] and talent into another source base that someone else could productize,
] lock up, and eventually throw away.  No thanks.  I'll put my cards in
] the Linux basket - there are enough other people that understand the
] legalities and they are doing the same thing.  

A company's internal politics are very different from politics
in a freely available source base.  Sun has it bad, and so does
Novell.  Both are at the stage where they ought to become holding
companies, but neither really wants to.  They're at the fifth
plateau of business growth.

The thing about a company is that if they decide something, you
lose your livelihood if you buck the decision.

If Jordan, for instance, since he has the keys to the machine
room, I think, decided to take FreeBSD private tomorrow, and
locked down access, he'd have a hell of a time doing it.  The
sources are out there.  No matter what improvements you make
to a proprietary product, it's hard to compete with "free".
How would you do it?  Sell for less?

The distinction, I think is that what gets locked up is the
people doing the locking-up's work after the point of the
lockup.  What that means is that if you depended on the source,
you can still depend on it.  Nothing has changed there.

What you *can't* depend on is subsequent changes to the privatized
source.  In all fairness, this is something you can't depend on
anyway in any case.  Jordan (I'm gonna get killed for picking on
him!) could just as easily get a job with a company that won't
let him work on BSD at all, like the job I had when USL bought
Novell (well, it felt like it).  His patches to the code after
that point stay on his home machine.  Just like my shared library
and streams implementations did.  Further, the work done outside
is forever after encumbered.  So even if Jordan quit and went to
work for a less anal company, he'd have to redo all the work
from scratch and be able to show his steps to prevent it from
becoming a problem for him.

In this scenario, Jordan's contributions have become privatized,
even though it was against Jordan's will.  Further, it really
matters not at all whether the source he was hacking was BSD or
Linux or Jordache (Jordan's personal VMS clone ;-)).

Since GPL'ed code *can* be released under alternate license by
its author, it has every possibility of becoming privatized the
same way, though the number of people who can take and use the
code this way is limited.

Ask Linus if he's ever been approached to license Linux under
alternate terms.  I'd be interested in his answer.  8-).

] : ] Face it, Linux is where the cool work is happening.  Beat 'em
] : ] or join 'em, but don't just sit there and whine.
] 
] : 8-).  I pick "Beat 'em".  But only because you didn't give "work
] : with 'em on projects of mutual interest" as a third option.
] 
] Hey, that third choice is fine by me as long as Linux gets
] copylefted versions of the "projects of mutual interest".  Did
] you know that you can release the same exact code under an
] infinite number of copyrights?  So you can shlep your work into
] *BSD under the BSD copyright and into Linux under the GPL, and
] everyone is happy.  Completely legal.  Done all the time.

Yeah; that's the math emulator and the AIC-7770 sequencer code
terms for both Linux and BSD.  The recent spate of FSSTND doc
hacking is another example.

The problem is that there isn't really a lot of this going on,
and the default licenses seem too restrictive in one direction.

I'd be really, really happy if, for instance, the device drivers
for non-boot critical devices in Linux were distributed under LGPL
instead of GPL.  The LGPL makes the same requirements in terms of
patch availability, etc..

As it is, you can load BSD devices as kernel modules in Linux and
distribute them as modules that way, but the same is not true
under BSD, since loading a module is linking it into the kernel.
To comply, BSD would have to GPL their kernel.

Anyway, I don't think things are as grim as you make them out to
be...

] Good to chat again, Terry.  It's been a while.  I guess I'm
] coming out of my slump.

Downhill (picking up speed) out of a slump is the best time
there is.  8-).

                                        Terry Lambert
                                        te...@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.

From: n...@trout.sri.MT.net (Nate Williams)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/01
Message-ID: <3vluc3$b80@helena.MT.net>#1/1
X-Deja-AN: 107281794
references: <3va10b$klc@agate.berkeley.edu> 
<3vbn86$178@fido.asd.sgi.com> <3vc6qu$mb6@agate.berkeley.edu> 
<DCKxqJ.BJB@info.swan.ac.uk>
organization: SRI Intl. - Montana Operations
reply-to: "Nate Williams" <n...@sneezy.sri.com>
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <DCKxqJ....@info.swan.ac.uk>,
Alan Cox <iia...@iifeak.swan.ac.uk> wrote:
>In article <3vc6qu$...@agate.berkeley.edu> h...@alumni.EECS.Berkeley.EDU 
(Jeffrey Hsu) writes:
>>education process applies to BSD as well.  As for politics, there are
>>still fewer competeing BSD distributions than there are linux versions.
>>So much for a cohesive front.
>
>There is only Linux kernel
             ^^^ (one?)

What about Linux-alpha, linix-m68k, linux-PPC, Amiga-Linux and the like.
They are all completely different from one another, and although they
share a large part of the code, each is at a different stage of
modification.

Heck, there are even two different ALPHA kernels, one from the folks at DEC,
another by Linux`s himself.

And then there's the 'stable and development' branches, which for the
most part == 'dead' and 'fixing old/adding new bugs' branches.  For the
most part one person (Linus) does the intergration, but there are still
lots of different versions of the same kernel distributed at the same
time.

>The FreeBSD approach of '_the_ distribution' hasn't suited the practicalities
>of Linux distribution and making CD's. There are for example Linux
>distributions built for specialist jobs (like Xdenu which is an Xterminal
>distribution).

But it could easily be suited to these, given the tools that have
developed for 'packaging' different parts of the system.  All you need
for an Xterminal is a minimal base system plus a minimal X system.  This
can be easily be part of 'two' mini-distributions inside of the base
distribution.

Nate
-- 
n...@sneezy.sri.com    | Research Engineer, SRI Intl. - Montana Operations
n...@trout.sri.MT.net  | Loving life in God's country, the great state of
work #: (406) 449-7662 | Montana.  Wanna go fishing?  Send me email, and we'll
home #: (406) 443-7063 | setup something.

From: n...@trout.sri.MT.net (Nate Williams)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/01
Message-ID: <3vlva1$bdl@helena.MT.net>#1/1
X-Deja-AN: 107281795
references: <3tk1r3$l8o@news1.halcyon.com> <DC86oE.JJH@pell.com> 
<3va18t$knq@agate.berkeley.edu> <gk6GFrW00YUwADPEIh@andrew.cmu.edu>
organization: SRI Intl. - Montana Operations
reply-to: "Nate Williams" <n...@sneezy.sri.com>
newsgroups: comp.sys.powerpc,comp.os.linux.development.system

In article <gk6GFrW00YUwADP...@andrew.cmu.edu>,
Gerry S Hayes  <sumn...@CMU.EDU> wrote:
>I'm interested, now.  What things do you see as broken about Linux?  I
>am well aware of certain things that still need to be implemented
>before Linux can claim to be a complete Unix implementation (real file
>descriptor passing, accounting & quotas, make sure /proc is secure)
>and that kernel threads really need to be added at some point in the
>(near?) future, but this is just indicative of the fact that Linux is
>a work in progress.

As well as FreeBSD.

>It certainly has advantages over *BSD (better
>scheduler and file system, vastly superior non-X console) and used to
>be more POSIX compliant (is this still the case?).

First of all, the scheduler in FreeBSD is much better under high loads
than the scheduler under Linux.  Contact Matt Dillon @best.com who
*used* to run Linux (Dillon's cron) for information on it.  Second, the
FS in FreeBSD is the *fastest* of *ALL* of the free Unixes.  And, if you
want to to off synchronous meta-data writing you're free to, which will
give you the speedy 'rm -rf *' speeds that ext2fs is so famous for.

Finally, it appears that you've only installed NetBSD, and naively
assumed the default console driver is the same.  The only thing that the
Linux console has is the mouse support, which IMHO doesn't belong in the
kernel but that's another story.  Virtual consoles (as many as you have
memory for), graphic/text switching, loadable keymaps, etc.. are all part
of the standard FreeBSD console driver.

>years.  I've only sat through one NetBSD installation and played with
>it for a couple hours, so I could be very mistaken about some of the
>above.

Give FreeBSD a try.  The installation tools are quite advanced (better
than some of the Linux distribution is many folks minds), and we've done
a much better job of making it more usable with the addition of a large
number of easily installation 'ports'.

Don't judge all *BSDs by one. :-)

Nate
-- 
n...@sneezy.sri.com    | Research Engineer, SRI Intl. - Montana Operations
n...@trout.sri.MT.net  | Loving life in God's country, the great state of
work #: (406) 449-7662 | Montana.  Wanna go fishing?  Send me email, and we'll
home #: (406) 443-7063 | setup something.

From: lm@neteng (Larry McVoy)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/01
Message-ID: <3vm27k$pkj@fido.asd.sgi.com>#1/1
X-Deja-AN: 107281802
references: <3tk1r3$l8o@news1.halcyon.com> <DC86oE.JJH@pell.com> 
<3va18t$knq@agate.berkeley.edu> <gk6GFrW00YUwADPEIh@andrew.cmu.edu> 
<3vlva1$bdl@helena.MT.net>
followup-to: comp.sys.powerpc,comp.os.linux.development.system
organization: Silicon Graphics Inc., Mountain View, CA
reply-to: l...@slovax.engr.sgi.com
newsgroups: comp.sys.powerpc,comp.os.linux.development.system

Nate Williams (n...@trout.sri.MT.net) wrote:
: >It certainly has advantages over *BSD (better
: >scheduler and file system, vastly superior non-X console) and used to
: >be more POSIX compliant (is this still the case?).

: First of all, the scheduler in FreeBSD is much better under high loads
: than the scheduler under Linux.  

Used to be so, is no longer so.  See linux v1.3.13.  It context switches
faster than *any* commercial or free unix that I have measured.  I've
measured IBM's top of the line, alphas, HP snakes, Suns, SGIs, and a
bunch of PCs.  I get 10 usecs context switches while the system is multi
user, running vmstat 2, X windows, etc, on a 100mhz 486.  What do you get?

: Second, the
: FS in FreeBSD is the *fastest* of *ALL* of the free Unixes.  And, if you
: want to to off synchronous meta-data writing you're free to, which will
: give you the speedy 'rm -rf *' speeds that ext2fs is so famous for.

Two points:

	1.  Post the data that proves your point.  I'm a file system
	person, I put the clustering into Sun's version of FFS (which
	the BSD crowd then copied), I've been an advocate of the BSD
	FFS for years.  However, I believe that the Linux ext2 fs is
	faster and better in all respects.  If you have *DATA* that
	shows this not to be the case, then post.

	2.  Do this.  Turn off the sync meta update in FFS.  Untar a
	big directory _into_ the file system and power off the machine
	in the middle.  Now do the same with Linux.  Please run fsck
	under script and post the outputs.  That's what conviced me that 
	Linux was better.  Go do it and report back to us.
--
---
Larry McVoy			(415) 390-1804			 l...@sgi.com

From: lm@neteng (Larry McVoy)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/01
Message-ID: <3vltjk$kvp@fido.asd.sgi.com>#1/1
X-Deja-AN: 107281813
references: <3vbn86$178@fido.asd.sgi.com> <3vc6qu$mb6@agate.berkeley.edu> 
<3vhoe9$8pt@fido.asd.sgi.com> <DCMM0H.Ms9@info.swan.ac.uk>
followup-to: comp.os.linux.development.system,comp.sys.powerpc
organization: Silicon Graphics Inc., Mountain View, CA
reply-to: l...@slovax.engr.sgi.com
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

Alan Cox (iia...@iifeak.swan.ac.uk) wrote:
: In article <3vhoe9$...@fido.asd.sgi.com> l...@slovax.engr.sgi.com writes:
: >Basically,  all I/O needs to happen through getpage()/putpage() interfaces
: >with read/write implemented as warts on top of this.  If you do it this
: >way, the kernel implements read/write as
: >
: >	m = mmap(file_loc_into_kernel_space)
: >	bcopy(m, user_buffer, n)
: >
: >and the kernel can/may take page faults on itself.  This is how SunOS works.
: >It means mmap is the only way you really look at pages; user or kernel.  
: >Talk to McKuisick, he knows the BSD VM is gunk.

: Doesn't the kernel taking faults on itself doing a write() via mmap make
: O_APPEND semantics even more exciting than normal. Also wouldn't it be more
: sensible at that point to put the mmap/memcpy in userspace and abolish the
: write system call totally.

: Alan

Damn, you're smart.  Seriously, you put your finger right on one of the yucky
issues of doing things via mmap.  O_APPEND is one case and writing past EOF 
is a modification of the O_APPEND case. 

The answer to the O_APPEND thing is the reason (at least one, I'll tell you 
another in a minute) that write()/read() are still in the kernel.  The
code I posted yesterday was dramatically simplified.  The real code looks
more like

		ufs_rdwr()
		{
			if (doing_write) {
				/*
				 * This will allocate the blocks
				 */
				bmap_write(inode, off, length);

				/* other stuff */
			}

			map the area in question

			bcopy()
		}

There are some problems with this approach:  if you do the allocate
and the I/O fails, your file has already been extended, I don't think
SunOS truncated it back down.

You need interfaces that let you pass the write(2) length parameter
all the way down to bmap().  You wannt this because the lower level
routines, as well as the stuff in the middle, can do more effecient
work if they know how much they need to do.  Sun sort of screwed up
their implementation because they didn't pass the lengths all the way 
through.  All the way means all the way.  So if you do a 
write(fd, buf, sizeof(buf)) I want *every* routine that gets involved
in that I/O to know that "sizeof(buf)" is the length of I/O that is
going to happen.

The information frequently gets lost because of old buffer cache 
interfaces that wanted to work on a block at a time or a set of
blocks.

Another reason you want read/write in the kernel - signal semantics
during I/O are sort of weird.  Ideally, you would like your I/O to
complete atomically - all of it or none of it.   If there is an error,
it's easier to figure out what to do (or block the error) in the kernel.

Another reason that read/write are in the kernel: remember the mapping
you have to do to get the bcopy going?  SunOS cached those mappings
in a special kernel cache called segmap.  So if you did silly stuff like
read a byte at a time, the kernel didn't go through all of the work to
set up the mapping once, it had it cached.

The funny thing that I never understood about SunOS was that the 
segmap cache was not available to users that mapped files, that
went through segvn.  I think the reason was that segmap was faster
because the kernel could make some assumptions that it couldn't
make in the user mapping case.  Personally, I never liked it but
never thought about it hard enough to figure out if you could merge
them.

That's enough for today.   If you are interested in this topic
(perhaps we ought to have a new thread title) and want to discuss
it some more, a couple of thoughts:  (a) I can make a mailing 
alias at slovax and we could take it off line, (b) if you
send mail to my archives server, you can get the SunOS VM papers.
I'd suggest reading them, they are quite useful as background.
To get these:

	% Mail archi...@slovax.engr.sgi.com
	Subject: SunOS.*
	^D

and you'll get get a bunch of postscript docs. They are definitely interesting
reading.

Cheers,
--
---
Larry McVoy			(415) 390-1804			 l...@sgi.com

From: lm@neteng (Larry McVoy)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/01
Message-ID: <3vm1i6$pkj@fido.asd.sgi.com>
X-Deja-AN: 107281818
references: <3tk1r3$l8o@news1.halcyon.com>  
<DBsGpy.27w@undergrad.math.uwaterloo.ca> <3ui5m7$rru@agate.berkeley.edu> 
<3v6tac$i4k@galaxy.ucr.edu> <3va10b$klc@agate.berkeley.edu> 
<3vbn86$178@fido.asd.sgi.com> <3vd021$a2o@park.uvsc.edu> 
<3vhq0a$8pt@fido.asd.sgi.com> <3vkk18$scd@park.uvsc.edu>
followup-to: comp.os.linux.development.system,comp.sys.powerpc
organization: Silicon Graphics Inc., Mountain View, CA
reply-to: l...@slovax.engr.sgi.com
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

Terry Lambert (te...@cs.weber.edu) wrote:
: lm@neteng (Larry McVoy) wrote:
: ] I think you are correct in terms of stability (maybe).  I doubt it in
: ] terms of performance.  The last time I looked at the BSD stack it was
: ] dog slow.  Sun's STREAMS stack was about at parity with the BSD stack 
: ] and lemme tell you, man, that ain't nothin' to write home about.

: The slowness in the TCP/IP in BSD has actually been fixed using
: the Vegas patches (see the papers at the X-kernel site at the
: University of Arizona).  It does a nice job of resolving all
: the messiest issues.

Perhaps I should have been more specific.  When I measure networking
performance, I'm interested in the following on widely available
wires such as 10 and 100 Mbit ethernet:

	.  CPU cycles per 1 byte packet (stack overhead)
	.  CPU cycles per byte (checksum / bcopy overhead)
	.  Round trip times for UDP & TCP "hot potato" (stack latency)
	.  Obtainable bandwidth

I went and read the Vegas paper (cs.arizona.edu:xkernel/Papers/vegas.ps)
which I hope documents the "Vegas patches".  This paper is addressing
the (important) issue of congestion on the internet.  It does a great
job of showing how to get more bandwidth out of the net.  

However, it does not address any of the performance issues that I
mention above.  And the last time I measured NetBSD it sucked.  Linux
was better.  *BSD may well be a better Internet TCP but for local area
LANs it is pretty poor.  Not exactly anything to write home about.

: Streams really, really sucks, when you get down to it, and the
: major place it sucks is the taking model.  

I'm glad to see you say that.  I know you were working on a STREAMS
framework; I'm hoping that you will join in the cause of explaining to
others that it is a broken architecture.  You can't add more layers 
when what you want is the ability to send/receive a packet in 10 usecs
of CPU time.

And if you think that the 10 usecs number is unobtainable please note
that peope are doing that now with special hardware only "networks" and
we will be forced to use that stuff if we don't get our TCP act
together.  Not a pretty picture.

: It is possible to fix streams, though not probable.

What he said.

: ] When I say "SunOS" I mean SunOS 4.x; when we talk about 5.x, I
: ] typically refer to it disdainfully as that Solaris thingy.  I don't do
: ] Solaris.  Not for any price.  Too gross for me.

: I do that too.  Pisses Solaris-lovers right off.  8^).

he, he, he, he.  :-)

: ] The BSD vnode layer has some fundemental flaws.  Some are tied up in the
: ] the VM/VFS interfaces.
:
: I don't know what source you've been reading, but here's some code
: from ufs_vnops.c that implements bread...
:
:                 error = VOP_BMAP(vp, bp->b_lblkno, NULL, &bp->b_blkno, NULL);

You are looking at the BSD code.  The SunOS/SVR4 (they are the same thing)
VFS interface has no VOP_BMAP().  That is a botch.  The interfaces that
do I/O are VOP_GETPAGE(), VOP_PUTPAGE().  bmap is not something that 
should be exposed in the VFS.

Take a gander at the SunOS papers on my archive server.  You'll see
the difference right away.  

: ] Nah, forget the SMP stuff.  The SunOS SMP stuff isn't interesting
: ] to me.  I'm talking about the Vm/VFS interfaces and architecture.

: The SMP stuff in the BSD camps, at least, is most likely going
: to consist of 95% interface cleanup that ought to be done anyway;

OK, cool.  But SMP is a bad thing to do to your kernel.  I believe that
as the vendors strive for more performance, you will see kernel
architectures that start looking more like QNX than an SMP kernel.  SMP
kernels do not scale.  I know you don't believe me because the horizon
you are seeing is 2-8 processor Pentiums.  Suffice it to say that is
interesting only to the low end; obviously, the "workstation" vendors
have to move past that.  They are all figuring out that SMP is a lose.

: ] BSD is a rat hole, because of its copyright.  Anyone who has been
: ] around for a while is perfectly aware that the code can get locked up.
: ] Linux code can never be locked up.  Never again will I become dependent
: ] on code that might go away.  I have years of investment into a source
: ] base that Sun is trying to throw away.  Why should I invest my effort
: ] and talent into another source base that someone else could productize,
: ] lock up, and eventually throw away.  No thanks.  I'll put my cards in
: ] the Linux basket - there are enough other people that understand the
: ] legalities and they are doing the same thing.  

: What you *can't* depend on is subsequent changes to the privatized
: source.  In all fairness, this is something you can't depend on
: anyway in any case.  
:
: Since GPL'ed code *can* be released under alternate license by
: its author, it has every possibility of becoming privatized the
: same way, though the number of people who can take and use the
: code this way is limited.
:
: Ask Linus if he's ever been approached to license Linux under
: alternate terms.  I'd be interested in his answer.  8-).

I think this is the crucial point:  with BSD, any company can choose to
take it private and you no longer get bug fixes unless you pay for them
and you can't see the source to those bug fixes.  Tell the truth - wouldn't
you really like to have rights to the SunOS 4.x kernel.  You would, whether
you think so or not.  It's the nicest kernel around.

With Linux, Linus doesn't have a choice.  The Linux kernel is GPLed,
yes, but owned by Linus, not.  There are lots of people that
contributed to that kernel.  You would have to get all of them to give
you a version of the code with a different copyright.  Good luck, it
ain't gonna happen.

: The problem is that there isn't really a lot of this going on,
: and the default licenses seem too restrictive in one direction.

Well, I for one, wouldn't mind seeing a BSD networking stack in Linux.
I would be happy to attempt to broker a trade between Linux and BSD
of the networking code for some set of device drivers.  I spoke with
McKuisick about this and he said there is no way that the BSD copyright
is going to change.  That means that the BSD code will never be used in
the Linux kernel - you can't put a GPL on top of the BSD copyright.

--
---
Larry McVoy			(415) 390-1804			 l...@sgi.com

From: iia...@iifeak.swan.ac.uk (Alan Cox)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/02
Message-ID: <DCoGIz.FGK@info.swan.ac.uk>#1/1
X-Deja-AN: 107281742
sender: n...@info.swan.ac.uk
x-nntp-posting-host: iifeak.swan.ac.uk
references: <3vbn86$178@fido.asd.sgi.com> <3vd021$a2o@park.uvsc.edu> 
<3vhq0a$8pt@fido.asd.sgi.com>
organization: Institute For Industrial Information Technology
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <3vhq0a$...@fido.asd.sgi.com> l...@slovax.engr.sgi.com writes:
>I think you are correct in terms of stability (maybe).  I doubt it in
>terms of performance.  The last time I looked at the BSD stack it was
>dog slow.  Sun's STREAMS stack was about at parity with the BSD stack 
>and lemme tell you, man, that ain't nothin' to write home about.

Not stability. The gaps left in 1.2 are completeness issues. Good things to
have it doesn't cope with - like PAWS, and window scaling and clean support
for variable length headers. 1.3.x has some of these now (and at the moment
the stability problems to go with it as you would expect 8)).

On most of my benchmarks with 1.2.x Linux and BSD come out fairly level (BSD
tends to win on raw tcp burst speed, Linux on multiple parallel streams
etc). Not a lot in it. [I ignore loopback here btw because Linux loopback
is far less optimal than BSD.. its a special case not worth optimising out to
the detriment of the rest of the performance]

Since then Linus has rewritten the scheduler to be very very much faster and
various people - not just me its a team effort (notablly Jorge Cwik, Tom May 
and Arnt Gulbrandsen in this case) have sped up the checksums a lot, added
single pass over memory copy from user space checksum and fragment, got
cache aligned IP headers added copy and checksum support and eliminated spare
copies. I've not looked hard at FreeBSD 2.0.5 over 2.0 other than to note 
things like finally fixing the raw socket bug etc.

Both Linux & BSD networking could be massively faster still - Van Jacobsons
quoted figures and snippits of code on his work more than prove that. Even 
sadder - in both cases none of what needs doing is new invention its old hat
that nobody has got around to doing all of these things.

[ObFlameWar] Streams is NOT the answer 8)

Alan
-- 
  ..-----------,,----------------------------,,----------------------------,,
 // Alan Cox  //  iia...@www.linux.org.uk   //  GW4PTS@GB7SWN.#45.GBR.EU  //
Redistribution of this message via the Microsoft Network is prohibited
Do you trust your web client. <IMG src="file:/dev/zero"><IMG src="file:/com1:">

From: torva...@cc.Helsinki.FI (Linus Torvalds)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/02
Message-ID: <3vna83$f2s@kruuna.helsinki.fi>#1/1
X-Deja-AN: 107281851
sender: torva...@cc.helsinki.fi
references: <3va10b$klc@agate.berkeley.edu> 
<3vc6qu$mb6@agate.berkeley.edu> <DCKxqJ.BJB@info.swan.ac.uk> 
<3vluc3$b80@helena.mt.net>
content-type: text/plain; charset=ISO-8859-1
organization: University of Helsinki
mime-version: 1.0
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <3vluc3$...@helena.mt.net>,
Nate Williams <n...@sneezy.sri.com> wrote:
>In article <DCKxqJ....@info.swan.ac.uk>,
>Alan Cox <iia...@iifeak.swan.ac.uk> wrote:
>>In article <3vc6qu$...@agate.berkeley.edu> h...@alumni.EECS.Berkeley.EDU 
(Jeffrey Hsu) writes:
>>>education process applies to BSD as well.  As for politics, there are
>>>still fewer competeing BSD distributions than there are linux versions.
>>>So much for a cohesive front.
>>
>>There is only Linux kernel
>             ^^^ (one?)
>
>What about Linux-alpha, linix-m68k, linux-PPC, Amiga-Linux and the like.
>They are all completely different from one another, and although they
>share a large part of the code, each is at a different stage of
>modification.

Actually, linux-alpha and linux-i386 share the same source tree.  There
is one extra axp patch (a very small one), but that is due to some OSF/1
binary compatibility stuff that I haven't decided on how to integrate
cleanly yet. 

The others are indeed not yet integrated, to a large degree due to
drivers that I don't want to put in the official tree yet (and the fact
that aside from the axp, only the 68k port is actually stable). 

>Heck, there are even two different ALPHA kernels, one from the folks at DEC,
>another by Linux`s himself.

No longer.  The DEC people are using my kernel these days (their binary
distribution isn't up-to-date yet, but that's similar to Slackware not
using 1.3.14 currently ;-)

>And then there's the 'stable and development' branches, which for the
>most part == 'dead' and 'fixing old/adding new bugs' branches.  For the
>most part one person (Linus) does the intergration, but there are still
>lots of different versions of the same kernel distributed at the same
>time.

So? I fail to see the relevance of this.  Yes, you can get old versions
of linux, and yes, they are even encouraged for sites that want
stability, but that's just a normal result of open development.  Sun has
SunOS 4.1.3 (and I know people who won't upgrade from that for the same
reasons some people want to use the 1.2.x linux kernels), and various
Solaris 2.x releases.  Does that mean that they have several different
source trees?

		Linus

From: ghud...@mit.edu (Greg Hudson)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/02
Message-ID: <3vnorm$oje@senator-bedfellow.MIT.EDU>#1/1
X-Deja-AN: 107423515
references: <3tk1r3$l8o@news1.halcyon.com> <DC86oE.JJH@pell.com> 
<3va18t$knq@agate.berkeley.edu> <gk6GFrW00YUwADPEIh@andrew.cmu.edu> 
<3vlva1$bdl@helena.MT.net> <3vm27k$pkj@fido.asd.sgi.com>
organization: Massachvsetts Institvte of Technology
newsgroups: comp.sys.powerpc,comp.os.linux.development.system

Larry McVoy (lm@neteng) wrote:
: 	2.  Do this.  Turn off the sync meta update in FFS.  Untar a
: 	big directory _into_ the file system and power off the machine
: 	in the middle.  Now do the same with Linux.  Please run fsck
: 	under script and post the outputs.  That's what conviced me that 
: 	Linux was better.  Go do it and report back to us.

I'm willing to believe that the FFSfilesystem comes out worse than
the Linux filesystem, but what does that prove?  You shouldn't be
turning off synchronous meta-data updates in your filesystem.  (It
might be enough of a performance boost in a news spool that it will
save some administrators some money, but this is explictly a mode
where reliability is NOT a design goal.)  Last I checked, under normal
conditions Linux ext2 is not as careful as FFS about keeping the
filesystem consistent during writes, so a spontaneous reboot is more
likely to damage a Linux filesystem than a NetBSD filesystem.  This is
certainly my experience in practice.

While I am here, I find your theory about code copyrights causing BSD
politics intriguing, but it doesn't seem very likely.  A few of the
conflicts have been over violation of code copyright, but the vast
majority seems to have been good old personal abrasiveness and
immaturity.  If you were right about your theory, you'd expect to find
more conflict between BSDi and the non-commercial BSD camps, but in
fact the rivalry is between NetBSD and FreeBSD.

For that matter, by your reasoning about copyrights, it should be
surprising that there are free BSD variants at all, since BSD has
been "productized and locked up" by Ultrix, SunOS, BSDi, and who knows
how many other companies.  I don't think you can identify rat-hole
projects by looking at whether they use a Berkeley-style or GPL
copyright; the real danger is that no one will continue to maintain
the (free) code base, and that can happen whether or not it's a GPL'd
product.

One further factual correction from a post made by someone other than
Larry: NetBSD does include pcvt, a console similar to Linux's virtual
terminals.  It's not the default console as shipped because the core
team doesn't believe that virtual terminal functionality really
belongs in the kernel (they're right, but then, they don't have a
user-space alternative, so I'm not sure if I agree with them there).

From: r...@informatik.uni-koblenz.de (Ralf Baechle)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/02
Message-ID: <3vocjt$5r1i@info4.rus.uni-stuttgart.de>#1/1
X-Deja-AN: 107423573
distribution: world
references: <3tk1r3$l8o@news1.halcyon.com> <3v6tac$i4k@galaxy.ucr.edu> 
<3va10b$klc@agate.berkeley.edu> <3vbn86$178@fido.asd.sgi.com> 
<3vc6qu$mb6@agate.berkeley.edu> <3vhoe9$8pt@fido.asd.sgi.com> 
<DCKF9B.H08@cunews.carleton.ca> <3vj3s3$eps@fido.asd.sgi.com>
organization: Uni Koblenz, Germany.
reply-to: r...@waldorf-gmbh.de
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <3vj3s3$...@fido.asd.sgi.com>, lm@neteng (Larry McVoy) writes:

|> I'm hoping that the SMP stuff will go into a different source base, long 
|> term.  SMP completely ruins your uniprocessor performance.

I think this is a problematic approach.  Linux is currently ported to various
architectures.  Linux is being ported to SMP.  Linux will be ported to cover
non MMU systems.  Having one signle source tree would make the development
more consistent; this consistency is often a big lack in the spread development
of Linux over the world.  Anyway - we'll find a solution.

  Ralf

From: iia...@iifeak.swan.ac.uk (Alan Cox)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/03
Message-ID: <DCqMsw.68H@info.swan.ac.uk>#1/1
X-Deja-AN: 107423544
sender: n...@info.swan.ac.uk
x-nntp-posting-host: iifeak.swan.ac.uk
references: <3vc6qu$mb6@agate.berkeley.edu> <DCKxqJ.BJB@info.swan.ac.uk> 
<3vluc3$b80@helena.MT.net>
organization: Institute For Industrial Information Technology
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <3vluc3$...@helena.MT.net> "Nate Williams" <n...@sneezy.sri.com> writes:
>What about Linux-alpha, linix-m68k, linux-PPC, Amiga-Linux and the like.
>They are all completely different from one another, and although they
>share a large part of the code, each is at a different stage of
>modification.

I suppose somehow magically every *BSD developers keypresses are
interlocked. You could have fun debating real issues instead of sniping

>Heck, there are even two different ALPHA kernels, one from the folks at DEC,
>another by Linux`s himself.

He's called Linus actually

Two development strands - they are merged in together and part of one
distribution (same as the i386/mips/sparc development set live in). The 68K
kernel has been merged into 1.2.x.

The only oddity at the moment is the ARM port - because it was someone's
student project and he needed to do his initial port against a known point.
This initial release is 1.1.59. Porting against a single chosen base then
upgrading the port to match is called 'Good software engineering practice'

>And then there's the 'stable and development' branches, which for the
>most part == 'dead' and 'fixing old/adding new bugs' branches.  For the
>most part one person (Linus) does the intergration, but there are still
>lots of different versions of the same kernel distributed at the same
>time.

As with any other software..they are called 'revisions'. Thats different
to NetBSD/FreeBSD/386BSD which are not revisions but very different versions
of the same base that may one day meet again.

>developed for 'packaging' different parts of the system.  All you need
>for an Xterminal is a minimal base system plus a minimal X system.  This
>can be easily be part of 'two' mini-distributions inside of the base
>distribution.

On what media. You may not what to put them inside the main distributions
when you are shipping floppy disk sets. Nor does it nicely cope with
different system sets for different goals.

[Insert smiley's if you have american made sarcasm detectors that don't
 do UK formatted sarcasm....]

Alan
-- 
  ..-----------,,----------------------------,,----------------------------,,
 // Alan Cox  //  iia...@www.linux.org.uk   //  GW4PTS@GB7SWN.#45.GBR.EU  //
Redistribution of this message via the Microsoft Network is prohibited
Do you trust your web client. <IMG src="file:/dev/zero"><IMG src="file:/com1:">

From: Terry Lambert <te...@cs.weber.edu>
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/03
Message-ID: <3vpfld$9fc@park.uvsc.edu>
X-Deja-AN: 107423609
references: <3tk1r3$l8o@news1.halcyon.com>  
<DBsGpy.27w@undergrad.math.uwaterloo.ca> <3ui5m7$rru@agate.berkeley.edu> 
<3v6tac$i4k@galaxy.ucr.edu> <3va10b$klc@agate.berkeley.edu> 
<3vbn86$178@fido.asd.sgi.com> <3vd021$a2o@park.uvsc.edu> 
<3vhq0a$8pt@fido.asd.sgi.com> <3vkk18$scd@park.uvsc.edu> 
<3vm1i6$pkj@fido.asd.sgi.com>
organization: Utah Valley State College, Orem, Utah
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

lm@neteng (Larry McVoy) wrote:
] Perhaps I should have been more specific.  When I measure networking
] performance, I'm interested in the following on widely available
] wires such as 10 and 100 Mbit ethernet:
] 
] 	.  CPU cycles per 1 byte packet (stack overhead)
] 	.  CPU cycles per byte (checksum / bcopy overhead)
] 	.  Round trip times for UDP & TCP "hot potato" (stack latency)
] 	.  Obtainable bandwidth

Bruce Evans is the guy to ask about this.  Unfortunately, I don't
frequently save benchmarks, since I feel them to both be highly
subjective and highly unstable over time.

I rememeber FreeBSD being *very* fast on benchmarks using DEC
100Mbit cards (the cheapest 100Mbit hub you can get today is a
FreeBSD box -- not necessarily the most efficient).

] However, it does not address any of the performance issues that I
] mention above.  And the last time I measured NetBSD it sucked.  Linux
] was better.  *BSD may well be a better Internet TCP but for local area
] LANs it is pretty poor.  Not exactly anything to write home about.

I guess I don't understand this: you are placing emphasis on
latency over throughput?

I would thing that pool retention time reflects only on how big
a pool you need to maintain to handle a particular level of
traffic.

As far as latency is concerned, I think it is an issue, but I'm
prejudiced, having worked on request/response architectures like
Novell NetWare.  Request/response is actually a pretty stupid
way to do things and has more to do with not having room for a
cache in a 640k DOS box or a 512k Mac box than anythig else.

I would not hold out latency as a larger issue than throughput.

] : Streams really, really sucks, when you get down to it, and the
] : major place it sucks is the tasking model.  
] 
] I'm glad to see you say that.  I know you were working on a STREAMS
] framework; I'm hoping that you will join in the cause of explaining to
] others that it is a broken architecture.  You can't add more layers 
] when what you want is the ability to send/receive a packet in 10 usecs
] of CPU time.

With the system call overhead at ~20uS of user time with a 2k
copy, you are going to have a hard time achieving your latency
goal out of hardware.  Even Native NetWare, which uses hand crafted
assembly code for its read/write path and directly uses disk buffers
as network buffers, hits 6uS, and that's with fore-knowledge of
a fixed packet size and isn't doing anything useful with the
packet.  To get a read turned around, you are talking in excess
of ~420uS, ~160uS of which are ODI driver latency.

And NetWare is the grail (or golden calf) to beat.

[ ... the BSD unified buffer cache code ... ]

] You are looking at the BSD code.  The SunOS/SVR4 (they are the same thing)
] VFS interface has no VOP_BMAP().  That is a botch.  The interfaces that
] do I/O are VOP_GETPAGE(), VOP_PUTPAGE().  bmap is not something that 
] should be exposed in the VFS.

This botch is codified in the interface, along with vnode locking
and some other issues.  And I claim that It Will Be Fixed.

] Take a gander at the SunOS papers on my archive server.  You'll see
] the difference right away.  

I've read the SunOS papers, believe me.  SunOS is far from perfect
in this respect.  The APPEND mode crap not withstanding.  The
biggest botch by far in BSD is the UNIX(POSIX) dmain socket
support.

The biggest botch in the SunOS code (to give equal time) is the
aioread/aiowrite/aiowait/aiocancel support, and that's mostly
because synchronicity in that respect should be handled in a user
library as a flag to the real call interface.

] OK, cool.  But SMP is a bad thing to do to your kernel.  I believe
] that as the vendors strive for more performance, you will see
] kernel architectures that start looking more like QNX than an
] SMP kernel.

OK. Let's first agree that SMP is a dumb idea period for a Von
Neumann machine, and the only way to go for a dataflow machine.

SMP is limited in applicability to multiprocessing and to tasks
which are intrinsically parallelizable, like fluid dynamics, and
which would better benefit from Dataflow in any case.

Where is the SMP market?  The SMP market is people who need more
processing power than is currently available, and are willing to
pay for it.

] SMP kernels do not scale.  I know you don't believe me because
] the horizon you are seeing is 2-8 processor Pentiums.

Not true.  My current horizon is 64 processor PPC 604 boxes from
a company in Germany, or 32 processor PPC boxes from Motorolla.

But both these horizons are way under the Connection Machine or
the GoodYear (JPL's 64k processor box) level.

] Suffice it to say that is interesting only to the low end;
] obviously, the "workstation" vendors have to move past that.
] They are all figuring out that SMP is a lose.

I don't know if I believe that.  Intel clone and Apple both think
they are selling workstations.

SMP doesn't scale on V.N. hardware designs past the point of
diminishing returns on scheduling arbitration, which if you
are as smart as Sequent, is 185% compounded.

Dynix takes great pains in their VM to avoid synchronization;
in most ways, it's superior to SunOS's SLAB allocation.

] I think this is the crucial point:  with BSD, any company can choose to
] take it private and you no longer get bug fixes unless you pay for them
] and you can't see the source to those bug fixes.

I think the reverse, the inability to unencomber a GPL'ed code base
that has had multiple authors contributing under the GPL terms, is
a strike against cooperative large scale projects with dual release
for licensing.

I honestly don't think you'd be able to "take BSD private" like
you are suggesting.  I'd have to have evidence of one product
that has been taken private successfully this way.

] Tell the truth - wouldn't you really like to have rights to
] the SunOS 4.x kernel.  You would, whether you think so or not.
] It's the nicest kernel around.

You bet your ass I would; they've solved some problems that I'd
like to avoid solving again.

By the same token, I'd like the SVR4 2.0 ES/MP kernel sources too;
the current sources have solved DOS driver usage issues and, whether
you think it's a good thing or not, SMP issues.  And I'd like to
avoid redoing that work again as well.

Especially, I'd like the kernel pieces for VM386() (as opposed to
VM86()) to let me run Windows programs by running Windows.

I'd like the bus drivers for everything but the serial ports in
VMS, and their DECNet code, especially LAT.  I really, really
want their HSC code for cluster controllers.

And I want Dell's install tools.

So I wouldn't want *just* the SunOS code; probably, I rank it
further down than the others, actually, since I believe that time
is the only barrier to replicating the work (not so in some of
the other cases).

] With Linux, Linus doesn't have a choice.  The Linux kernel is GPLed,
] yes, but owned by Linus, not.  There are lots of people that
] contributed to that kernel.  You would have to get all of them to give
] you a version of the code with a different copyright.  Good luck, it
] ain't gonna happen.
] 
] : The problem is that there isn't really a lot of this going on,
] : and the default licenses seem too restrictive in one direction.
] 
] Well, I for one, wouldn't mind seeing a BSD networking stack in Linux.
] I would be happy to attempt to broker a trade between Linux and BSD
] of the networking code for some set of device drivers.  I spoke with
] McKuisick about this and he said there is no way that the BSD copyright
] is going to change.  That means that the BSD code will never be used in
] the Linux kernel - you can't put a GPL on top of the BSD copyright.

Say you broker this trade.  How do you free up the GPL from all
of the different authors on the driver set to get a BSD release?

I don't think it's horribly feasible.  8-(.

					Terry Lambert
                                        te...@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.

From: torva...@cc.Helsinki.FI (Linus Torvalds)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/03
Message-ID: <3vptji$mbm@kruuna.helsinki.fi>
X-Deja-AN: 107423615
sender: torva...@cc.helsinki.fi
references: <3tk1r3$l8o@news1.halcyon.com> <3vlva1$bdl@helena.MT.net> 
<3vm27k$pkj@fido.asd.sgi.com> <3vnorm$oje@senator-bedfellow.mit.edu>
content-type: text/plain; charset=ISO-8859-1
organization: University of Helsinki
mime-version: 1.0
newsgroups: comp.sys.powerpc,comp.os.linux.development.system

In article <3vnorm$...@senator-bedfellow.mit.edu>,
Greg Hudson <ghud...@mit.edu> wrote:
>Larry McVoy (lm@neteng) wrote:
>: 	2.  Do this.  Turn off the sync meta update in FFS.  Untar a
>: 	big directory _into_ the file system and power off the machine
>: 	in the middle.  Now do the same with Linux.  Please run fsck
>: 	under script and post the outputs.  That's what conviced me that 
>: 	Linux was better.  Go do it and report back to us.
>
>I'm willing to believe that the FFSfilesystem comes out worse than
>the Linux filesystem, but what does that prove?  You shouldn't be
>turning off synchronous meta-data updates in your filesystem.  (It
>might be enough of a performance boost in a news spool that it will
>save some administrators some money, but this is explictly a mode
>where reliability is NOT a design goal.)  Last I checked, under normal
>conditions Linux ext2 is not as careful as FFS about keeping the
>filesystem consistent during writes, so a spontaneous reboot is more
>likely to damage a Linux filesystem than a NetBSD filesystem.  This is
>certainly my experience in practice.

I've said this before, and I guess I'll say this again.

	BSD "synchronous" filesystem updates are braindamaged.

	BSD people touting it as a feature are WRONG. It's a bug.

Synchronous meta-data updates are STUPID:

 (a) it's bad for performance
 (b) it's bad for filesystem stability

(a) is obvious, and even BSD people will agree to that.  But (b) is not
as obvious, and BSD people mostly say "Huh?"

In short, updating meta-data synchronously almost guarantees that the
filesystem structure will be up-to-date after a crash, but it will _not_
guarantee that the actual file data will be up-to-date.  In fact, it
will often result in a filesystem that "fsck" thinks is perfectly ok,
_despite_ the fact that you have corruption. 

In fact, the way to get a stable filesystem is to do the updates exactly
reverse to the way BSD does it: write out the data blocks first, _then_
write out the meta-data.  The problem with this approach is you end up
with a partial ordering in which to write the data, and ordering it
isn't trivial. 

Doing synchronous meta-data updates is a cludge to make fsck not
complain as much about corrupted filesystems.  It doesn't fix the
problem, it only fixes some of the symptoms.  Touting that as a Good
Thing (tm) is idiocy, IMNSHO (you'll feel safe because fsck doesn't tell
you anything is wrong). 

What makes the BSD approach even more stupid is the fact that the
meta-data inconsistencies are the one thing fsck _can_ fix, so trying to
keep meta-data up-to-date is in some respect a complete waste of time. 

It's much better to instead concentrate on making a better fsck, as fsck
is run only once at bootup (and often not even then as most bootups will
be from a clean filesystem) than to take the performance hit at
run-time.  That's the approach the linux filesystems take (well, at
least the ext2fs filesystem: most other filesystems have a rather stupid
version of fsck). 

Of course, if filesystem integrity is important for you, you don't want
to use the linux ext2fs.  That isn't what I'm trying to claim.  What I'm
saying is that ffs isn't really better in this regard.  If you want
filesystem consistency, you have to use some kind of journalling
filesystem. 

Alternately you can make a unix-type filesystem and do the disk updates
the _right_ way: data blocks first, then indirect blocks (starting from
the lowest level indirected blocks), then the inode, and finally the
directory entry (and going in the opposite direction when you're
deleting a file).  Note that you don't need to do any of these updates
synchronously: you only have to make them in the right order. 

The sad thing is that the FFS approach is not just _wrong_, it's also
slower then the right way (the partial ordering will still allow quite a
lot of re-ordering among non-related updates, so you probably can get
reasonably close to the completely asynchronous case).  Linux doesn't do
it right either, but at least linux doesn't take the performance hit for
no gain. 

		Linus

From: Terry Lambert <te...@cs.weber.edu>
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/03
Message-ID: <3vrnof$ni5@park.uvsc.edu>
X-Deja-AN: 107423551
references: <3tk1r3$l8o@news1.halcyon.com> <3vlva1$bdl@helena.MT.net> 
<3vm27k$pkj@fido.asd.sgi.com> <3vnorm$oje@senator-bedfellow.mit.edu> 
<3vptji$mbm@kruuna.helsinki.fi>
organization: Utah Valley State College, Orem, Utah
newsgroups: comp.sys.powerpc,comp.os.linux.development.system

torva...@cc.Helsinki.FI (Linus Torvalds) wrote:
] I've said this before, and I guess I'll say this again.
] 
] 	BSD "synchronous" filesystem updates are braindamaged.
] 
] 	BSD people touting it as a feature are WRONG. It's a bug.
] 
] Synchronous meta-data updates are STUPID:
] 
]  (a) it's bad for performance
]  (b) it's bad for filesystem stability
] 
] (a) is obvious, and even BSD people will agree to that.  But (b) is not
] as obvious, and BSD people mostly say "Huh?"

Usually, I agree with Linus.  Not this time.

I'll say more than "huh", I'll say "you're wrong".

] In short, updating meta-data synchronously almost guarantees that the
] filesystem structure will be up-to-date after a crash, but it will _not_
] guarantee that the actual file data will be up-to-date.  In fact, it
] will often result in a filesystem that "fsck" thinks is perfectly ok,
] _despite_ the fact that you have corruption. 

You are talking about the difference between corrupted data *in*
a file systems and corrupted *file system structure* here.

The meta-data writes (they don't have to be synchronus, they just
have to be ordered -- USL's code does this, patent pending) are
essential to the mainentnance of POSIX semantics and to guarantee
a consistant file system structure.

] In fact, the way to get a stable filesystem is to do the updates exactly
] reverse to the way BSD does it: write out the data blocks first, _then_
] write out the meta-data.  The problem with this approach is you end up
] with a partial ordering in which to write the data, and ordering it
] isn't trivial. 

That particular problem is, in fact, trivial.

] What makes the BSD approach even more stupid is the fact that the
] meta-data inconsistencies are the one thing fsck _can_ fix, so trying to
] keep meta-data up-to-date is in some respect a complete waste of time. 

File names are directory entries, not attributes of files.  Having
the meta-data "fixed" but the file in lost+found is hardly what
I'd consider correct behaviour.

] Of course, if filesystem integrity is important for you, you don't want
] to use the linux ext2fs.  That isn't what I'm trying to claim.  What I'm
] saying is that ffs isn't really better in this regard.  If you want
] filesystem consistency, you have to use some kind of journalling
] filesystem. 

You want log strucutring, not just journalling.  Journalling does
not guarantee integrity; it only allows you to more quickly identify
where it's bad.  Log structuring (ie: keeping multiple file system
event based -- non idempotent -- transactions atomic) is the only
real soloution, and even then the ability to roll transactions
forward rather than back is the *most* important aspect.

This is because the statefulness of the user program is seperate
from the statefulness of the file system data, and the way you
are using "corrupt" implies that a database transaction that is
made up of two or more file system transactions must roll forward
or backward as a unit.

I have to note here that not even Veritas (VXFS) can guarantee this;
the closest thing is NetWare, and it operates by divorcing the
client of the file system from the operating system entirely.

A UNIX application that used both a log structured file system
that allowed you to roll transactions forward AND a Tuxedo-like
system for transaction idempotence seperate from the file system
structure maintenance also qualifies.

You can build similar systems on VMS and IBM systems because they
synchronously perform operations, have record structured file
systems, and don't have file data cacheing (by default; don't
tell me about VMS 6.x file caching services, I was around when
they were being written, mostly to support the product that we
wrote and DEC OEM'ed: Pathworks for VMS (NetWare)).

] Alternately you can make a unix-type filesystem and do the disk updates
] the _right_ way: data blocks first, then indirect blocks (starting from
] the lowest level indirected blocks), then the inode, and finally the
] directory entry (and going in the opposite direction when you're
] deleting a file).  Note that you don't need to do any of these updates
] synchronously: you only have to make them in the right order. 

With the further caveat that they not overwrite the original top
level until all substructure data is on disk, and consistency
dictates that that update go to a different block as well.

Congradulations.  You've invented log structuring.

] The sad thing is that the FFS approach is not just _wrong_, it's also
] slower then the right way (the partial ordering will still allow quite a
] lot of re-ordering among non-related updates, so you probably can get
] reasonably close to the completely asynchronous case).  Linux doesn't do
] it right either, but at least linux doesn't take the performance hit for
] no gain. 

Except that Delayed Ordered Writes are patented, or at least patent
pending.  And they only help concurrency if you trust your cache
to disk, which you can only do if you use certain equipment (notably
SCSI drives that use their rotational energy to complete a cached
write) and have intimate knowledge of cylinder boundries on the
translated geometry drives most modern systems use.  Which is to
say it requires at least SCSI II or better, and even then, you
have pretty much obviate the need for reordering the writes in the
code.  Unless you are satisfying SMP constraints, and even then,
that;s really an implementation choice more than a requirement.

                                        Terry Lambert
                                        te...@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.

From: ghud...@mit.edu (Greg Hudson)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/03
Message-ID: <3vqjh3$qup@senator-bedfellow.MIT.EDU>#1/1
X-Deja-AN: 107423505
references: <3tk1r3$l8o@news1.halcyon.com> <3vlva1$bdl@helena.MT.net> 
<3vm27k$pkj@fido.asd.sgi.com> <3vnorm$oje@senator-bedfellow.mit.edu> 
<3vptji$mbm@kruuna.helsinki.fi>
followup-to: comp.sys.powerpc,comp.os.linux.development.system
organization: Massachvsetts Institvte of Technology
newsgroups: comp.sys.powerpc,comp.os.linux.development.system

Linus Torvalds (torva...@cc.Helsinki.FI) wrote:
: Synchronous meta-data updates are STUPID:

:  (a) it's bad for performance
:  (b) it's bad for filesystem stability

: (a) is obvious, and even BSD people will agree to that.  But (b) is not
: as obvious, and BSD people mostly say "Huh?"

Not only is your argument not obvious, it's not solid.

You have claimed that a filesystem is corrupt if the data in the files
it contains is not up to date.  By doing this, you have redefined
"corruption" to mean much more than the usual "violation of the
invariants of filesystem structure"; you instead refer to violation of
a much larger and more vague invariant: that the data in the filesystem
should be "up to date."

This invariant may seem obvious and well-defined to you if you're only
thinking in terms of creation of small files by "cp" and "tar xf", but
you won't be able to maintain your "up to date" invariant in the case of

	* Large files which are opened once and written to over a long
	  period of time
	* Large files which are appended to over time
	* Databases which are opened and modified regularly

If you're going to do any sort of write caching for the above classes
of files, you will not ever be able to guarantee that their contents
are "up to date," and you will always risk "corruption."  A journaling
or log-structured filesystem won't help; either you write the data to
disk before returning from write() or you risk a power failure
happening before the data blocks hit the disk.

Good applications and systems are designed to tolerate failures at any
point during the file update, as long as the underlying filesystem
remains consistent so that individual filesystem operations (file
creation, single-block writes, deletion, renaming, etc.) are atomic.
Unless I've completely misunderstood your argument, you seem to be
advocating a futile attempt at removing the need for application-level
failure atomicity by trying to create files only as "up to date"
entities.

All that said, you may be quite right about synchronous meta-data writes
being unnecessary for stability.  I'm not really qualified to debate the
necessity of such a design decision.  It remains the case that, in my
experience, FFS achieves performance equal to or better than that of
ext2fs and significantly better reliability.

From: torva...@cc.Helsinki.FI (Linus Torvalds)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/04
Message-ID: <3vsfdv$3nn@kruuna.helsinki.fi>
X-Deja-AN: 107423512
sender: torva...@cc.helsinki.fi
references: <3tk1r3$l8o@news1.halcyon.com> 
<3vnorm$oje@senator-bedfellow.mit.edu> <3vptji$mbm@kruuna.helsinki.fi> 
<3vrnof$ni5@park.uvsc.edu>
content-type: text/plain; charset=ISO-8859-1
organization: University of Helsinki
mime-version: 1.0
newsgroups: comp.sys.powerpc,comp.os.linux.development.system

[ Duh.  This probably should be moved to "comp.filesystems.advocacy",
  but we don't seem to have that group here ]

In article <3vrnof$...@park.uvsc.edu>,
Terry Lambert  <te...@cs.weber.edu> wrote:
>
>You are talking about the difference between corrupted data *in*
>a file systems and corrupted *file system structure* here.

It doesn't really matter much, is my opinion.  Corruption is corruption,
and if you're worried about one, you should be worried about the other. 

And fsck _can_ generally fix a corrupted file system structure, so
arguably that is the less interesting case.  Admittedly, when this isn't
true, _then_ you're really in deep waters, but to be really secure you
need special hardware and other algorithms anyway and then you either do
backups or you don't use ext2fs.  OR ffs. 

>The meta-data writes (they don't have to be synchronus, they just
>have to be ordered -- USL's code does this, patent pending) are
>essential to the mainentnance of POSIX semantics and to guarantee
>a consistant file system structure.

POSIX semantics? Across a system crash? What POSIX standards are you
referring to?

>] In fact, the way to get a stable filesystem is to do the updates exactly
>] reverse to the way BSD does it: write out the data blocks first, _then_
>] write out the meta-data.  The problem with this approach is you end up
>] with a partial ordering in which to write the data, and ordering it
>] isn't trivial. 
>
>That particular problem is, in fact, trivial.

"trivial" is doing synchronous writes.  Ordering it while getting good
asynchronous performance does take some thought.  You still have to
handle cases like one block containing several logically independent
inodes or directory entries, etc. 

I don't think Kirk McKusick is stupid, so why did he use synchronous
writes for FFS if ordering is such a trivial matter? (I'm sure others
were centrally involved, but Kirk is the only one I know of, no offence
intended). 

>File names are directory entries, not attributes of files.  Having
>the meta-data "fixed" but the file in lost+found is hardly what
>I'd consider correct behaviour.

Sure.  Which is why I claim that disk ordering is the right answer. 
What I'm arguing against is people who think that the BSD synchronous
writes are a good thing.  I think the BSD approach makes a very bad
tradeoff of speed vs "security".  Personal opinion, but it's one of my
"buttons" (you should hear me preach about spl-levels ;-)

>] Alternately you can make a unix-type filesystem and do the disk updates
>] the _right_ way: data blocks first, then indirect blocks (starting from
>] the lowest level indirected blocks), then the inode, and finally the
>] directory entry (and going in the opposite direction when you're
>] deleting a file).  Note that you don't need to do any of these updates
>] synchronously: you only have to make them in the right order. 
>
>With the further caveat that they not overwrite the original top
>level until all substructure data is on disk, and consistency
>dictates that that update go to a different block as well.

Actually, I'd rather see the hardware do the "different block" part.. 
With the logic for bad block remapping on most SCSI disks, this probably
wouldn't be that much different (keep a "in transit" block as well). 
You need special hardware to be secure anyway, so going out of your way
in the OS doesn't sound like a good tradeoff.  (well..  You don't _need_
special hardware, you _can_ do it all in software, but I happen to think
that in some respects hardware will be the cheaper solution). 

(and even if you do an all-software solution, you at least want hardware
that doesn't do any caching or re-ordering for you, so in some respects
you have to rely on the hardware in any case). 

>Congradulations.  You've invented log structuring.

Pat.pend.. 

What it all boils down to is that getting 100% security takes a _lot_ of
work, and some special hardware.  Which brings us back to the original
question - which would you rather see:

 - synchronous meta-data updates
	- 95% secure
	- 10-100 times slower for some (very common) operations
 - asynchronous updates
	- 94% secure
	- 10-100 times faster for some (very common) operations

I just find it ludicrous to take a _huge_ performance hit for doing
something that doesn't really give you much, certainly not security (and
I can come up with some scenarios where it's much worse to do
synchronously, although they probably aren't the common ones ;-). 

		Linus

From: n...@trout.sri.MT.net (Nate Williams)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/04
Message-ID: <3vscvm$iq4@helena.MT.net>#1/1
X-Deja-AN: 107423617
references: <3vc6qu$mb6@agate.berkeley.edu> <DCKxqJ.BJB@info.swan.ac.uk> 
<3vluc3$b80@helena.MT.net> <DCqMsw.68H@info.swan.ac.uk>
organization: SRI Intl. - Montana Operations
reply-to: "Nate Williams" <n...@sneezy.sri.com>
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <DCqMsw....@info.swan.ac.uk>,
Alan Cox <iia...@iifeak.swan.ac.uk> wrote:
>In article <3vluc3$...@helena.MT.net> "Nate Williams" <n...@sneezy.sri.com> 
writes:
[ Assertion that there is only *one linux kernels ]
>>What about Linux-alpha, linix-m68k, linux-PPC, Amiga-Linux and the like.
>>They are all completely different from one another, and although they
>>share a large part of the code, each is at a different stage of
>>modification.
>
>I suppose somehow magically every *BSD developers keypresses are
>interlocked.

Actually, in NetBSD each developers code is very much intertwined.  Each
of the distributions is directly related to the base code, so when one
developers modifies kern/vm_foo_foo_foo.c all of the other folks have to
modify their code to suit.

>>Heck, there are even two different ALPHA kernels, one from the folks at DEC,
>>another by Linux`s himself.
>
>He's called Linus actually

Whoops, typo.

>>And then there's the 'stable and development' branches, which for the
>>most part == 'dead' and 'fixing old/adding new bugs' branches.  For the
>>most part one person (Linus) does the intergration, but there are still
>>lots of different versions of the same kernel distributed at the same
>>time.
>
>As with any other software..they are called 'revisions'. Thats different
>to NetBSD/FreeBSD/386BSD which are not revisions but very different versions
>of the same base that may one day meet again.

But these *revisions* are (supposedly) developed seperately.  That makes
them different kernels.

I guess Dell's SysV4 kernel is a different 'revision' from the Unixware
kernel using your analogy.  The 1.1 code tree was significantly
different from the 1.2 tree, similar to the Dell tree was probably less
different from the Unixware tree.

Nate

-- 
n...@sneezy.sri.com    | Research Engineer, SRI Intl. - Montana Operations
n...@trout.sri.MT.net  | Loving life in God's country, the great state of
work #: (406) 449-7662 | Montana.  Wanna go fishing?  Send me email, and we'll
home #: (406) 443-7063 | setup something.

From: h...@wsrdb.wsr.ac.at (Peter Holzer)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/04
Message-ID: <3vte22$90k@wsrcom.wsr.ac.at>#1/1
X-Deja-AN: 107554600
references: <3tk1r3$l8o@news1.halcyon.com> <3vlva1$bdl@helena.MT.net> 
<3vm27k$pkj@fido.asd.sgi.com> <3vnorm$oje@senator-bedfellow.mit.edu> 
<3vptji$mbm@kruuna.helsinki.fi> <3vqjh3$qup@senator-bedfellow.MIT.EDU>
organization: WSR, Vienna, Austria
newsgroups: comp.sys.powerpc,comp.os.linux.development.system

ghud...@mit.edu (Greg Hudson) writes:

>Linus Torvalds (torva...@cc.Helsinki.FI) wrote:
>: Synchronous meta-data updates are STUPID:

>:  (a) it's bad for performance
>:  (b) it's bad for filesystem stability

>: (a) is obvious, and even BSD people will agree to that.  But (b) is not
>: as obvious, and BSD people mostly say "Huh?"

>Not only is your argument not obvious, it's not solid.

I'll follow it up with an example, maybe it's solid then.

>You have claimed that a filesystem is corrupt if the data in the files
>it contains is not up to date.  By doing this, you have redefined
>"corruption" to mean much more than the usual "violation of the
>invariants of filesystem structure"; you instead refer to violation of
>a much larger and more vague invariant: that the data in the filesystem
>should be "up to date."

Nobody was talking about "up to date". It is obvious that any
write-buffering introduces the possibility of losing data. Nothing will
prevent that.  But a problem that the BSD FS has is that files can end
up containing data which was never written to them. If one of my files
suddenly contains pieces of the shadow password file I would certainly
call that filesystem corruption.

Consider the following scenario:

Somebody invokes passwd, which creates a temporary file with the
new password, then moves the temporary file to /etc/shadow. The
old inode and old blocks are freed.

I write a file, which I happened to be editing for the last few hours
to disk. The file happens to get the the blocks
previously allocated to /etc/passwd. The
meta information is immediately written to disk, the new data 
stays in memory until the next sync.

Before that sync somebody pulls the plug.

The system will reboot, fsck will check the file system and find
a file which looks perfectly ok. 

I invoke vi and find to my amazement the contents of /etc/shadow in
them.

>If you're going to do any sort of write caching for the above classes
>of files, you will not ever be able to guarantee that their contents
>are "up to date," and you will always risk "corruption."  A journaling
>or log-structured filesystem won't help; either you write the data to
>disk before returning from write() or you risk a power failure
>happening before the data blocks hit the disk.

A journalling file system will maintain a consistent state. That is
after a crash it have the same state as it had at some instance before
the crash. If you wrote first A and then B before the crash, it might
contain neither, only A, or both. It cannot contain only B.

>Good applications and systems are designed to tolerate failures at any
>point during the file update, as long as the underlying filesystem
>remains consistent so that individual filesystem operations (file
>creation, single-block writes, deletion, renaming, etc.) are atomic.
           ^^^^^^^^^^^^^^^^^^^
This is exactly what is not atomic in BSD FFS. The block is immediately
claimed for the file, but the data is written some time later. 

>Unless I've completely misunderstood your argument
[...]

You did. (Unless I completely misunderstood Linus :-)

	hp
--
   _  | Peter Holzer | Systemadministrator WSR | h...@wsr.ac.at
|_|_) |-------------------------------------------------------
| |   |  ...and it's finished!  It only has to be written.
__/   |         -- Karl Lehenbauer

From: lm@neteng (Larry McVoy)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/04
Message-ID: <3vtssk$g29@fido.asd.sgi.com>#1/1
X-Deja-AN: 107554624
references: <3vbn86$178@fido.asd.sgi.com> <3vd021$a2o@park.uvsc.edu> 
<3vhq0a$8pt@fido.asd.sgi.com> <DCoGIz.FGK@info.swan.ac.uk>
followup-to: comp.os.linux.development.system,comp.sys.powerpc
organization: Silicon Graphics Inc., Mountain View, CA
reply-to: l...@slovax.engr.sgi.com
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

Alan Cox (iia...@iifeak.swan.ac.uk) wrote:
: copies. I've not looked hard at FreeBSD 2.0.5 over 2.0 other than to note 
: things like finally fixing the raw socket bug etc.

I'm planning on doing this in hopes of stealing anything I can use from
FreeBSD for IRIX.  I've heard some good things about FreeBSD networking -
has anyone verified the numbers?

: Both Linux & BSD networking could be massively faster still - Van Jacobsons
: quoted figures and snippits of code on his work more than prove that. Even 
: sadder - in both cases none of what needs doing is new invention its old hat
: that nobody has got around to doing all of these things.

Be careful about Vans claims - Van is a brilliant man and has made
immense contribution to the field.  However, when he talks about 5 usec
TCP he is really talking about the TCP processing and he is leaving out
the bcopy, the checksum, the driver overhead, and the interrupt
dispatch costs.  

If it takes, say 50 usecs to receive a packet, it isn't unreasonable to
say that the TCP processing is 10% of that.  What Van's numbers never
reflect is the entire cost of receiving that packet.  He focusses on
the protocol part.  He's right that that part should be trivial.  I'm a
little unhappy that he leaves the impression that if you used his stack
(which noone can get a copy of) then you would have 5 usec packet times
too - it simply isn't true and he knows it.  I think he is doing it out 
of frustration with bloated systems that have not only expensive times
in everything but the TCP part, but have also bloated out the TCP
processing.

: [ObFlameWar] Streams is NOT the answer 8)

Well, hey, that is great to hear.  last I checked, I thought Linux' stack 
was going to movve towards STREAMS.  Can I rest easy that that is no longer
the plan?
--
---
Larry McVoy			(415) 390-1804			 l...@sgi.com

From: lm@neteng (Larry McVoy)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/04
Message-ID: <3vttkg$g29@fido.asd.sgi.com>#1/1
X-Deja-AN: 107554626
distribution: world
references: <3tk1r3$l8o@news1.halcyon.com> <3v6tac$i4k@galaxy.ucr.edu> 
<3va10b$klc@agate.berkeley.edu> <3vbn86$178@fido.asd.sgi.com> 
<3vc6qu$mb6@agate.berkeley.edu> <3vhoe9$8pt@fido.asd.sgi.com> 
<DCKF9B.H08@cunews.carleton.ca> <3vj3s3$eps@fido.asd.sgi.com> 
<3vocjt$5r1i@info4.rus.uni-stuttgart.de>
followup-to: comp.os.linux.development.system,comp.sys.powerpc
organization: Silicon Graphics Inc., Mountain View, CA
reply-to: l...@slovax.engr.sgi.com
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

Ralf Baechle (r...@informatik.uni-koblenz.de) wrote:
: In article <3vj3s3$...@fido.asd.sgi.com>, lm@neteng (Larry McVoy) writes:
:  
: |> I'm hoping that the SMP stuff will go into a different source base, long 
: |> term.  SMP completely ruins your uniprocessor performance.

: I think this is a problematic approach.  Linux is currently ported to various
: architectures.  Linux is being ported to SMP.  Linux will be ported to cover
: non MMU systems.  Having one signle source tree would make the development
: more consistent; this consistency is often a big lack in the spread development
: of Linux over the world.  Anyway - we'll find a solution.

I've seen two approaches.  Let me explain them and you can decide what you 
think is best.

Sun's approach:

	Sun multi threaded their kernel.  Processor interrupts may occur at
	any time, you don't have to disable interrupts around critical
	sections, everything is covered by locks.  This allows them to have
	a fully preemptable kernel which some people think is important.

	Their uniprocessor kernel and MP kernel are basically identical. 
	They have to use the locks even in the UP case because of preemption.

	Pros: preemption, MP performance.  By MP performance, I mean this:
		Sun still sells uniprocessors.  Given that the UP machines
		have to run the locks, Sun is forced to optimize that code
		pretty carefully.  If they didn't, Solaris would be even 
		slower than it already is.  All of that optimization pays
		off on MP systems, you get more efficient use of each cpu.

	Cons: you are running through those locks in many cases where you
		don't need them.  Those locks cost.  There are zillions
		of them.  Sun tried inlining them at one point - it was
		disaster that never saw the light of day.  The reason?
		There were so many locks that it reduced the effectiveness
		of the instruction cache to nil.

SGI's approach:

	SGI's kernels are ifdef-ed for MP/UP.  They use different locking
	strategies in each.

	Obviously, this leads to just the opposite tradeoffs from Sun.

	Pros: uniprocessor systems are *fast*.  If you can get your work
		done on a UP, I'd use an SGi before I'd use a Sun.

	Cons: Maintaining what is essentially two different source bases,
		albeit via ifdefs.

My opinion (tm):

	I don't like either of these approaches.  I think both of them have
	drawbacks that make them fairly undesirable.  

	So what do you do?  Well, it's a hard question.  I sort of havve an 
	vague idea in my head about this that is not necessarily realizable.
	The best way to understand it would be to look at a network of 
	QNX systems and imagine that your network was your backplane (or
	some mix of backplanes and networks!) and think about how you would 
	more tightly integrate that.  You need to think about processors
	and kernels as having a one to one binding.  This makes it crucial
	that your kernel be: 1) small, 2) modular, 3) have fast kernel to
	kernel IPC.

	I'll admit that this idea is extreme.  I'll predict that in 10
	years this is how all MPP and most MP systems work.  You won't
	really recognize the kernel.  Not all kernels will be equal,
	some will be just glorified process schedulers and message
	systems.
---
Larry McVoy			(415) 390-1804			 l...@sgi.com

From: ghud...@athena.mit.edu (Greg Hudson)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/05
Message-ID: <3vuep2$7vg@senator-bedfellow.MIT.EDU>#1/1
X-Deja-AN: 107554644
references: <3tk1r3$l8o@news1.halcyon.com> <3vptji$mbm@kruuna.helsinki.fi> 
<3vqjh3$qup@senator-bedfellow.MIT.EDU> <3vte22$90k@wsrcom.wsr.ac.at>
organization: Massachvsetts Institvte of Technology
newsgroups: comp.sys.powerpc,comp.os.linux.development.system

In article <3vte22$...@wsrcom.wsr.ac.at>, Peter Holzer <h...@wsrdb.wsr.ac.at> 
wrote:
>Nobody was talking about "up to date". It is obvious that any
>write-buffering introduces the possibility of losing data.

Linus used the phrase "up to date," which confused me (and Terry, I think)
into believing that Linux was harping on the likelihood of creating zero-
length files instead of creating files whole.

That misunderstanding corrected, I find it disturbing that NOBODY ACTUALLY
LOOKED AT THE FFS CODE before making bogus claims about its algorithms.
Especially Linus.

Take a look at my most recent post; I'll explain how it applies to your
example:

>Somebody invokes passwd, which creates a temporary file with the
>new password, then moves the temporary file to /etc/shadow. The
>old inode and old blocks are freed.
>
>I write a file, which I happened to be editing for the last few hours
>to disk. The file happens to get the the blocks
>previously allocated to /etc/passwd. The
>meta information is immediately written to disk, the new data 
>stays in memory until the next sync.
>
>Before that sync somebody pulls the plug.

What actually happens in the FFS is this: when you write the file you just
edited, you invoke the system calls open(), write(), write(), ..., close().
At open() time, the inode and directory entry for the new file are created
and synchronously written to disk (the inode refers to no data at this
time).  As you write your data, the blocks are allocated and added to the
inode structure, but no write is scheduled for the inode.  Then the next
filesystem sync operation comes along.  It goes file-by-file, notices that
your inode has changed, schedules writes for the referenced data blocks,
and *then* schedules a write for the updated inode.

By contrast, ext2 pays no attention to when inodes and data blocks are
written, so you might very well find /etc/shadow in your file after the
power has been restored.

(If you have a controller which reorders disk writes and doesn't finish
cached writes on power failures, then you need a filesystem more
sophisticated than either ext2 or FFS to ensure consistency, scheduling
synchronization points with as many independent writes between them as
possible.  FFS will still work better than ext2 probabilistically with
such a controller, though.)

From: G Dinesh Dutt <b...@htilbom.ernet.in>
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/05
Message-ID: <3vv3ua$cpc@senator-bedfellow.MIT.EDU>#1/1
X-Deja-AN: 107560928
distribution: world
sender: n...@athena.mit.edu
organization: The Internet
reply-to: b...@htilbom.ernet.in
newsgroups: comp.os.linux.development.system

This post was definitely illuminating. I was harbouring the impression based on
the discussion so far that ext2 orders the data block writes and then the inode
in case of ext2 and the reverse in case of FFS. The reverse certainly doesn't
make sense, I thought. 

    Greg> What actually happens in the FFS is this: when you write the file you
    Greg> just edited, you invoke the system calls open(), write(), write(),
    Greg> ..., close().  At open() time, the inode and directory entry for the
    Greg> new file are created and synchronously written to disk (the inode
    Greg> refers to no data at this time).  As you write your data, the blocks
    Greg> are allocated and added to the inode structure, but no write is
    Greg> scheduled for the inode.  Then the next filesystem sync operation
    Greg> comes along.  It goes file-by-file, notices that your inode has
    Greg> changed, schedules writes for the referenced data blocks, and *then*
    Greg> schedules a write for the updated inode.

    Greg> By contrast, ext2 pays no attention to when inodes and data blocks
    Greg> are written, so you might very well find /etc/shadow in your file
    Greg> after the power has been restored.

But going by what Linus writes, ordering is being done in ext2. So which is
which. I know I need to look at the code, but I am so swamped with "pay" work
that I only get weekends to do anything at all. Also, going by the behaviour of
FFS as outlined by Greg above and by what I know of ext2, the main difference
between the two is that when a new file is created and the plug is pulled while
I am adding data, the FFS will show that a file exists (albeit empty) in the
directory whereas ext2 may not (I am assuming ext2 orders writes) ? The
performance penalty spoken about comes because of guaranteeing the presence of
this file ? A 10-100 times slower performance penalty for this seems a little
high. 

Dinesh
###############################################################################
"Whatever you can do, or dream you can, begin it. Boldness has genius, power
and magic in it." - Goethe.
G. Dinesh Dutt,					email : b...@htilbom.ernet.in
Hinditron Tektronix Instruments Ltd.,		voice : 8217649/8212262
SDF-2, Unit 63-A, SEEPZ, Andheri (east), Bombay - 400096.
###############################################################################

From: torva...@cc.Helsinki.FI (Linus Torvalds)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/05
Message-ID: <4004di$c3r@kruuna.helsinki.fi>#1/1
X-Deja-AN: 107560874
sender: torva...@cc.helsinki.fi
references: <3vv3ua$cpc@senator-bedfellow.mit.edu>
content-type: text/plain; charset=ISO-8859-1
organization: University of Helsinki
mime-version: 1.0
newsgroups: comp.os.linux.development.system

[ Blush.  I have to admit that when people said that FFS does
  synchronous meta-data updates, I thought they were _synchronous_ which
  is stupid, but now I'm told that FFS actually does order the updates
  asynchromously to a large degree.  Paint me rednosed.  That's what you
  get when you just read documentation, not sources.  ]

In article <3vv3ua$...@senator-bedfellow.mit.edu>,
G Dinesh Dutt  <b...@htilbom.ernet.in> wrote:
>
>But going by what Linus writes, ordering is being done in ext2.

Nope, the ext2 filesystem does not order anything.  In fact, it does
some very limited ordering which I consider wrong, but some people like
(it essentially does stupid things with meta-data updates, which may or
may not help). 

What I claimed was that ordering is the _right_ way to handle this.  I
didn't say linux did the right thing in this regard, just the stupid but
_fast_ thing ;-/

>I am adding data, the FFS will show that a file exists (albeit empty) in the
>directory whereas ext2 may not (I am assuming ext2 orders writes) ? The
>performance penalty spoken about comes because of guaranteeing the presence of
>this file ? A 10-100 times slower performance penalty for this seems a little
>high. 

The performance difference can be pretty noticeable for file
creation/deletion.  Very noticeable indeed for untarring file trees, for
example.  That's where the FFS is more-or-less completely synchronous,
while linux will happily buffer the directory/inode updates. 

Especially noticeable on my alpha when I use OSF/1, and untar the linux
sources..  My i486 is finished in about half the time.  (Damn.  I need
to get X for Linux/axp, so that I can throw away OSF/1 completely). 

Anyway, FFS seems to do the right thing for actual file updates,
certainly much better than what linux does.  It seems it's mainly the
actual creation that is so slow.  So I guess this is a public retraction
of about half I've said.. 

		Linus

From: iia...@iifeak.swan.ac.uk (Alan Cox)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/07
Message-ID: <DCxtwE.M1z@info.swan.ac.uk>#1/1
X-Deja-AN: 107681708
sender: n...@info.swan.ac.uk
x-nntp-posting-host: iifeak.swan.ac.uk
references: <3vhq0a$8pt@fido.asd.sgi.com> <DCoGIz.FGK@info.swan.ac.uk> 
<3vtssk$g29@fido.asd.sgi.com>
organization: Institute For Industrial Information Technology
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <3vtssk$...@fido.asd.sgi.com> l...@slovax.engr.sgi.com writes:
>the protocol part.  He's right that that part should be trivial.  I'm a
>little unhappy that he leaves the impression that if you used his stack
>(which noone can get a copy of) then you would have 5 usec packet times
>too - it simply isn't true and he knows it.  I think he is doing it out 
>of frustration with bloated systems that have not only expensive times
>in everything but the TCP part, but have also bloated out the TCP
>processing.

Van's initial postings on this were part of the 'TCP is too complex to scale
to future network speeds' flame war. That I think has something to do with
the figures quoted. In the Linux case the TCP header processing paths 
are too long - much too long.

>Well, hey, that is great to hear.  last I checked, I thought Linux' stack 
>was going to movve towards STREAMS.  Can I rest easy that that is no longer
>the plan ?

It's never been my plan directly. I tried building a setup where the layers
built as a set of trees of protocols so that any protocol layer knew the
worst case head/tail room below it, so you could use linear buffers. That
got as far as NET3.026+Linux 1.1.64 and I just could not make it fast
enough even then before trying to put all the funny stream head mode support
on it.

I tried 8)

Streams is elegant so if someone one day can do streams as fast or faster
than the current Linux stack I'll be glad to listen.

Alan
-- 
  ..-----------,,----------------------------,,----------------------------,,
 // Alan Cox  //  iia...@www.linux.org.uk   //  GW4PTS@GB7SWN.#45.GBR.EU  //
Redistribution of this message via the Microsoft Network is prohibited
<A href="file:/dev/mouse">Click here to disable mouse.</A>

From: iia...@iifeak.swan.ac.uk (Alan Cox)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/07
Message-ID: <DCxx44.Mo6@info.swan.ac.uk>#1/1
X-Deja-AN: 107681735
sender: n...@info.swan.ac.uk
x-nntp-posting-host: iifeak.swan.ac.uk
references: <3ui5m7$rru@agate.berkeley.edu> <3v6tac$i4k@galaxy.ucr.edu> 
<3va10b$klc@agate.berkeley.edu>
organization: Institute For Industrial Information Technology
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <3va10b$...@agate.berkeley.edu> h...@alumni.EECS.Berkeley.EDU 
(Jeffrey Hsu) writes:
>Microsoft is indeed the evil empire.  But Linux is not far behind.  It
>is often noted in business school that in technology, it's not the always
>the best solution that wins, but usually the second best solution.  On
>the larger scale, this is true of Microsoft versus Unix, as you have
>noted.  But it's also true of BSD versus Linux.

You neglected to include a detailed summary of the technology issues
preferably with quotes to benchmarks, peer reviewed papers and mathematical
proofs.

Nothing less is going to have any effect on either side 8)

Alan

-- 
  ..-----------,,----------------------------,,----------------------------,,
 // Alan Cox  //  iia...@www.linux.org.uk   //  GW4PTS@GB7SWN.#45.GBR.EU  //
Redistribution of this message via the Microsoft Network is prohibited
<A href="file:/dev/mouse">Click here to disable mouse.</A>

From: bh...@klondike.winternet.com (Brian Hurt)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/13
Message-ID: <bhurt.808332816@winternet.com>#1/1
X-Deja-AN: 108079141
references: <3tk1r3$l8o@news1.halcyon.com> <3v6tac$i4k@galaxy.ucr.edu> 
<3va10b$klc@agate.berkeley.edu> <3vbn86$178@fido.asd.sgi.com> 
<3vc6qu$mb6@agate.berkeley.edu> <3vhoe9$8pt@fido.asd.sgi.com> 
<DCKF9B.H08@cunews.carleton.ca> <3vj3s3$eps@fido.asd.sgi.com> 
<3vocjt$5r1i@info4.rus.uni-stuttgart.de> <3vttkg$g29@fido.asd.sgi.com>
organization: StarNet Communications, Inc
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

lm@neteng (Larry McVoy) writes:

>Ralf Baechle (r...@informatik.uni-koblenz.de) wrote:
>: In article <3vj3s3$...@fido.asd.sgi.com>, lm@neteng (Larry McVoy) writes:
>:  
>: |> I'm hoping that the SMP stuff will go into a different source base, long 
>: |> term.  SMP completely ruins your uniprocessor performance.

I'm not so sure of this.

>	SGI's kernels are ifdef-ed for MP/UP.  They use different locking
>	strategies in each.

There are locks and there are locks.  On PCs there are two ways to disable
interrupts: at the CPU (set or clear the interrupt flag) and at the PIC.
In the class o semaphores, there are everything to simple test and set
instructions (which only take a pair of memory references to execute)
to full blown semaphores with queues and automatic blocking and all
the bells and whistles, which can take thousands of instructions to
deal with (including lots of those test and set instructions).  DEC
had the right idea: use the right locking mechanism in the right place.

Now, in a uniprocessor environment, disabling interrupts makes a lot of
sense- especially at the CPU level.  Setting a bit in a CPU register
is very fast (hmm, my 386SX manual claims 8 clock cycles in protected
mode).  In a MP environemnt, disabling interrupts at the CPU is nigh
on worthless, and doing it at the PIC is of questionable intelligence
(if the two interrupts don't need to access the same data structure,
why shouldn't they both happen?).  This leads to the simple solution
of writting the whole kernel as SMP and then ifdefing the calls to 
more complex semaphore types to simple enable to disable interrupt
instructions.  This doesn't allow reenterent UP kernels, but it works.

The only places the source trees would seperate would be in the
semaphore code itself (and the companion include files).

The main problem is that this requires more intelligent developers-
they have to understand MP and deal with it.  At least at the kernel/
device driver level.

Brian Hurt

From: iia...@iifeak.swan.ac.uk (Alan Cox)
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
Date: 1995/08/14
Message-ID: <DDAuA1.KH3@info.swan.ac.uk>#1/1
X-Deja-AN: 108137984
sender: n...@info.swan.ac.uk
x-nntp-posting-host: iifeak.swan.ac.uk
references: <3vocjt$5r1i@info4.rus.uni-stuttgart.de> 
<3vttkg$g29@fido.asd.sgi.com> <bhurt.808332816@winternet.com>
organization: Institute For Industrial Information Technology
newsgroups: comp.os.linux.development.system,comp.sys.powerpc

In article <bhurt.808332...@winternet.com> bh...@klondike.winternet.com 
(Brian Hurt) writes:
>to full blown semaphores with queues and automatic blocking and all
>the bells and whistles, which can take thousands of instructions to
>deal with (including lots of those test and set instructions).  DEC
>had the right idea: use the right locking mechanism in the right place.

The basic semaphores on PC are about 10 instructions (because the cache is
MESI you do a non locked spin on the semaphore variable until its free then
try and get it with a locked test and set). The performance nasty on the
Pentium at least is the swapping out of a page, because you have to
interrupt all other processors to do a TLB flush and wait for a reply from
each before you can continue and issue the page to another process.

>mode).  In a MP environemnt, disabling interrupts at the CPU is nigh
>on worthless, and doing it at the PIC is of questionable intelligence

Its useful for internal protection, and when you need to do things like
undisturbed block I/O (eg IDE drives).

>The main problem is that this requires more intelligent developers-
>they have to understand MP and deal with it.  At least at the kernel/
>device driver level.

At the moment the SMP project kernel changes the scheduler, adds a single
spinlock on kernel entry/exit and interrupt entry/exit and passes slave
timer interrupts from the CPU getting the timer interrupt to the slaves 
and finally supports two other IPI messages - one to flush the TLB's and
one to jam the target processor (for a panic,halt etc)

Nothing clever, just a dumb start on the job.

Alan
-- 
  ..-----------,,----------------------------,,----------------------------,,
 // Alan Cox  //  iia...@www.linux.org.uk   //  GW4PTS@GB7SWN.#45.GBR.EU  //
Redistribution of this message via the Microsoft Network is prohibited
<A href="file:/dev/mouse">Click here to disable mouse.</A>