2.2.18pre2aa2 and patches for 2.2.18pre3

Date	Wed, 6 Sep 2000 21:52:07 +0200 (CEST)
From	Andrea Arcangeli <>
Subject	2.2.18pre2aa2 and patches for 2.2.18pre3

The main features of 2.2.18pre2aa2 are:

o Support for 4Gigabyte of RAM on IA32 (me and Gerhard Wichert)
o Support for 2T of RAM on alpha (me)
o Improved VM for high end machines with enough ram and doing
heavy I/O under high memory pressure plus fixes
for the MAP_SHARED oom by killing kpiod. (me)
o RAW-IO (doable with bigmem enabled too). (Stephen C. Tweedie)
o SMP scheduler improvements. (me and partly from 2.3.x contributed
by Ingo Molnar)
o LFS (>2G file on 32bit architectures)
o misc fixes

Detailed description of 2.2.18pre2aa2 follows.

---------------------------------------------------------------------------
00_4_min_percent-1
Increase the min percent of the buffer cache and page cache to 4%.
(it wouldn't matter if the VM algorithms were better). (me)

00_IO-wait-2.gz

Avoid suprious unplug of the I/O queue. (me)

00_SMP-scheduler-2.2.14-G.gz

Better SMP reschedule_idle. (partly backported from 2.3.x, 2.3.x
version was contributed by Ingo Molnar)

00_account-failed-buffer-tries-1

Account also the failed buffer tries during shrink_mmap. (me)

00_alpha-epoch-1

Put in sync the RTC parsing done by the kernel at boot
while setting the system time, with the parsing done
by the RTC driver during initialization. (Jan-Benedict Glaw
<jbglaw@lug-owl.de> fixed it in 2.4.x)

00_alpha-read-unlock-SMP-race-1

SMP race fix in the read_unlock implementation of alpha. (me)

00_buf-run_task_queue.gz

Avoid suprious unplug of the I/O queue. (me)

00_delack-timer-5.gz

Additional paranoid stuff in TCP delayed acks code (if somebody
can still reproduce lockups under heavy TCP load, the
delack-timer-5 patch can be the solution). In 2.2.15 only what it
was obviously wrong is been fixed, but at the time I did
delack-timer-5 I was convinced the stuff in 2.2.15 _might_ not be
enough, so if you have still lockups under 2.2.15 let us know.
delack-timer-5 is reported to definitely fix the TCP lockup. (me)

00_elv-simple-latency-1

It simplify the elevator algorithm to remove the careful code
that was accounting also the place in the queue where the
request was inserted while setting its initial latency.
This changed makes the systems much less interactive, but
it fixes the tiotest numbers of course still avoiding indefinite
starvation as it was happeneing before the elevator rewrite. (me,
tested also by Jens Axboe)

00_inode-cleanup-3.gz

Minor inode stuff cleanedup. (me)

00_java-proc.gz

Revertd the semantic change that make differences between
/proc/00000$$ and /proc/$$, this allows backwards compatibilty of
a misfeature and it _won't_ hurt security. There's no downside
in reverting the 2.2.13 semantic change.
00_kupdate-sigstop-2.2.11-1.gz

Allows kupdate to be stopped via SIGSTOP (currently it must be
stopped by setting interval to zero via sysctl). This returns
useful in the airplane where you do killall -STOP
kupdate... STOP kupdate at your own risk of course ;) (me)

00_mremap-waste-virtual-stack-space-1.gz

Avoids mremap to grow in virtual addresses (and so in turn
it avoids to waste virtual address space).

00_nanosleep-4.gz

Provide nanosleep usec resolution so that a signal flood doesn't
hang glibc folks that correctly trust the rem field to resume
the nanosleep after a syscall interruption. (without the patch
nanosleep resolution is instead 10msec on IA32 and around 1msec on
alpha) (me)

00_overcommit-1.gz

Make sure to not understimate the available memory (the cache and
buffers may be under the min percent) (me)

00_set_rtc_mss-SMP-race-1.gz

Fix set_rtc_mss SMP race between timer irq and rtc irq. (me)

00_silent-stack-overflow-4.gz

Avoid stack to silenty overflow on the heap (make life easier
to track down userspace issues). Perfect approch is not possible
even using special LDT for the stack segment. The approch in the
patch will work 99% of the time and it enforces a GAP between heap
and stack of 1 page as default (configurable via sysctl, if
turned to zero the feature is disabled completly). (me)

00_slow-gtod-SMP-race-1.gz

Fixes an SMP race in slow gettimeofday. (me)

00_stod-lost_ticks-1.gz

Check lost_ticks in settimeofday to be more precise.

00_tsc-calibration-non-compile-time-1.gz

TSC calibration must be dynamic and not a compile time thing
because gettimeofday is dynamic and it depends on the TSCs
to be in sync. (me)

00_version

Change to the Makefile EXTRAVERSION.

10_bigmem-2.2.18pre2-18.bz2

BIGMEM production code (IA32 and alpha). This one fixes
a memory leak during swap (almost unreproducibe, found it
while reading the sources only) and some alpha-bigmem
bug. It's now possible to generate core dumps larger
than 2G (the previous limitation was due a few ext2
bugs and a binfmt_elf.c bug). I just includes the large_shm
patch so it should be possible to allocate more than 2g in
one shm segment. (me and Gerhard Wichert)

13_bigmem-rawio-2.2.17pre19aa1-3.gz

SCT's rawio bigmem capable. (Stephen C. Tweedie)

14_bigmem-rawio-lvm-2.2.15pre13aa1-6.gz

LVM compatible with the 2.3.x one. (Heinz Mauelshagen)

30_rtclight-2.2.15pre13aa1-1.gz

Needed by alpha SMP. (won't be needed anymore in 2.4.x)

40_lfs-2.2.18pre2aa2-15.bz2

LFS: >2G file support for 32bit archs. This also includes
the new getdents64 and fcntl64 syscalls in sync with the
2.4.0-test8-pre5 user<-> kernel API. It also fixes some bug
present in the previous 2.2.x LFS patches.

50_robust-initrd-2

Relocate the initrd image ASAP (btw, on alpha ASAP is sooner
than on IA32), to avoid corrupting it while allocating the
boot memory (like the mem_map_t array). This is good for
robusteness and it's mainly interesting when bigmem
is enabled.

60_VM-global-2.2.18pre2aa1-6.gz

Kills kpiod removing the OOM while paging out MAP_SHARED
segments during a low memory condition. Fixes
mmap00[123] on 8mbyte of RAM, and I bet I fixes as well
Alan's DBMS OOM troubles. The way I could remove
kpiod is to add a per-process fs_locks information
and to check that information instead of deadlocking
into the semaphore or into the superblock lock.

Fixes in a alternate (more efficient) way the GFP race that
is currently handled by the free_before_allocate logic.
The alternate way is to have a per-process freelist
and to refill the global queue only once the tasks
allocate something for itself.

Accounts protected buffers as unfreeable (fixes
OOM with big ramdisk in memory).

Rewrite the sync_page_buffers logic. Now it's a per-bh
bitflag information that tell us if a dirty buffer is hanging
around for a too long time w/o becoming freeable (so we block
on such buffer only once necessary and never when
there's no high pressure). If there's no high pressure and
__GFP_IO is set we try to write with WRITEA that doesn't block
but that allows us to fill the I/O queue with the
less interesting buffers in the VM. Never waits on the
buffer to become unlocked (that's too aggressive and that was
necessary only to workaround the pagingout of MAP_SHARED segments
that wasn't able to do proper write throttling due the always
async kpiod).

Rewrite the async buffer flushing wakeup points ala 2.4.x way.

Avoids swain/swapout while writing heavily to disk (no-swapout).

Adds lowlatency control checks in shrink_mmap and swap_out.

The same VM-global-patch backported to clean 2.2.18pre2 is
here (the only difference is that it includes the above
account-buffer-tries and it doesn't remove the suprious
tq_disk run during readahead that isn't a VM related change):

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.18pre2/VM-global-2.2.18pre2-6.bz2

70_cond-sched-1

Implements conditional_schedule(). (Ingo Molnar)

71_copy-user-reschedule-2

Add low latency schedule entry points into the copy_user(), this fixes
read(2)/write(2) hangs. (Originally implemented by Ingo Molnar
in the lowlatency patch, but this patch is not equivalent, it's
a very subset and it's different the reason I put the
conditional schedule _after_ the copy is because we want the
higher timeslice possible once we return to userspace to avoid
live locks during swap)
---------------------------------------------------------------------------
The 2.2.18pre2aa1 patch is here:

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.2/2.2.18pre2aa2.bz2

As usual all the separate patches are in:

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.2/2.2.18pre2aa2/

and they can be applyed in `ls` order to generate the 2.2.18pre2aa2
kernel.

All the patches that I propose for inclusion into 2.2.18pre3 are here
(the directory is called 2.2.18pre2 because the patches are against
2.2.18pre2).

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/proposed/v2.2/2.2.18pre2/

They are all orthogonal and they can be applied in any order regardless of
each other.

Andrea

PS. If you'll run with more than 4giga of RAM on alpha, you'll need also
to fix procps if you want to read the correct MM values with `free`, `top`
and `vmstat`. procps takes all the memory information with byte
granularity in `int` words (and not long words). I fixed procps this way:

ftp://ftp.suse.com/pub/people/andrea/procps/procps-2.0.2-bigmem-1.gz

PPS. 2.2.18pre2 doesn't included riskious stuff compared to 2.2.17, thus
don't worry applying that pre-patch. In case you need to be
based on 2.2.17 for whatever reason the 2.2.18pre2aa2.bz2 patch will
just apply cleanly to 2.2.17 too.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Date	Thu, 7 Sep 2000 10:09:39 +0200 (CEST)
From	Boszormenyi Zoltan <>
Subject	Re: 2.2.18pre2aa2 and patches for 2.2.18pre3

On Wed, 6 Sep 2000, Andrea Arcangeli wrote:
> The 2.2.18pre2aa1 patch is here:
> 
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.2/2.2.18pre2aa2.bz2

It contains code in arch/i386/mtrr.c that looks pretty much like
my "64 bit MTRR" patch that was posted on lkml some time last year
and makes use of the full 36 bit MTRR address and size on Intel and
44 bits on AMD Athlon. BTW my patch contained some new ioctls
which exports this capability to user space...

Which of the smaller patches contain this code?
Your comments do not describe this particular piece of code.

Regards,
Zoltan Boszormenyi <zboszor@mail.externet.hu>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Date	Thu, 7 Sep 2000 14:49:30 +0200 (CEST)
From	Andrea Arcangeli <>
Subject	Re: 2.2.18pre2aa2 and patches for 2.2.18pre3

On Thu, 7 Sep 2000, Boszormenyi Zoltan wrote:

>It contains code in arch/i386/mtrr.c that looks pretty much like
>my "64 bit MTRR" patch that was posted on lkml some time last year
>and makes use of the full 36 bit MTRR address and size on Intel and
>44 bits on AMD Athlon. BTW my patch contained some new ioctls
>which exports this capability to user space...

Correct, it handles the full 36bit MTRR. I received the patch from David
Wragg <dpw@doc.ic.ac.uk> a few months ago claiming it was fixing a problem
with bigmem (however bigmem handle at max 4G, thus I guess it would fix
problems also for the stock kernel on machines with more than 4G, but of
course if you have more than 4G and you run 2.2.x you're for sure using
bigmem :).

While sending me the patch David Wragg told me:

	"... Mark Vojkovich at VA Linux found the problem. ..."

Then I checked the patch with the reference manual at hand, it was correct
and I included it into the bigmem patch as a bugfix.

>Which of the smaller patches contain this code?

bigmem.

>Your comments do not describe this particular piece of code.

I'll try to rember to mention this particular bugfix in bigmem in the
future. If you were the author let me know and I'll add your name (however
you should check with David, they may have implemented it indipendently).
I sure _want_ to give credits to people who deserves them so I appreciate
the clarification.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/