Porting BSD Unix Through the GCC by John Gilmore GNU's Bulletin June, 1989 I have ported the University of California at Berkeley's latest Unix sources through the GNU C Compiler. In the process, I made Berkeley Unix more compatible with the draft ANSI C standard, made many programs less machine-dependent and less compiler-dependent, and tested GCC. Berkeley Unix has set the standard for high powered Unix systems for many years, and continues to offer an improved alternative to AT&T Unix releases. However, Berkeley's C compiler is based on an old version of PCC, the Portable C Compiler from AT&T. By merging GCC into the Berkeley release, we provided ANSI C compatibility, better optimization, and improved compiler maintenance. The GNU project gained an important test case for GCC, and a strong collaborator in the free software movement. The project was conceived by John Gilmore, and aided by Keith Bostic and Mike Karels of Berkeley, and Richard Stallman of FSF. I did most of the actual porting, while Keith and Mike provided machine resources, collaborated on major decisions, and arbitrated the style and content of the changes to Unix. Richard provided quick turnaround on compiler bug fixes and problem solving. We are producing a Unix source tree which can be compiled by both the old and the new compilers. Rather than introducing new `#ifdef''s, we are rewriting the code so that it does not depend on the features of either compiler. Whenever we have to make a change, we are moving in the direction of ANSI C, POSIX compatibility, and machine independence. We have used GCC releases 1.15 through 1.35. I did four complete "passes" over the Unix source tree; each involved running "make clean; make" on the entire source tree, and examining 500K to 800K of resulting output. I'd fix as many errors as I could, testing small parts of the source tree in the process, then merge my changes back into the master sources and rebuild the whole thing again. The errors fell into two general categories: language changes in ANSI C, and non-portable code. In some cases it was hard to tell the difference. The major ANSI C problem was the generation of character constants in the preprocessor. Excessive use of this now-obsolete feature in system header files caused us to change about 10 include files and about 45 source modules. Another preprocessor problem was that ANSI C uses a different syntax for token concatenation; we rewrote pieces of five modules to avoid having to concatenate tokens. ANSI C clarified the rules for the scope of names declared `extern'. We moved extern declarations around, or added global function declarations, in more than 38 files to handle this. Nine programs used new ANSI keywords, such as `signed' or `const', as identifiers; we picked new identifiers. Eleven modules used typedefs as formal parameters names, or used `unsigned' with a typedef. The worst non-portable construct we found in the Unix sources was the use of pointers with member names that aren't right for the pointer type. Fixing this problem caused a lot of work, because we had to figure out what each untyped or mistyped pointer was really being used for, then fix its type, and the references to it. We changed 5 modules due to this, and abandoned one program, efl, which would have required too much work to fix. Another problem was caused by using CPP as a macro processor for assembler source. We circumvented this problem by making the assembler source acceptable to both old-CPP and ANSI CPP. A major problem was `asm' constructs in C source. Some programs were written in C with intermixed assembler code, producing a mess when compiled with anything but the original compiler. Other routines, such as compress, drop in an `asm' here or there as an optimization. Still more modules, including the kernel, run a sed script over the assembler code generated by the C compiler, before assembling and linking it. We eliminated as many uses of `asm' as we could, and turned others into assembler language subroutines in `.s' files. Both the Pascal and Lisp interpreters used heavy hacking with sed scripts; each of these took several days to fix. We fixed three programs that used multi-character constants; two were clearly errors. Fifteen programs tried to declare functions or variables, while omitting both the type and storage class; we added `int' to the declarations. In two modules this diagnosed errors caused by use of `;' where `,' was intended. Changes to the rules for parsing declarations made us fix five modules, and declaration bugs in six more were caught by GCC's improved error checking. Fifteen programs had miscellaneous pointer usage bugs fixed. GCC caught bugs in five modules caused by misunderstood sign extension. Five or ten other miscellaneous bugs were caught and fixed. We are pleased with the results so far. Most of the Unix code compiled without problems, and the parts which we have executed are free from code generation bugs. The worst of the ANSI C changes only required roughly fifty modules to be changed, and there were only two problems of this magnitude. A total of twenty bugs in GCC have been found so far, and most of them are now fixed. We expected several times this many bugs; the compiler is in better shape than any of us expected. Many minor problems and nit incompatibilities with ANSI C have been removed from the Unix sources. Far fewer user programs should require attention when doing a BSD Unix port now. However, we did not attempt to make Berkeley Unix fully ANSI C compliant. In particular, we kept preprocessor comments (`#endif FOO') as well as machine-specific `#define''s (`#ifdef vax'). GCC supports these features even though ANSI C does not. Unfinished work remains. The BSD kernel has not yet been ported to GCC, though it has been syntax-checked. Optimization of the kernel will cause problems until `volatile' declarations are used in all the right places. Pieces of the Portable C Compiler are still used inside lint, f77, and pc. Various sources still need their `setjmp' calls fixed so that only volatile variables depend on keeping their values after a `longjmp'. Our changes will be available to recipients of Berkeley's next software distribution, whenever that is. We will also make diffs available to others involved in porting Unix to ANSI C. Future projects include building a complete set of ANSI C and POSIX compatible include files and libraries (including function prototypes), and converting the existing sources to use them. An eventual goal is to produce a fully standard-conforming Unix system---not only in the interface provided to users, but with sources which will compile and run on any standard-conforming compiler and libraries. The success of this collaboration between GNU and Berkeley has encouraged further cooperation. The GNU project is working to provide reimplementations of System V features that Berkeley Unix lacks, such as improved shells and make commands. In return, Berkeley has released much of its software to the public, eliminating the AT&T license requirement for programs that AT&T did not supply. A large set of "freed" BSD software is available by uucp or ftp from `uunet.uu.net' in the subdirectory `bsd-sources', as well as on the GNU Compiler tape and the UUNET tapes. Copyright 1989