Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!philabs!prls!pyramid!ut-sally!std-unix From: std-u...@ut-sally.UUCP (Guest Moderator, John B. Chambers) Newsgroups: mod.std.unix Subject: Case sensitive file names Message-ID: <5860@ut-sally.UUCP> Date: Thu, 2-Oct-86 02:59:13 EDT Article-I.D.: ut-sally.5860 Posted: Thu Oct 2 02:59:13 1986 Date-Received: Fri, 3-Oct-86 05:35:41 EDT Organization: IEEE 1003 Portable Operating System for Computer Environments Committee Lines: 53 Approved: j...@sally.utexas.edu From cbosgd!cbosgd.ATT.COM!m...@seismo.CSS.GOV Wed Oct 1 16:55:45 1986 Date: Mon, 29 Sep 86 12:33:36 edt From: m...@cbosgd.att.com (Mark Horton) Message-Id: <8609291633.AA10479@cbosgd.ATT.COM> Newsgroups: mod.std.unix Subject: Case sensitive file names OK, here's a new topic. File names. I note that the committee recently decided that all file names in conforming systems must be case sensitive, for example, makefile and Makefile must be different files. (I've forgotten where I read this, it was probably Communixations.) I think this is a mistake. UNIX is the only major operating system that treats things like file names, logins, host names, and commands as case sensitive. The net effect of this is that users get confused, since they have to get the capitalization right every time. To avoid confusion, everybody always just uses lower case. So there are few, if any, benefits from a two-case system, and any time anyone tries to do something that isn't pure lower case, it causes confusion for somebody and often breaks some program. Another problem is that emulations on other operating systems, such as VMS or MS DOS, will become impossible without drastic changes to their file systems. Given the problems in the above paragraph, plus politics as usual, I think it is unlikely that other systems will be changed to have case sensitive file systems. After all, it's not like it was easiest to make the VMS filesystem case insensitive - that took extra effort on their part. I think it's a mistake to move in the direction of requiring other operating systems to become case sensitive. If anything, motion in the other direction might be of more benefit. Note: I am NOT suggesting that UNIX should have a case insensitive filesystem that maps everything to UPPER CASE like MS DOS. There is nothing wrong with mapping everything to lower case, for example. It's also reasonable to leave the case alone, but ignore case in comparisons. There is also probably a good argument for keeping it case sensitive (after all, there are probably 5 or 6 people out there who really need both makefile and Makefile, or both mail and Mail, for some reason that escapes me at the moment.) But I think it would be a mistake to require other systems to change if they are to support a POSIX emulation on top of them. (On the other hand, it may be reasonable to expect other operating systems to support more general file name lengths and character sets, rather than things like the MS DOS 8+3 convention. But in practice, this may be too painful to fix.) Mark Horton Volume-Number: Volume 7, Number 11
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!tektronix!teklds!cae780!amdcad!amd!intelca!qantel!lll-lcc! lll-crg!rutgers!husc6!ut-sally!std-unix From: std-u...@ut-sally.UUCP Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <5865@ut-sally.UUCP> Date: Thu, 2-Oct-86 12:08:21 EDT Article-I.D.: ut-sally.5865 Posted: Thu Oct 2 12:08:21 1986 Date-Received: Fri, 3-Oct-86 07:55:03 EDT Organization: IEEE 1003 Portable Operating System for Computer Environments Committee Lines: 53 Approved: j...@sally.utexas.edu From @SUMEX-AIM.ARPA:MRC@PANDA Thu Oct 2 05:09:39 1986 Date: Thu 2 Oct 86 01:59:26-PDT From: Mark Crispin <MRC%PA...@SUMEX-AIM.Stanford.EDU> Subject: Re: Case sensitive file names To: std-unix%ut-sally.U...@SALLY.UTEXAS.EDU In-Reply-To: <5860@ut-sally.UUCP> Postal-Address: 1802 Hackett Ave.; Mountain View, CA 94043-4431 Phone: +1 (415) 968-1052 Message-Id: <12243533720.7.MRC@PANDA> I would like to add a loud "Bravo!" to Mark Horton's message! The present case sensitivity of the Unix filesystem is a real drag, and something that has regularly and reliably caused me problems when working in a heterogenous environment. As far as I can tell, the only individuals who actually *like* case sensitivity in Unix are the high-schoolish hackers who think it's really cute to write programs with separate -1, -l, -I, and -L switches. I think that the most reasonable proposal is to do a free case match on input, so that "more foobar" is the same as "More FooBar", etc. On output, you first do a free case match to see if there is an extant file and if so preserve the case of that file. In other words, if I overwrite FooBar but specify foobar or FOOBAR, the file is still called FooBar. Otherwise, use whatever case the user specifies. Renaming would always use the case the user specifies, so the user can rename foobar to FooBar, etc. Now, if I can convince you guys to do this for usernames, I will take back at least 50% of the nasty things I've ever said about Unix. Golly gee, it would be nice to be MRC or Crispin, not "mrc" or "crispin"... Another way of doing it is how TOPS-20 does it. TOPS-20's filesystem isn't *really* case independent. All lowercase characters are coerced into upper case, so if I say foobar.txt it becomes FOOBAR.TXT in the actual filename. This is both from the user interface and from the filename lookup system call. It is, however, possible for any of the 128 ASCII characters to be in a filename, provided that the "oddball" characters are quoted using CTRL/V. In other words, a FooBar.Txt file is possible on TOPS-20, but only by F<^V>o<^V>oB<^V>a<^V>r.T<^V>x<^V>t. For once, I don't favor the TOPS-20 way of doing things. TOPS-20's scheme is alright if you started with case independence to begin with, but I don't think it would fit in well into Unix, and certainly not without a major flag day. I hope that my suggestion above could fit in with only minimal inconvenience. I found on TOPS-20 that no serious user used case-sensitive filenames. Everybody appreciated the case-insensitivity of the interface, even though it took the form of coercing to upper case. My experience also suggests that case sensitivity is a pain in the a**; I tried writing a major utility in Interlisp using mixed case function and variable names and eventually gave up when most of my errors turned out to be case errors. It's *so* much easier to keep the shift lock key down... -- Mark -- ------- Volume-Number: Volume 7, Number 12
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!ucbvax!ucbcad!nike!sri-spam!rutgers!husc6!ut-sally!std-unix From: std-u...@ut-sally.UUCP (Guest Moderator, John B. Chambers) Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <5875@ut-sally.UUCP> Date: Fri, 3-Oct-86 13:56:07 EDT Article-I.D.: ut-sally.5875 Posted: Fri Oct 3 13:56:07 1986 Date-Received: Sat, 4-Oct-86 07:22:45 EDT Organization: IEEE 1003 Portable Operating System for Computer Environments Committee Lines: 64 Approved: j...@sally.utexas.edu From im4u!...@prophet.bbn.com Fri Oct 3 04:42:00 1986 Message-Id: <8610030928.AA14794@im4u.UTEXAS.EDU> Date: Thu, 2 Oct 86 12:43:49 EDT From: Dan Franklin <im4u!...@prophet.bbn.com> To: "Guest Moderator, John B. Chambers" <std-unix%ut-sally.U...@im4u.UTEXAS.EDU> Subject: Re: Case sensitive file names I can see that it will be hard to emulate POSIX filenames on top of an operating system such as MS-DOS or VMS, but the benefits of changing the POSIX spec must be weighed against the costs. Suppose we changed the spec so that it permitted a POSIX implementor to provide either a case-sensitive or case-insensitive filesystem, their choice (which I think is what Mark is proposing). There are three groups of people who will be affected: those who write POSIX emulators, those who write programs for POSIX, and those who *use* POSIX and its programs. The last group will be the largest and most important by far; the emulator writers will be the smallest group. So how would users be affected? It might benefit them, because case-insensitivity might really be better than case-sensitivity. However, in the absence of a controlled study, let's assume the null hypothesis: that it makes no big difference. More than "proof by assertion" is needed! Regardless of which is really better, some users will probably benefit because they will be used to other operating systems providing case-insensitivity, particularly MS-DOS. However, if we really make it an implementor's choice, users will be hurt by the fact that each POSIX system they encounter will be different. In fact, this system-to-system difference will probably cause more problems than optional case insensitivity would solve. What about people who write POSIX programs? They will lose. To the extent that POSIX permits two possible underlying filesystems, a truly portable POSIX program will have to be prepared for either one. For many programs it may not matter what the FS looks like, but if it does matter, it will mean extra work. Finally, there are all those emulator writers. They might find it easier; then again, they might not. If I were going to do an emulator on top of MS-DOS, then (since I don't work for Microsoft) I would probably use the existing filesystem just as a base to build the POSIX filesystem, almost the way UNIX builds a named hierarchical filesystem space out of inodes. Going to case insensitivity wouldn't help me a bit, because of the other limitations Mark mentioned. It might help Microsoft, because they could change the 8+3 convention at the same time. But unless they were willing to do that, it wouldn't help them either. VAX-VMS might be easier, but again there are other problems I would have to solve. Case-insensitivity would help me some, but I'd still have a lot of work ahead of me. But arguments regarding emulator-writing are beside the point. No matter what POSIX does on this, it will always be possible to write a POSIX emulator on top of an existing operating system. So the ease of *using* the system must take precedence over the ease of writing it. For the reasons above, I believe that making case-insensitivity an *option* would be a bad idea. Changing the spec to *insist* on case-insensitivity might be a good idea, but it would cause enough problems w.r.t. existing UNIX systems that it ought to be very strongly motivated. To start with: is it really much easier for people to use such a system? Dan Franklin Volume-Number: Volume 7, Number 14
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!ucbvax!ucbcad!nike!think!husc6!ut-sally!std-unix From: std-u...@ut-sally.UUCP (Guest Moderator, John B. Chambers) Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <5913@ut-sally.UUCP> Date: Sun, 5-Oct-86 18:25:25 EDT Article-I.D.: ut-sally.5913 Posted: Sun Oct 5 18:25:25 1986 Date-Received: Mon, 6-Oct-86 05:41:41 EDT Organization: IEEE 1003 Portable Operating System for Computer Environments Committee Lines: 49 Approved: j...@sally.utexas.edu Date: Fri, 3 Oct 86 23:56:26 edt From: m...@cbosgd.att.com (Mark Horton) Subject: Re: Case sensitive file names >Finally, there are all those emulator writers. They might find it easier; >then again, they might not. If I were going to do an emulator on top of >MS-DOS, then (since I don't work for Microsoft) I would probably use the >existing filesystem just as a base to build the POSIX filesystem, almost >the way UNIX builds a named hierarchical filesystem space out of inodes. >Going to case insensitivity wouldn't help me a bit, because of the other >limitations Mark mentioned. It might help Microsoft, because they could >change the 8+3 convention at the same time. But unless they were willing >to do that, it wouldn't help them either. VAX-VMS might be easier, but >again there are other problems I would have to solve. Case-insensitivity >would help me some, but I'd still have a lot of work ahead of me. I'm not concerned very much about the amount of work the emulator writer has to do, but I am concerned about the quality of the resulting emulation. If I'm a user of an emulator which is written on an otherwise-reasonable case insensitive filesystem (VMS comes to mind) which emulates case sensitivity, then apparent POSIX filenames will bear little resemblance to real native filenames. Either there's an external table somewhere not unlike the UNIX directory/inode # tables, or else file names are somehow encoded into longer native filenames. I'm living with the latter kind of system now (Sun's PC/NFS, which makes UNIX filesystems look like DOS filesystems) and the contortions it has to go through to fit ordinary UNIX file names into DOS filenames are a serious inconvenience. The former kind of system makes it impossible to access native files from inside the POSIX environment, unless someone is awfully clever. On the other hand, if case insensitive is an option for the emulator, then two possibilities occur: (1) the vendor of the native operating system can otherwise upgrade their filesystem to allow a clean POSIX implementation (maybe they will arrange that their native OS conforms directly to POSIX; wouldn't you consider it strongly if the market starts to demand POSIX compatibility?) and (2) True UNIX systems have the option to evolve to case insensitive, should a study be done and the world conclude that insensitive is better. I agree that a study should be done; I have my own intuitive feelings on the subject, and there is quite a collection of operating systems out there that went to extra work to be case insensitive, they can't all be wrong, can they? But by all means, this would make a great human factors study for somebody. Mark Volume-Number: Volume 7, Number 18
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!ucbvax!ucbcad!nike!think!husc6!ut-sally!std-unix From: std-u...@ut-sally.UUCP (Guest Moderator, John B. Chambers) Newsgroups: mod.std.unix Subject: Case sensitive file names Message-ID: <5914@ut-sally.UUCP> Date: Sun, 5-Oct-86 18:26:23 EDT Article-I.D.: ut-sally.5914 Posted: Sun Oct 5 18:26:23 1986 Date-Received: Mon, 6-Oct-86 05:42:02 EDT Organization: IEEE 1003 Portable Operating System for Computer Environments Committee Lines: 25 Approved: j...@sally.utexas.edu Date: Sat, 4 Oct 86 04:19:12 CDT From: dutoit!...@research.UUCP Subject: Case sensitive file names The suggestion that POSIX be required (worse, permitted) to conflate cases in file names is utterly loony. We have enough portability problems already in reconciling System V with 4.x without trying to make Unix compatible with MS-DOS. It is granted that Stu Feldman committed a rare lapse of taste in accepting both `makefile' and `Makefile' (thus dooming everyone to typing `cat ?akefile') and that Fowler apparently compounded the distinction to the point of felony by encouraging both kinds of ?akefiles to exist and have different meanings. Nevertheless, neither the possibility of silliness in choosing file name conventions nor the dubious advantages of permitting Unix to be embedded in other systems are relevant; what is important is that such a subtle yet central change would be certain to make transport of programs and of files more onerous. This is not a wise thing for an endeavor devoted to promoting portability. Dennis Ritchie Volume-Number: Volume 7, Number 19
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!ucbvax!ucbcad!nike!think!husc6!ut-sally!std-unix From: std-u...@ut-sally.UUCP (Guest Moderator, John B. Chambers) Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <5915@ut-sally.UUCP> Date: Sun, 5-Oct-86 18:31:52 EDT Article-I.D.: ut-sally.5915 Posted: Sun Oct 5 18:31:52 1986 Date-Received: Mon, 6-Oct-86 05:42:22 EDT Organization: IEEE 1003 Portable Operating System for Computer Environments Committee Lines: 51 Approved: j...@sally.utexas.edu Date: Sat, 4 Oct 86 16:54:37 PDT From: hoptoad!...@lll-crg.ARPA (John Gilmore) Subject: Re: Case sensitive file names > From: m...@cbosgd.att.com (Mark Horton) > Another problem is that emulations on other operating systems, > such as VMS or MS DOS, will become impossible without drastic > changes to their file systems. I think we should eliminate the hierarchical file system too (-:). After all, VM/370 doesn't use it, nor does CP/M. It would be too hard to emulate. (Thank Bog that MSDOS and the Mac added the feature, and that Atari and Amiga started that way, or somebody might actually take me seriously!) We could consider getting rid of devices-as-files, though -- there's an idea that none of those people have picked up :-). > After all, it's not like it was easiest to make the VMS filesystem > case insensitive - that took extra effort on their part. Their feeling it was worth the work for VMS doesn't make it right for Unix. > I think it's a mistake to move in the direction of requiring other > operating systems to become case sensitive. Nobody is requiring anything of any other operating system. We're defining a *new* operating system here. My impression was that the "new operating system" was supposed to look very much like the set of features-in-common to the various Unix operating systems. If we are trying to standardize an environment that will run under other operating systems, somebody better tell us quick. I thought the "Portable Operating System" stuff was just a legalese hack because we can't use the trademarked name "Unix". Was I wrong? > But I think > it would be a mistake to require other systems to change if they > are to support a POSIX emulation on top of them. (On the other hand, > it may be reasonable to expect other operating systems to support > more general file name lengths and character sets, rather than things > like the MS DOS 8+3 convention. But in practice, this may be too > painful to fix.) Either they will implement POSIX compatability or they won't. If we define POSIX systems to be case insensitive, MSDOS would not qualify anyway, since you can't use an arbitrary 14-character file name. VMS would have problems with files whose names contained [, ], or colon, etc. So they will have to provide some form of file name translation, and they should handle the case issue at the same time they handle the length and allowable character set issues. Volume-Number: Volume 7, Number 20
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!ucbvax!ucbcad!nike!sri-spam!rutgers!husc6!ut-sally!std-unix From: std-u...@ut-sally.UUCP (Moderator, John Quarterman) Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <5929@ut-sally.UUCP> Date: Mon, 6-Oct-86 18:55:36 EDT Article-I.D.: ut-sally.5929 Posted: Mon Oct 6 18:55:36 1986 Date-Received: Tue, 7-Oct-86 03:41:10 EDT Organization: IEEE 1003 Portable Operating System for Computer Environments Committee Lines: 21 Approved: j...@sally.utexas.edu The discussion has been interesting and has brought up some topics, such as what case insensitivity means in non-English languages, that many of the readers were evidently unaware of. However, it's getting a bit out of hand. IEEE P1003.1 is interested in promoting portability of applications by defining a UNIX-like operating system interface. Any major change from a feature of *every* variant of UN*X, such as case-sensitive file names (really, filenames as uninterpreted byte strings), needs major justification before being considered. So further assertions of the form "I want it because I like it" are not of interest. It would be most interesting to see the results of a survey on user reaction to case sensitivity or insensitivity, but this newsgroup isn't the place to conduct such a survey, and it's not clear that the results would be relevant to 1003.1 anyway (what does case mean in Japanese or Finnish)? So, unless you've got something new to say on this subject, please let's go on to something else. Volume-Number: Volume 7, Number 27
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!ucbvax!ucbcad!nike!rutgers!husc6!ut-sally!std-unix From: std-u...@ut-sally.UUCP (Moderator, John Quarterman) Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <6018@ut-sally.UUCP> Date: Thu, 16-Oct-86 11:47:50 EDT Article-I.D.: ut-sally.6018 Posted: Thu Oct 16 11:47:50 1986 Date-Received: Thu, 16-Oct-86 21:59:52 EDT References: <6002@ut-sally.UUCP> <5865@ut-sally.UUCP> Organization: IEEE P1003 Portable Operating System for Computer Environments Committee Lines: 101 Approved: j...@sally.utexas.edu [ *sigh* Below you will find two examples of proof by emotion, one for case sensitivity, one for case insensitivity. Now that we have one on each side together like this, how about let's either use facts and arguments or go on to another subject? Below the second example there is a somewhat new point, marked by another interjection from the moderator. -mod ] From: seismo!mcvax!gec-mi-at.co.uk!adam Date: Thu, 16 Oct 86 09:29:20 -0100 Organization: Marconi Instruments Ltd., St. Albans, Herts, UK >I would like to add a loud "Bravo!" to Mark Horton's message! The present >case sensitivity of the Unix filesystem is a real drag.... No NO nO NO nO No no! Case sensitivity is a bonus. If you can't handle it, it's your problem. I've worked with both case-sensitive, -preserving and -insensitive systems, and I prefer them in that order. -Adam. From: pyramid!lll-crg!nike!ucbcad!ucbvax!excelan!donp (Don Provan) Date: Wed, 15 Oct 86 09:58:48 pdt This is a good example of why people coming from other operating systems so often dislike UNIX. Two people pointed out what is clearly a bug in UNIX which particularly upsets them. Many people responded that it was a feature. Hrumph! [ Below is the new point. -mod ] If you're so concerned about correctly handling of foreign languages, why don't you start by handling English correctly? In English, "Make" and "make" are considered identical. Capitalization rarely has an effect on meaning. Yet in UNIX, "Makefile" and "makefile" are two different files with different "meanings". Where are your *NEW* users that are going to understand this sudden departure from a rule of their native tongue? [ The point is wrong. Capitalization is significant in English: internet and Internet do not have the same meaning, nor do john and John (for readers outside the States, perhaps I should point out that john with no capital refers to a toilet). The distinction applies not only to proper names but also in Emphasis and in syntax at the beginning of sentences. -mod ] I am not sufficiently versed in foreign languages to understand the issues concerning capitalization there. It sounds like in some cases the rules of what letters are equivalent (such as "A" and "a" in English) might require tailoring. If you're going to support foreign languages in a meaningful way, i assume you're going to make lots of other modifications, too. For example, "Makefile" would need to have a different name, right? (I suppose the UNIX utilities themselves already have names far enough removed from English so that they're no problem. What *does* "ls" stand for, anyway?) [ As a moderately good reader of French and Spanish, I believe I can state that the same sort of capitalization conventions exist in them as in English, but with different details as to when capitalizaition is appropriate. The lexical details also differ: the capital of ll (a single letter in Spanish) is usually Ll, except when it's LL; in French, whether an e with an acute accent still has an accent in its capital E form depends on whether you're in France, Belgium, Quebec, Louisiana, etc. I understand Greek is an interesting language: there are several kinds of lower case forms of some letters, to be used in different places in a word (beginning, middle, end). Similar distinctions exist in Arabic. And, as several people have pointed out, case isn't meaningful in Chinese, Korean, or Japanese kanji. Also, the number of bytes used to encode a character changes with the language, and multiple languages should be supportable on the same system (in Japan, they commonly use English, Japanese in romanji, and Japanese in Kanji; in Scandinavian countries I suspect they have a lot of English interspersed with the national language in technical literature). In most European countries, UNIX command names are used unchanged, and Makefile does not in fact have a different name. Would some Europeans care to comment? -mod ] Having done a lot of case insensitive work, i've always felt that the UNIX case sensitivity was from laziness. If i were to be charitable, i might go so far as to call it a shortcut. [ See Doug Gwyn's previous article for a good explanation of why file names are case sensitive (or, rather, byte streams uninterpreted by the kernel) in UNIX (see Barry Shein's article for a good explanation of why some other systems are case insensitive). In places where there was a reason for case insensitivity (e.g., to match mail standards), it has been done. -mod ] But it's ridiculous to say it makes more sense or it makes UNIX easier for new users or it allows UNIX to support foreign languages. [ "Ridiculous" is not an argument. -mod ] don provan Volume-Number: Volume 7, Number 62
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!decwrl!amdcad!lll-crg!seismo!ut-sally!std-unix From: std-u...@ut-sally.UUCP Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <6029@ut-sally.UUCP> Date: Fri, 17-Oct-86 12:35:48 EDT Article-I.D.: ut-sally.6029 Posted: Fri Oct 17 12:35:48 1986 Date-Received: Fri, 17-Oct-86 21:20:24 EDT References: <6002@ut-sally.UUCP> <5865@ut-sally.UUCP> <6018@ut-sally.UUCP> Organization: IEEE P1003 Portable Operating System for Computer Environments Committee Lines: 73 Approved: j...@sally.utexas.edu From: cbosgd!cbosgd.ATT.COM!m...@ucbvax.berkeley.edu (Mark Horton) Date: Fri, 17 Oct 86 11:20:32 edt Organization: AT&T Medical Information Systems, Columbus Don Provan raises some interesting questions about foreign languages. In general, I think we know how to do a case insensitive comparison appropriately, by extending a function (I think it's called strcoll, but I don't have my X3J11 draft handy) defined in ANSI C; the function is like strcpy, but the destination buffer gets a translation of the string that will collate properly when a lexicographic comparison like strcmp is used. If we extend this function to also translate to one case (as appropriate) and allow each country to define its own function, it's technically possible to ignore case. Whether it's fast enough for the UNIX filesystem is unclear, although this problem is not restricted to UNIX. I think it would be interesting to hear what other, case-insensitive operating systems do about these issues. What do MS DOS, or VM/CMS, or VMS, or whatever, do with their case insensitive file names in Europe, or Japan, or whereever? If the answer is that file names are restricted to use the same character set as in the USA, and that extra letters are disallowed, then we need to know how well this is accepted by the users on other systems. Maybe it's good enough. Do users in other countries often create files whose names contain extra letters? If they try, does the shell get in the way if their letter happens to be "|", for example? If the answer is that other operating systems have forced other countries to put up with Americanisms, and that POSIX is an opportunity to break new ground by handling other languages properly, then by all means let's do it right. This might require 8 bit characters in file names, for example. Incidently, I've seen it claimed here that UNIX allows arbitrary byte streams in file names. Perhaps this is the intent, but in reality the UNIX filesystem is far from a transparent path. There are lots of restrictions, some of which are: The slash character is special. The null character is special. Sequences of more than 14 chars not containing a slash are either illegal or only significant to 14 chars or significant to 256 chars, depending on the version of UNIX. Characters with the 8th bit turned on are not allowed. Since many commands take names beginning with "-" as flags, file names beginning with "-" don't always work. Since the shell treats many of the punctuation characters specially, file names containing space, #, $, &, *, (, ), [, ], ;, ', ", \, |, <, >. and ? do not always work properly. Even if you quote them, the shell strips off the quotes, so that if multiple layers of shell are involved (for example, uux) it still fails. Because some of these problems only affect certain uses of the filesystem (whether or not you go through the shell, whether or not you're going through a command that takes arguments) it's not unusual for casual users to create a file and then have trouble using, renaming, or even removing it. I recall that removing a file whose 8th bit has been set is a frequent topic on net.unix. If the filesystem were really transparent, the designers of /proc would not have had to encode process ID's in ASCII digits, they could have directly used the binary representation. It's for these reasons that I feel that a conservative UNIX user should restrict themselves to certain "reasonable" filename conventions; basically using only lower case letters, digits, and a few save punctuation characters such as . and - in their filenames. Just because it's possible to put a space in a file name doesn't make it a good idea. Mark Volume-Number: Volume 7, Number 67
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!ut-sally!std-unix From: std-u...@ut-sally.UUCP (Moderator, John Quarterman) Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <6036@ut-sally.UUCP> Date: Fri, 17-Oct-86 19:09:37 EDT Article-I.D.: ut-sally.6036 Posted: Fri Oct 17 19:09:37 1986 Date-Received: Sat, 18-Oct-86 00:30:44 EDT References: <6002@ut-sally.UUCP> <5865@ut-sally.UUCP> <6018@ut-sally.UUCP> <6029@ut-sally.UUCP> Organization: IEEE P1003 Portable Operating System for Computer Environments Committee Lines: 29 Approved: j...@sally.utexas.edu From: mordor!...@sally.utexas.edu (John Bruner) Reply-To: j...@s1-c.arpa Date: Fri, 17 Oct 86 14:39:08 PDT Organization: S-1 Project, LLNL It seems to me that there are three alternatives. POSIX can specify that conforming implementations must be case sensitive, must be case insensitive, or may be either case sensitive or case insensitive. If a conforming system must be case insensitive, then UNIX doesn't conform. If UNIX is to be included in the set of POSIX-compatible systems, then case sensitivity must be permitted. If a conforming system may be case sensitive or case insensitive, then a lot of programs won't be portable. Ignore for the moment all existing UNIX code and consider new program development. I believe that programmers on one kind of system won't bother with the library routines that are used to compare and/or convert mixed-case names to monocase. It doesn't matter what people "ought" to do. A well-known example of this effect is 4.2BSD. The source code is full of variables that should be declared "long" but -- since on the VAX "long" and "int" are identical -- are not. In the same way, optional case sensitivity will spawn code that only runs correctly in the environment where it was written. Therefore, I believe that case sensitivity must be retained, and it should not be made optional. Volume-Number: Volume 7, Number 68
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!decwrl!pyramid!ut-sally!std-unix From: std-u...@ut-sally.UUCP Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <6049@ut-sally.UUCP> Date: Mon, 20-Oct-86 05:13:29 EDT Article-I.D.: ut-sally.6049 Posted: Mon Oct 20 05:13:29 1986 Date-Received: Mon, 20-Oct-86 21:40:36 EDT References: <6002@ut-sally.UUCP> <5865@ut-sally.UUCP> <6018@ut-sally.UUCP> <6029@ut-sally.UUCP> <6036@ut-sally.UUCP> Organization: IEEE P1003 Portable Operating System for Computer Environments Committee Lines: 65 Approved: j...@sally.utexas.edu From: cbosgd!cbosgd.ATT.COM!m...@ucbvax.berkeley.edu (Mark Horton) Date: Sun, 19 Oct 86 23:11:35 edt Organization: AT&T Medical Information Systems, Columbus >If a conforming system may be case sensitive or case insensitive, >then a lot of programs won't be portable. Ignore for the moment >all existing UNIX code and consider new program development. I >believe that programmers on one kind of system won't bother >with the library routines that are used to compare and/or convert >mixed-case names to monocase. It doesn't matter what people "ought" >to do. A well-known example of this effect is 4.2BSD. The source >code is full of variables that should be declared "long" but -- >since on the VAX "long" and "int" are identical -- are not. In the >same way, optional case sensitivity will spawn code that only runs >correctly in the environment where it was written. > >Therefore, I believe that case sensitivity must be retained, and >it should not be made optional. I'm sorry, but I don't buy this argument. It seems to be based on the assumption that case insensitivity will be implemented by the use of subroutines for case-insensitive operations, with a different user interface from that available today. I think such an implementation is silly, even if other operating systems may do it that way. I'm talking about file names only. I do not advocate even considering making all of the user interfaces in UNIX case insensitive. While it might have once been a good idea to design them that way, I feel it's far too late for someone to decree that all the upper and lower case keys in, say, vi must be equivalent. I think it's a given that existing code won't be rewritten to use new interfaces, even if we come up with a wonderful way to do it. Vi still uses raw terminfo, even through curses would have been much easier and better. Also, there are lots of binaries out there that can't even be recompiled. Any solution to this problem must be in the kernel, or possibly in libc underneath such subroutines as open, unlink, and chmod, (if you have shared libraries or full source to recompile) or it won't work all the time. The obvious implementation is that the code in the kernel, when mapping a filename to an inode number, to do a case-insensitive comparison when checking each filename element in a directory. This would be pretty simple to add, although issues such as speed and international variations would probably require a clever case-insensitive comparison, possibly using a country-specific case mapping table with some flags or other hacks to deal with single-multiple glyph mappings like SS to ess-tset. There might even be a performance GAIN if creation of a directory entry including calculating an appropriate hash function which is also stored in the directory and used for initial comparisons. I see no need to map everything to lower case when creating the directory entry. Let the entries be in mixed case; this allows more readable names. I don't know what to do about sorting (e.g. in the shell or ls) - it might be case sensitive or insensitive sorting, and good arguments can probably be made for both. The behavior I'm concerned about is that, if the user types, say, "mail" and there's a command "Mail" in the search path, it should still work. If the file "FooBar" exists and the user cats "foobar", because somebody read that name over the phone, it should find it. Mark Volume-Number: Volume 7, Number 72
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!ucbvax!ucbcad!nike!rutgers!seismo!ut-sally!std-unix From: std-u...@ut-sally.UUCP (Moderator, John Quarterman) Newsgroups: mod.std.unix Subject: Re: case sensitive filenames Message-ID: <6107@ut-sally.UUCP> Date: Sun, 26-Oct-86 01:19:05 EST Article-I.D.: ut-sally.6107 Posted: Sun Oct 26 01:19:05 1986 Date-Received: Sun, 26-Oct-86 07:17:13 EST References: <5860@ut-sally.UUCP> Organization: IEEE P1003 Portable Operating System for Computer Environments Committee Lines: 77 Approved: j...@sally.utexas.edu From: mcken...@sri-unix.arpa (Paul E. McKenney) Date: Thu, 23 Oct 86 17:27:21 pdt Organization: SRI, Menlo Park, CA. Ok, how about a compromise proposal? Keep roughly the same case-sensitivity in the kernel interface that exists now. This means that (for example) 'unlink("abc")' and 'unlink("ABC")' will remove two different files. Keep the normal shell interface for filenames. This means that (again, for example) 'rm abc' and 'rm ABC' will again remove two different files. Make escape completion case insensitive. (Escape completion is used in some versions of BSD 4.x csh, perhaps elsewhere also. It allows a user to type the first part of a filename (or command name) and then hit ESC. The system will complete the filename as best it can. If it cannot unambiguously determine the filename from the part given by the user, it will beep after having supplied as much of the filename as it can without problems with ambiguity. There is also usually a feature that allow the user to display all filenames that match what he has typed so far -- control-D serves this function in some variants of BSD 4.2 csh.) In other words, if a user types 'rm abc<ESC>' (where <ESC> represents the ESC key), and there is a file named 'ABC', and there is no other file that matches the pattern '[aA][bB][cC]', the shell (-not- the kernel) will backspace over the 'abc' and overwrite it with 'ABC' so that the command line will look as if the user had typed 'rm ABC'. The user may then hit RETURN if he wishes to execute the command, or he may further edit the command line (using his usual backspace/delete, etc. characters). This escape-mapping facility should be supplied in a library routine so that application programs can easily act the same way. It would be nice if such a function could work with keywords, hostnames, etc. as well as filenames. This proposal has the following advantages: o It does not impact existing software (addition of the case-insensitive ESC does not add any functionality, it just makes it easier on users). o It answers Mark Horton's 'filename-over-the-phone' problem <6...@ut-sally.UUCP> (just tell the user to type 'foobar<ESC>'). o It allows users from a case-insensitive environment a helpful tool to ease their transition (let's face it -- if it is different than whatever you are used to, it ain't friendly -- regardless of whether you are used to case sensitivity, case insensitivity, or hieroglyphics). o Removes the need for millions and millions of 'upper()' calls in application code mentioned by Dan Libes <5...@ut-sally.UUCP> (although the additional code to do good escape-completion is far from trivial!). o Removes the need for 'isfsense()' or 'isflegal()' (Chris Lent, <5...@ut-sally.UUCP>) since all implementations could use the same definition of legal characters in a pathname. Note that 'isflegal()' is still useful for programs that are trying to be portable across different operating systems. This proposal leaves the following two issues unresolved: o Whether the eighth bit on characters within a filename should be significant. The developers of BSD 4.[23] must have had some good reason for making it insignificant, but the only reason that comes to mind is that most terminals cannot easily specify the eighth bit (just like some older terminals cannot easily specify lower case!). o Whether there should be some escaping mechanism to allow slash ("/") and ASCII NUL in a filename. I cannot think of a reason for allowing this that seems worth the trouble -- any comments? Paul E. McKenney mcken...@sri-unix.arpa {pyramid,rutgers,ucbvax!hplabs}!sri-unix!mckenney Volume-Number: Volume 7, Number 89
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!rutgers!sri-spam!mordor!lll-crg!seismo!ut-sally!std-unix From: std-u...@ut-sally.UUCP (Moderator, John Quarterman) Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <6226@ut-sally.UUCP> Date: Tue, 4-Nov-86 12:36:22 EST Article-I.D.: ut-sally.6226 Posted: Tue Nov 4 12:36:22 1986 Date-Received: Wed, 5-Nov-86 06:23:59 EST Organization: IEEE P1003 Portable Operating System for Computer Environments Committee Lines: 54 Approved: j...@sally.utexas.edu From: ch...@mimsy.umd.edu (Chris Torek) Date: Tue, 4 Nov 86 07:33:44 EST We seem to have three proposals: CS: Case sensitive file systems. This is what all major Unix variants (V6, V7, SysIII, SysV, 2BSD, and 4BSD) now support. CC: Case coercive file systems (file names forced to all upper or all lower case). CR: Case retaining but otherwise insensitive file systems (new names are created according to the given case; matches are not case sensitive). I sincerely hope that no one is seriously suggesting POSIX adopt CC: no one seems to like such systems much. That leaves CS and CR. The case for CR appears to be that those who have used both CS and CR prefer CR. This may be true; I have seen no studies, but the anecdotes do seem to favour it. I have used such a system, and did not think it so wonderful, but for the sake of argument, let us assume that CR really is objectively better than CS---so much so that 5BSD and System V Release N+1 will have CR style file systems. Fine. But as I understand it, POSIX is intended to be an interface specification for something that resembles `Unix' (whatever `Unix' may be). If that is indeed the case, the only sensible choice is CS, for, as I noted above, this is what all major Unix variants *do*. *They all agree:* file names are case sensitive. Should we make standard something that no one uses? I say no! When 5BSD and Release N+1 come out, then we can create a new standard to describe these wonderful new systems, but until then, let us write something that describes what we have now. I believe that the first standard for *anything* that already exists should describe the existing implementations, at least wherever they agree. Afterward, feel free to invent new improved standards, so as to foist progress upon vendors. Indeed, it might not be a bad idea to publish two standards virtually simultaneously: That Which Is, and That Which Should Be. But list first That Which Is. [ There really are (or at least were) two discussions going on here: one about what should be in POSIX, the other about what UNIX should do. I haven't seen any recent arguments that POSIX should do anything but reflect what UNIX currently does, i.e., case sensitive file names (really file names as uninterpreted byte streams). -mod ] -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: ch...@mimsy.umd.edu Volume-Number: Volume 8, Number 34
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!ut-sally!std-unix From: std-u...@ut-sally.UUCP (Guest Moderator, John B. Chambers) Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <6412@ut-sally.UUCP> Date: Fri, 21-Nov-86 16:11:54 EST Article-I.D.: ut-sally.6412 Posted: Fri Nov 21 16:11:54 1986 Date-Received: Fri, 21-Nov-86 21:49:42 EST Organization: IEEE P1003 Portable Operating System for Computer Environments Committee Lines: 29 Approved: j...@sally.utexas.edu References: >From bu-cs!...@harvard.UUCP Wed Nov 19 07:19:28 1986 Date: Tue, 18 Nov 86 21:35:03 EST From: bu-cs!bu-cs.BU.EDU!...@harvard.UUCP (Barry Shein) The problem with a file system where you cannot have ReadMe and README is that you are throwing away possibilities. This also means that I cannot have tmp01234A, tmp01234B, ... , tmp01234a, ... I fear that although many people have applications that are small and have small requirements they should not place restrictions on those with large requirements, use your imagination, consider MasterCard's data base for a moment or some of the multi-library catalog systems people are building, they may need (and have machines that have no trouble with) many thousands of files who's names may serve as primary keys (why not, it's one way to guarantee write-through on update...) Next they'll be telling us we should only allow 16-bit ints because any number larger than 16-bits is hard to type in and error prone anyhow. I still suggest the use of 'stty lcase' if that's what you want (alias run 'stty -lcase; \!* ; stty lcase' :-) -Barry Shein, Boston University Volume-Number: Volume 8, Number 58