Was DOS copied from CP/M?

Background
Gary Kildall started Intergalactic Digital Research (later shortened to Digital Research or just DRI) and created the first microcomputer operating system, CP/M, used on many hobbyist personal computers before Apple and IBM introduced their machines. But Microsoft captured the microcomputer OS market, and with it the market for software applications, with its MS-DOS that came out years later for the IBM PC. For decades, a rumor has persisted that DOS was illegally copied from CP/M and that the fortune accumulated by Bill Gates rightfully belonged to Gary Kildall.

Several years ago, I did a forensic comparison of the binary code for MS-DOS to the source code for CP/M. I could find no signs of copying, and wrote a journal paper[9] and an article[10] about my examination. Since that time, the Computer History Museum in Mountain View, California received the source code for MS-DOS 2.0 from Microsoft and was given permission to make it public. The museum also received the source code for MS-DOS 1.1 from Tim Paterson, the original developer of DOS who was originally contracted by Microsoft to write MS-DOS. Comparing source code is more accurate than comparing binary code, which can produce false negatives, so I decided to perform another comparison to put the question to rest for good.

In addition to source code, I examined whether the DOS commands were copied from CP/M and whether the DOS system calls were copied from CP/M. These issues have also been discussed and debated for decades.

Finally, I will discuss whether DRI could have had a legitimate copyright claim against Microsoft.

The History
In 1980, IBM started a “skunk works” project in Boca Raton Florida to create a personal computer. This independent development group within IBM decided that they would focus on the hardware and partner with one of the small microcomputer companies already producing and selling programs. Thirty-six years ago, in August 1980, IBM executives flew to Bellevue, Washington to meet 24-year old Bill Gates who ran Microsoft, a company selling a very successful version of the BASIC programming language. Microsoft didn’t have an operating system, so Gates sent them to see his friend Gary Kildall at DRI in Pacific Grove, California, who had CP/M.[1][2][3][4][5]

At this point there are several versions of the story. In one version, Kildall and his team, described by some as a bunch of hippies, didn’t trust “Big Brother” IBM. Avoiding the meeting, Kildall took off in his plane for a joyride[1]. The IBM execs were met by Kildall’s wife and business partner Dorothy who refused to sign IBM’s non-disclosure agreement (NDA), a standard business document that would have kept the discussion secret. After several hours of quibbling over the NDA, not even getting to the essential negotiation, the IBM executives got frustrated and left[1][3][4][5].

In another version of the story, Kildall and DRI employee number 1, Tom Rolander, went off in Kildall’s plane to deliver software to a customer and put their chief negotiator Dorothy in charge[3]. Dorothy felt the NDA was too restrictive and their attorney Gerry Davis advised her to wait for Kildall to return[1][3]. Kildall returned later that day but again accounts differ as to whether he signed the NDA or even participated in discussions with IBM[1].

In any case, no deal was signed. The IBM negotiators flew back to Seattle that day and again met with Gates, still in need of an operating system. Gates decided to acquire the rights to Q-DOS from Seattle Computer Products for $75,000[6][7][8], and hired its author, Tim Paterson, to modify it into MS-DOS for licensing to IBM as PC-DOS.

The IBM PC became a huge success and Microsoft displaced DRI as the leading microcomputer operating system company. Kildall maintained that QDOS, and subsequently MS-DOS, had been directly copied from CP/M and thus infringed on his copyright. DRI attorney Gerry Davis claimed that forensic experts had proven that MS-DOS had been copied from CP/M and infringed DRI’s copyright but decided not to go to court[1].

Next Page >>

Cleaning the Code
I had access to several versions of CP/M source code, but the most complete early version was a low resolution, dot matrix printout of version 1.3. I had to perform a number of time-consuming steps to get usable code.

First I had to remove things that were not source code, including stamps on each page indicating that the code is copyright by Digital Research in 1976. I had to cut out the stamps from each document page image and replace any underlying text that I could identify. I also manually cut out line numbers on the left margins and memory maps that weren’t part of the actual source code.

Some of the code ran off the printed page. Usually these were comments, which didn’t affect the functionality of the code but might have contained potential clues to copying. Unfortunately, anything printed off the page was lost forever.

I then performed OCR scanning on each page image and did several passes of manual corrections where the OCR didn’t produce good results, usually because the printouts weren’t clear.

I found a number of places where strange strings of characters showed up or unusual instructions that weren’t documented anywhere. I initially assumed these were some kind of deprecated instructions and searched online, put questions on various CP/M groups, and even asked Tom Rolander, DRI’s first employee. No one could identify these strange instructions. When I eventually saw a pattern, I realized that these were printer glitches causing extraneous letters to print at random times, and the printer to spew out random strings at other times. Carefully eliminating these superfluous characters, I ended up with legitimate source code that ran through an assembly language parser without errors.

Source Code Comparison
For the code comparison, I used the forensic techniques that I’ve developed at my consulting company Zeidman Consulting over the past 15 years or so, as well as the CodeSuite® tool from my software company Software Analysis and Forensic Engineering and followed the procedure that I’ve written about in my textbook on software forensics[11].

Searching for Clues
The first part of the process is to search for certain clues in the source code including the string “copyright,” company names, programmer names and initials, and any other relevant terms that can be thought of. You might be surprised how many times a copyright notice for company A can be found in the source code for company B it was copied.

Interestingly, a search for the terms “CP/M” and “CPM” produced some results in the DOS source code:

; 1.12 10/09/81 Zero high half of CURRENT BLOCK after all (CP/M programs don't)

STOSB ;Set it to zero (CP/M programs set low byte)

STOSB ;Set it to zero (CP/M programs set low byte)

STOSB ; Set it to zero (CP/M programs set low byte)

My research and my reading of the code led me to believe that the code above has something to do with the file system. Because it discusses differences between DOS and CP/M, it’s interesting, but not proof that the code was copied from CP/M. However, I also found the following reference to CP/M in the DOS code:

XOR AX,AX   ; zero extent, etc for CPM

DOCHAR:

  MOV     AL,BYTE PTR [BX]

  CMP     AL,1AH             ;^Z?

  JZ     FILEOFJ             ;CPM EOF

  CMP    AL,0DH             ;CR?

  JNZ     NOTCR

  MOV     [COLPOS],0

JZ FILEOFJ  ;CPM EOF

The CP/M file system used fields called “extents” to keep track of files in directories. The sizes of CP/M files were stored in sectors of 128 bytes each. If a file filled up less than the 128 bytes of the last sector, the other bytes were filled with an ASCII Control-Z character as an end-of-file (EOF) marker.

DOS had a different way of keeping track of files, by recording file sizes in bytes, and so no EOF marker was needed. The code above seems to indicate that MS-DOS could read CP/M files and had special code to do so, but my initial research showed that CP/M files were incompatible with DOS. Was this a clue to copying?

The answer is no. Further research showed that very early versions of DOS were designed to read and write CP/M files. The code I found confirms that compatibility. Eventually that compatibility was dropped from DOS. This is not a sign of copying.

Source Code Correlation
The next part of the process is to run the CodeMatch® function of CodeSuite on the two sets of code. CodeMatch divides source code into elements: statements, comments, strings, identifiers, and instruction sequences. It then compares these elements to find matches or partial matches in the two sets of code, calculating correlation scores for pairs of files.

Finding a correlation between the source code files for two different programs does not necessarily mean that illicit behavior occurred. There are six reasons for correlation between two different programs, only one of which is copying. These reasons are:

  1. Third-Party Source Code . Both programs incorporate code from third parties.
  2. Code Generation Tools . Both programs were developed using the same code generation tool.
  3. Commonly Used Identifier Names . Both programs contain identifier names that are commonly taught in schools or commonly used by programmers, particularly in certain industries.
  4. Common Algorithms . Both programs use an algorithm that is taught in many programming classes or is found in a popular programming textbook.
  5. Common Author . Both programs were developed independently but written by the same programmer.
  6. Copied Code (Authorized or Plagiarized) . If none of the other reasons account for the correlation, then the code was copied from one program to another. If the copying wasn’t authorized by the original owner, then it’s plagiarism.

It’s important to remember that all of the first five kinds of correlation could have resulted from copied code, but it can’t be reasonably proven. If some correlation can only be explained by copying, then that is proof of copying, and then it makes sense to look at the surrounding code, and the previously eliminated correlations, to determine the extent of the copying.

The process for filtering out correlation due to reasons other than copying uses the SourceDetective® function of CodeSuite to search the Internet for other references to matching program elements. This filtering process is illustrated in Figure 1. If an element is found in two programs but nowhere else on the Internet, then it’s very likely due to copying.


Figure 1. Filtering process to find copying (Source: Bob Zeidman)

DOS was written in assembly code but CP/M had parts written in assembly code and other parts written in PL/M code. I needed to do two comparisons, DOS assembly code to CP/M assembly code and DOS assembly code to CP/M PL/M code.

Table 1 shows the search scores for the rarest matching statements in the two programs. As you can see, the first statements in the table are found in very few places on the Internet, which could indicate copying, but examining the code showed that these statements in the two different programs are part of very different routines performing very different functions, indicating that they are not signs of copying.


Table 1. Statements and hits on the Internet (Source: Bob Zeidman)

Table 2 and Table 3 show the rarest matching comments, strings, and identifiers. Other than the first comment, all the matches are fairly common and provide no signs of copying. Also, the code surrounding that first comment in the two programs comprise very different routines performing very different functions, indicating that it is not a sign of copying.


Table 2. Comments and strings and hits on the Internet (Source: Bob Zeidman)


Table 3. Identifiers and hits on the Internet (Source: Bob Zeidman)

My conclusion is that DOS source code was not copied from CP/M source code. The small number of correlations between DOS source code and CP/M source code can all be explained by reasons other than copying.

Command Comparison
The commands for DOS and CP/M are given in Table 4 along with those of VMS, the operating system from Digital Equipment Corporation for the VAX computer that was released in 1977[12] and Apple DOS that was released in 1978[13].


Table 4. MS-DOS, CP/M, and VMS commands (Source: Bob Zeidman)

The commands were not copied; they were simple, descriptive terms that were common to other operating system such as VMS and Apple DOS.

System Calls
The comments from the CP/M source code[1] and the DOS source code for implementing system calls are shown in Table 5. Programs running on DOS and CP/M used different code to perform system calls, and the code to implement the system calls was very different in the two programs. However, the numbers for system calls 0 through 5, 9 through 11, 13 through 23, 25, and 26 represented identical functions[2].


Table 5. CP/M and DOS system calls (Source: Bob Zeidman)

The DOS system calls were definitely copied from the CP/M system calls. Given the quantity of identical numbers representing identical functions, it is clear that Tim Paterson referenced the CP/M manual when writing DOS.

Footnotes
[1] I removed the code to make the similarity clearer, because the code in both programs is very different.

[2] Based on the code comments and research into DOS and CP/M. It’s possible that other system calls also use identical numbers, but the functions of the system calls are not clearly described.

Was There Copyright Infringement?
Here are my conclusions about copying. And because many people are interested in whether DRI could have brought a copyright lawsuit against Microsoft, I will tie in my conclusions with that possibility. Keep in mind that while I have extensive experience in copyright law, I’m not a lawyer and the law is constantly changing.

Source Code
There was no copied source code and so there was no copyright infringement of the code.

Commands
Commands were not copied, but even so, the commands are no copyrightable because they are simple and descriptive of the functionality. Only creative expression that is not simple description and not functional can be copyrighted.

System Calls
The system call numbers were copied. While a list of numbers is not by itself creative and thus not copyrightable, a list of numbers that arbitrarily represent specific functions is creative and thus copyrightable. Furthermore, DRI appears to have indicated its copyright by putting a copyright notice on the CP/M Interface Guide[14] that describes the system calls.

On the other hand, Microsoft could have prevailed by showing that it was a fair use to copy the system calls. According to copyright law, fair use is determined by the following factors[15]:

  1. the purpose and character of the use, including whether such use is for nonprofit educational purposes;
  2. the nature of the copyrighted work, especially whether it benefits the public;
  3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
  4. the effect of the use upon the potential market for or value of the copyrighted work.

It’s clear that the copying would not pass the first two factors. DOS was a commercial product sold at a profit and it would be hard to argue that the copying served a public benefit. Therefore to defeat a copyright infringement charge, Microsoft would have had to show that the amount of copyrighted material copied into DOS was minimal and that copying the CP/M system calls did not, by itself, cause DRI any financial harm.

I believe that DRI could have brought a legitimate copyright claim against Microsoft and that Microsoft would have had a good chance of avoiding liability by claiming a fair use defense.

The Zeidman Challenges
I’m confident in my conclusion, so I’ve decided to offer two cash rewards to back it up. The first Zeidman Challenge is an offer of $100,000 reward to anyone who can use accepted forensic techniques to prove that Microsoft copied MS-DOS source code from DRI’s CP/M source code. The second Zeidman Challenge is an offer of $100,000 reward to anyone who can demonstrate or find source code for a secret function in MS-DOS that prints Gary Kildall’s name or a copyright notice for DRI, as was claimed by science fiction author and computer pundit Jerry Pournelle to John C. Dvorak on the podcast This Week in Tech (TWiT) on October 15, 2006[16]. The award details and specific criteria will be announced shortly.

References

5 thoughts on “Was DOS copied from CP/M?

  1. “Bobnn”Commandsn Commands were not copied, but even so, the commands are no copyrightable because they are simple and descriptive of the functionality. Only creative expression that is not simple description and not functional can be copyrighted”nnI

    Log in to Reply
  2. “Antedeluvian: I haven't heard about Intel's copyright, but an assembly language is different because there are many choices for instruction mnemonics. An instruction to move data could be MVE, MOV, MVI, MOVEATOB, or BLAHBLAH. But if you call it MOVE, it w

    Log in to Reply
  3. “Have you tried ECHO “Copyright Gary Kildall”?nnJokes aside, I'm pretty sure that no code was ever stolen directly from CP/M. The two are similar, yeah. That's because the CP/M formula was okay, but Microsoft adapted it to be much, much better. It's

    Log in to Reply
  4. “EDIT: I listened to the podcast. I wonder if maybe Kildall wrote a program or modified a DOS program to print that out as a joke. I highly doubt it could be so well-hidden in decades-old code that any built-in command would just call it.”

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.