Advertisement

Creating an operating system (OS) in assembly language

February 05, 2016

Max The Magnificent-February 05, 2016

One of our younger community members -- we'll call him Ryan (because that's his name) -- is desperately eager to learn more about all aspects of electronics. Every now and then, an email will "ping" its way into my Inbox with a new question. The problem is that there's rarely an easy answer, because there are typically myriad underlying assumptions and potential areas for confusion.

Recently, Ryan has turned his attention to computers. Just a few minutes ago, for example, he sent me an email asking: "Can an operating system be created in assembly language? Also, what is the need for assembly in these days of languages like C/C++ and Java when memory is so cheap?"

Now, I must admit that I was tempted to send a short, quick, and easy (on me) answer, but I remember how confusing things were for me at the beginning -- plus I'm a bit obsessive-compulsive about this sort of thing -- so I responded as follows; I would be interested to hear your thoughts as to the way I presented all of this.

Hi Ryan -- as usual, there's more to your question than meets the eye LOL. Let's start with the fact that a processor (we'll focus on microprocessors/MPUs or microcontrollers/MCUs, but this applies to everything up to mainframe computers) ultimately executes what are called "machine code instructions." These are basically numbers representing instructions or data, where number 'x' might indicate an "Add" instruction, number 'y' might indicate a "Compare" instruction, and so forth.

It is possible for someone to write a program directly in machine code -- but this is very laborious, time-consuming, and prone to error. It wasn't so bad in the very early days of computing when programs were really short and the computers themselves supported limited instruction sets and addressing modes, but it soon grew harder as computers increased in complexity.

The next level up is to use assembly language, in which instructions are written using mnemonic names and you can declare labels and suchlike. In the really early days, you might write your program in assembly with pencil and paper and then hand-translate it into machine code. Later, you would use an assembler program to take your assembly source code and translate it into machine code.

This actually provides a great example of "bootstrapping"; i.e., pulling oneself up by one's bootstraps. Here's the way it would go. You would define a very, very simple assembly language as a pencil-and-paper exercise. Then you would write a very rudimentary assembler in your assembly language -- again using a pencil and paper -- and you would hand-assemble this rudimentary assembler into machine code.

This is where things get interesting. You would also write a very simple text editor (probably of a type known as ca "Line Editor") in your assembly language as a pencil-and-paper exercise, and you would hand-assemble this simple editor into machine code.

Now you are in the position to capture simple programs in your assembly language using your text editor (in machine code) running on your computer, and then assembly them into machine code using the assembler program (in machine code) running on your computer.

One of the first things you might do would be to use your simple editor to capture the source code for a slightly more sophisticated editor in your simple assembly language, and to then run this through your rudimentary assembler. Alternatively, you might capture the source code for a slightly more sophisticated assembler in your simple assembly language and run this through your rudimentary assembler. And so it would go, looping round and round using your existing editor and assembler to create bigger, better, easier-to-use versions with more functions and features.

Now let's consider a modern language like C, for example. As you know, we use an editor program to create our source code in C, then we run this code through a compiler program, and -- ultimately -- we are presented with a machine code ("executable") equivalent.

The first question we need to ask is: "Where did the compiler program come from?" In fact, if you were the person who created the compiler, then the first version (a very simple C compiler) would have been written in assembly language and run through your assembler program to generate the machine-code version of the compiler. This first version of your C compiler might only support a subset of the C language -- just enough to get you up-and-running.

Once you had your first simple C compiler, you would probably hand-translate your original assembly code version of the compiler into your supported C subset equivalent, and then used your C compiler to compile its own source code in C and generate a new version of itself in machine code, just to make sure that everything was tickety-boo.

After this, you would start to loop around creating new, more sophisticated versions of your C compiler, where each one was compiled into machine code using the previous version. Once again, you would be "pulling yourself up by your bootstraps." (For the purpose of our discussions thus far, we're assuming we go directly from assembly to C. In fact, there were other languages like FORTRAN in between, and it's more than likely that someone might have created a FORTRAN compiler in assembly, and then used FORTRAN to capture the first C compiler, but let's not wander off into the weeds.)

Another interesting point is that, when someone creates a C compiler, they typically don't use it to generate machine code directly. Instead, the C compiler generates its output in assembly language source code, and then an assembler is used to take this source code and assemble it into machine code. All of this usually takes place "behind the scenes" or "under the hood," so most people don't even think about it, but expert users (including the compiler writers) often look at critical parts of the code generated in assembly language to see if they need to refine their C source code to better guide the compiler.

Are we having fun yet (LOL)? OK, let's say we want to create a program that does something-or-other; perhaps it's a game of Breakout. We could create this program in assembly language or C. In the past, expert programmers would say that they knew lots of cunning tricks, so they could create smaller, faster programs in assembly language than could be generated using a C compiler. This was probably true, but it would have taken them a lot longer to do it; also, modern C compilers are very sophisticated and they typically know the same cunning tricks as the programmers.

Be this as it may, we create our game in assembly or C, and then we assemble or compile it into a machine code (executable) equivalent. If we power-up the processor and point it as the first instruction of our program, then we say our program is running on the "bare metal" and we talk about this as "bare metal programming." This is still the way things are done for a lot of small embedded systems, like the microcontroller programs running on home thermostats, for example.

But what happens if we want to quickly and easily switch between multiple programs. Or suppose we want to have multiple programs running at the same time (or, at least, appearing to run at the same time). Take your home computer, for example, you might have a web browser open and a Word document and an Excel document. In reality, the processor is time-slicing things -- it goes so fast that it can look at your mouse to see if you've moved it -- then switch to look at your keyboard to see if you've clicked a key -- then switch to one of the open programs to see if it needs to do anything -- then switch to the next program -- and so on and so forth. It does all of this so quickly that it appears to you, the user, that multiple things are happening simultaneously (let's keep things simple and assume a single processor/core and no multithreading).

The point is that, in this case, we need to have a higher level program called an operating system (OS). We can think of the OS like the conductor of an orchestra, telling the horn section (one program) when to play and then telling the string section (another program) when to take over.

And how is the OS created? Well, these little scamps come in all shapes and sizes. At one end of the spectrum we have incredibly large and complex monsters like Windows -- at the other end we might have something much, much simpler beavering away in a small embedded system. It's certainly possible to create a rudimentary OS in assembly language, but more-sophisticated systems would be created in C (or an equivalent high-level language).

Returning to your original question, in the case of very low-level systems based on 8-bit microcontrollers, there are still quite a few older programmers who program exclusively in assembly language. There are also things like the PicoBlaze family of soft 8-bit processor cores used in Xilinx FPGAs that can only be programmed in assembly language because no one ever bothered creating a C compiler for them.

When it comes to higher-level applications, some programmers may still hand-craft performance-critical functions in assembly language, but -- generally speaking -- this is becoming less and less common. Finally, for the moment, the underlying mechanisms behind languages like Python and Java are a whole different ball game involving things like "byte code" and "virtual machines," so I think we'll leave those for another day.

Good grief! Poor old Ryan. Even my head hurts now. This all seems so simple if you gloss over the underlying mechanisms and just say things like: "You capture your program in C and then use a compiler to translate it into an executable," but it becomes a lot more convoluted as you delve deeper into the morass.

On the other hand, I personally think this stuff is really interesting. I sometimes ponder what would happen if I were to stumble through a rift in time and space and I ended up having to develop a computer system (hardware and software) from the ground up. Hey, I'm sure we all worry about things like this... right?

So, what do you think of my explanation above? Is this all "so-so soup" to you, or are there elements of this discussion that you never actually thought about before? Also, is there anything I misrepresented or missed out or that you think I should have covered in more detail?

Loading comments...