The MCU guy's introduction to FPGAs: The Software - Embedded.com

The MCU guy’s introduction to FPGAs: The Software

This is a follow-on to my earlier column: The MCU guy's introduction to FPGAs — Hardware. As I noted in that column, a lot of my friends are highly experienced embedded design engineers, but many of them come from a microcontroller (MCU) background, in which case they often have only a vague idea as to what an FPGA is and what it does. When pressed, they might say something like “You can configure an FPGA to do different things,” but they really have no clue as to what's inside an FPGA or how one might be used in a design.

Similarly, MCU-based designers have typically heard about hardware description languages (HDLs) like Verilog and VHDL; they understand that hardware design engineers use these languages to capture the design intent; but… that's typically about the extent of their knowledge. Thus, in this column we are going to consider the FPGA equivalent to MCU software.

The FPGA design flow
Before we start, we first need to remind ourselves that FPGAs contain special configuration cells. Some of these configuration cells will be used to define the functionality of the programmable elements, while others will be used to make or break connections between different sections of interconnect, thereby connecting the various entities within the device. So, the question is, once we have decided what tasks we wish our FPGA to perform, how to we (a) capture our design's intent, (b) convert this representation into something the FPGA can use, and (c) load the design into the FPGA?

On the one hand, this is really quite simple, conceptually speaking. On the other hand, all of this stuff can be a little tricky to wrap one's brain around the first time one is exposed to it, so let's take things step-by-step. Speaking of which, let's start by taking a step back, as it were. Let's suppose we wish to create a program to run on a traditional microcontroller as depicted in the left-hand-side of the diagram below. In this case, we could capture our intent in some programming language like C or C++, and then we would compile this into a machine-code (executable) file to be run on the microcontroller. The final step would be to load the machine code file into the microcontroller's memory and then instruct the microcontroller to start executing the program. Note that our program might accept input from the outside world — perhaps reading a data file or monitoring whatever is entered on a keyboard. The program will also generate some form of output — perhaps writing to a log file or displaying graphics on a screen.


Comparison of MCU program creation and FPGA design creation flows.

Now let's consider the equivalent FPGA flow. For the purposes of these discussions, let's assume that we are working with a very simple FPGA that contains only fundamental programmable fabric in the form of programmable blocks containing lookup tables (LUTs) and registers, along with programmable interconnect. We start by capturing our design intent using a special hardware description language (HDL), such as Verilog or VHDL. Next, we run the synthesis tool, which takes our high-level HDL description and translates it into the resources that will be used in the FPGA (that is, the LUTs, registers, and so forth). You can think of the synthesis tool as being the hardware designer's version of a compiler — it transforms the high-level input into a low-level equivalent. The output is a configuration file that defines the values that need to be loaded into the FPGAs configuration cells. Finally, we load the configuration file into the FPGA, thereby “programming” the FPGA. When the FPGA is running, it may accept external control and data signals presented to its input pins and it may generate values that it presents to the outside world on its output pins.

Of course, the above offers an extremely simplified view of things. In the case of the microcontroller flow, for example, the compiler would either be one that was specifically intended for use with a single MCU, or the user would have to specify the target MCU so as to receive the appropriate machine code instructions. Furthermore, there are usually some intermediate steps that are “hidden under the hood.” Without the user knowing it, for example, a compiler typically generates an intermediate file in assembly language or pseudo-code, and it is this intermediate file that is subsequently assembled into the machine code output file.

Similarly, in the case of the FPGA flow, we would have to inform the synthesis tool as to the target FPGA (vendor, device family, specific member of that device family, etc.). Also, the synthesis tool outputs an intermediate file that allocates resources like “this function goes into a LUT,” but it doesn’t actually specify which LUT to use in the FPGA. A variety of techniques, including place-and-route algorithms, are subsequently employed in order to generate the final configuration file.

Index

HDLs versus programming languages
Let's start with the fact that we describe traditional programming languages as being sequential in nature, while we say that HDLs are inherently concurrent , but what does this actually mean? Let's start with a snippet of programming language code as follows:


Pseudo-code example of a programming language.

Note that this is not intended to represent any actual programming language — it's just my own pseudo-language to illustrate the point (this way I cannot be accused of making any mistakes). When a processor executes this program, it does so in a sequential manner. We can think of this as executing one line of source code at a time. We start by declaring two integer variables called intA and intB . At some stage in the program we load an integer value of 3 into intA and a value of 6 into intB .

The next part is the bit we are interested in — the part where we load intA with a copy of the contents of intB , and then we load intB with a copy of the contents of intA . STOP! Did you notice the portion of the previous sentence where I said “…and then we load…”? This is the crucial point. This is where we realize that we are absolutely comfortable with the fact that the conventional programming language is sequential in nature. The result, in this example, is that both intA and intB end up containing a value of 6.

Now let's consider an equivalent example in a pseudo-HDL as shown below. We commence by declaring two 4-bit registers. Next, we assign values to these registers. And then… well, what do you think happens next?


Pseudo-code example of an HDL.

The simple answer is that we can visualize both of these statements as being executed concurrently (at the same time), which means that regA ends up containing 6 (the original contents of regB ) while regB ends up containing 3 (the original contents of regA ).

This is the aspect of HDLs that people with a traditional programming background — like software developers — find it so difficult to wrap their brains around. By comparison, so long as they haven't been “corrupted” by any previous programming experiences, most hardware design engineers instinctively understand the way in which HDLs work, but what do I mean by “work” in this context? Well, read on…

Index

Simulation and synthesis
In the case of a traditional software program, we understand that the program itself is just words and symbols. It doesn’t actually do anything until it is compiled into machine code that is run on a processor. So what about our HDL design code? In fact, there are two main things we typically do with our HDL: simulation and synthesis as illustrated below:


Simulation and synthesis.

Let’s start with simulation. Our HDL represents the functionality we eventually wish to implement in an FPGA. If we had a real FPGA on a test bench, then — assuming it was already powered-up and loaded with its configuration file — we could apply stimulus signals to its inputs and observe any response signals at its outputs. In the simulation world, we use a testbench to specify the required stimulus and the expected responses, both of which we can store, analyze, and display using a variety of techniques (the above illustration reflects a waveform display on a computer screen).

As an aside, in the case of the early logic simulators, a separate WDL (waveform description language) was used to specify the stimulus and expected response signals. Today, languages like Verilog and VHDL can be used to specify both the design and the testbench, but we digress…

We will return to the logic simulator in a moment, but first let's consider the synthesis part of the picture. The synthesis tool understands what we are trying to “say” with our HDL, so — assuming we're still working with our original example — it will generate a configuration file that will implement something like the following circuit in the FPGA:


Things can happen concurrently in hardware.

Remember that regA and regB are actually 4-bit entities — I've just drawn them as shown above for simplicity. Also, I haven’t shown any circuitry that would allow us to load values like 3 (binary 0011) and 6 (binary 0110) into these registers. The main thing is to not allow ourselves to become side-tracked by these minor details, but to instead focus on the fact that — in the real (hardware) world — events can indeed happen simultaneously (concurrently). In this case, the internal delays associated with the registers would allow each register to load the value on its input before the value on its output started to change. Thus, following an active edge on the clock signal, each register will end up containing the value that was previously stored in the other register.

Now, as promised, let's return to the simulator. When you come to think about it, the simulator is a program that is itself written in a sequential language like C or C++. The simulator's task is to “fake concurrency” in its sequential world. It does this by keeping track of queues of pending events and future events, which is why this type of tool is formally known as an “event-driven simulator.”

In the case of our example circuit shown above, when the original triggering event (a rising edge on the clock signal) occurs, the simulator will look at whatever value is currently being presented to the inputs of regA (the current output from regB ) and will schedule an event to load this value into regA and change the value on its outputs at some future time (the actual time depends on the delays the designer has associated with the register). Similarly, the simulator will look at whatever value is currently being presented to the inputs of regB (the current output from regA ) and will schedule an event to load this value into regB and change the value on its outputs at some future time. The point is that these changes now form their own events that will be actioned at the appropriate time(s) in the future.

Index

Software functions/procedures versus hardware blocks/modules
As we've just discussed, programming languages like C and C++ are very different in nature to hardware description languages (HDLs) like Verilog and VHDL. Let's explore these differences in a little more detail…

Let's begin by noting that — if we wished — on the software side of things we could simply create a single, honking big program. By this, I mean a single file containing line-after-line of code that may include loops and if-then-else type statements, but that doesn’t make any use of procedure or function calls.


A honking big program (software) and a honking big circuit (hardware).

Similarly, on the hardware side of the fence, we could use our HDL to describe a single honking big circuit/design. In reality, of course, we tend not to do this sort of thing on either side of the fence, because it makes life difficult in so many ways…

Taking a stroll on the software side…
Just one more point before we really plunge into the mire — we are going to keep things as simple as possible. On this basis, we will consider only a single-threaded program running on a single processor core — we are not going to worry about multi-core systems, threaded applications, operating systems time-sharing multiple programs, or any other weird and wonderful scenarios.

The term “subroutine” is used to refer to a small portion of code that performs a specific task and that is relatively independent of the main body of code. The main routine (program) calls the subroutine as and when it is required. These days, the term “subroutine” is now predominantly associated with assembly languages and languages like BASIC. In the case of languages like C and C++, it is more common to use the terms “procedure” and “function” (we don’t need to worry about the differences between these constructs here). Older programmers often think “subroutine” but say “procedure” and “function,” while younger programmers typically view the world only in terms of procedures and functions. Having said this… these days, for a number of folks, even the terms “procedure” and “function” are going by the wayside, being replaced by “object,” but we digress…

Consider a program that makes use of four procedures that we will call pA , pB , pC , and pD . A very simplistic way of viewing things (ignoring loops and suchlike) is that the “arrow of time” — by which I mean the order of program execution — points from top to bottom as illustrated below:


A software program in C/C++ boasting four procedures.

Once again, we're keeping things simple here. In reality, the main program may call procedures and functions multiple times from different parts of the program, passing in different values each time. Also, procedures and functions can call other procedures and functions (sometimes they even call themselves recursively), but we really don’t want to get sidetracked by any of these scenarios, because we have other fish to fry…

One important point to note is that the use of procedures and functions offers a “divide and conquer” approach to programming. We really don’t need to go into all of the advantages conveyed by the use of procedures and functions here — suffice it to say that it's a lot easier to create, debug, analyze, and verify a bunch of small procedures and functions than it is a single honking big program. Also, if created with re-use in mind, the same procedures and functions can be subsequently deployed in multiple applications.

But the really big point of which we need to remind ourselves is that C/C++ programs are sequential in nature. This means that one instruction is executed after another. Similarly, it means that only one procedure or function can be executing at any particular time. In the case of our earlier example, let's assume that the main program calls procedure pA and passes in a 32-bit binary value. Let's further assume that procedure pA does something to this value — perhaps something as simple as shifting it one bit to the left — and then hands the result (and control) back to the main program.

The main program then calls procedure pB and passes in this new value. Procedure pB performs some new action on the value and then hands the result (and control) back to the main program. And so on and so forth for procedures pC and pD . Let's make one further assumption, which is that the main program is itself a big loop that keeps on cycling around calling procedures pA , pB , pC , and pD in turn. In this case (ignoring the actions / overhead associated with the main program), we can visualize the way in which this application executes as illustrated below:


Visualizing the way in which our program executes.

Perambulating on the wild (hardware) side…
Hardware description languages (HDLs) also have the concept of procedures and functions, but we tend to consider these in a somewhat different context. For the purposes of these discussions, we will think in terms of functional units that we will refer to as “blocks” or “modules.” (In the case of VHDL, each of these blocks would be formed from the combination of an “entity” and an “architecture,” but we don’t need to worry about that here.)

Let's assume that we've used our HDL to create four blocks bA , bB , bC , and bD . These blocks have the same functionality as their counterpart procedures pA , pB , pC , and pD in our software implementation. By this I mean that if procedure pA is presented with a 32-bit binary value and shifts that value one bit to the left in software, then block bA will do the same thing in hardware. A graphical representation of this could be as follows:


A hardware design in HDL boasting four blocks (plus a top-level block).

The top-level block in HDL is equivalent to the main() function (or method, or whatever you want to call it) in C/C++. In the case of the software, the main() function's role is to orchestrate everything and to call the procedures and functions in the correct order. By comparison, in the case of our hardware representation, the role of the top-level block is to “instantiate” the sub blocks and to connect them all together as required.

STOP! When we are talking about programming languages like C/C++ and HDLs like Verilog/VHDL, and when we then start throwing terms like “software” and “hardware” around, this is an area where a lot of people become confused, so let's take a deep breath and pause for a moment's reflection…

When we use the term “software” in the context of C/C++, it's important to remember that a C/C++ source code program is just text, and this text may either be printed out on a piece of paper or stored in a machine-readable file on a computer. It's only when the C/C++ source code is passed through a compiler and converted into equivalent machine code — and when that machine code is loaded into the computer's memory and the processor is instructed to execute it — that it actually does anything useful.

Similarly, when we use the term “hardware” in the context of an HDL, it's important to remember that a Verilog/VHDL source code representation is just text, which — once again — may either be printed out on a piece of paper or stored in a machine-readable file on a computer. It's only when the Verilog/VHDL source code is passed through a synthesis engine and converted into an equivalent bit configuration file (which we might think of as a gate-level netlist) — and when that bit configuration file is loaded into an FPGA — that it actually does anything useful.

And, just to remind ourselves, a simulator is a program that reads in a Verilog/VHDL source code description and uses this to create a virtual representation of the way in which the hardware would work in the real world. Although the simulator is itself written in a sequential language like C or C++, it's created in such a way as to “fake concurrency” in its sequential world.

The point is that when I see an image like the one below, I think of it in a variety of different ways. For example, I can visualize it as being HDL source code, with a top-level block instantiating sub-blocks (which could themselves instantiate sub-sub-blocks, and so on and so forth). Alternatively, I can visualize it as being a collection of simple integrated circuits containing logic gates and registers (like the old 74-series TTL devices) all mounted on a circuit board. And, of course, I can visualize it as a bunch of functional blocks — each comprising a collection of logic gates and registers — being implemented inside an FPGA.


A hardware design in HDL boasting four blocks (plus a top-level block).

Let's assume that each of our functional blocks contains registers on their outputs, and that all of these registers are driven by a common clock as illustrated above. In this case, we have a classic case of pipelining. The first active clock edge loads whatever value is being presented to the inputs of the device into block bA , after which the inputs to the device may change. The next active clock edge loads the outputs from block bA into block bB and — at the same time — loads the new value being presented to the device's inputs into block bA . The third clock edge loads the outputs from bB into bC , the outputs from bA into bB , and the new inputs to the device into bA — all at the same time. And so it goes. The resulting actions can be visualized as illustrated below:


This hardware example provides a classic case of pipelining.

What this means is that, once the pipeline has been fully “primed” (which occurs on active clock edge #4), every new clock causes the design to load whatever new value is being presented to its inputs and — at the same time – to presents a new value to the outside world on its outputs.

There are two really, REALLY big points to wrap one's brain around here. The first is that only one procedure or function can be active at any particular time in C/C++ (in the ensuing executable machine code, that is); by comparison, all of the functional blocks are going to be active all of the time in Verilog/VHDL (in the ensuing simulation or FPGA implementation, that is).

The second important point is that, although a C/C++ procedure or function can be called multiple times from different places in the main program, that procedure or function still only exists in one place at one time (no, I don’t want to ponder the metaphysical aspects of recursion here). For example, consider a software application that calls procedure “pA” from two separate points in the main program as illustrated below:


Calling a procedure from multiple locations in the main software (C/C++) program.

By comparison, if a higher-level block in Verilog/VHDL instantiates multiple instances of a sub-block, then each of those instantiations is an independent “thing” in its own right. For example, consider a top-level block that invokes two instantiations of a sub-block called bA . Four simple scenarios off the top of my head are as illustrated below:


Instantiating a block multiple times in the same hardware (Verilog/VHDL) design.

And just to make sure that there's no confusion (I don’t want anyone to think that you can only have multiple instantiations of one type of block), lets ponder one final example showing three instantiations of block bA and two instantiations of block bB as illustrated below:


Another example of instantiating blocks multiple times.

Phew! So what do you think? If you are new to Verilog and VHDL and FPGAs, does this clear up any confusion you may have been having? Alternatively, if you are already familiar with HDLs, do you think my comparisons to regular programming languages like C/C++ cover all of the bases, or have I missed anything?

Index

13 thoughts on “The MCU guy’s introduction to FPGAs: The Software

  1. “Maxnn”a lot of my friends are highly experienced embedded design engineers, but many of them come from a microcontroller (MCU) background”nnActually my training pre-dated micros and so I think in terms of logic gates, flip-flops and shift registers,

    Log in to Reply
  2. “There's all sorts of things around. Cypress Semiconductor have a graphical interface for their PSoC devices. You've played with the GPAK stuff from Silego so you know about their graphical interface (admittedly their chips are small in capacity). Altium s

    Log in to Reply
  3. “Maxnn”There's all sorts of things around. Cypress Semiconductor have a graphical interface for their PSoC devices. You've played with the GPAK stuff from Silego so you know about their graphical interface (admittedly their chips are small in capacity).

    Log in to Reply
  4. “For their latest and greatest devices i think they have — but I think there may still be schematic tools available for legacy devices.”

    Log in to Reply
  5. “Again – well done an beautifully illustrated! Many universities have introductory hardware design classes as part of their computer science curriculum. You could supplement your income by teaching one of those classes. Given the completeness of your in

    Log in to Reply
  6. “Thank you once again for your kind words. With regard to supplementing my income (a) I wish (b) I'm a full time employee of UBM and I'm not allowed to and (c) I don't have a minute spare in my day (sad face)nnI could certainly do one on digital verifica

    Log in to Reply
  7. “OK – I'll stop you. ATPG and BIST are BORING – leave them out. Timing accurate simulation is almost never used, so you can leave that out too. Fault grading and formal verification are interesting, but belong in a senior level class.nEmulation is inte

    Log in to Reply
  8. “ATPG and BIST are boring? Wash your mouth out with soap and water!!!nnFault grading and formal verification belong in a senior class? Well, maybe if you want to go into depth, but I was thinking more about an introductory article that explains what all

    Log in to Reply
  9. “Ok – I concede. Those subjects may not be boring to those people who don't have to use the tools :-).nAs long as you want to introduce concepts in the digital design flow, you might as well discuss static timing analysis and power estimation. Those are

    Log in to Reply
  10. “## Those are also subjects that may be interesting to those who never have to do it.nnBy the time I've finished describing them, people will be begging for the chance to do them LOL”

    Log in to Reply
  11. “Actually what I'd like to see is a tool that allows you to draw a block diagram and then further define what's in each block with either another block diagram or VHDL/Vreilog. This approach would allow you to work at whatever level of abstraction is appr

    Log in to Reply
  12. “As a product designer, it's hard to compare two approaches without also considering cost and power consumption tradeoffs. nnI'd love to see a comparison of some hypothetical IOT application (SPI bus, two timers, simple math, 8K of RAM storage for variab

    Log in to Reply
  13. “Hi John — as you say, FPGAs may not be ideal for some applications. If all you're doing is creating a simple IoT leafe node device that wakes up and measures/logs the temperature once every 10 minutes, then a cheap and cheerful MCU may well offer the be

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.