Queueing Theory for Dummies


How do you size a buffer or message queue? Queueing theory provides the answer.

From time to time when I teach classes on software development for embedded systems, I'm asked how to figure out the maximum number of messages that will queue up in a message queue. When I answer, “Oh, that's just queueing theory,” I invariably hear an uncomfortable nervous sort of laughter. Everyone is thinking “Sure! I'm gonna stop my project for a semester so I can take a course in queueing theory just to figure out how long to make my program's message queue. Where's this guy been living?”

Similar questions come up about the capacity of linked lists and ring buffers, or the number of buffers in a buffer pool, or even the assignment of dual-port RAM to buffer descriptors inside of PowerQUICC communications processors. These questions impact directly the amount of memory that must be allocated to the resource at issue, and the capacity of an embedded system to handle the ebb and flow of data.

Sometimes engineers use empirical methods to estimate the required capacity. In one method, they divvy up all available RAM in their development target board, assigning it to their stacks, their queues, their linked lists, and buffer pools. Hopefully the hardware team has supplied more than enough RAM to accomodate all of these structures. Before running the application software, test software “paints” these structures with a standard but improbable content, for example, patterns of 0xAAAAAAAA, 0x55555555, or 0xDEADDEAD. Then the application is run for a while and later the structures are examined to see how much of the standard “paint” is left. The area where “paint” remains presumably hasn't been used by the application, and most of it can probably be eliminated.

In another empirical technique, many RTOS-sensitive debuggers/profilers and “object browsers” can measure the maximum number of messages in a queue at any time during an experimental run of application software. Some will even show you graphs of queue filling as a function of time. Similar graphs can be obtained for stacks and buffer pools. The high point of the graph will tell you how much of the capacity your software actually used. But is this number a good measure of the required capacity for a given message queue, stack, or buffer pool? Perhaps not, since seconds or minutes after the end of the experimental run of the software, its memory needs might increase dramatically had it continued to run. And this information would not be captured by any empirical method.

So empirical methods cannot always be relied upon even if fudge factors, like adding 30% extra capacity, are included.

A third approach, based on mathematics rather than empirical tests, may give us a better handle to the answers to our capacity questions. This article will present a simplified peek into queueing theory that may be just enough to answer some of our software design questions and let us know if more queueing theory is the way to get more questions answered.

Software message queues will be used as our example here, although similar considerations could apply to linked lists, buffer pools, ring buffers, and stacks.

Model of a message queue

A message queue is a set of memory locations that provide temporary storage for data that are being passed from process to process (or to/from interrupt service software). Each set of data is called a message. When the queue already has messages and a new message is stored, the old messages remain and the new message is added to the queue. When a message is taken from the queue, no memory of that message remains in the queue. Computer science gurus call this non-destructive write/destructive read.

Let's use a simplified model of a queue that uses the analogy of an olive oil bottling plant. A funnel is used to pour olive oil into bottles. The funnel has a wide opening at its top, so that it can hold lots of olive oil (or software messages by analogy). The funnel has a narrow opening at its bottom, so that the rate of flow of olive oil out of the funnel is limited (by analogy to a software process that can receive and process messages slowly). See Figure 1.

The rate at which olive oil is poured into the funnel (or the rate at which messages are sent into the message queue) is named P(t), the production rate. The rate at which olive oil comes out of the bottom of the funnel (or the rate at which messages are consumed from the message queue) is named C(t), the consumption rate. Both P(t) and C(t) are functions of time; in other words, the flows in and out can vary over time.

When does a message queue help?

If the production rate is always less than the consumption rate, P(t) * C(t), then everything put into the funnel or queue always flows out immediately. No backup of olive oil (or queue of messages) ever develops. Thus, an olive oil funnel or message queue is unnecessary in this case.

On the other hand, if the production rate is always larger than the consumption rate, P(t) > C(t), then more oil is always being put in than can be taken out. Eventually, olive oil will fill up the entire funnel and begin to spill out of the top. By analogy, messages will fill up the entire message queue and will have no additional place to be stored in the message queue. In this situation, a message queue will be of no use since it's overflowing and messages are being lost. See Figure 2.

So it sounds as if olive oil funnels and message queues are not awfully useful. And in fact both are useful only in one limited situation: if the production rate is greater than the consumption rate for a short time T, P(t) > C(t) for times t within T, then an olive oil funnel or a message queue makes sense. This short time T may be called a burst. The idea is that the funnel or queue can absorb only a temporary excess of production over consumption during an occasional burst. And so a funnel or a queue is only useful if the production rate is greater than consumption rate for a limited burst time only. These bursts should be spread out in time.

Many software designers forget this limitation of message queues and other data buffering schemes, and sometimes overuse them. Ask yourself if your queues, ring buffers, or buffer pools will be used to store information that is arriving in bursts. Otherwise they will fail (by explosion) if asked to store information that will relentlessly be arriving at a faster pace than it can be handled.

Constant pace burst calculation

We can begin with the simplification of making the consumption rate C and the production rate P constants. Of course, the constant production rate must be limited to burst time T. For now, let's keep it at zero at other times.

At the beginning of a burst, the message queue should be empty. During the burst, messages are produced and put into the queue; at the same time, some messages are taken out of the queue and consumed. Since everything is linear in this simple situation, the number of messages in the queue gradually grows as the burst proceeds. By the end of the burst, P*T messages have been produced and put in the queue, and C*T messages have been taken and consumed from the queue. So the number of messages remaining in the queue at the end of the burst is: L = (P–C)*T.

Since the queue gets longer as time advances through the burst, the length of the queue at the end of the burst is the maximum length of the queue during the burst. So L = (P–C)*T is the maximum length of the queue. In other words, it is the required size of the queue.

But there's a catch here. We're not done with our constant-pace situation yet! If another burst begins shortly after the end of the first burst, the funnel will overflow; the message queue will not have capacity for the second burst. In fact, the previous formula is only valid if a new burst begins after the funnel/queue has been totally emptied of content from the previous burst. This will take some time. We'll call that the emptying time, E. We can calculate it as follows. At the end of the burst, the production rate P goes to zero, but the consumption rate C continues at its constant value. At that time, the queue contains L = (P–C)*T messages. So the emptying time is calculated by: E = L/C, or E = (P/C–1)*T.

The emptying time E could actually be much longer than the burst time, if the P/C ratio is big. It is a “quiet time” that allows the message queue to recover and get ready for a new burst.

Example: constant pace burst calculation in a psychology experiment

The application in this example is a device that will detect changes in a person's brain waves in response to colors and images that are flashed before the subject's eyes. Brain waves, also called electoencephalographic (EEG) signals, are weak electrical signals that can be detected by an array of electrodes placed in a “halo” around a person's head.

Whenever a new color or image is flashed before the subject's eyes, hardware begins to sample the EEG signals at a 2ms/sample rate. Sampling then continues for 1.5 seconds. In software, each sample consists of an array of bytes, one byte per electrode. The entire array for a sample is put into a single message that is put into a message queue. See Figure 3. Further processing of the EEG signals takes five milliseconds per sample.

When the EEG signal messages are stored in a message queue, with one message per data sample, the required length of the message queue is calculated as follows:

P = 500 samples per second, during a burst
C = 200 samples per second, during and after a burst
T = 1.5 seconds, length of a burst

resulting in:

L = (500–200)*1.5 = 450

The required length of the message queue is then precisely 450 messages.

Of course, the previous calculation of the length of the queue is valid only if the queue is completely emptied before another burst begins. In the brain wave device example, this means that:

E = L/C = 450 messages/200 messages per second = 2.25 seconds

In other words, a new burst must not begin until at least 2.25 seconds after the end of the previous burst.

This sort of limitation can impose significant constraints on a system. In our brain wave device, it means that a new color or image must not be flashed before the subject's eyes more frequently than once every 3.75 seconds (burst time plus recovery time). Let's hope that the subjects of the brain wave experiments have the patience for this!

Variable pace length formula
Queueing formulas get much more complicated when message production and consumption rates are no longer constant. In preparation for working with a much more complex equation, some new symbols need to be defined.

The burst period T begins at time t1 and ends at time t2. A time inside of this burst interval will be called Tx.

The rate of production of messages for the queue is now a function of time, called P(t). The rate of consumption of messages is now also a function of time, called C(t).

Since this more general case is non-linear, we can no longer rely on the assumption that the queue will be longest at the end of a burst. So we need to proceed in two steps. In the first, we need to figure out the length of the queue at any time during a burst. In the second, we need to see at what point during the burst the queue was longest. This would then be the length of the queue we need to prepare in our software. In the general case, the point at which the queue is longest could be in the middle of the burst or two-thirds of the way through a burst or anywhere else.

For the first step, here is an equation which tells us the actual length of the queue of messages L(Tx) at any time Tx during burst T:

By looking at L(Tx), we can see how the length of the message queue varies over time during the burst of messages. (Caution: This formula assumes that the length of the queue L(Tx) does not go negative anytime during the burst. If yours does, then you may need to sign up for a full queueing theory course right now!)

Now for the second step: the issue of greatest interest is the worst-case length of the message queue. We can't assume that this worst-case length occurs at the end of the burst. It might occur at any time during the burst, if P(t) and C(t) fluctuate wildly enough. In fact, all that can be said is that the worst-case length of the message queue is:

Lqueue = MAXIMUM [L(Tx)]

for any Tx in T. This is the overall queue length that is needed for the message queue.

Example: variable pace queue

The formulas I presented are pretty frightening for those of us who've forgotten our days in calculus class (or who'd like to forget them). So here's a way to avoid the calculus: work with graphs of the functions, and calculate the areas under the curves instead of integrating the equation.

Figure 4 shows graphs of message production rate and consumption rate as functions of time. To figure out L(Tx), the length of the queue at time Tx , you just need to get the area of the P(t) curve up to time Tx and the area of the C(t) curve up to time Tx. The difference between these two areas is L(Tx), the length of the queue at time Tx. (Kids can do this by putting a ruler vertically on the two graphs at time Tx, and then figuring out the areas to the left of the ruler and under each of the two curves.)

By doing this for a few different values of the time Tx you can create a plot of L(Tx) as a function of time Tx. That's the bottom curve in Figure 4. This curve shows you how the number of messages in the queue varies as a function of time. You can continue the curve after the end of the message production burst, and thus see the queue empty out. (In fact, in the variable-pace general case, the limitation to well-defined “bursts” can be relaxed quite a bit.)

Then all that remains is to find the high point of the curve of L(Tx). That high point represents the longest length of the message queue at any time in the burst. That's the overall queue length that's needed for your message queue.

Where can we go from here?

If any of the stuff you've seen here might be useful for your next embedded project, then you really ought to consider delving further into queueing theory. For example, message production rates P(t) can often be modeled mathematically with functions like Poisson distributions. And message consumption rates C(t) can frequently be modeled with functions like exponential distributions. Then the monster integral I presented can be solved pretty neatly and Lqueue pops right out.

In real-time systems, the amount of time messages spend in queues and in the entire system are of interest. How long will it take your messages to get through the queue when variables P(t) and C(t) are creating varying levels of congestion in the queue? What then would be the shortest and longest transit delays of your messages?

If issues such as these come up in your embedded work, then it may well be worth biting the bullet and signing up for a semester of queueing theory at your local university extension. Hopefully, this primer will suffice for the rest of us.

David Kalinsky is director of customer education at Enea OSE Systems. He is a popular lecturer and seminar leader on technologies for embedded software in North America, Europe, and Israel. David has built high-tech training programs for a number of companies on aspects of software engineering for the development of real-time and embedded systems. Before that, he helped design many embedded medical and aerospace systems. David holds a PhD in nuclear physics from Yale and can be reached by e-mail at .

Figures

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.