# How to size message queues

**The statistical nature of message arrival and processing times makes sizing message queues difficult. This down-to-earth primer on queueing theory will make such analysis seem downright easy. **

After reading my overview of queueing theory (“Queueing Theory for Dummies,” April 2001, p. 63), many very smart readers asked for a precise formula for calculating queue lengths and delays when the rates at which queued messages are sent and processed are not exactly known. In this article, we'll wade deeper into “classical” queueing theory to learn these formulas and how to use them.

To understand these concepts, we'll examine systems with soft real-time behavior that can be described statistically, and we'll use the M/M/1 queue model.

An M/M/1 queue is the simplest kind of statistical queueing model. The first M means *memoryless arrival time probability function.* In other words, the arrival times of subsequent messages into the queue don't depend on when previous messages arrive. The second M is *memoryless service time probability function,* which means the processing times of subsequent messages don't depend on the processing times of previous messages. The 1 indicates we're focusing on a single queue feeding its messages to a single processor of messages. We'll consider only M/M/1 queues in this article.

**Message arrival**

You may think that memoryless arrival times imply you can't form any assumptions about the next message's arrival time. For example, if the last 25 messages arrived at a queue about 10ms apart, then the next message could just as easily arrive 10,000 years from now as it could 10ms from now. But that would be unrealistic. So, for this article, we'll talk about a much more orderly kind of random behavior in which interarrival times are randomly distributed within a known average number of arrivals per unit of time. This is called a Poisson distribution.

The Poisson distribution is widely used to describe discrete, rare events for which the time between successive arrivals is randomly distributed. For instance, it can describe traffic accident rates, cancer rates in large populations, radioactive decay, or interrupt arrival rates. Things become quite orderly if you look at them through the Poisson distribution.

To use the Poisson distribution, you need to ask how many of these random events you expect will occur in a given time interval, based on the average number of such events per time interval. Sometimes fewer than the average number of events will occur, sometimes more. The Poisson distribution expresses this mathematically. If *E* is the average number of arrivals per time interval, then the probability *P* (*N* ) of *N* arrivals in a time interval is:

for *N* = {0, 1, 2, 3, . . .}.

Don't worry. We won't derive this formula, nor use it directly. We just need to understand what it means, which we can easily do by studying the graph in Figure 1, in the which *E* =1. In other words, on average we're having one event (or message arrival at our queue) per time interval. As you can see, the probability is 33% that zero events will occur in any given time interval. There's also a 33% probability that exactly one event will occur in any given time interval. The probability is smaller that a larger number of events will occur in a given time interval.

**Figure 1: Poisson distribution of message arrival rates for E =1**

Similar graphs can be obtained for other values of *E, * which typically falloff on either side of a peak near value *E.* The falloff is gradual for *N* > *E,* but steep for *N* <>*E* (N cannot go negative).

We'll assume that messages arrive into our queues according to such Poisson distributions, and we'll work with the inverse of *E,* which we'll call I*,* the mean interarrival time: *I* = 1/*E.*

**Message processing**

Once a message has arrived in a queue, it stays in the queue until all messages ahead of it have been taken off the queue and individually processed. It's not hard to model message processing times because they can also be randomly distributed if you can provide an estimate of the average processing time. So, that's what we'll do.

Because these are the same assumptions in a Poisson distribution, we can use the same math for message processing times. When we talk about message processing times, however, it's convenient to express the formulas a bit differently. If *A* is the average message processing time, then the probability *P* (*T* ) that any given message will consume actual processing time is:

for *T* = {0,1,2,3, . . .}.

This view of random timing is called an exponential distribution. Don't worry, we won't derive this formula either. You just need to understand Figure 2.

**Figure 2: Exponential distribution of message processing times**

For above-average message processing times, the curve extends out to infinity. Some of our messages may have excruciatingly long processing times, though the probability is low that this will happen to any given message. A large number of messages will cluster in the small range below average.

This patternÂ—lots of easy-to-process messages with a few occasional hard-to-process messagesÂ—describes the reality within many embedded systems. So, we'll use exponential distributions to describe message processing times. The average message processing time *A* will appear again and again as we continue.

**To queue or not to queue**

If message arrival times and message processing times are both random, then how could a queue of significant length ever develop? If the arrival times were correlated with the processing times, a queue might be very short in the worst case. But message arrival and processing times do not normally correlate in any way. In other words, a cluster of messages with very short interarrival times might well contain a lot of messages with very long processing times. And that's when a queue of significant length would develop.

In fact, if the mean interarrival time *I* is shorter than the average processing time *A,* the queue will grow without bounds, overflow, and become useless. Hence, in this random statistical world of ours, we must have *I* >*A* for a queue to be useful.

Another way of expressing this is to introduce a new parameter *R,* defined as *R* =*A* /*I * or equivalently *R* =*A* **E. R* is the ratio of average processing time to mean interarrival time. *R* must be less than one for a queue not to overflow uselessly. In other words, for a queue to be useful, messages must arrive on average more slowly than they can be processed on average. That makes good sense.

**Useful formulas**

It can be shown (although we won't go through mathematical proofs here) that the average length of a queue will be:

(1)

The average time that a message spends in the queue will be:

(2)

And if you divide the *T _{avg} * formula by the

*L*formula, you'll get:

_{avg}(3)

Ultimately, the length of a queue is the product of the average rate of arrival of messages multiplied by the average time they stay in the queue. This is called *Little's Theorem,* and it's analogous to the famous “rate * time = distance” formula of high school physics, but applied to queues.

Why doesn't Little's Theorem involve the average amount of time it takes to process a message? Wouldn't that be more intuitive? In fact, A is very much involved in Little's Theorem, because A is a major ingredient in determining the average amount of time a message spends in the queue, *T _{avg} .*

A final useful formula is the probability that there will be at least *K* messages in the queue at a given time:

(4)

Simply stated, the probability of a queue becoming longer goes down geometrically with its length (since *R* is smaller than one by definition).

The remainder of this article shows how the equations I've mentioned can be used in designing embedded systems and software.

**Delay**

Say we've got a task that produces messages and puts them into a queue with average interarrival time of *I* = 25ms. Another task fetches messages from the queue and processes them with mean processing time *A* = 20ms. Messages may be delayed as shown in Figure 3.

**Figure 3: A message may be delayed by sitting in a queue**

What will be the average delay of a message in the queue? We can use Formula 2 to calculate delay. The average time that a message spends in a queue will be:

A message will on average be delayed a tenth of a second while it is in this queue.

We can also try out Formula 1 on this example. The average length of the queue will be:

Little's Theorem confirms the average length:

**Overflow**

For our second example, suppose we've got a task that's producing messages and depositing them into a queue at random times, but with an average message arrival time of *I* = 30ms. Another task fetches and processes messages from this queue. The queue is limited to a maximum of five messages; it may overflow as shown in Figure 4.

**Figure 4: A fixed size queue may overflow**

What is the highest average message processing time that will limit queue overflow to less than 0.0001%? Let's use Formula 4:

Since *R* = *A* / *I* and *I* = 30ms, we get *R* ^{6} = *A* ^{6} / (0.030)^{6} <>^{-6} . The average processing time must be less than 3ms if we are to limit overflow losses in our capacity-5 message queue to less than 0.0001%.

What would be the average queue length under these conditions? *R* = 0.003 / 0.030 = 0.1, so *L _{avg} * = 0.1 / 0.9 = 0.11 messages. There's a lot less than one message in the queue on average! Most of the capacity of the queue is hardly ever used, though it's still required to limit queue overflow to less than 0.0001%.

**Interrupts**

Now say we've got a hardware device that delivers interrupt requests to a processor at random times, but with an average time between interrupts of *I* = 25ms. The interrupt service routine (ISR) takes average processing time A to handle an interrupt. Interrupts cannot be queued up before they're processed; hence, they're lost if the processor is overloaded.

What is the value of A that will limit the interrupt overloading losses to less than 1%? Here we don't really have a queue. Instead, interrupt-related information is stored in hardware until the ISR handles it or another interrupt arrives. This is like a hardware-based queue with capacity-1. The arrival of a second interrupt while an earlier one is still queued constitutes an overloading loss.

We'll once again use Formula 4:

And since *R* = *A* / *I* and *I* = 25ms, we get *R* ^{2} = *A* ^{2} / (0.025)^{2} < 0.01,="" resulting="" in="">*A* <>(0.01)* 0.025 seconds. Thus, the average processing time must be less than 2.5ms if we're to limit interrupt overloading losses to less than 1%.

**Interrupt queues**

Finally, say that interrupts from a hardware device are arriving sporadically, with average interarrival time *I* = 400s. For these interrupts, the mean data processing time is *A* = 300s.

What will be the average response time for the processing of these interrupts? The average time that a message spends waiting to be processed (using Formula 2) will be *T _{avg} * = 300 / (1–300/400) = 1.2ms. This is the average time until software begins to process its responses to these interrupts.

A result such as this raises a number of system and software design questions:

- Can our application live with a response time that's so much longer than the average rate of interrupts and so much longer than the interrupt's data processing time?
- Where are the interrupt-related data going to be stored while they're waiting to be processed?
- Will this storage be implemented in software or hardware?

Let's try to answer the second question by using Formula 1. If interrupts were to queue up, then the average length of the queue, *L _{avg} ,* would be three messages. (You can also get this result using Little's Theorem.) So, we're going to need some queueing of interrupts or interrupt-related data to handle this hardware device.

Some modern processors, such as network processors, have message queueing capability for on-chip hardware interfaces built right into their silicon. But if we choose to implement this queue in software, we're going to have to redesign the ISR to avoid the kind of interrupt overloading losses we discussed earlier.

Rather than doing all of the data processing (*A* = 300s) right inside the ISR, we're going to need to break apart the software into a brief ISR and a separate deferred procedure call or task that will handle the bulk of the interrupt-related data processing. The ISR can pass messages (one per interrupt) to a task through a message queue, as shown in Figure 5.

**Figure 5: Redesigned software for handling interrupts**

**Queueing up**

From the examples, you can see how a quick calculation grounded in queueing theory can bring about a major rethinking of a software design. If you'd like to learn more about M/M/1 queueing, or you think your queues are more complex than M/M/1 queues, there are many ways to delve deeper into queueing theory, a well-developed branch of mathematics. You'll find a list of references at the end of this article.

Despite its name, queueing “theory” has lots of practical uses.

**David Kalinsky** is director of customer education at OSE Systems. David has built high-tech training programs for a number of companies. Before that, he helped design many embedded medical and aerospace systems. David holds a PhD in nuclear physics from Yale and can be reached by e-mail at .

**For further reading**

Daigle, John N. *Queueing Theory for Telecommunications.* Addison-Wesley, New York, 1992.

Kleinrock, Leonard. *Queueing Systems, Vol. 1: Theory.* John Wiley & Sons, New York, 1975.

Laplante, Phillip A. *Real-Time Systems Design and Analysis, 2nd Edition.* IEEE Press, New York, 1997.

**Queueing Theory Books Online**

Links to books and course notes available for free online.

www2.uwindsor.ca/~hlynka/qonline.html