Just how random is the Arduino's random() function? - Embedded.com

# Just how random is the Arduino’s random() function?

As part of my on-going Caveman Diorama project, we've now reached the stage where we need to consider how to implement the fire around which a miniature version of yours truly and his cavemen companions will be relaxing, mulling over the events of the day, and commenting on the quality of the mammoth burgers we'll doubtless be munching.

I've actually been pondering this poser in the back of my mind for the past couple of months (see What's a cool algorithm to simulate hot flickering flames? ), but now it's time to leap into action with gusto and abandon.

It probably won't surprise you to learn that I'm currently planning on using an Arduino microcontroller and a bunch of Adafruit's NeoPixels as the basis for my fire effect. One reason for this is that I happen to have quantities of both of these little rascals close to hand here in my office. Having said this, I may change direction if it turns out we cannot achieve sufficient realism, but we'll see how we go.

We'll consider the actuall nitty-gritty details of the fire implementation in a future column. For the moment, one thing we know we're going to need is a good random (or pseudo-random) number generator upon which to base our fabulously flickering effects.

One way to create random numbers is to build a hardware random number generator (HRNG) — also known as a true random number generator (TRNG) — that is based on some physical process that generates low-level, statistically-random “noise” signals, such as thermal noise, the photoelectric effect, and other quantum phenomena.

The HRNG/TRNG level of super-randomness is great for applications like cryptography, but it's probably overkill for what I'm doing, not the least that I don't want to spend the time building one.

Of course, the Arduino does come equipped with a handy-dandy `random()` function, which is just what I need… assuming that this little scamp does indeed generate random numbers. Call me an old fuddy-duddy if you will, but this is the sort of thing that is easy to take for granted, only to be unpleasantly surprised somewhere down the line.

Suppose, for example, that I wish the various NeoPixel elements forming my fire to randomly switch between red, orange, and yellow (we'll also want to vary their brightness and duration, but let's not get ahead of ourselves). Let's further suppose that we want to weight things such that red is the predominant color, followed by orange, followed by yellow.

One way to do this would be to generate a series of random numbers between say 1 and 1,000, and to then say that any number from 1 to 500 equates to red, any number from 501 to 800 equates to orange, and any number from 801 to 1000 equates to yellow. The idea being that (over time and using these values) each LED will spend 50% of its time red, 30% of its time orange, and 20% of its time yellow.

But this only works if the `random()` function is truly — or, at least, reasonably — random. We'd all feel slightly silly if we ended up seeing only a tiny amount of yellow, and we subsequently discovered that the vast majority of our “random” (1 to 1,000) numbers actually resided predominantly in the 300 to 700 range.

In order to test this, I threw a simple program together that loops around creating 1000 X-Y pairs where each X and Y value is a randomly generated integer between 1 and 1,000. The core of this code is as follows:

`    for (int i = 0; i < 1000; i++)    {        x = random(0, 1001);        y = random(0, 1001);        Serial.print(x);        Serial.print(",");        Serial.println(y);    }`

You can Click Here to see the full code (such as it is). If you do, you might care to note that I'm trying to follow the guidelines espoused in the Embedded C Coding Standard by Michael Barr (or, at least, the guidelines I find to be most applicable to what I'm doing).

I then copied my 1,000 comma-separated X-Y pairs out of the Arduino IDE's Serial Monitor window and saved them in a NotePad *.txt file. You can Click Here to access these values if you want to play with them yourself (see also the discussions below).

So, now that I have my 1,000 comma-separated X-Y pairs, what do I want to do with them? Well, the obvious first-pass option is to generate a scatter plot and observe the result. What we would like to see would be an arbitrary distribution that is relatively evenly distributed across our 1000 x 1000 canvas. What we don't want to see is something that looks like the following, for example:

(Source: Max Maxfield / Embedded.com)

Actually, even the above would be (fractionally) better than seeing something like the following, which would tend to indicate that our random number generator is outputting different sub-sets of values into our 'x' and 'y' variables, which would certainly make our poor old noggins ache.

(Source: Max Maxfield / Embedded.com)

Of course, my next consideration was performing the actual plot itself. I had a quick Google (while no one was looking) and ended up using an online tool called plotly (https://plot.ly). Although the user interface ended up being a tad less intuitive than one might hope, I was certainly happy with the first-pass results, as illustrated below:

(Click Here to see a larger image. Source: Max Maxfield / Embedded.com)

Well, I must admit that this is looking rather encouraging. Having said this, my chum Ivan (the "Moustache of Knowledge") just wandered into my office and observed that it might be worth examining the temporal and spatial relationships between the X-Y pairs in more detail. For example, is the average distance between the members of medium-sized groups of X-Y pairs constant across the whole canvas? Also, if we were to consider medium-sized groups of X-Y pairs whose members are generated close to each other in time, then are the members of those groups also distributed spatially across the whole canvas, or are they clustered and clumped together?

Remember that you can Click Here to access my 1,000 X-Y pairs. What we need is someone who has a clue what they are doing access to sophisticated plotting and analysis software and knows how to use it. Are you that person?

## 12 thoughts on “Just how random is the Arduino’s random() function?”

1. cdhmanning says:

“Randomness is quite hard to measure or visualise.nnLike most statistical stuff it is only after considerable training that you learn how little you know.nnI knew a lot more about statistics before I went to university than I did after passing second y

2. Clive"Max"Maxfield says:

“I know what you mean about learning how little I know — with regard to Apple's “random shuffle” not meeting user expectations — off the top of my head I'd guess that users didn't like two tracks that were close together on an album to be played close

3. WRynczuk says:

“I recommend these resources. They're quite descriptive and conceise.nhttp://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.htmln”

4. Clive"Max"Maxfield says:

“Very interesting — thanks for sharing”

5. cdhmanning says:

“Yup that was exactly the issue. People complained when they heard two tracks from the same CD within half an hour: “but it is supposed to be random…”nnUltimately good enough is good enough. Different applications have different definitions of good e

6. Clive"Max"Maxfield says:

“As you say, pseudo-random is great when it comes to repeatability, which is necessary for many applications like simulation and place-and-route and stuff. If you want to “mix things up a bit,” you can always use a HRNG to generate a truly random seed fo

7. Clive"Max"Maxfield says:

“I hear that the Spotify Shuffle is very poor indeed — to the extent that some tracks will repeat multiple times while others haven't yet played at all.nnAs you may recall, one way to get around this is to use an external randomizing function, as discus

8. cdhmanning says:

“That's pretty much what I do (except not using a HRNG).nnI run the simulations from within a shell script loop which calls its RNG to set up the seed in the simulation.nnBy recording the inputs and the seed, I can then completely repeat the simulation

9. Steve_B says:

“I once did this exercise on the random function in Fortran IV on an IBM 1130 (which dates the exercise…), and discovered that there was a significant relationship between successive numbers coming out of its typical linear-congruential generator. If yo

10. Clive"Max"Maxfield says:

“This is great input — generally speaking, I think that the majority of today's random()-type functions are reasonably robust. The main reason I posted this column is to alter younger engineers to the fact that one shouldn't take anything for granted.”

11. cybernion says:

“Your x,y plot is an attempt to visually see if you can tell your sample apart from a true random one. While your two “bad” examples are possible (with random numbers, anything is possible), they are extremely unlikely to come from a true random source.