From time, wisdom - Embedded.com

From time, wisdom

Jack discovers that when the only tool you have is a hammer, every problem looks like a nail.

Ever notice that whatever seems to be in your mind also seems to be going on all around you? Whenever I've moved to a new town, suddenly it's in the news. (When I moved to Florida, the Tampa Bay Buccaneers went to the Super Bowl; when I moved to Arizona, the Arizona Diamondbacks went to the World Series).

Some years ago, I drove a 1980 Datsun 510; my wife drove a 1974 AMC Hornet. Suddenly it seemed that about every fifth car we encountered was either a Datsun or a Hornet.

At the time, I thought there were just lots of those cars on the road. After all, they were good cars, and after 10 years or so, the cream tends to rise to the top. I reasoned that those who drove the cars for all those years had realized their value and determined to keep them forever.

More recently, I've come to understand that it's simply a matter of selective filtering of the data presented to our senses. It's like the “cocktail party effect.” Despite a din of interference from other conversations, your brain filters out the noise and focuses on the conversation of interest. Likewise, after years of exposure to the distinctive lines of my wife's Hornet, my brain tended to filter out other cars and key in on the ones that fit the pattern, like a baby duck spotting his mother from 100 yards away.

For me, it's the same with attitudes or mental perspectives. I find that, once I'm thinking along certain lines —once I've settled into a certain mindset—I see examples of things appropriate to that mindset everywhere. A couple years ago, I was thinking about function minimization and suddenly every problem I encountered was begging for a solution that involved minimization. At another time, it was language parsing. At yet another, digital filters.

Lately, it seems to be philosophy. I'm coming around to the point of view that there are things in this world more important than the latest Extreme Programming technique, the latest modeling language, or the latest GUI-based IDE. Far more important are the fundamental truths that transcend the problem of the day. The best tools in the world won't save you from stupidity, and the best methods in the world can be misapplied.

In my previous column, I voiced Crenshaw's first and second laws:

  • Law 1: There's an infinity of ways to make a simple problem seem difficult; only a handful to make a difficult problem seem simple
  • Law 2: KISS (Keep it Simple, Sydney)

(Given one more, could my laws become as famous as Asimov's three laws of robotics? Nah.)

I'm coming to understand the obvious: That such principles are universally applicable, not just to software design, but to life. As I look around me from this new perspective, I see myriad examples where breaking the laws led to disaster and adhering to them, happiness.

The Microsoft ticker
More about these ideas later, but first, I'd like to update you on the claim I discussed in February (“Time to Re-Evaluate Windows CE?” Feb. 2005, p.9) that there's danger lurking in the Microsoft Win32 function, GetTickCount( ). It's a function that's used in all versions of Windows, including Windows CE.

To refresh your memories, Richard M. Smith first brought the issue to my attention. The best reference on Smith's site is currently at www.computerbytesman.com/security/comair.htm. Although this reference is really about the more recent software failure at Comair, it also refers to the earlier LA-area software failure.

The problem revolves around a 32-bit counter, which is updated by the operating system every millisecond. The function GetTickCount( ) returns a DWORD with the current value. A little arithmetic will show that the counter wraps around through zero once every 49.7 days, give or take.

As I mentioned last time, when I first heard of this problem, I was prepared to write it off as the result of a misunderstanding of how counters work. I mean, how hard can it be to bump a counter? What can go wrong?

Apparently a lot. The more I dig into this story, the more it seems something funny is going on inside Win32. It's a little hard to separate errors in the system call from errors in its use, but there does seem to be a hint of fire beyond the smoke. And beyond the fire lies our old friend, a simple problem made difficult.

Your first hint that things aren't right should be the fact that the counter returned by GetTickCount( ) doesn't count up, but down. At system startup, it's initialized to 0xffffffff. Since initializing to a non-zero value is (marginally) harder than initializing to zero, and counting a clock down seems less intuitive than counting up, the count down suggests that someone expected something odd to happen when the count rolls over. At the very least, it's not an example of the KISS principle.

Add to that the knowledge that when you build a debug image of Windows CE, the counter is set to roll over in two minutes instead of 49.7 days, presumably so you can verify that your software works properly when the dreaded rollover occurs.

To add fuel to the fire, consider this statement from Microsoft's support group, article #823273, November 6, 2004:

Symptoms
The Rpcss.exe process consumes 60 percent or more of CPU time, and system performance and network performance are affected. This symptom typically occurs 49.7 days after the server is started.

Cause
This problem occurs because a call to the GetTickCount timer function causes the function to overflow 49.7 days after the server is started.

I presume that Microsoft used its own function GetTickCount( ) from within the code for the remote procedure call (RPC). It could still be argued that the fault lies in the RPC code, not the function itself. But things are starting to look suspicious. That ominous phrase, “causes the function to overflow” leaves one wondering just what happens when it does overflow. Clearly, in the case of RPC at least, the “overflow” condition seems to be permanent.Or consider this one, from article 216641, December 20, 2004. Note that this article doesn't say anything about the use of the function, GetTickCount(). The problem seems to be present in Windows itself.

Symptoms
After 49.7 days of continuous operation, your Windows-based computer may stop responding (hang).

Cause
This problem can occur because of a timing algorithm in the Vtdapi.vxd file

If there were any doubt left, this tidbit, from reader Tom Cannon, should clinch the deal:

I recently ran into [the GetTickCount( ) problem] in a Windows 2000 Excel/VBA application. The event triggered runtime error 6—overflow (if my memory serves me correctly). If activated, the debugger points to the named function.

The problem is, we keep seeing that reference to a condition called “overflow,” but aren't quite sure what it means, in this context. If we were dealing with a microprocessor programmed in assembly language, we'd know that the processor has hardware flags for overflow, carry, and, for that matter, zero. All would be set temporarily as the counter rolled through zero, but they'd only last for one or two instructions. Furthermore, if we didn't care what their values were, we wouldn't have to check them.

The term “overflow,” as used in the context of Windows, seems to be implying that some “sticky bit,” a software error flag, gets set as the counter reaches zero, and once set, it persists forever. Certainly the behavior of the RPC code suggests this model.

Question: If the counter counts down to zero, does it get stuck there? Does the code refuse to allow it to roll back to 0xffffffff? If so, that's a serious bug, and it reflects a profound misunderstanding of how timers work.

Counting backwards?
If this confusion over the behavior of GetTickCount( ) isn't enough to keep you awake nights, consider this one, from reader Bill Roman:

What I've observed on some platforms, though, is that GetTickCount() will sometimes return values that do not increase monotonically; the clock occasionally runs backwards by a few ticks.

Backwards? Sigh.

You can find a lot more information on this topic on the Web. Google on GetTickCount and 49.7, and see what you find.

Bill brings up a very important point, echoed by what appears to be the key to the whole thing. In Windows CE, certain low-level functions must be written, or at least customized, for the underlying hardware. As with many operating systems, Windows CE requires a board support package (BSP), which must be written by the user for a given hardware configuration. This software presumably includes the interrupt handler that decrements the counter.

Bill adds, “Among other things, a BSP is usually responsible for providing a clock tick of some sort. You'd think it would be hard to get this wrong: Just count the ticks, thanks. But as I recall (this problem was a while ago, and I was only peripherally involved in debugging it) WinCE has a mechanism the BSP can use when the processor has been in a low-power mode and the BSP has not been delivering clock interrupts (like because the CPU was shut down). The BSP is supposed to inform WinCE of how many ticks passed in that state.”

I just have one question: If the BSP knows how many ticks have passed, why do we need the counter in Windows? Why do we need an interrupt at all? Why not just read the BSP timer each time GetTickCount( ) is called?

Here's another problem. Recall that the counter is a 32-bit integer. If we're using a 32- or 64-bit CPU, all's well and good. But what if it's an 8- or 16-bit CPU? In that case, we have a problem. The software is going to have to maintain the counter in two words or four bytes, and manage them separately as needed. But however the task is done, the process of incrementing (decrementing) the counter absolutely must look like an atomic action to the rest of the system. When GetTickCount( ) fetches that counter, it must know that all bytes are self-consistent. No fair giving a result that's partly the old count and partly the new. Whatever the software is doing to alter those counter bytes, it had darned well better disable interrupts as it does so. If it doesn't, it's thoroughly broken.

The more I look at this problem, the more I realize that there are at least four ways to foul up the simple operation of decrementing a counter. The problem could be in the improper and wholly unnecessary use of a sticky overflow flag in the operating system proper. It could be a problem in higher-level functions in the Microsoft-supplied software, like RPC and Vtdapi. Or it is the un-atomic count, always fatal.

Or it could be a problem created by us users. We'll talk about that one some more, in the June issue.

Finally, I should note that most of the code I've seen on the Internet showing examples of how to use GetTickCount( ) is using the function to time a given region of code and measure its execution time. I submit that this usage is, in itself, a violation of the KISS principle.

You don't need system calls to the operating system to tell you how long a given code sequence takes. Use a stopwatch!

What's that you say? The code's too fast for that? Instead of writing code to call GetTickCount( ) twice, computing the interval, and converting it to seconds, how about adding just these lines:

   for(i = 0; i < n; ++i)   {      	<code under test here>   }

Isn't that easier?

Philosophy 101
New topic. As most of you know, I'm big on teaching people new things (and learning them myself). When I'm not writing this column or working my day job, you can usually find me with my nose buried in a book, trying to soak up more knowledge.

In the past, I've been told that I have a knack for teaching things in a way that makes them seem simple. I guess it must be true. But there have also been times when I felt that I simply wasn't getting through. That was the thrust of the columns with titles like “Getting It.” More and more, what I'm learning is that there's a huge difference between understanding something and really understanding it in the deep, fundamental sense.

Back in college, as I studied to be an engineer, I remember that we tried to memorize every formula in the textbook. It didn't help us much on quizzes, because it's not enough to have the formula; you have to know what to do with it.

I remember hearing some of my fellow students saying, “Once I get out of school, I don't need to remember the formula. I just have to know which book to find it in.” Ever hear that? How do you feel, knowing that our compatriots are now out there, building bridges, skyscrapers, and ICBMs?

After I switched to physics, I could forget trying to memorize formulas. There were simply too many. Fortunately, I learned an important lesson, which is that if you really understand the subject, it's easier to derive a formula from first principles than to remember it. Even on quizzes, where time was of the essence, I could still derive formulas and apply them while my fellow students were trying to remember which memorized formula worked for which problem.

Lately, however, I've noticed an interesting phenomenon. I see some folks—folks with lots of experience and impeccable credentials—referring to textbooks entirely too often to suit my tastes. They don't seem to want to trust their own skills.

Why is this? Partly, it could simply be a matter of available time, though again, it's always been my opinion that deriving a formula from first principles is not only better than memorizing it, but faster , especially when you factor in the time to implement and test. By understanding the problem at the deepest level, I can tell when the software implementation is right, and giving reasonable results. My formula-memorizing colleagues have more trouble, because the formula alone gives them no hint as to what's a reasonable result.

We can speculate as to the cause of this phenomenon, but I do have a theory. It's called fear .

No, I'm not talking about fear as in fear of sharks, or fear of falling. Nothing so powerful and totally unsubtle. The fear I'm talking about is simply fear of screwing up. If I derive an equation, I could have made a mistake. If I copy it out of a book, and there's a mistake in it, it's not my fault.

What I've learned, over the years, is that the fears that cause me the most troubles are the ones I'm not even aware of. I may think I'm chugging along just fine, but in reality, I'm carefully avoiding certain activities or thoughts, because they make me uncomfortable.

One of my many heroes is the Nobel laureate in physics, Richard Feynman. The thing I admire most about him is that, from all indications, he never seemed to be afraid of anything in his life. From safecracking to bongo drums to samba band to quantum electrodynamics, if he was curious about something—anything—he simply set out to learn it. I'm reasonably sure that the thought, “Maybe I won't be able to do this” never even crossed his mind. Wouldn't that be great?

What does this have to do with you, you ask? Well, remember my columns on the equation I call the Rosetta stone, which connects continuous-time systems to discrete-time systems? It's:

z =ehD

I love this equation (perhaps you've noticed?). I love it because I can do all kinds of nifty things with it. Practical, useful things. I've tried to explain it to you folks more than once, with limited success.

Whose fault is that? Well, in the long run, it has to be mine, right? Either I can explain it well or I can't. And looking back, I've seen that I didn't explain it very well, because I took too much for granted. I assumed that, when I said things like “continuous time” you'd know what that meant. Or when I derived it from the Taylor series, I assumed that you knew what that meant.

As you might guess, I'm building up to having another go at the topic. I had promised several readers that I'd show them how to implement a PID controller using this formula, and I want to. But I need your help.

I need you to resolve to overcome fear and hang on with me. In the words of Maggie Thatcher, “Don't go all wobbly on me.”

Jack Crenshaw is a senior software engineer at Spectrum-Astro and the author of Math Toolkit for Real-Time Programming , from CMP Books. He holds a PhD in physics from Auburn University. E-mail him at .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.