Using clock margining for system test boundary stability and early failure prediction
Like many of you, I distinctly remember the PC clone platforms of the 1980's and early 90's contained an important button called turbo mode. I loved to push the turbo button and watch the display numbers change. Many times the numbers made no sense, but then what did it really matter as long as they changed when I pushed the button?
Pushing the turbo button made me feel better thinking that somehow I was on the edge of computational performance and getting more than "my monies worth" when it came to purchasing a $2500 desktop system. I also knew that should I ever doubt system instability I could always return to "normal" mode to ensure full system stability.
Frankly, I never did operate in "normal" mode and neither did anyone else. A quick walk around the office space quickly revealed that everyone had the turbo-button set. Of course the thrill of turbo mode operation is a two edged sword as it is continually blamed for system crashes with the incessant fear of liquefying the CPU down to a blob of molten silicon should the fan ever fail.
The turbo mode of yesterday is often referred to in today's nomenclature as overclocking. Perhaps a new name will heal old wounds. The fundamental concept has not changed; the direction is always pushing the envelope of computational speed of stability (usability) vs. instability. When we think of overclocking, we naturally gravitate to thinking about the PC experience.
Aside from what many may consider a hindrance, if we analytically look at the overclocking experience is it possible that this can become a tool capable of revealing system weaknesses? Is it possible that through a structured "design of experiments" that the logic weakest link might be forced to reveal itself?
Further, on this point can overclocking analyze failure such that a more robust system meets catastrophic failure upon crossing the unstable threshold? Through structured analysis, is it possible that overclocking will accurately expose the boundary of stability vs. instability in a system? Are there other hidden treasures in our analysis that may result in estimating early failure detection through aging?
If overclocking serves to push a system to the stability edge then what can be said for the compliment of the overclocking -- what about underclocking? We often think about overclocking as predominately attacking setup time.
Underclocking, then, might attack the compliment in our system of hold time. Implementing the concept of over or underclocking requires that we have a baseline condition referenced in this article as the system "nominal" response. The system designer establishes a nominal response where specifications are based on manufacturing stipulation normally provided through component datasheets.


Loading comments... Write a comment