Achieving zero defects in auto electronics systems - Embedded.com

Achieving zero defects in auto electronics systems

As the amount of electronic content in automobiles continues toincrease, tighter control over the IC parts going into modern autos isneeded so that defect per million (DPM) rates are driven down, fieldreturns and warranty issues related to electronic components areminimized, and liability is decreased.

The Automotive Electronics Council AEC-Q001  specification recommends a general method for using partaverage testing (PAT) to removeabnormal parts from populations and thus improve the quality andreliability of parts from a supplier.

PAT is a methodology that prescribes finding test results that falloutside six sigma from the population mean for a given wafer, lot orgroup of parts being tested. Any test result outside the six-sigma limit for a givendeviceis considered an outlier and removed from the population – parts thatfail the PAT limits are not shipped to the customer.

There is intense pressure to improve reliability and bring thedefect rate down, especially nowthat many important safety functions such as braking, traction control,and dynamic and active stability control are governed bysemiconductors.

While dedicated to improving the quality of shipped parts, suppliersare also trying to minimize the impact of applying these specificationsto their yields. Since manufacturing costs continue to drop and testcosts remain relatively flat, the margins on devices shrink as testcost becomes a greater component of manufacturing cost.

Major yield hits simply cannot be tolerated – suppliers mustthoroughly evaluate their test process to find candidate tests anditeratively refine the candidate list until they zero in on a goodtarget.

Without sophisticated analysis and simulation tools, suppliers willbe applying these specifications without a good understanding of whatit means to their supply chain—or applying them blindly and missingcritical tests, resulting in shipping at the same DPM rates with theguarantee thatthe devices were tested using a specification like PAT. In such a case,the guarantee is meaningless and reliability suffers.

Some suppliers seem to think that performing PAT at wafer probe isgood enough, but studies show countless issues with that approach.Using PAT at wafer probe is the first quality gate, but the rest of thedownstream manufacturing process adds potential variability from amyriad of sources – variability that causes more PAT outliers atpackage test.

If suppliers truly want to ship the highest quality parts, they willneed to perform PAT at both probe and final test, and their customersshould drive this approach. One look at the data conclusively supportsthat PAT must be performed at both quality gates.

One method in the PAT process is to analyze recent data from severallots to establish static PAT limits for each test of interest. Theselimits are calculated as the RobustMean ± Six Sigma and are normally incorporated into thetest program as the upperspecification limit(USL) andthe lower specification limit (LSL). Static PAT limits must bereviewed and updated at least every six months.

The preferred method is to calculate dynamic PAT limits for each lotor wafer. The dynamic PAT limits are normally tighter than the staticPAT limits and weed out any outliers not in the normal distribution.

The important distinction is that dynamic PAT limits are calculatedon a wafer or lot basis, thus the limits are continuously changingbased on the performance of the material for that wafer or lot. DynamicPAT limits are calculated as the Mean ± (n * Sigma) or Median± (N * Robust Sigma) and cannot be less than the LSL or greaterthan the USL specified in the test program.

Figure1: Any values outside the dynamic PAT limits but within the LSL and USLare considered outliers, which are usually designated as failures.

Calculated dynamic PAT limits are designated as the lower PAT limit(LPL) and the upper PAT limit (UPL) in Figure 1. Any values outside thedynamic PAT limits but within the LSL and USL are considered outliers.

These outliers are usually designated as failures and are binned outto a special outlier software or hardware bin. Keeping track of thecalculated PAT limits for a given wafer or lot and the number ofoutliers detected per test is important for traceability at a laterdate.

Real-time implementation
There are two main and very different schools of thought when it comesto implementing PAT in production—real-time PAT and statistical postprocessing (SPP). Suppliers must ask themselves if they want toinstitutionalize two different approaches for probe and final test orif a single solution for PAT makes more sense.

Real-time PAT relies on calculating dynamic PAT limits and makingbinning decisions in real-time as parts are tested, without affectingtest time. This requires a dynamic real-time engine that can handlecomplicated data streams for monitoring and sampling.

Likewise, this process requires a robust statistical engine capableof taking test data and performing the necessary calculations togenerate new limits, passing the new limits and binning informationinto the test program – all the while, monitoring the entire process toensure stability and control. Real-time processing works for both probeand final test and handles baseline outliers as required by suppliers.

Statistical post-processing produces the same end result, processingstatistics from device test and making binning decisions after a lothas been completed. However, because binning decisions are made after alot has been processed, post-processing can only be used for waferprobe since the test and binning results must be tied to a specificdevice so that it can be re-binned.

At package test, there is no way to connect test and binning resultsto a particular device because there is no tracking mechanism orserialization once parts have been packaged. SPP also requires 100percent data logging of test results so that decisions can be made,increasing IT infrastructure needs and slowing test times. Sinceresults are post-processed, SPP handles baseline outliers as part ofthe general population of devices in a lot.

Both methods allow for powerful algorithms to be run against thetest and binning results, like regional PAT and other failure patterns.One example of regional PAT is the proverbial “good guy in a badneighborhood” where one good (passing) die on a wafer is surrounded bymultiple failing die.

In an effort to reduce the DPM for automotive devices, mostsuppliers want to pitch this “good guy in the bad neighborhood” becausestudies show that it is highly likely that this passing device willfail prematurely.

Consider a power management device made for automotive use. We havepulled historical test data into an analysis tool and dug deeply intothe device's parametric data to discover which tests are goodcandidates for PAT. Some tests are better than others, more suited toPAT, or more critical to the functioning of a device. If all tests fora device are selected, the yield hit will be quite unacceptable.

The problem with some tests is that they are simply not “stable”enough to be measured against the PAT standard. The source ofvariability may be inherent in the device itself, may come from thetest process (i.e. an instrument in a piece of ATE incapable ofproducing granular measurement), or may have been introduced during thepackaging process. These tests are simply not in statistical controland cannot be measured.

A baseline is used to establish dynamic PAT limits on a given waferor lot. For example, on a wafer containing 1,000 die, a baseline of 100representative die would be an appropriate statistical sample of thatwafer.

Once the baseline is reached, several important tasks are performedbefore dynamic PAT limits can be applied in the real-time environment.A normalcy check is performed for each of the selected tests. If thedata is normally distributed the standard deviation is calculated usingthe “normal” method, but if the data is not normally distributed thestandard deviation is calculated using the “robust” method.

Figure2: Some tests may simply not be 'stable' enough to be measured againstthe PAT standard.

Forward to implementation
The dynamic PAT limits for each selected test must be calculated andstored in memory for use on subsequent tests. The original LSL and USLare unchanged and used to detect test failures according to theoriginal test program pass/fail binning.

Calculations to identify outliers in the baseline for the selectedtests are performed. At probe test, the x-y coordinates are saved forprocessing at end of wafer. At package test, the baseline devices arebinned to a “baseline” bin. If outliers are detected in the baseline,these devices can then be identified for re-test.

After the baseline is reached, each selected test is checked againstthe dynamic PAT limits and binned accordingly in real-time for eachdevice. Devices that fail the PAT limits drop into a unique “outlier”software or hardware bin, which identifies them as PAT outlier devicespost-test.

The best part of a real-time system is the potential for real-timeprocess feedback while devices are being tested, triggering actions andalerting personnel to issues immediately. For example, a trigger may beactuated if the total number of outliers in the baseline exceeds auser-entered threshold and alerts the test operator that the baselinebinned parts at package test should be retested.

In a real-time environment, outliers in the statistical baselinemust be post processed after the lot has been run. In the SPPenvironment, this is not an issue, since the entire population of datais processed at the same time, after the lot has finished. Handling ofoutliers in baseline devices is important, even if the actual number ofoutliers is usually small.

While both PAT solutions offer advantages, the fastest road tomeeting reliability requirements without disturbing the test process isvia real-time, active quality management that is based on soundstatistical methodologies.

Scott Bibbee is director ofMarketing and Co-founder of PintailTechnologies .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.