jump to navigation

Problems With Field Failure Analysis September 29, 2014

Posted by Tim Rodgers in Process engineering, Product design, Quality.
Tags: , , ,

We hate to hear from a customer that there’s been a problem with their product, but sometimes it happens, in spite of our efforts to create a robust design and manufacturing processes, and in spite of whatever testing, inspection, and audits we perform. We want the customer to know that we care, and we typically try to make it right, either by replacing or repairing the defective unit. We’d also like to rebuild confidence and assure the customer that we know what went wrong, and that we’ve implemented corrective action to prevent that problem from reoccurring.

Of course that last part assumes that we actually do know what went wrong, and that we understand the root cause and how to eliminate it. That’s why we investigate field failures. We want that failed unit back in our hands so we can take it apart and figure out what happened. We want to know the operating and environmental conditions, and we want to trace the manufacturing history back to individual parts. We’re looking for clues that can help us determine why this unit failed when others didn’t. We apply disciplined problem solving techniques and develop a convincing analysis that we can bring back to the customer.

That sounds good, but there are problems when you’re trying to use field failures to improve product quality, and I don’t think it’s all that easy. Here are two examples:

1. When I was at Foxconn in 2009-11 we built inkjet printers for Hewlett-Packard and reached a peak production volume of 3 million units a month. These were consumer electronics products that were primarily shipped to a worldwide network of distribution centers and retail channels. Customer returns were received by HP or a 3rd party , and a very small number of units made their way back to Foxconn for field failure analysis (FFA).

“No trouble found” (NTF) was always the leading “cause” on the FFA Pareto chart. Many of the returned printers were poorly packaged and poorly handled, making it hard to know where damage might have occurred. We never experienced a “class failure” that affected a large population of printers. The small percentage of returns left us wondering if we were looking at a statistically significant finding, or just a bunch of outliers that would be expected from any complex design and manufacturing process. If we did find a broken part or an error in assembly (both very rare), we would send an alert to the appropriate supplier or production area, but generally the value of FFA hardly seemed to be worth the effort.

2. The other situation was my last position where we built solar inverters that were typically installed as part of a large construction project to support a vast “solar farm” of photovoltaic panels for power generation. Inverter production was relatively low volume, and inverters were often customized to meet the specific requirements of the customer. A field failure after installation was addressed by a team of mobile repair technicians who would be dispatched to the site.

The repair team’s performance was measured in part on how quickly they could get the inverter up and running again, and they weren’t going to spend much time trying to understand the root cause of the failure. Their standard approach was to swap out components or subsystems, guided by their technical training, past failures and data logs, until they found a combination that worked. Sometimes the replaced parts would be returned for analysis, but never the entire inverter. Returned parts were either damaged so badly by electrical or thermal failure that they could not be analyzed, or else they passed all testing against their design specifications. We struggled to determine whether any problem was due to defective parts, poor workmanship in assembly, or a product design that unexpectedly subjected the part to conditions that were beyond the part’s design capabilities.

I don’t think these are unusual situations. FFA often must contend with small numbers of returned units and the possibility that they may be compromised by poor handling at the “crime scene.” Data logs from the failed unit may be available to help understand the operating conditions before the failure, but that depends on whether the customer will grant access. A broken part without the larger system doesn’t tell you much, but often that’s all you get. Even if the analysis does get to a root cause in manufacturing or the supply chain, it’s too easy to dismiss it as an outlier. Given all these limitations, why do we put our faith in FFA as part of our quality management system? Is this just to make our customer feel better?



No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: