jump to navigation

Are You Looking For Root Cause, Or Someone to Blame? August 15, 2016

Posted by Tim Rodgers in Management & leadership, Operations, Process engineering, Quality.
Tags: , , , ,

When I worked as a quality manager in my first career I was often required to investigate quality failures to determine the cause. There were times when it was pretty easy to figure out, but in an uncontrolled business environment it can be hard to identify a simple dependent relationship between cause and effect. There are usually multiple contributing factors. Sometimes a small thing (the cause) can become a big thing when it’s overlooked (another cause).

Most of the other managers I worked with didn’t have much patience with the complexities of root cause analysis. They wanted a simple, actionable outcome: this is the cause, and if we eliminate this cause then this problem will never happen again (right?), so let’s eliminate the cause. The people who were impacted by this quality failure want answers, and they want to feel confident that the business has taken decisive and effective action. They don’t want to endure an extended period of uncertainty and exposure to risk while the business figures out what to do in order to prevent re-occurrence.



Quality Decisions in Hindsight July 25, 2016

Posted by Tim Rodgers in Management & leadership, Operations, Organizational dynamics, Process engineering, Product design.
Tags: , , , , , ,
add a comment

For the last several years there’s been at least one high-profile case of quality failure that captures the attention of the business press for months at a time. Since late 2015 and early 2016 we’ve been watching to see if air-bag supplier Takata, iconic auto maker Volkswagen, and fast food chain Chipotle will survive their highly-publicized quality missteps. There’s always a lot of apologizing to the public, and a commitment to conduct internal investigations to identify and eliminate the causes of field failures. Senior management and boards of directors scramble to regain the trust of their customers.

I’m not at all surprised by the frequency of these events. What surprises me is that these events don’t happen more often. We should expect to continue to hear about similar catastrophic quality problems from otherwise reputable companies despite all the talk about six sigma and customer satisfaction, and despite all the investments in quality improvement programs. It’s the nature of business.


The Battle Over Discrepant Material January 19, 2016

Posted by Tim Rodgers in Quality, Supply chain.
Tags: , , , , , ,
1 comment so far

Quality issues have been on my mind a lot lately, specifically some of the more frustrating things that I’ve had to deal with during my career as a quality manager. In my last job my team was responsible for managing the discrepant material review (DMR) process for our US-based factory.

For those who are unfamiliar, the DMR process is how most factories deal with raw materials or other inputs that have been identified as possibly defective and unsuitable for use. Incoming materials that don’t pass visual inspection or other testing are supposed to be sequestered so they can’t go into production. Later, the DMR process is used to determine what to do with that material. The choices are usually:


Problems With Field Failure Analysis September 29, 2014

Posted by Tim Rodgers in Process engineering, Product design, Quality.
Tags: , , ,
add a comment

We hate to hear from a customer that there’s been a problem with their product, but sometimes it happens, in spite of our efforts to create a robust design and manufacturing processes, and in spite of whatever testing, inspection, and audits we perform. We want the customer to know that we care, and we typically try to make it right, either by replacing or repairing the defective unit. We’d also like to rebuild confidence and assure the customer that we know what went wrong, and that we’ve implemented corrective action to prevent that problem from reoccurring.

Of course that last part assumes that we actually do know what went wrong, and that we understand the root cause and how to eliminate it. That’s why we investigate field failures. We want that failed unit back in our hands so we can take it apart and figure out what happened. We want to know the operating and environmental conditions, and we want to trace the manufacturing history back to individual parts. We’re looking for clues that can help us determine why this unit failed when others didn’t. We apply disciplined problem solving techniques and develop a convincing analysis that we can bring back to the customer.

That sounds good, but there are problems when you’re trying to use field failures to improve product quality, and I don’t think it’s all that easy. Here are two examples:

1. When I was at Foxconn in 2009-11 we built inkjet printers for Hewlett-Packard and reached a peak production volume of 3 million units a month. These were consumer electronics products that were primarily shipped to a worldwide network of distribution centers and retail channels. Customer returns were received by HP or a 3rd party , and a very small number of units made their way back to Foxconn for field failure analysis (FFA).

“No trouble found” (NTF) was always the leading “cause” on the FFA Pareto chart. Many of the returned printers were poorly packaged and poorly handled, making it hard to know where damage might have occurred. We never experienced a “class failure” that affected a large population of printers. The small percentage of returns left us wondering if we were looking at a statistically significant finding, or just a bunch of outliers that would be expected from any complex design and manufacturing process. If we did find a broken part or an error in assembly (both very rare), we would send an alert to the appropriate supplier or production area, but generally the value of FFA hardly seemed to be worth the effort.

2. The other situation was my last position where we built solar inverters that were typically installed as part of a large construction project to support a vast “solar farm” of photovoltaic panels for power generation. Inverter production was relatively low volume, and inverters were often customized to meet the specific requirements of the customer. A field failure after installation was addressed by a team of mobile repair technicians who would be dispatched to the site.

The repair team’s performance was measured in part on how quickly they could get the inverter up and running again, and they weren’t going to spend much time trying to understand the root cause of the failure. Their standard approach was to swap out components or subsystems, guided by their technical training, past failures and data logs, until they found a combination that worked. Sometimes the replaced parts would be returned for analysis, but never the entire inverter. Returned parts were either damaged so badly by electrical or thermal failure that they could not be analyzed, or else they passed all testing against their design specifications. We struggled to determine whether any problem was due to defective parts, poor workmanship in assembly, or a product design that unexpectedly subjected the part to conditions that were beyond the part’s design capabilities.

I don’t think these are unusual situations. FFA often must contend with small numbers of returned units and the possibility that they may be compromised by poor handling at the “crime scene.” Data logs from the failed unit may be available to help understand the operating conditions before the failure, but that depends on whether the customer will grant access. A broken part without the larger system doesn’t tell you much, but often that’s all you get. Even if the analysis does get to a root cause in manufacturing or the supply chain, it’s too easy to dismiss it as an outlier. Given all these limitations, why do we put our faith in FFA as part of our quality management system? Is this just to make our customer feel better?

The Danger of Quick Fixes September 17, 2014

Posted by Tim Rodgers in Management & leadership, Process engineering, Quality.
Tags: , , , ,
add a comment

I think it’s fair to say that most people make better decisions when they have more time. With more time we can collect more data, consult with people who have more experience, and weigh the alternatives before choosing a course of action. In the specific case of problem solving, we can propose alternate root causes and perform experiments to verify the cause before implementing a solution. This kind of disciplined approach helps ensure that the problem doesn’t reoccur.

The thing is that in business we rarely have enough time, or all the time we wish we had. All of us make daily, small decisions about how to spend our time and resources based on external priorities and internal heuristics. Some of us have jobs in rapidly-changing or unstable environments, with periodic crises that need management attention. Unresolved situations create ambiguity in the organization, and ultimately these situations cost money, and this cost creates pressure to do something quickly. There’s an emotional and perception component as well: it “looks better” when we’re doing “something” instead of sitting and thinking about it. After all, “you can always fix it later.”

Of course “fixing it later” comes at its own cost, but that’s often underestimated and under appreciated. It’s tempting to implement a quick fix while continuing to investigate the problem. It takes the pressure off by addressing the organization’s need for action, which is both good and bad. The danger is that the quick fix becomes the de facto solution when the urgency is removed and we become distracted by another problem. The quick fix can also bias subsequent root cause analysis, especially if it appears to be effective in the short term.

Please note that I’m not suggesting that every decision or problem solving effort requires more time and more inputs. I’m not advocating “analysis paralysis.” We’re often faced with situations where we have to work with incomplete and sometimes even inaccurate data that may not even accurately represent the true problem. Sometimes a quick fix is exactly what’s needed: a tourniquet to stop the bleeding. However, corrective action is not the same as preventive action. If we want better decisions and better long-term outcomes, let’s not forget that a quick fix is a temporary measure.

Seeing the Forest to Improve Quality May 14, 2014

Posted by Tim Rodgers in Process engineering, Product design, Quality, Supply chain.
Tags: , , , , ,
add a comment

A few weeks ago I listened to a presentation by a quality engineer who gave an overview of his company’s processes for measuring and improving first-pass yield (FPY). He started with a fairly standard graph showing the trend in FPY over time, and later presented a detailed breakdown of the individual defects found at the end-of-line testing of the product. It was a typical Pareto analysis, with a problem solving focus on the defects that occurred most-frequently.

All very straightforward and by-the-book, but it seemed to me that there was something missing. Certainly part of our job in quality is to address the problems that occur most often, but it should also be about detecting trends and implementing preventive action, not just corrective action.

I asked the speaker if he had tried to classify the individual defects into categories of some kind (Answer: no). In this case, for a hardware product, one simple classification scheme would be to group the defects by root cause, such as design, workmanship, supplier, or test procedure. A large number of defects in one root cause category would indicate the need for a more generalized problem solving approach that prevents different, but similar, defects from occurring in the future.

You might get there eventually, if you ask enough “whys” during the corrective action root cause analysis, but too often this results instead in a localized fix to a specific defect. We don’t see the forest for the trees, and we end up chasing individual defects instead of addressing the real causes. It may look good to reduce or eliminate defects one-at-a-time, but in quality we should be working toward defect prevention, and that requires more detailed analysis.

What Are Individual Accomplishments Within a Team Environment? April 7, 2014

Posted by Tim Rodgers in Management & leadership, Process engineering, Project management.
Tags: , , , , , ,
add a comment

The other day I responded to a question on LinkedIn about whether performance reviews were basically worthless because we all work in teams and individual accomplishments are hard to isolate. It’s true that very few jobs require us to work entirely independently, and our success does depend in large part on the performance of others. But, does that really mean that individual performance can’t be evaluated at all?

If I assign a specific task or improvement project to someone, I should be able to determine whether the project was completed, although there may be qualifiers about schedule (completed on-time?), cost (within budget?), and quality (all elements completed according to requirements?). However, regardless of whether the task was completed or not, or if the results weren’t entirely satisfactory, how much of that outcome can be attributed to the actions of a single person? If they weren’t successful, how much of that failure was due to circumstances that were within or beyond their control? If they were successful, how much of the credit can they rightfully claim?

I believe we can evaluate individual performance, but we have to consider more than just whether tasks were completed or if improvement occurred, and that requires a closer look. We have to assess what got done, how it got done, and the influence of each person who was involved. Here are some of the considerations that should guide individual performance reviews:

1. Degree of difficulty. Some assignments are obviously more challenging with a higher likelihood of failure. Olympic athletes get higher scores when they attempt more-difficult routines, and we should credit those who have more difficult assignments, especially when they volunteer for those challenges.

2. Overcoming obstacles and mitigating risks. That being said, simply accepting a challenging assignment is enough. We should look for evidence of assessing risks, taking proactive steps to minimize those risks, and making progress despite obstacles. I want to know what each person did to avoid trouble, and what they did when it happened anyway.

3. Original thinking and creative problem solving. Innovation isn’t just something we look for in product design. We should encourage and reward people who apply reasoning skills based on their training and experience.

4. Leadership and influence. Again, this gets to the “how.” Because the work requires teams and other functions and external partners and possibly customers, I want to know how each person interacted with others, and how they obtained their cooperation. Generally, how did they use the resources available to them?

5. Adaptability. Things change, and they can change quickly. Did this person adapt and adjust their plans, or perhaps even anticipate the change?

This is harder for managers when writing performance reviews, but not impossible. It requires that we monitor the work as it’s being done instead of evaluating it after it’s completed, and recognizing the behaviors that we value in the organization.

Trends and Early Indicators March 24, 2014

Posted by Tim Rodgers in Management & leadership, Process engineering, Quality.
Tags: , , , , , , ,
1 comment so far

Everybody understands that it’s better to be proactive and avoid problems rather than be reactive and respond after the problem has surfaced. In quality we try to shift from fixing defects to defect prevention. In strategic planning and project management we identify risks, assess their impact, and develop mitigation plans. If we could know in advance that something bad is about to happen, we could surely avoid it.

And of course that’s the problem: it’s hard to accurately predict the future. We may have identified a serious risk, but we underestimated its likelihood. We knew there was a good chance it would happen, but we couldn’t predict when it would happen. We put a plan in place to reduce the risk, but we had no way of knowing if the plan worked until it was too late.

Control charts are a great tool for monitoring a process. Once you’ve established process stability and eliminated special causes, the process will operate within a range of variability defined by common causes. Rules based on probability and statistical significance help determine when the process is starting to drift from stability, which gives the process owners time to investigate and eliminate the cause.

That’s great for measurable processes that are repeated frequently, but there are lot of business processes that are neither. We can’t identify, much less eliminate, the causes of variability. You can wait until the process is complete, measure its effectiveness, and make improvements before the next iteration, but that’s still reactive, and it can be expensive when we miss the target. We need tools to predict the outcome before the process is complete so we can perform course corrections as necessary.

We need leading indicators to determine if we’re on-track or heading off the cliff. In project management you can look at schedule, task completion, earned value, and budget trends. Risk planning can include triggers that provide early warning (i.e., if this happens, then we know we’re OK / we’re in trouble). Products and software can be designed to enable early testing of high-risk subsystems and interfaces, and manufacturing process parameters that impact critical performance requirements (determined from FMEA or PFMEA) can be monitored. We will always rely on judgment and experience to minimize risk, but if we don’t implement warning systems we might as well use a crystal ball.

Adding Value With Less March 13, 2014

Posted by Tim Rodgers in Management & leadership, strategy.
Tags: , , , , , , ,
add a comment

One of the most common complaints I hear from managers and individual contributors is that they never have the resources they need to get the job done. The schedule, deliverables, or both are impossibly unrealistic because they’ve been denied the budget, the hiring authority, or the access to the internal staff that they really need. When they fail to achieve their objectives, it’s because upper management (or whoever has the authority to approve their requests) got in their way.

In fact, they’re probably right: upper management may have been directly responsible for refusing their request for more time or resources, but were they given any reason to do otherwise? People often present these decisions as an equivalency between results and resources. “I can complete this if you give me that.” But, have they presented a convincing argument that supports that equivalency? Have they presented other options, or explained the risks of operating with less-than-adquate resources?

Put yourself in the perspective of the person who controls the resources. Their best-case scenario is that you will be able to do the job within the schedule with no additional cost beyond what has already been budgeted. There’s going to be some natural resistance to any request for more resources (or at least there should be if they’re managing within a budget), and the burden of proof is on the requestor.

The mistake that people make is framing this as a binary choice: either they get everything they ask for, or they’re doomed to failure. As a manager, I’m generally open to multiple options. I want to know what can be done, and what the risks are, at a variety of “price points.” I want to brainstorm about pros and cons, priorities, and alternatives that may not be obvious. It’s this kind of collaborative problem solving that leads to better decisions and adds value in an organization. It also helps the team understand and appreciate the constraints that the business is operating within, which builds commitment. Finally, making tradeoffs and learning how to get things done with less are important skills that strengthens the organization.

Statistics and the Translation Problem January 31, 2014

Posted by Tim Rodgers in Management & leadership, Process engineering, Quality.
Tags: , , , , ,
add a comment

Folks who are trained in statistical analysis methods can become frustrated when working with untrained folks, and the feeling is often mutual. Business leaders are looking for clear guidance, and all the talk about probabilities, confidence intervals and “the results fail to reject the null hypothesis” doesn’t sound very helpful. I believe that in most cases the problem isn’t with the answer given by the statistical analysis, it’s with the question.

This reminds me of the “word problem” problem that causes a lot of people to give up on math in school. They can solve x+2=3, but they can’t figure out when train A will pass train B when their speeds and locations are given. It’s actually a translation problem, an inability to express the proper mathematical representation of the question that’s posed. If you can express the question correctly the rules of algebra will get you the right answer, but it’s hard to get there.

In inferential statistics there are usually two translations required: from the real-world question to a statistical question, and then from the statistical answer to a real-world answer (or interpretation of the results). If people aren’t happy with the results of the analysis, it’s could be because they (1) asked the wrong statistical question, or (2) didn’t understand the statistical answer.

A little education can go a long way here. At minimum, those who pose the questions and interpret the answers need to understand the concepts of populations, samples, and using sample data to infer the properties of the population. And, those with the statistical training need to develop a good understanding of the business to translate the initial real-world question into an appropriate analysis.

%d bloggers like this: