jump to navigation

When the Sample Size Is Small June 15, 2012

Posted by Tim Rodgers in Process engineering, Quality.
Tags: , , , , ,

Are things getting better, getting worse, or staying the same? We ask these questions all the time in business and we look at numbers as an objective way to help us understand what’s going on. Often this is a situation where we’ve taken some kind of action that was intended to cause an improvement in a key performance metric. We want to know if the action was successful. We typically collect data or otherwise measure the system after the changes have been implemented. If the performance metric improved, we celebrate — satisfied that our intervention has “worked.” If the metric doesn’t improve, we go back to the drawing board and try again.

That all sounds pretty straightforward and obvious, but in many cases we’re comparing a small number of measurements — sometimes as few as two (one “before” and one “after”) — and drawing conclusions about whether the difference between those numbers represents a real change. A statistician would put it this way: we’re trying to determine whether the “before” and “after” states represent two different populations, and whether the difference in the measurements can be attributed to different populations, or is the difference due to natural, common-cause variation within the same population? The abbreviated form of this question is: “Is the difference statistically significant?”

Generally speaking, you need a large number of measurements (large sample size) in order to determine whether the difference of two means is statistically significant within a given confidence interval. The statistical models and methodology are very well-established and is collectively referred to as analysis of variance, or ANOVA. The smaller the sample size, the harder it is to draw any definitive conclusion with high confidence, and a sample size of one (one measurement “before,” one measurement “after”) is virtually meaningless, statistically speaking.

In the factory where I worked in China we were building literally millions of products each month, so it wasn’t difficult to collect a large number of measurements representing the before and after states. Customer or employee surveys conducted before and after can be designed with a large enough sample size to draw meaningful conclusions.

However in many cases it’s impossible to collect enough data to prove conclusively that our measurements of the after state represents a different population. This is important because the systems we’re working with are typically complex and are not characterized by direct cause-and-effect relationships. The change that we make in order to make things better may have no effect at all, or may even have the opposite effect to what is intended. Sometimes a change may not have a favorable impact on the system right away because of slack or delays in the system.

The point is this: we can’t get carried away by a single data point, or even the first few data points. It doesn’t necessarily prove anything. All we can do is keep taking measurements over time to see if the change still has the intended, directional effect, while staying open to the possibility that the improvement isn’t real or lasting.



No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: