02/10

Arbitron’s Small Sample Problem – it’s Worse than you Think

Arbitron's Small Sample Problem

It’s not news that Arbitron has a problem with small samples in PPM markets. Every broadcaster in these markets has bumped into this problem at one time or another, usually with either delightful or devastating ratings consequences.

Let’s look at this problem from a different perspective:

Forget ratings and consider cancer rates in the US.  Did you know that the lowest cancer rates occur in the the least populated communities? Aha, obviously it’s the quality of life in these sparse communities that contributes to their greater health, right?

Now what if I were to tell you that the highest cancer rates also occurred in the least populated communities?

Oops.

Both conditions turn out to be true for different small communities, but they are not literally “true,” they only seem to be true as a result of what statisticians and broadcasters alike call the “small sample problem.”

Small samples yield extreme results more than large samples do. Every broadcaster knows this, of course, but the impact is far worse than you think.

Daniel Kahneman discusses this issue in some depth in his book Thinking, Fast and Slow:

Imagine a large urn filled with marbles. Half the marbles are red, half are white. Jack draws 4 marbles on each trial, Jill draws 7. They both record each time they observe a homogeneous sample—all white or all red. If they go on long enough, Jack will observe such extreme outcomes more often than Jill—by a factor of 8.

By a factor of 8!  Extreme (and erroneous) results are 8 times more likely with 4 rather than 7. And how many meters are tuned to your station right now?

But it gets worse.

Since PPM units are delivered in household-sized clusters, there will be a tendency for them to be in the presence of the same radios.  When that occurs, the small sample problem is exaggerated still further since it’s quite possible that the “4 marbles” you draw come from the same household and the same radio environment.  That is, 4 act more like 1.  This takes the small sample error and elevates it an altitude so high the truth will resemble tiny ants on the ground below.

Anyone can see this problem on vivid display in the case of PPM ratings for online radio streams.  In those cases where such streams appear in the ratings it’s easy to compare these “ratings” with the actual metrics for your stream from your own MSA and determine that there is approximately zero correlation between the two numbers. Any agency that prefers a small-sample estimate over a 100% accurate census measurement of all listeners should have their heads examined.  What does it say about the veracity of our message and our messengers when we’re using numbers we know to be flat-out false?

Look, there are a lot of good people working hard for positive outcomes at Arbitron.  That is a fact.

But statistics are also a fact.  Arbitron has offered the radio industry the best electronic measurement we can afford.  Arbitron has done this knowing precisely the statistical consequences of what that money can buy.  We look to Arbitron for a standard of precision that we and our advertising partners can trust, no matter how much or how little we pay for the measurement technology.

Without trust, without confidence, our ratings system will be too easily questioned, and our agency friends will use those questions to bludgeon us on rate, assuming they don’t veer off to media where metrics have a greater ring of truth.

Broadcasters need a solution to this problem or we will pay the price with our advertisers.

* = required field

Dive Into The Blog