Thursday, 6 December 2012

The Sensitivity of Tests

Today's junkmail brought an invitation to have someone ultrasound my arteries to test for heart disease.  This annoys me on many levels, not least the crass advertising gimmicks used.  I'm not planning to participate because I don't think it will tell me anything useful and until they talk to me in the language of science

I'm after the answers of the Positive Predictive Value (PPV) and Negative Predictive Value(NPV) for this test and... well whatever potential future event they're promising me that I don't need to worry about.  As I'm not worried to begin with, I can spend that £150 on cake instead.

I'll explain PPV and NPV, and their related diagnostic terms Specificity and Sensitivity using something far more important to modern life - testing for witches.

Of course: The absolute test for whether someone is a witch or not is if he or she burns in hellfire for all eternity after dying.  For many people, that test doesn't happen soon enough, so there's the more conventional test of whether a witch floats in water or not.  Despite many centuries of refinement, even the best techniques aren't a perfect match for being Satan's future boyfriend/girlfriend.

There are two things that a test needs to be for it to be worthwhile: It needs to tell us what's true is true, and it needs to tell us what's false isn't true.  In our witchcraft example, the suspect needs to float if he/she's a witch and sink if he/she's not a witch.  Unfortunately there are consistently cases of known witches drowning and saints floating on water, so we need to measure how effective this test is.

We call the truthiness of a test specificity.  A high specificity means that if the answer's yes then it means yes, while a low specificity means that if the answer's yes, then it probably doesn't mean yes.  We can give specificity a figure by dividing the number of times that a true positive occured by every time a true witch is tested (which is true positives and the false negatives - the witches that floated and the witches that drowned)

The opposite of specificity in sensitivity.  A high sensitivity means that if the answer's no then it means that he's very unlikely to be a witch, while if the sensitivity is low then even the fact of drowning means that he probably was a witch all along.  We can give sensitivity a figure by dividing the number of tiems a true negative occured by every time an innocent is tested (the true negatives and the false positives - the innocents that drowned and the innocents who floated)

The use of these measures is in deciding whether a test is worthwhile or not, especially because there is a tendency that the more specific the test then the less sensitive it is - particularly when the test gives many answers where the cut-off is drawn somewhere along the scale.  Take blood pressure as an example: The higher you draw the cut-off for hypertension then the more people you're going to include who could potentially have complications of blood pressure (increasing specificity - including the right ones) but you include more that will end up not having complications (decreasing sensitivity - including the wrong ones).

Now instead of looking at a test in terms of whether it will judge a True Witch to be a witch or an innocent, we can look at it the other way - to what degree does a positive mean Witch?

Positive Predictive Value (PPV) uses the maths differently to evaluate whether we have to burn everyone who floats.  PPV is calculated by dividing the number of Floating True-Wtiches by all the times that someone Floats.  A high number means that it's unlikely an innocent has been falsely labelled Witch, while a low number means that we just can't tell.

Meanwhile Negative Predictive Value (NPV) looks at the people who sink.  Are we right to mourn the person who sank, or could they have been a witch all along?  NPV involves dividing the number of Sinking Innocents by all those that sank.  High values means we can have confidence in the immortal soul of the innocents who die passing the test, whereas low values mean that they could have been a witch all along.

Wikipedia has nice tables showing how this all fits together.

We live in an imperfect world, and understanding the statistics of testing allows to determine whether imperfect tests are worthwhile, particularly when the consequences of a false result are as severe as drowning or burning.

==Update 26/12/12==
I've now created a nice one page PDF showing the relationships between the different measures, to add my fellow witchfinders in their quests:

Witchcraft testing