Data

False Negatives Can Be a Matter of Life and Death

Algorithms will repeat our mistakes unless we know what we're missing.

Ground truth.

Photographer: Bulet Kilic/AFP/Getty Images

It’s easier to understand when you’ve got things right than exactly how you got things wrong. Therein lies an insight that could save a lot of lives.

Consider relationships. You can know whether you’re happy with your partner, but you can never be sure you shouldn’t have dated someone else entirely. Or say you own a company. Good and bad employees are easy to recognize: Some work hard and are happy, others shirk or quit soon after their expensive training ends. But it’s much harder to know whether you’ve passed over good candidates, because you never gave them a chance to prove themselves.

Recognizing such false negatives matters a lot, particularly in the world of big data. Suppose you’re creating a hiring algorithm. If you train it to recognize good employees based on the track records of people you’ve hired, it will mimic your biases and miss the potentially good qualities of all the people you rejected. In other words, if you don’t know what mistakes your process is making, your algorithm will be doomed to repeat them -- and possibly even expand on them, because there will be nothing to stop it.

Same goes for higher-stakes situations. The New York Times recently reported that the Pentagon has been undercounting civilian casualties from airstrikes against the Islamic State in Iraq, by a factor of more than 31. Clearly, the military needs a mechanism for recognizing such false negatives -- and Times reporters Azmat Khan and Anand Gopal demonstrated how to do it by visiting the sites of airstrikes, conducting interviews and counting the dead. This is what researchers call ground truth, the deep study of errors through direct observation.

Seeking ground truth can be expensive, but it’s worth the effort – and not just because innocent civilians are dying and we’re telling ourselves lies. It’s also important because the same casualty data are likely being used to train the most lethal of algorithms: artificial intelligence attached to weapons. Such autonomous weapons are being pushed as a new, unbiased approach to warfare that removes human error. But of course they will not be unbiased if they are trained on information that makes them blind to civilian casualties.

We need to invest in much more extensive, expensive and time-consuming ground truth if we are going to develop high-stakes algorithms that depend crucially on good information. For airstrikes, that will mean doing a much better job of counting the dead.

This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.

    To contact the author of this story:
    Cathy O'Neil at cathy.oneil@gmail.com

    To contact the editor responsible for this story:
    Mark Whitehouse at mwhitehouse1@bloomberg.net

    Before it's here, it's on the Bloomberg Terminal.
    LEARN MORE
    Comments