Our anonymous data is not so anonymous
A few "anonymous" data points and boom - there you are. And you have a lot of "anonymous" data out there.
I enjoy a solid detective story. There’s something glorious about the way a keen mind can connect the dots – tracks in the dirt, a dark sedan captured on the security footage of a corner stone, the nervous behavior of someone close to the crime. Even the most adept criminal cannot escape the trail of clues he or she unwittingly leaves behind. All it takes is the right detective.
In fact, when it comes to tracking down a person using nothing but their online data, a grad student and $20 should do it.
And I’m not even talking about tracking down a criminal. I’m talking about tracking down me and you – regular ol’ folks who maybe once left a department store with an undergarment they didn’t pay for but that was decades ago and we’ve lived with the shame ever since...and are otherwise people who try to do the right thing.
My point: We’re not as anonymous online as we’d like to believe.
Sure, the websites and apps we use love to brag about how anonymous our data is. Maybe they’re even trying to make it so. But the truth is, it doesn’t take much to connect the data dots and identify just about anyone. Even us regular, non-crime-committing folk.
I wrote a couple weeks ago about how a Catholic newsletter got information about a priest’s private life by accessing and analyzing location data gleaned from his smartphone. That information prompted the priest’s immediate resignation from his role as top administrator for the U.S. Conference of Catholic Bishops. Regardless of what you think about what his data revealed, the larger point is that his data, which he no doubt believed was anonymous, was not.
That’s not news to people who’ve been looking at data for decades. Latanya Sweeney discovered how easy it is to connect the dots back in 2002. Last year, The Markup shared how Sweeney, a graduate student in computer science at MIT in 2002, used voter registration records and insurance company health records to ID former Massachusetts Governor William Weld.
Long story short: In 1996, then-Governor Weld was taken to the hospital after collapsing at a public event. The hospital created a medical record detailing his care, the tests they ran, his diagnosis, and his prescription. That medical record was presumed private, much like you or I presume our medical records are private. But the Governor’s health insurance opted to sell their health records to researchers. Using those health records and voter registration records, which Sweeney purchased for $20, she was able to match up the data and narrow down six possible patient records that could be Weld’s. Three of the six were men, and one of those men shared Weld’s zip code.
“Sweeney found that 87 percent of the U.S. population could be identified by just three data points: zip code, date of birth, and gender,” writes Sara Harrison on The Markup.
Read the full story: When Is Anonymous Not Really Anonymous?
Twenty-one years later, and the amount of data that exists about each of us goes far and away beyond what existed in 2002; there are ever more records and ever more ways to use multiple data sets to drill down to a single individual, even if those lists appear to be anonymous on their surface.
“The data is typically stripped of the most obvious identifying information like a name, email or cell number,” writes Justin Sherman on WIRED. “However, it still contains information that could reveal the person behind it, such as a device ID, an IP address or an advertising identifier. With the right outside information or a third-party service, so-called anonymous data can be de-anonymized.”
Read the full story: Big Data May Not Know Your Name. But It Knows Everything Else
Why does this matter to any of us?
Sara Harrison of The Markup writes, “Deanonymized health data could be used by insurers to discriminate against patients. Anonymized web browsing data has been combined with publicly available information from Twitter to re-identify who did which searches. Location data could be used to track people’s movements, monitor where they pray, who they see, or whether they’re involved in political groups.”
That’s all a very big deal.
Your name doesn’t have to be attached to your data for people and companies to know it’s yours. (But let’s be honest, they probably have your name, too.)
And data brokers are happy to sell it.
Big Data May Not Know Your Name. But It Knows Everything Else - WIRED
How smartphone data can be used to learn secrets - The Washington Post
A priest’s phone location data outed his private life. It could happen to anyone - Washington Post
Unique in the Crowd: The privacy bounds of human mobility - Nature