BACK TO PAPERS

Why do we need common data?

Today, personal data is being used to make decisions at every level of modern life. Electoral politics, tax policy, health care, finance, and education have all been radically changed by the capacity to collect and store personal data on an unprecedented scale. Private companies collect even more data than the government, trying not only to figure out what book you will buy next, but also how many calories you are willing to consume in a single cookie. We, as individuals, also have some access to personal data, which we use when we peruse online hotel reviews or check house sales in our neighborhood. Decisions are being made for us, and occasionally by us, based on enormous collections of personal data.

The free flow of information has always been a central principle of democracy. Yet we as a society are not truly scrutinizing whether all this data is being collected in a manner consistent with our democratic principles. Information is quickly becoming the most valuable resource in our modern economy, but even as scandals arise and fears of privacy invasion grow, we are not asking ourselves the truly difficult questions: Who has access to my personal information? What is it being used for? Is it being used for me or against me?

Perhaps the most important question of all is, “How could this valuable resource benefit all of us and not just the government agencies, institutions, and private corporations who hold it now?”

The Common Data Project seeks to answer these questions and propose a new way of thinking about personal data that could revolutionize the way we see ourselves and our society.

Who has access to my personal data? How did they get it?

Personal information is being collected from us everyday. Although prescient fiction writers have warned us to watch out for Big Brother, the government as it turns out, is not what we need to be wary of. The data from the census and tax returns are dwarfed by the quantity and quality of data private corporations have access to and buy and sell to each other. Not only do companies store our name, address, and credit card information, they track what we buy and even what we type into search engines.

Despite the sensitive nature of this information, it is collected mostly as a byproduct of transactions. Insurance companies, pharmaceuticals, banks, and mere retailers all collect information, but those who provide online services have a particular ability to capture every action of their customers for aggregation and analysis. They bait individuals with a free service and depend on general apathy to get them to agree to a lot of fine print in privacy policy statements and “End User License Agreements” that are essentially meaningless because no one, not even the service provider, understands what the implications will be of agreeing to the terms stipulated in the agreement.

These documents include vague clauses that refer to collecting data about the customer’s use of the service, and sometimes cast even broader nets, collecting data about general usage of the user’s computer or the information they access over the internet. Most include clauses that state the agreement may change at any time and without any notice. Every effort is made to minimize specifics about what exactly is being collected, both to minimize liability around inaccuracies and because over-specificity might garner undesired attention and scrutiny.

What do we get in return for our personal data? Right now, nothing. We may be able to seek some penalties for misuse, but only after misuse has occurred and only after we prove there were damages. This is because we don’t own our personal information; it is not considered our property. Yet our data is valuable enough to companies that buying and selling it is big business. What is my data used for? What’s wrong with the status quo?

Karl Rove made headlines for using data on wine versus beer preferences in electoral campaigns, but personal data is used by the government to shape less-sexy policies that nevertheless affect all of us. “The proposed tax cut will help the middle class”; “the proposed fare hike will not affect many transit riders”; “this community needs x amount of federal dollars”—all these claims are made based on personal data, whether collected from the U.S. Census, surveys, or other datasets. On an even larger scale, private companies use personal data every day. Private companies mainly use it to try to sell us more things, but that can mean something obvious, like Amazon making book recommendations, or something less obvious, like using Nielsen ratings to determine how much advertising should cost.

None of these uses are necessarily nefarious. However, there are several significant problems with the status quo. First, there is a great deal of information out there that is personal, valuable, and yet completely inaccessible to the people who created it. Access to information is a crucial condition for democracy, why we zealously guard our freedom of speech, press, and association. That some of the largest collection and analysis of information in the world is being done without our explicit participation has serious implications beyond the relatively simple crime of identity theft. The potential harm to our democracy is much graver than any digital divide that is occurring because of socioeconomic disparities in the availability of laptops.

Second, many of the decisions being made that impact our lives are being made based on flawed or limited data. Private corporations are limited by law from gathering certain kinds of data, and current methods of surveying people are unreliable, yet millions of dollars, both private and public, are spent everyday based on inaccurate data.

Third, as bad as the situation is, it could very easily get worse. The status quo will certainly collapse if it continues. Consumers are increasingly aware of how much of their information is being collected, especially when a government laptop is stolen, or when Yahoo helps the Chinese government identify an online dissident. Even private corporations are unhappy with the status quo. They are limited in their ability create high-quality, accurate, and detailed personal data sets, and they know the government is likely to squash their ability to collect even the data they currently collect.

If this happens, we as a society will never find out what we might have achieved with this valuable resource.

How could opening up access to personal data benefit all of us?

For the first time in human history, we not only have the technical capacity to crunch a lot of numbers, we are inadvertently generating a wealth of numbers (primarily as a byproduct of the internet boom) that pushes data anlaysis well beyond abstracted mathematical models and ventures into the realm of approximating the richly textured and self-contradictory mess of reality.

An individual with access to anonymized aggregates from similar individuals would have a much more powerful tool than a Google search to determine what kind of medical treatment is best, what kind of investment strategies make the most sense, or what a house is really worth (to them). That individual would not have to rely only on the advice of doctors being paid by pharmaceutical companies, financial advisers with interest in the companies whose stocks they sell, or shady real estate developers.

On a broader level, a broad database of anonymized, aggregated data would allow a public health official to determine how best to use limited resources to fight disease. A researcher could analyze, rather than theorize, whether government tax policies have the effects legislators claim. Cutting-edge technology for anonymization and encryption could allow government agencies to be transparent, as mandated by law, without jeopardizing individuals’ privacy. We could actually create programs that work and eliminate those that don’t.

We are at the brink of something huge and utterly unprecedented. In the same way Gutenberg couldn’t fully anticipate how society would be revolutionized by his printing press, we cannot anticipate how the world could change through smart, thoughtful collection, sharing, and analysis of personal information. The technology has raced ahead of our ability to understand its potential. But private companies have already realized that incredible potential exists, and it’s time for the public to realize this, too.

What can we do?

We are at a crucial moment in time. Consumers are concerned about invasions of privacy and identity theft. Corporations are at the limit of what they can collect, and fear data collection will be shut down altogether. Private and public sector interests are aligned in that we all know the status quo cannot continue. We thus have the opportunity to work together to make change now, by shifting the discussion from “privacy versus information” to “privacy and information.”

The Common Data Project seeks to foster awareness and dialogue around privacy issues and to promote a solution that returns ownership and control of personal data to individuals without shutting down the full potential of information sharing for society as a whole. To that end, CDP is currently working on creating new standards and policies that would enable the creation of “ datatrust,” a new kind of data store that allows users and institutions to securely and private share information they choose to share. Ultimately, however, the Common Data Project actively seeks any and all solutions that would effectively solve the data and privacy problems facing us today. The Common Data Project pledges to be transparent and open in the policies it promotes and invites participation from individuals, researchers, agencies, and private companies.

Continue on to our White Paper.