White Paper

A snapshot of our thinking as of December of 2008. Read our current white paper.

If you haven't already, you may want to begin by reading: What motivates us?

The “Information Age” has delivered on its promise of more information and more access to information for more people than ever before, “digital divide” notwithstanding. At the same time, the boundaries of publicly accessible information have shifted to include not only people and activities in the public domain, but private lives as well. You can Google the names of your favorite celebrity’s children. You can also Google the names of any casual acquaintance's children, not to mention where your acquaintance lives, for how long he’s lived there, how much he paid for his house and how much he owes in property taxes each year.

Yet despite our powerful and at times frightening ability to collect and access information, there is valuable information-sharing that is not happening. Researchers and government regulators want to analyze mortgage records to figure out what caused the subprime mortgage crisis. Academics and policy wonks want to peruse IRS records so they can analyze whether government tax policies do what they claim. The FDA wants to track any unforeseen risks to patients after drugs are approved. Patients desperate for a cure want researchers to share their results and quickly.

Even individuals yearn to share more information. The users of PatientsLikeMe have demonstrated there are many who see a real value in sharing detailed medical information as they seek effective treatment for Parkinson’s, multiple sclerosis, and HIV/AIDS by examining others’ experiences in excruciating detail. Users of Mint and other online financial services similarly want to take advantage of new information-sharing tools that the “socially aware” web 2.0 is only beginning to provide, even as they’re not quite sure what will happen to their personal information. TOP

We have the technology, we have the capacity. The only thing stopping us is the important issue of privacy.

The problem is partly technical: even when data is “scrubbed” of Personally Identifiable Information like name, date of birth, and Social Security number, it is statistically possible to identify individuals, especially when databases are combined, i.e., shared. Other methods of anonymization sacrifice data fidelity for privacy, making data less specific and therefore less useful. Unlike other forms of property, data can also be easily replicated and distributed multiple times. Even when institutions and individuals want to share information, they have reason to fear that the receiver may then turn around and share or use the information in unauthorized ways, or even merely store it so irresponsibly it becomes vulnerable to attack.

But a significant part of the problem is also the current culture of data collection. Existing privacy policies are meant more to protect companies from liability than protect individuals’ privacy rights. Although users do not yet fully understand what’s happening, they are increasingly aware that data collection is a one-way street. While businesses buy, sell, and share sensitive, personal information, the individuals from whom the information was collected cannot even access their own information. And as recent data leaks have shown, many who are collecting and storing data now cannot be trusted to do so securely. As a result, the debate between privacy advocates and businesses has been framed as a zero-sum game, privacy or information.

The Common Data Project believes that we can and should advocate solutions that promote privacy and information-sharing. We seek to change the culture of data collection from one where businesses and other data collectors have all the control to one where individual users are secure enough in their privacy to become active participants and consumers of data. Through a broad strategy of public education, development of new standards, and support for related technologies, CDP hopes to achieve the right balance between privacy rights and the benefits of information-sharing.

Below are descriptions of ongoing CDP projects, which we hope will provide interesting and valuable solutions.

A New Breed of Privacy Policies: Raising Standards

Right now, information-sharing is hampered by the legitimate concerns many have about their privacy rights. If more information-sharing is to happen, privacy standards must be raised, not merely maintained. Individuals and institutions must gain new confidence that their privacy will be protected. CDP therefore believes that an important first step is to develop new standards that describe what should be happening, not just what is happening in data collection.

For example, a baseline qualification for getting certified by a privacy compliance organization such as Truste, is to simply have a written privacy policy in place. Although a written policy is certainly a necessary component of privacy protection, providing credit for merely having a written policy does not push service providers to go beyond existing standards. Currently, many privacy policies do not even cover all traffic to a website, as they disclaim responsibility for the practices of their partners and/or third-party advertisers. A more comprehensive standard that requires policies to cover all traffic to a site would be a simple first step towards raising the bar on best practices. Although few companies now meet this standard, by declaring it to be a possibility, the standard would change the scope of public discourse. The promotion of such a standard would inform consumers that privacy policies are not now all-inclusive, while companies willing to meet the standard would be able to signal more clearly how they are different from their competitors.

Additional areas that would benefit from higher standards include:

  1. How much notice is required when the terms of a privacy policy change;
  2. How long data is stored;
  3. How explicitly companies describe how data is used;
  4. How data is secured and anonymized before it is shared with 3rd parties in order to provide an “appropriate” level of protection. New, improved standards also need to up the ante on how much control and access individuals have to data that is being collected about them:
  5. User access to collected data;
  6. User control over whether data is shared, with whom and for what purpose;
  7. User control over the “level of anonymization” applied to data before it is shared;
  8. User control over whether data can be combined with other data sets;
  9. User control over availability of data for public secondary use.

There are certainly many challenges we face in developing and promoting the application of such standards, but we believe this is an essential conversation to have as we as a society work to reconcile the goals of privacy and information-sharing. TOP

A New Kind of Data Dispensary: A Datatrust

The development of new standards is not only crucial for public education, it is also crucial for the kind of confidence that organizations and individuals need if they are to share information with each other. CDP believes that these standards, as they are developed, could form the basis for a new kind of data-sharing mechanism, a third-party nonprofit “datatrust” with the capacity to securely store and anonymize data while maintaining data fidelity and individual and organizational control over shared data.

As data collection and data-driven decision-making become the norm, the desire and need to get data that others possess but won’t share is becoming increasingly common. One such impasse of late involves subprime mortgage data that consumer advocacy groups and government regulators want (need) and that credit-rating agencies possess but refuse to hand over, because doing so would violate standing privacy agreements with borrowers.

The possibility of secure, anonymized data-sharing through a trusted intermediary would eliminate this privacy barrier and allow organizations to work together in ways previously not possible. With a datatrust, credit-rating agencies could hand their data over to the datatrust with a set of requirements for who may access the data and under what circumstances. Those granted entry would not gain direct access to the data itself, but to anonymized aggregates resulting from targeted queries. Examples of additional “circumstances” credit agencies could specify include: To what extent is the data anonymized? What range of queries is permissible? Can the data be combined with other data sets for more comprehensive analysis? Presumably, such conditions would be heavily negotiated between providers and consumers of sensitive data.

In the long run, CDP hopes that its support for a datatrust and related technologies will increase access to personal information by researchers and the general public. In the same way the holdings of a public library benefit the public, CDP hopes aggregate personal information, properly anonymized and secured, will become a valuable resource available to the general public.

A New Way to Enforce Privacy Policies: A Personal “Gatekeeper”

With the increasing ubiquity of online computing, there are few opportunities to feel complete confidence in the privacy of personal computers. Certainly every move in online services is tracked, but even local applications are increasingly “network-aware” and at the very least make periodic connections to remote servers for maintenance purposes. To guarantee privacy, we would need to erect impermeable walls that essentially take personal computing offline in order to enforce a zero-tolerance policy against “leaky” network connections. However, doing so would render most of the software we use either dysfunctional or useless.

In the best-case scenario, individuals could better safeguard their privacy by defining and enforcing their own privacy policies. Therefore, CDP is researching the potential for a Personal Gatekeeper application that would allow users to dictate the terms by which their data is used. The terms of such agreements would be derived from the standards CDP is working to develop, as described above. Of course, many unanswered questions remain in determining how network connections are monitored and terms enforced without placing undue burden on individuals.

Gatekeeping software would

  1. Provide users with a clear interface to understanding and setting contract terms;
  2. Reject data requests that violate those terms;
  3. Alert individuals when errant requests have been made;
  4. Help users understand the significance of the data they possess; and
  5. Expose for inspection, the contents of the data transmissions currently taking place “under the hood” obscured by vague EULAs and technical jargon.
  6. Ultimately, CDP seeks to explore solutions that redefine the interactions between those who possess personal information and who clamors to gain access to that data, and to change who is empowered to set the terms for access. TOP

New Technologies

The successful operation of the datatrust would depend, in part, on the development of promising new technologies for: anonymization; quantifying and tracking "privacy expenditures" incurred by data requests; managing the reputation of datatrust users; defining, monitoring and enforcing "individual" privacy policies. CDP plans to bring together those researching new technologies with those in search of immediate, practical applications in the hopes of bridging the gap between theoretical technology and usable, tested solutions. TOP