"Licensing" Personal Information

A Thought Experiment To Remix Creative Commons Licenses for Personal Information: What, Why, and Challenges to Making It Work

Let us walk you through a thought experiment to try and imagine the ways in which licenses for personal information might work or not work.

  1. What would a Creative Commons-type license for personal information look like?

  2. Why do we need Creative Commons-style licenses for personal information?

  3. What are the challenges and how would you deal with them?

Creative Commons1, in creating its licenses2 , did a very sexy thing. It didn’t repeal the Sonny Bono Copyright Term Extension Act, it didn’t change technology. Yet it managed to shift the social norm around intellectual property. It’s now cool to share. And they did this, not by forcing people to give up their rights, but by offering a set of choices by which those rights can be exercised in a way that encourages collaboration and ultimately benefits the public.

Imitation being the sincerest form of flattery, we at CDP have been playing around with the idea of creating personal information licenses, a la Creative Commons. Right now, we live in a pardadoxical world where 1) people have little control over how their information is used and reused, and 2) lots of valuable, fascinating raw data is locked up because of the danger of violating privacy. Big corporations get a lot of value out of their data-mining; researchers and regular individuals, not so much. Modern privacy problems aren’t exactly analogous to modern intellectual property problems, but we think Creative Commons-type licenses could have a lot to offer in addressing these two issues. We’re certainly not the first3 to think along these lines, but we want to add our voice to the ongoing discussion.


1. What would a Creative Commons-type license for personal information look like?

WHAT CHOICES WOULD THE LICENSES OFFER?
Imagine a set of licenses with a specific, pre-determined set of choices. Anyone who wants to signal their willingness to make their personal information available to the public could choose among these licenses and display it prominently, wherever their information is provided, whether it’s an online forum, a social network, or even personal website or blog.

The choices could include the following:

Notification

  • First ask my permission before using the information

  • Tell me that you are going to use my information.

  • I don’t care.

Commercial / Non-commercial Use

  • First ask my permission before using the information

  • Tell me that you are going to use my information.

  • I don’t care.

Level of Privacy

  • If I’ve provided any of this information, strip my information of classic identifiers (as enumerated, most likely, name, email address, etc.), though with no guarantee that this equals “anonymous.”

  • If I have not provided any identifiers, do not try to re-identify me.

  • [Intermediary option of better anonymization, should the technology develop.]

  • I don’t care.

WHAT KIND OF “PERSONAL INFORMATION” COULD BE LICENSED?

The license could be attached to any personal information the individual has gathered and displayed. It could apply to:

Specifics of a medical condition, as shared on an online forum.

med.jpg

An individual’s profile information on Facebook, MySpace, or other social networking site.

An individual’s personal website and/or blog.  


As these examples make clear, we’re not talking about slapping a license on “all personal information” about a person in the abstract universe, but about placing a license on specific bits of data collected and displayed by an individual online. A set of information, a dataset, even arguably a database. It’s an open question, what might be “licensable,” what might even be worth licensing.

Which brings us to the question, is it worth licensing information that’s already out there, in public view? Would a license end up restricting rather than enabling more information sharing?

So what if one person uploads a dataset on her blog, making it public, and then says it’s available for reuse? How does that make the world a better place?

2. Why do we need Creative Commons-style licenses for personal information? It’s possible that although personal information licenses, a la Creative Commons, wouldn’t solve all data-collection problems today, it could shape and shift the debate in several important ways.

CREATE A PROACTIVE WAY FOR PEOPLE TO TAKE CONTROL OF THEIR INFORMATION.
Right now, we as users generally are told, “Take it or leave it.” We can agree with the terms of use that govern the use of our personal information, or not. A few companies are trying to offer more choices—Firefox has a “Private Browsing” option, Google offers some choices in what interests are tracked.4 But a user almost never gets a choice in how his or her information is used once it’s collected. A set of licenses could be a way to assert control instead of waiting for the choices to be offered. As many privacy advocates have noted, it’s problematic that most privacy choices are offered as an opt-out rather than an opt-in. A set of licenses would create a way to “opt-in” before being asked. Even if the licenses turned out to be difficult to enforce, if the licenses became popular and widespread, it would be harder to ignore that people do have preferences that are not being considered or honored.

CREATE A GRASSROOTS WAY FOR PEOPLE TO ACTIVELY SHARE THEIR INFORMATION FOR CAUSES THEY EXPLICITLY SUPPORT.

We’ve all seen campaigns that are organized around human-interest stories, true stories about real people that are meant to humanize a campaign and give it urgency. The current healthcare debate, for example, inspired a host of organizations to ask people to “share their stories,” the Obama administration’s site being one of the best-organized ones.5

It had the following "Submission Terms":

submissionterms.png

"By submitting your story, you agree that the story, along with any pictures or video you submit along with the story (the "Submission"), is non-confidential and may be freely used and disclosed, in whole or in part and in any manner or media, by or on behalf of Democratic National Committee ("DNC") in support of health care reform.

You acknowledge that such use will be without acknowledgment or compensation to you.

You grant DNC a perpetual, irrevocable, sublicensable, royalty-free license to publish, reproduce, distribute, display, perform, adapt, create derivative works of and otherwise use the Submission."

Despite the all-or-nothing language, the Obama site was still able to solicit a great number of stories. But the terms underscore a perennial problem for lesser-known organizations. How do people trust an organization with their stories?

A more decentralized set of licenses could allow people to essentially tag their information across the internet and flag that it’s been provided in support of a specific cause, without giving their stories explicitly to another organization. Individuals could also choose to tag their information in support of specific research projects.

The licenses could be an organizing tool, a way for organizations or people without established reputations to gather useful information without asking people to sign away the rights to their stories. Or the licenses could be a research tool, enabling new forms of data collection. Already, sociologists are exploring the possibilities of broadening research beyond the couple hundred subjects that can be managed through more traditional methods. Matt Killingsworth, a graduate student in psychology at Harvard, created an iPhone application that allows research subjects in a study on happiness to rate their happiness in real time, rather than through recollection with an interviewer later.6

Would the existence of standard licenses for sharing personal information make organizing around real stories easier? Could it make personal information-based research easier? Could it encourage people who support such causes or research but are uncertain about existing privacy guarantees more willing to try? We think it’s certainly worth exploring.

MAKE SHARING COOL (AND GOOD)

Creative Commons is not without controversy, but almost everyone would agree, what the organization did manage to do was making sharing work cool. The licenses created an easy way for people who shared the same view of intellectual property to band together and display their commitment. They also made it easier to advertise and sell this ethos of IP to others.

We wonder if a set of licenses for sharing personal information might not be able to do the same. We want to promote sharing information as a virtue, a civic act of generosity, and a way to enable all of us to have more information for decisions. We want donating information to feel like donating blood.

RAISE THE BAR ON USE OF PERSONAL INFORMATION IN RESEARCH, MARKETING, AND OTHER CONTEXTS

It may seem like we’re encouraging less use and reuse of information by imagining a system where people put licenses on information they already make public (see screenshots from the first post.) But what the licenses would make clear, which is not clear now, is that there is a difference between something being put out for the public, for general use and enjoyment, and something being put out for someone else’s reuse, gain, and potential profit. Those who use the license would be signaling clearly their willingness to make their information available for research and other public uses.

About a year ago, researchers at the Berman Center for the Internet and Society at Harvard released a dataset of Facebook profile information for an entire class of college students at an “an anonymous, northeastern American university.” As Michael Zimmer pointed out on his blog, however, the dataset was hardly “anonymous.”7 He was quickly able to deduce that the university in question was Harvard. Although some have argued that some of these profiles were already “public,” Zimmer argues (and we agree) that having a public profile does not equal consent to being a research subject:

This leads to the second point: just because users post information on Facebook doesn’t mean they intend for it to be scraped, aggregated, coded, disected, and distributed. Creating a Facebook account and posting information on the social networking site is a decision made with the intent to engage in a social community, to connect with people, share ideas and thoughts, communicate, be human. Just because some of the profile information is publicly available (either consciously by the user, or due to a failure to adjust the default privacy settings), doesn’t mean there are no expectations of privacy with the data. This is contextual integrity 101.

By creating a license that allows people to clearly signal when they do consent to being “scraped, aggregated, coded, dissected, and distributed,” we would also make clearer that when people don’t clearly signal their consent, that consent cannot be assumed.

ULTIMATELY CREATE NEW SCENARIOS IN WHICH LICENSES CAN BE USED

So far, the scenarios I’ve outlined in which a license could be applied are where information is being displayed openly, as on a website. But the licenses could eventually apply to more closed systems, where the individual’s decision to share data is not itself public.

CDP is working on building a datatrust, a new kind of institution and trusted entity to store sensitive, personal information and make it publicly accessible for research. Individuals and institutions could choose to donate data to the datatrust, knowing that they are contributing to public knowledge on a range of issues. CDP will likely use a system of licenses that allow each data donor to pre-determine his or her preferences on how their data is accessed rather than a single “terms of use” that applies to everyone, take it or leave it.

Similarly, if the licenses were to become popular, other organizations and companies that collect information from their members or account holders would be under pressure to offer these set choices or licenses when people sign up for accounts that require them to provide personal information.

3. What are the challenges and how would you deal with them?

We’ll admit it—we’re not sure how this will work. Yes, sharing information could be cool! People could exercise choices! Companies could be pressured to offer similar choices! But there are certainly obstacles and challenges to creating a system of personal information licenses for common use. We want to identify them and address them, hopefully as a community of people committed to creating more options in the way data is collected and shared.

PERSONAL INFORMATION ISN’T PROPERTY—WHY DO YOU WANT TO PROPERTIZE IT?

The short answer is, we don’t. We’re well aware that there is a history of academic debate on this issue, pro and con around whether making personal information personal property would make it easier to protect individual privacy. Although the issues are certainly interesting, we don’t want to step into that debate and we don’t think we have to for the licenses we’re imagining.

First, let’s examine how personal information is viewed today. I can’t own a fact. I can’t own the fact that I’m 32, but I can have copyright in an essay in which I state I am 32 and I can have copyright in a database that includes the fact I am 32 if I’m creative in building the structure of that database (in the U.S.).8 We can understand the reasoning behind this. We want to live in a world where facts are “free" to be used and reused without any need to pay a licensing fee.

But the simple declaration, “You can’t own a fact” doesn’t begin to describe the many ways in which people are collecting data, selling it, renting it, and otherwise making money off of it. When a company sells a mailing list, it may not “own” the fact that I live at XYZ Avenue in Brooklyn, but it certainly is using it to its advantage. Why, then, should the fact that I can’t own the fact of where I live keep me from sharing that data as I like and trying to control it in new ways?

The digital revolution is forcing us to think beyond property/not property. Facts have become valuable even when they're not technically "owned" by anyone. I haven’t come up with some snappy new terms to use, but the issue should no longer be defined solely around "property/not property."

Right now, we as users generally are told, “Take it or leave it.” We can agree with the terms of use that govern the use of our personal information, or not. A few companies are trying to offer more choices—Firefox has a “Private Browsing” option, Google offers some choices in what interests are tracked.8 But a user almost never gets a choice in how his or her information is used once it’s collected. A set of licenses could be a way to assert control instead of waiting for the choices to be offered. As many privacy advocates have noted, it’s problematic that most privacy choices are offered as an opt-out rather than an opt-in. A set of licenses would create a way to “opt-in” before being asked. Even if the licenses turned out to be difficult to enforce, if the licenses became popular and widespread, it would be harder to ignore that people do have preferences that are not being considered or honored.

bluekai.png

Some new businesses seem to be working off this model. BlueKai9 and KindClicks10, while collecting personal information for market research, provide individuals with a way of stating their preferences and monetizing their data. KindClicks, for example, allows individuals who contribute data to then donate the money they make off their data to the charity of their choice. BlueKai collects data through cookies, but provides a link on their site by which users can see what information has been gathered about them. Those who want to opt out can. Those that choose to participate in BlueKai’s registry can then choose to donate a portion of their “earnings” to charity.

We don’t actually want to model these companies’ ways of valuing data. CDP’s mission isn’t to make sure that everybody gets a dollar here or a dollar there every time their data is accessed. To us, the value of such information is immense to the public and yet not easily measurable in dollars. But we do want to explore the idea that we could just take control of our data and obtain value from it, even if it’s the non-monetary, social value of providing something useful to the public.

HOW WOULD THESE LICENSES BE ENFORCEABLE? WHAT ABOUT EXISTING TERMS OF USE ON ONLINE FORUMS, SOCIAL NETWORKS?

This is a big question. I’m not sure what kind of dataset could be licensable and the extent to which a license could cover facts within that dataset. Could the license really encourage new forms of sharing if there was no way to prevent people from using individual facts within that dataset outside of the license terms? How useful would such a license be?

Arguably, Creative Commons licenses are not easily enforced. Caselaw is moving in the direction that they are definitely enforceable. A district court in Amsterdam upheld the terms of a Creative Commons license in 200611, while the Ninth Circuit Court of Appeals ruled in 2008 that holders of open source licenses, like Creative Commons licenses, are able to seek injunctive relief for copyright infringement, rather than merely seeking relief for violation of a contract.12 But the vast majority of people using photographs, art, and other work outside the terms of the license do it without impunity. Most CC license holders never find out their Flickr photo was used outside the license terms, and most wouldn’t have the resources to do anything about it even if they did find out. Yet CC licenses have still managed to impact societal norms on intellectual property.

Personal information licenses may still have an effect, then, on societal norms about how information is collected and shared regardless of how much the licenses are litigated. Even the process of litigation may help us as a society have a smarter conversation about current practices.

As to the objection that the licenses wouldn't work in the face of existing terms of use for social networks and other sites -- the fact that I might not be able to "license" my own information that I put myself on Facebook just underscores why creative, proactive, even aggressive strategies might be necessary.

WHY WOULD YOU ENCOURAGE PEOPLE TO PUT THEIR PERSONAL INFORMATION OUT IN PUBLIC? ISN’T IT IRRESPONSIBLE TO ENCOURAGE PEOPLE TO PROVIDE INFORMATION THTA COULD INCREASE RISK OF IDENTITY THEFT AND FRAUD?

I don’t want to dismiss this concern off-hand. But as my father likes to say, everything in life has good and bad. There are many things we do that are risky, and we try our best to minimize those risks, both as a society when we pass laws and as individuals when we take more particular, personal measures. Driving is a very dangerous activity. It is also a very valuable one. Many governments have decided to legislatively require the wearing of seatbelts. Many of us personally make the decision to practice other safe driving techniques that aren’t legally required.

We think it’s imperative that we, as a society, think hard and carefully about how to minimize the risks of personal information being used, collected, and exchanged. Creative Commons-style licenses for personal information sharing may or may not be the best way to address today’s privacy problems. I'm curious to hear if you think the risks outweigh the benefits and why. But to shut down the idea solely because the risk exists -- that is not going to help push the conversation forward.

Conclusion

Licenses for making personal information more widely available for research and public use—would they work? Maybe, maybe not. Worth exploring? Most definitely.



FOOTNOTES
1 http://creativecommons.org/
2
http://creativecommons.org/about/licenses
3 There are others who are interested in applying Creative Commons principles to the issue of privacy and data, such as the group described here: 
http://www.securitycatalyst.com/creative-commons-for-privacy/. Their focus is a little different from ours, as they’re more focused on creating simple concepts for privacy rather than data-sharing.
4 As described in our blog post in March 2009, Google’s behavioral targeting ad program included features that long-time privacy advocates, including EFF, found worthy of praise. Google now will show which interests are being tracked and users can opt out of being tracked category by category. 
http://blog.myplaceinthecrowd.org/2009/03/27/transparent-google/
5 Available at 
http://stories.barackobama.com/healthcare/.
6 Described at 
http://bits.blogs.nytimes.com/2009/07/29/if-youre-happy-and-you-know-it-tell-your-phone/.
7
http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/.
8 More information available here: 
http://sciencecommons.org/resources/faq/databases#canicc
9
http://www.bluekai.com/index.html
10
https://kindclicks.com/
11
http://www.groklaw.net/article.php?story=20060316052623594/
12
http://creativecommons.org/press-releases/entry/8838/