The Datatrust Privacy Guarantee: Protecting the Datatrust from Compelled Disclosure

  1. Introduction: Why we need to understand privacy rights.

  2. A quick overview of federal privacy law.

  3. Implications for data collectors today.

  4. Implications for the datatrust.

1. Introduction: Why we need to understand privacy rights.

What’s more personal to you? Papers you have in your home or documents you store in the cloud? The prescription drugs you keep in your medicine cabinet or medical data stored at your pharmacy? The questions you ask your spouse in bed or the questions you type into your search engine?

In the U.S., the Fourth Amendment protects us from unlawful search and seizure, meaning that the police can’t just come barging into your home unless they have “probable cause” that you have committed a crime.

But now so much of what we consider “personal” is in the form of data, i.e., bits that can be easily replicated, shared, and stored in places very far from our homes, cars, or any other physical space that is protected by the Fourth Amendment.

It's not just a question of, “What can the government find out about you from Facebook?”

What can the government find out about you from third parties? From companies and businesses that have your data without your consent or even your knowledge?

Pharmacies sell prescription data that includes you; cellphone-related businesses sell data that includes you. So much of the data economy involves companies and businesses that don’t necessarily have you as a customer, and thus even less incentive to protect your interests.

Some of this data is anonymized, some of it is not. But even data that’s supposedly de-identified or anonymized isn't actually private. We know that such data can be combined with another dataset to re-identify people.

And we at the Common Data Project, in creating a datatrust, are stepping right into the middle of all of this.

We seek to create a new kind of institution, a datatrust, where organizations can safely release personal data without compromising individual privacy. Like any organization or business that stores a lot of personal information, we know that law enforcement officials may end up very interested in the data that we have. We need to understand how existing laws and proposed reforms might apply to us, and to be thoughtful and creative about what it means to protect privacy rights today. We don’t have all the answers, but we need to start with defining the questions.

2. A Quick Overview of Federal Privacy Law

Basic Fourth Amendment Rights

(Much of the information below is from the Electronic Freedom Foundation’s clear and easy to understand, “Surveillance Self-Defense Project.”)

Right now, the U.S. government’s ability to simply grab documents from your home is limited by the Fourth Amendment of the Constitution, which protects you from “unreasonable searches and seizures.” This is why on cop shows, you see police showing search warrants when they show up to search your house.

A search warrant requires the police to demonstrate “probable cause,” a reasonable belief that a person has committed a crime. The police must apply for the search warrant to a judge, specifying who, what, and where will be searched, and why they think they have probable cause, such as a tip from an informant.

This requirement does not only apply to your home. It may apply to your office, your hotel room, your car, anywhere that you might have a reasonable expectation of privacy.

But none of this applies in general to information you might give to a third party, a.k.a., Facebook.

Or almost all “third parties.” EFF states,

“…you will often have no Fourth Amendment protection in the records that others keep about you, because most information that a third party will have about you was either given freely to them by you, thus knowingly exposed, or was collected from other, public sources. It doesn’t necessarily matter if you thought you were handing over the information in confidence, or if you thought the information was only going to be used for a particular purpose.”

So what about financial records? Telephone records? Aren’t medical records protected?

Congress has passed special laws around certain classes of personal data, including some information collected by third parties like Facebook.

Electronic Communications: The Electronic Communications Privacy Act of 1986

The Electronic Communications Privacy Act of 1986 (18 USC 119) creates special protections for electronic communications, such as telephone calls. ECPA prevents the police from willy-nilly placing wiretaps.

The ECPA also has provisions for email, but because the law was passed in 1986, the provisions related to Internet communications are both outdated and unclear.

The Stored Communications Act, which is part of ECPA, governs what the government can access for communications service providers, such as your cellphone company, your Internet service provider, or email provider like Gmail. (It does not govern your interactions with “remote computing services,” companies that are not communications service providers but provide data storage services, or with websites or search engines that store information but do not provide communications services. More on this below.)

Some communications, like emails and voicemails, receive the strongest protection. The government can access emails, voicemails and other communications content stored by communications service providers ONLY IF the following conditions are met.

  • If the email or voicemail message is unopened or unlistened to, AND has been in storage for 180 days or less, the government must obtain a search warrant, though it need not notify you.

  • If you’ve opened or listened to the email or voicemail message, OR they’ve been unopened and stored for more than 180 days, the government can use a special court order or subpoena to access your message. Both a court order and subpoena are easier to get than a search warrant, though then you must be notified.

The Ninth Circuit has a different interpretation of the law, that even if the email has been opened, if the message is in electronic storage, the government must get a warrant if the email has been in storage for 180 days or less. That means you get a little more protection if you're in the Ninth Circuit, meaning the states of Alaska, Arizona, California, Hawaii, Idaho, Montana, Nevada, Oregon, and Washington. Elsewhere, the above provisions apply.

Basic subscriber information from your communications providers can be obtained with just a subpoena. Such information includes your

  • Name.

  • Address.

  • The length of time you've used that phone or Internet company.

  • Phone records, including telephone number and local and long distance telephone connection records.

  • Internet records, including when you signed on and off of the service, the length of each session, and the IP address that the ISP assigned to you for each session.

  • Information on how you pay your bill, including any credit card or bank account number the ISP or phone company has on file.

Who you communicate with, including email addresses, IP addresses, and how much data was exchanged, as well as web addresses of pages you visit, can be obtained by court order. A court order is harder to get than a subpoena but easier to get than a search warrant.

None of the above applies to companies or organizations that are not "communications providers."

The government has argued that records kept by search engines and other websites, because they are not “communications service providers,” can be obtained without a search warrant, court order, or subpoena.

Similarly, data stored with “remote computing services” can be obtained with only a subpoena, regardless of how old it is. The government is supposed to notify you, but the law makes it easy for law enforcement officials to delay until after they’ve gotten your data. However, data that’s stored in your desktop computer cannot be accessed without a search warrant. Thus, the law distinguishes, rather artificially, between data that’s stored in a desktop computer and data that’s stored in the cloud.

The protections described above also do not apply to businesses and schools that provide email services for employees and students, as they are not available to the public.

A coalition of businesses and advocacy groups, called Digital Due Process has proposed that the law be changed. Specifically, with regard to electronic communications and data storage, the coalition has proposed:

“A governmental entity may require an entity covered by ECPA (a provider of wire or electronic communication service or a provider of remote computing service) to disclose communications that are not readily accessible to the public only with a search warrant issued based on a showing of probable cause, regardless of the age of the communications, the means or status of their storage or the provider’s access to or use of the communications in its normal business operations.”

In other words, communications that most people consider private, including data stored in the cloud, can only be disclosed if law enforcement officials present a warrant issued on the basis of probable cause, whether or not the email's been opened or is more than 180 days old.

Other Relevant Categories of Data

Electronic communications records are the most obvious analogue to the kind of data that could be stored in our datatrust. There are other categories of data and information that receive special privacy protections under U.S. law that may not be directly applicable to data in the datatrust, but could be useful in helping us understand what kind of protections we should be advocating for.

Financial Records: The Right to Financial Privacy Act

The Right to Financial Privacy Act of 1978 carved out a statutory Fourth Amendment right around financial information, so that federal government agencies cannot obtain an individual’s financial records without an appropriate warrant or subpoena.

Research and Statistical Data

Medical research participants are protected from having their identifying information disclosed to law enforcement officials in several ways.

When federal agencies, such as the Centers for Disease Control, collect personal data, they are authorized to do so by the Public Health Service Act (42 USC 242k). Section 308(d) of the Act (42 USC 242m), the Privacy Act of 1974 (5 USC 552A), and the Confidential Information Protection and Statistical Efficiency Act (PL 107-347) prohibit the disclosure of that information without the individual’s consent. (Technically, the Privacy Act permits disclosure to law enforcement officials in certain situations, but the Confidential Information Protection and Statistical Efficiency Act states that data that is collected exclusively for statistical purposes must be used only for statistical purposes.)

Other researchers, who are not affiliated with federal agencies, can apply to the CDC for a Certificate of Confidentiality. Such a certificates “protect against compulsory legal demands, such as court orders and subpoenas, for identifying information or identifying characteristics of a research participant.” Any project that collects personally identifiable sensitive information, and that has been approved by an Institutional Review Board, is eligible for a Certificate. Federal funding is not required. The information that is protected includes “name, address, social security or other identifying number, fingerprints, voiceprints, photographs, genetic information or tissue samples, or any other item or combination of data about a research participant which could reasonably lead, directly or indirectly by reference to other information, to identification of that research subject.”

3. Implications for Data Collectors Today

Every large corporation is kept busy with requests for personal information via subpoena, court order, or warrant from law enforcement officials.

Google has published all of the government requests for data they've received.

Any business or organization potentially has to deal with such requests, but online businesses that collect a lot of personal information clearly have more data, making them particularly tempting for law enforcement. A policeman may walk into a local grocery store and ask questions about what you buy; the police can ask a search engine company what you search for.

Many companies (MicrosoftComcastFacebook, and MySpace) have created documents that describe what is available to law enforcement. These policies were recently published online, albeit not necessarily with the companies’ enthusiastic consent. What they gloss over in their privacy policies is outlined clearly in these documents. Reading them can be surprising, as they state pretty starkly how much information is available to the government. But they are complying with existing laws, and the frustration of dealing with outdated laws has led many of them to join the Digital Due Process coalition.

As troubling as all this might be, the relationship between a user and Facebook is at least relatively straightforward. The user knows his or her data has been placed in Facebook, and legislation could be updated relatively easily to protect his or her expectation of privacy in that data.

More complicated is the situation in which the government seeks information from a party that does not have a direct relationship with the user.

Increasingly, the companies that have data about you aren’t even the companies you initially transacted with.

For example, if I am a customer of Company A, and Company A gives “anonymized” data that includes me to Company B, and the government seeks that data from Company B, how are my rights implicated? Could the government seek that kind of data and avoid getting even a subpoena? What kind of recourse would I have? What if I am not identifiable in that dataset, but could be identified if crossed with other data? What if I’m not even the suspect they’re looking for? I might still care that the government has data on me. How would even proposed reforms by the Digital Due Process coalition deal with this reality?

This isn’t a hypothetical question. Recently, Gawker published a story involving the vulnerability of the AT&T website which exposed the email addresses of iPad owners. The FBI recently came to Gawker asking it to retain documents related to this story. The FBI in this case may not be looking for a suspect among these email addresses, but the privacy rights of those individuals are arguably at issue should Gawker hand the addresses it has over to the FBI.

A recent Third Circuit case suggested that privacy rights could reside in a legal entity, and not just an individual. In AT&T v. Federal Communications Commission, the court ruled that AT&T was protected by an exemption in the Freedom of Information Act (FOIA) that applies to “unwarranted invasions of personal privacy.”

However, building up protections around corporate privacy rights may not be the best way to protect individual privacy interests when data is held by corporations.

Public Citizen, EFF and other groups have filed an amicus brief in the government's appeal to the Supreme Court, arguing that FOIA is not meant to protect these kinds of corporate interests, as that would greatly limit information in situations such as “records about safety violations at a coal mine, environmental problems at an offshore oil rig, filthy conditions at a food manufacturing plant, financial shenanigans at an investment bank.”

But the case suggests that privacy rights, as personal as they are, may not be sufficiently protected by focusing solely on the individual.

4. Implications for the Datatrust?

The datatrust can likely expect that it, like any organization holding large amounts of data, will receive requests for law enforcement for that data.

We will have to have a clearly stated policy regarding how we deal with such requests. We’ll have to understand how existing law applies to us, and the extent to which we can lawfully push to protect the individual privacy rights of people whose data is in the datatrust.

How ECPA might apply

Right now, it’s unclear precisely how ECPA would apply to the datatrust. It most likely would not be considered an electronic communications service provider. It could be considered a remote computing service, in which case the datatrust could be compelled to disclose data by a subpoena. If the law were changed in accordance with Digital Due Process’s recommendation, it’s possible that any personally identifying data stored by the datatrust would be available to the government only with a search warrant issued based on a showing of probable cause.

At the same time, it’s unclear if ECPA, now or later, would apply clearly to a service for organizations rather than individuals. Although there are many businesses out there trucking in personal information, sometimes anonymized, sometimes not, the same uncertainty around them would apply to the datatrust. This is an area we will have to research.

Potential defenses to law enforcement requests

One way the datatrust could deal with law enforcement requests for data would be to limit the amount of data it actually holds. Right now, it’s not clear what kind of personally identifying information the datatrust will retain, and thus, what we might be compelled to disclose. We will have registered users, which means we will have names of organizations. The data will be raw, which may or may not include names and/or other identifying information.

Would it be possible, technologically, to store data in ways that limit the amount of data the datatrust can actually disclose? If we’re a privacy query+filter, could we avoid storing information that could be subpoenaed? Could we use technology to organize the data in some way so as to limit the data available to government?

Could we (and should we) create a new class of data to protect?

We at the Common Data Project are working on creating a datatrust because we believe a new kind of institution is needed to enable the public to make use of data in as powerful a way as is used by corporations and government agencies. Although we are not a lobbying organization, we do believe we as a society need to acknowledge data as a new kind of resource and come up with new legislation if necessary.

We've talked about how a datatrust will be like a bank or a credit union for personal data. People increasing store and back-up their data in the cloud because the cloud makes their data easier to access and use. It's arguably similar to the shift from people storing cash under their mattress to depositing in a bank. In that case, data would warrant privacy protection on par with financial records. Are we ready to talk about data privacy in a context that includes individual use of data, and not Google and Facebook's use of data?

Privacy protections for statistical information currently protect government agencies and non-goverment researchers, at least to the extent they're able to obtain discretionary Certificates of Confidentiality. It may be possible to create a similar type of document to protect institutions like the datatrust that seek to provide data as a public resource, for statistical research and analysis. Could such protection apply to an institution rather than a specific study? Would this require some formalization of what a datatrust is, in statutory or regulatory terms?