Kerry-McCain Privacy Bill: What it got right, what's still missing.

May 11

At long last, we have a bill to talk about. It's official name is the "Commercial Privacy Bill of Rights Act of 2011" and it was introduced by Senators Kerry and McCain.I was pleasantly surprised by how well many of the concepts and definitions were articulated, especially given some of the vague commentary that I had read before the bill was officially released.[pullquote]Perhaps most importantly, the bill acknowledges that de-identification doesn't work, even if it doesn't make a lot of noise about it. [/pullquote]More generally though, there is a lot that is right about this bill, and it cannot be dismissed as an ill-conceived, knee-jerk reaction to the media hype around privacy issues.For readers who are interested, I have outlined some of the key points from the bill that jumped out at me, as well as some questions and clarifications. Before getting to that however, I'd like to make three suggestions for additions to the bill.

Transparency, Clear Definitions and Public Access

Lawmakers should legislate more transparency into data collection; they should define what it means to render data "not personally identifiable;" and they should push for commercial data to be made available for public use.

Legislators should look for opportunities to require more transparency of companies and organizations collecting data by establishing new standards for "privacy accounting" practices.

Doing so will encourage greater responsibility on the part of data collectors and provide regulators with more meaningful tools for oversight. Some examples include:

Companies collecting data should be required to identify outside contractors they hire to perform data-related services. Currently in the bill, companies are liable for their contractors when it comes to privacy and security issues. However, we need a more positive carrot to incent companies to keep closer track of who has access to sensitive data and for what purposes. A requirement to publicly account for that information is the best way to encourage more disciplined internal accounting practices.
Data collectors should publicly and specifically state what data they are collecting in plain English. Most privacy policies today are far too vague and high-level because companies don't want to be limited by their own policies.

For example, the following is taken from the Google Toolbar Privacy Policy:

"Toolbar's enhanced features, such as PageRank and Sidewiki, operate by sending Google the addresses and other information about sites at the time you visit them." (Italics mine.)

This begs the question, what exactly is covered by "other information?" How long I remain on a page? Whether I scroll down to the bottom of the page? What personalized content shows up? What comments I leave? The passwords I type in? These are all reasonable examples of the level of specificity at which Google could be more transparent about what data they collect. None of these items are too technical for the general user to understand and at this granularity, I don't believe such a list would be terribly onerous keep up to date. We should be able to find a workable middle-ground that gives users of online services a more specific idea of what data is being collected about them without overwhelming them with too much technical detail.

Legislators Need to Establish Meaningful Standards for Anonymization

After describing the spirit of the regulations, the bill assigns certain tasks that are either too detailed or too dynamic to "rulemaking proceedings." One such task is defining the requirements for providing adequate data security. I would like to add an additional, critical task to the responsibilities of those proceedings:They must define what it means to "render not personally identifiable" (Sec 202a5A) or "anonymise" (sec 701-4) data.Without a clear legal standard for anonymization the public will continue to be misled into believing that anonymous means their data is no longer linkable to their identity when in fact there can only ever be degrees of anonymity because complete anonymity does not exist. This is a problem we have been struggling with as well.[pullquote]Our best guess at a good way to approach a legal definition would be to build up a framework around acceptable levels of risk and require companies and organizations collecting data to quantify the amount of risk they incur when they share data, which is actually possible with something like differential privacy.[/pullquote]

Legislators Should Push for Public Access

[pullquote]Entities that collect data from the public should be required to make it publicly available, through something like our proposal for the datatrust.[/pullquote]Businesses of all sorts have, with the advent of technology, become data businesses. They live and die by the data that they come by, though little of it was given to them for the purposes it is now used for. That doesn't mean we should delete the data, or stop them from gathering it - that data is enormously valuable.It does mean that the public needs a datastore to compete with the massive private sector data warehouses. The competitive edge that large datasets provide the entities that have them is gigantic, and no amount of notice and security can address that imbalance with the paucity of granular data available in the public realm.Now for a more detailed look at the bill.

Key Points of the Bill

The bill is about protecting Personally Identifiable Information (PII), which it correctly disambiguates to mean both the unique identifying information itself AND any information that is linked to that identifier.
Though much of the related discussion in the media talks about the bill in terms of its impact to tracking individuals on the internet, the bill is about all commercial entities, online or off.
"Entities" must give notice to users about collecting or using PII - this isn't particularly shocking, but what may be more complicated will be what constitutes "notice".
Opt-out for individuals is required for use of information that would otherwise be considered an unauthorized use. (This is a nice thought, but the list of exceptions to the unauthorized use definition seems to be very comprehensive - if anyone has a good example of use that would "otherwise be unauthorized" and is thus addressed by this point, I would be interested to hear it.)
Opt-out for individuals is also required for the use of an individual's covered information by a third-party for behavioral advertising or marketing. (I guess this means that a news site would need to provide an opt-out for users that prevents ad-networks from setting cookies, for example?)
Opt-in for individuals is required for the use or transfer of sensitive PII (a special category of PII that could cause the individual physical or economic harm, in particular medical information or religious affiliations) for uses other than handling a transaction (does serving an ad count as a transaction? - this is not defined), fighting fraud or preventative security. Opt-in is also required if there is a material change to the previously consented uses and that use creates a risk of economic or physical harm.
Entities need to be accountable for providing adequate security/protection for the PII that they store.
Entities can use the PII that they collect for an enumerated list of purposes, but from my reading, just about any purpose related to their business.
Entities can't transfer this data to other entities without explicit user consent. Entities may not combine de-identified data with other data "in order to" re-identify it. (Unclear if they combine it without the intent of re-identification, but it has the same effect.)
Entities are liable for the actions of the vendors they contract PII work to.
Individuals must be able to access and update the information entities have about them. (The process of authenticating individuals to ensure they are updating their own information will be a hard nut to crack, and ironically may potentially require additional information be collected about them to do so.)

It's hard to disagree with the direction of the above points - all are ideas that seem to be doing the right thing for user privacy. However, there are some hidden issues, some of which may be my misunderstanding, but some of which definitely require clarifying the goal of the bill.

Clarifications/Questions

1. Practical Enforcement - While the bill specifies fines and indicates that various rule making groups will be created to flesh out the practical implications of the bill, it's not clear how the new law will actually change the status quo when it comes to enforcement of privacy rules. With no filing and accounting requirements to demonstrate that they are actually doing so, outside of blatant violations such as completely failing to provide notice to end users use of PII, the FTC will have no way of "being alerted" when data collectors break the rules. Instead, they will be operating blindly, wholly dependent on whistle blowers for any view into the reality of day-to-day data collection practices.2. Meaningful Notice and Consent - While the bill lays out specific scenarios where "proper notice" and "explicit [individual] consent" will be required, there is no further explication of what "proper notice" and "explicit consent" should consist of.Today, "proper notice" for online services consists of providing a lengthy legal document that is almost never read, and even more rarely fully understood by individuals. In the same vein, "Explicit consent" is when those same individuals "agree" to the terms laid out in the lengthy document they didn't read.[pullquote] We need guidelines that provide formatting and placement requirements for notice and consent, much the way the the FDA actually designed "Nutrition Facts" labels for food packaging.[/pullquote]3. Regulating Ad Networks - In the bill's attempt to distinguish between third-parties (requires separate notice) and business partners (does not require separate notice), it remains unclear which category ad networks belong to.Ads served up directly by New York Times on nytimes.com should probably be considered an integral part of the NYT site.However, should Google AdWords be handled in the same way? Or are they really third party advertisers that should be required to provide users with separate notice before they can set and retrieve cookies?More disturbingly, the bill seems to imply that online services gain an all-inclusive free pass to track you wherever you go on the web as soon as you "establish a business relationship," what EFF is calling the "Facebook loophole." This means that by signing up for a gmail account, you are also agreeing to Google AdWords tracking what you read on blogs and what you buy online.This is, of course, how privacy agreements work today. But the ostensible goal of this bill is to close such loopholes.

A Step In The Right Direction

The Kerry-McCain Privacy Bill is undeniable evidence of significant progress in public awareness of privacy issues. However, in the final analysis, the bill in its current form is unlikely to practically change how businesses collect, use and manage sensitive personal data.

Access to InformationKerry-McCain Privacy Bill

Alex Selkirk