4. What is their data retention policy and what does it say about their commitment to privacy?

Data retention has been a controversial issue for many years, with American companies not measuring up to the European Union’s more stringent requirements. But for us, it obscures what’s really at stake and often confuses consumers.

For many privacy advocates, limiting the amount of time data is stored reduces the risk of it being exposed. The theory, presumably, is that sensitive data is like toxic waste, and the less we have of it lying around, the better off we are. But that theory, as appealing as it is, doesn’t address the fact that our new abilities to collect and store data are incredibly valuable, not just to major corporations, but to policymakers, researchers, and even the average citizen. Focusing on this issue of data retention hasn’t necessarily led to better privacy protections. In fact, it may be distracting us from developing better solutions.

Google and Yahoo! in the past year announced major changes to their policies about data retention. These promises, however, were not promises to delete data, but to “anonymize” it after 9 months and 6 months, respectively. As discussed previously, neither company defines precisely what the word “anonymize” means. According to the Electronic Frontier Foundation, Yahoo! is still retaining 24 of 32 digits of users’ IP addresses. As the Executive Director of Electronic Privacy Information Center (EPIC) stated, “That is not provably anonymous.” Yet most mainstream media headlines focused only on the Yahoo!’s claim of shorter data retention. The article in which the above quote appeared sported the headline: “Yahoo! Limits Retention of Personal Data.”

Interestingly, the debate around data retention has also focused primarily on these three large Internet companies. Even though companies like eBay and Amazon also retain significant amounts of data on their users, there hasn’t been any public clamor for Amazon to delete its data as soon as possible. Certainly, the volume and breadth of data Amazon collects pales in comparison to what Google has access to, and some might argue that search queries are more “private” than what books one chooses to buy. But most people still probably wouldn’t want their purchase histories on Amazon to be revealed willy-nilly.

A different take on why data retention (which is not addressed at all in its privacy policy) has not become a major issue for Amazon is that Amazon does a better job of showing how its data collection can be useful to its users.

Every item view shows what others have considered buying, what others have ended up buying, what else you might like. In contrast, Google, Yahoo!, and Microsoft have yet to vividly demonstrate why collecting and retaining data makes their services better. Perhaps if they did, they would be less hard-pressed to delete their data as soon as possible.

When I look at a search engine like Ixquick, which is trying to build a reputation for privacy by not storing any information, I’m even less convinced that deleting all the data is a sustainable solution. Ixquick is a metasearch engine, meaning that it’s pulling results from other search engines. It’s not a solution to replace Google or Yahoo! for everyone. It feels more like a handy tool for someone who is wants to know his search queries aren’t being tracked than a model that other search engines could end up following.

If data deletion by all search engines is the goal, the example to hold up can’t be a search engine that relies on other non-deleting search engines!

At the same time, despite all the controversy around data retention, this issue isn’t even addressed in the privacy policies of these three large internet companies. Google addressed this issue in a separate FAQs section, while Yahoo! addressed it in a press release and its blog. Microsoft in December 2008 said that they would cut their data retention time from 18 months to six if their major competitors did the same. But this information was not in the privacy policy itself. Among the other companies we looked at, Wikipedia, Ask Network, Craigslist, and WebMd did address the question of data retention in at least some limited way in their policies. No information could be found readily on the sites of eBay, AOL, Amazon, New York Times Digital, Facebook, and Apple.

What exactly do we want to keep private? At the same time, what information do we want to have? What is the best way to balance these interests? These are the questions we should be asking, not “How long is Yahoo! going to keep my data?”

Questions we asked of each company.

  1. What data collection is happening that is not covered by the privacy policy?
  2. How do they define “personal information”?
  3. What promises are being made about sharing information with third parties?
  4. What is their data retention policy and what does it say about their commitment to privacy?
  5. What privacy choices do they offer to the user?
  6. What input do users have into changes to the policy’s terms?
  7. To what extent does they share the data they collect with users and the public?

Introduction / Conclusion / Preview Blog Posts