Did the NYTimes Netflix Data Graphic Reveal the Netflix Preferences of Individual Users?
Slate has an interesting slant on the New York Times graphic everyone's been raving about -- the most popular Netflix movies by zip code all over the country. It really is great and fun to play with, but as Slate points out, some of the zip codes with rather anomalous lists may be pointing to individual users. For example, 11317 has this top-ten list:
- Wall-E
- Indiana Jones and the Temple of Doom
- Oz: Season 3: Disc 1
- Watchmen
- The Midnight Meat Train
- Man, Woman, and the Wall
- Traffic
- Romancing the Stone
- Crocodile Dundee 2
- Godzilla's Revenge
11317 is the zip code for LaGuardia Airport, which doesn't have any residents. That means this list may very well represent the Netflix renting habit of a small group or even a single subscriber who has his or her DVDs mailed there.Slate finds some other zip codes that may represent a single subscriber, but doesn't point out the privacy problem here, despite the fact that Netflix is already in hot water about its data releases.We've said a lot about what "anonymization" means and what a privacy guarantee should include, so I won't say more here. Instead, I just want to point out that the Slate article helps illustrate the problem PINQ is trying to avoid. As Tony points out in his post, PINQ won't give you answers that would be changed by the presence of a single record. Of course, because PINQ gives aggregate answers, you wouldn't be asking questions phrased exactly as, "What are the top ten most popular Netflix movies for 11317?" But if you tried to ask, "How many people in 11317 had viewed "The Midnight Meat Train?", it would add sufficient noise that you would never know that the single person using LaGuardia airport as an address had viewed it.