Can differential privacy be as good as tossing a coin?
At the end of my last post, I had reasoned my way to understanding how differential privacy is capable of doing a really good job of erasing almost all traces of an individual in a dataset, no matter how much "external information" you are armed with and no matter how pointed your questions are.Now, I'm going to attempt to explain why we can't quite clear the final hurdle to truly and completely eradicate an individual's presence from a dataset.
- If coins are actually weighted such that one side is just ever-so-slightly heavier than the other side.
- And such a coin is spun by a platonically balanced machine.
- And the coin falls with the head's side facing up.
- And I only get one "spin" to decide which side is heavier.
- Probabilistically, (by an extremely slim margin) I'm better off claiming that the tail's side is heavier.
Translate this slightly weighted coin toss example into the world of differential privacy and PINQ and we have an explanation for why complete non-discernibility is also non-possible.
I have a question. I know ahead of time that the only two valid answers are 0 and 1. PINQ gives me 1.7.Probabilistically, I'm better off betting that 1 is the real answer.In fact, PINQ doesn't even have to give me an answer so close to the real answer. Even if I were to ask my question with a lot of noise, if PINQ says -10,000,000,374, then probabilistically, I'm still better off claiming that 0 is the real answer. (I'd be a gigantic fool for thinking I've actually gotten any real information out of PINQ to help me make my bet. But lacking any other additional information, I'd be an even gigantic-er fool to bet in the other direction, even if only by a virtually non-existent slim margin.)The only answer that would give me absolutely zero "new information" about the "real answer" is 0.5 (where the two distribution curves for 0 and 1 intersect). An answer of 0.5 makes no implications about whether 0 or 1 is the "real answer." Both are equally likely. 50/50 odds.But most of the time...and I really mean most of the time, PINQ is going to give me an answer that implies either 0 or 1, no matter how much noise I add.
Does this matter? you ask.
It's easy to argue that if PINQ gives out answers that imply the "real answer" over "the only other possible answer" by a margin of, say, 0.000001%, who could possibly accuse us of false advertising if we claimed to guarantee total non-discernibility of individual records?(As it turns out, coin tosses aren't really a 50/50 proposition. they're actually more of 51/49 proposition. So perhaps the way you would answer the "Does it matter?" question depends on whether you'd be the kind of person to take "The Strategy of Coin Flipping" seriously.)
Nevertheless, a real problem arises when you try to actually draw a definitive line in the sand about when it's no longer okay for us to claim total non-discernibility in our privacy guarantee.
If 50/50 odds are the ideal when it comes to true and complete non-discernibility, then is 49/51 still okay? 45/55? What about 33/66? That seems like too much. 33/66 means that if the only two possible answers are 0 and 1, PINQ is going to be twice as likely to give me an answer that implies 1 than as to give me answer that implies 0.
Yet still I wonder, does this really count as discernment?
Technically speaking, sure.But what if discernment in the real world can really only happen over time with multiple tries?If I ask a question and I get 4 as an answer. Rationally, I can know that a "real answer" of 1 is twice as likely to yield a PINQ answer of 4 as a "real answer" of 0. But I'm not sure if viewed through the lens of human psychology, that makes a whole lot of sense.After all, there are those psychology studies that show that people need to see 3 options before they feel comfortable making a decision. Maybe it takes "best out of 3" for people to ever feel like they can "discern" any kind of pattern. (I know I've read this in multiple places, but Google is failing me right now.)Here's psychologist Dan Gilbert on how we evaluate numbers (including odds and value) based on context and repeated past experience.These two threads on the difference between the probability of a coin landing heads n-times versus the probability of the next coin landing heads after it has already landed n-times further illustrates how context and experience cloud our judgement around probabilities.
If my instincts are correct, what does all this mean for our poor, beleaguered privacy guarantee?