From now on, make sure that the vanity searches originate from a different IP address than the how-to searches regarding killing, maiming, and dead people…

The AOL search data scandal is a welcome wakeup call. It is useful to remember that even a common, theoretically harmless internet activity might be used to correlate between normally segmented parts of a person’s identity. Data that people believe goes no farther than from the chair to the keyboard gets published, and one more illusion of privacy goes out like the baby with the bathwater.

Ever heard the abbreviation “TMI”? It means “Too Much Information”. Generally it applies when somebody volunteers embarrassing and/or revealing information over and above what is necessary in the context of the conversation, resulting in discomfort and/or disgust on the part of the conversational partner.

We are, as a population, entering the age of TMI. Scores of people (including myself) are busily working on data entry: their thoughts, biographies, portraits, proclivities, and personal habits are being eagerly keyed in. Some of them are bright enough to do so pseudonymously or anonymously. Regardless of how they do it, it seems to me that there is no guarantee whatsoever that their anonymity or pseudonymity, or even their expectations of freedom from webcrawler indexing will stand the test of time.

I think most people, whether they are aware of it or not, still believe in security by obscurity. Sure, if somebody worked hard it might be possible to realize that Mary Smith is “concernedParent” on motherhoodparenting.com, but “naughtyGirl” on sexymommies.com, but why would anyone think to correlate those identities together? Such a correlation today takes an active effort, and it’s difficult to conceive of why anyone would even bother.

Of course, security by obscurity definitely didn’t work for Thelma Arnold. Her identity was extracted using the AOL dataset and other publicly available information for no other reason than because it could be. I’ll bet the phone calls from the reporters were an unwelcome shock. Even data that cannot quite personally identify someone now might cumulatively do so later. Perhaps the AOL dataset links “concernedParent” to “naughtyGirl”, but one year later, a different body of data manages to link “concernedParent” to Mary Smith. At this point the link between Mary Smith and “naughtyGirl” is there for the farming. Imagine what will happen if anyone on the internet figures out who user #17556639 is. Even worse, imagine if they get it wrong.

So when will be the day that the right body of personally correlational data combines to reveal your secrets? Maybe never. Maybe tomorrow. Just because it is obscured now, doesn’t mean it will be forever. And once an internet search for Mary Smith links to “naughtyGirl”, it won’t go away — the problem with a TMI situation is that the damage cannot be undone.

All I can think to hope for is a partial solution: mutual assured TMI. If everyone has as much dirt as everyone else, the dirt might become less significant. At least the excusable indiscretions might be overlooked. MySpace, you might save us yet….