Tuesday, March 20, 2007

Analytics and Invasion of Privacy

Lately, I have seen a string of articles on the concerns around Department of Homeland Security's A.D.V.I.S.E program (Analysis, Dissemination, Visualization, Insight and Semantic Enhancement). People have had concerns around invasion of privacy and DHS acting like a peeping tom.

A similar set of concerns came around the DoD and Microsoft deal for analyzing Electronic Patient Records.

“Dr. Deborah Peel, chairwoman of the Patient Privacy Rights Foundation, views the patient information not as a goldmine ripe for exploitation but as a collection of personal and sensitive health information that needs to be zealously guarded and only accessed with express consent by the patient.”

This blog here raises an interesting point

“data mining by definition compromises the privacy of people represented in that database - if your personal information is included, there is no way to opt out.”

The least that can be said is that the concerns are valid. But having said that, here is what I think of the problem of privacy invasion -

1. What information is private information? When a customer signs up for something like a loyalty card and agrees to give information about their demographic profile, income, tastes & preferences, this is voluntary sharing of information. What they also realize, in addition, is that their purchase behavior can be tracked on the card (that’s how they earn loyalty points which are redeemable against tangible benefits). This kind of information, according to me, is not private information in the strictest sense (for the organization which has collected this data painstakingly).

3. What could be called privacy intrusion? Even though google has tried to bring some changes, what it usually does (I am talking about simple examples like ads next to your emails when you are accessing gmail accounts) can be called fairly intrusive.

2. Can private information be kept private? If we don’t associate a piece of information with a named individual in a database, keeping the name or the identifier as a random number, the analysis insights at the end of the day tell me a profile. Individuals with attributes a, b and c are more likely to behave in a certain manner. At this stage, we still don’t know who has attributes a b or c. This step is critical to upholding user privacy.

4. What can organizations do? As a third party analytics services provider, we must realize that data security standards need to be absolutely non-negotiable. This requires

a) working only with masked data,

b) removing information that helps identify an individual to as much extent as possible,

c) maintaining high security standards while transferring/porting data

d) create an onsite-offshore delivery model where data security concerns are alleviated by working onsite for some time and creating master data tables that alleviate data security concerns.

5. How big is the problem? Well, as any analytics provider will tell you, the real value of information is not in “who”, it lies in “what”, “how” and “why”! Once an organization has answered these three critical questions, “who” is the final step of the strategic gameplan, and can be answered at a group level, rather than individual level.

Having said this, projects like ADVISE are bound to create a fair bit of skepticism around the way private information will be treated, and the impact of lying in the group of false-positives (being identified as a terrorist when you are not one!)

No comments: