Wednesday, March 21, 2007
More on Sports
I saw an article in viewpoint , the journal of Marsh & McLennan which again talks about the Maths of sports.
Some of my previous posts on this topic are here - 1, 2 and 3
Tuesday, March 20, 2007
Analytics and Invasion of Privacy
Lately, I have seen a string of articles on the concerns around Department of Homeland Security's A.D.V.I.S.E program (Analysis, Dissemination, Visualization, Insight and Semantic Enhancement). People have had concerns around invasion of privacy and DHS acting like a peeping tom.
A similar set of concerns came around the DoD and Microsoft deal for analyzing Electronic Patient Records.
“Dr. Deborah Peel, chairwoman of the Patient Privacy Rights Foundation, views the patient information not as a goldmine ripe for exploitation but as a collection of personal and sensitive health information that needs to be zealously guarded and only accessed with express consent by the patient.”
“data mining by definition compromises the privacy of people represented in that database - if your personal information is included, there is no way to opt out.”
1. What information is private information? When a customer signs up for something like a loyalty card and agrees to give information about their demographic profile, income, tastes & preferences, this is voluntary sharing of information. What they also realize, in addition, is that their purchase behavior can be tracked on the card (that’s how they earn loyalty points which are redeemable against tangible benefits). This kind of information, according to me, is not private information in the strictest sense (for the organization which has collected this data painstakingly).
3. What could be called privacy intrusion? Even though google has tried to bring some changes, what it usually does (I am talking about simple examples like ads next to your emails when you are accessing gmail accounts) can be called fairly intrusive.
2. Can private information be kept private? If we don’t associate a piece of information with a named individual in a database, keeping the name or the identifier as a random number, the analysis insights at the end of the day tell me a profile. Individuals with attributes a, b and c are more likely to behave in a certain manner. At this stage, we still don’t know who has attributes a b or c. This step is critical to upholding user privacy.
4. What can organizations do? As a third party analytics services provider, we must realize that data security standards need to be absolutely non-negotiable. This requires
a) working only with masked data,
b) removing information that helps identify an individual to as much extent as possible,
c) maintaining high security standards while transferring/porting data
d) create an onsite-offshore delivery model where data security concerns are alleviated by working onsite for some time and creating master data tables that alleviate data security concerns.
5. How big is the problem? Well, as any analytics provider will tell you, the real value of information is not in “who”, it lies in “what”, “how” and “why”! Once an organization has answered these three critical questions, “who” is the final step of the strategic gameplan, and can be answered at a group level, rather than individual level.
Having said this, projects like ADVISE are bound to create a fair bit of skepticism around the way private information will be treated, and the impact of lying in the group of false-positives (being identified as a terrorist when you are not one!)
Wednesday, March 14, 2007
Analytically Sports... Continued!
Going back to some of my earlier posts on use of analytics in sports - at Diamond Analytics Blog and here itself, what Fractal has already managed to do is a proof of concept.
The factors that I don't see them looking at is the location/playground/weather/batting order/ bowling order/ etc., which do have a big impact on performances.
> Under overcast conditions, the chances of Indian batsmen holing out to the wicketkeeper goes up significantly.
>> The chances of genuine swing bowlers running through the side on grassy pitches is high
>> On flat tracks, against minnows, in subcontinent kinda pitches, batsman have a feast day
These are examples of hypotheses that can be tested using data.
It would be interesting to see how teams can use a model like this to decide team composition, play batting orders, etc.!
Monday, March 12, 2007
Complex Data to Complex Knowledge
Dell Zhang quoted the challenging problems in Data Mining research [ICDM ‘05’].
It will be interesting to touch upon each of these problems in greater detail. However, for now, the most interesting bit is 4. Mining Complex Knowledge from Complex Data. That is what defines the heart of data mining for me.
Mining – From its origins in extraction of minerals, mining has traditional implied extraction of extremely valuable stuff from earth. Wikipedia says any material that cannot be grown from agricultural processes, or created artificially in a laboratory or factory, is usually mined. What is implicit here is the application of intelligence for achieving this feat.
What organizations are increasingly finding difficult to do is to revisit the (apparently) already mined data and come up with new strategies. And when we say already mined, most organizations find it difficult to let go of the semi-cooked analysis that might have been done to meet immediate requirements of marketing executives breathing down the neck of analytics departments.
Complex - A complex is a whole that comprehends a number of parts, especially one with interconnected or mutually related parts. [Wikipedia]
For most of the organizations today, integrating parts of information to see the bigger picture is the new challenge. Today, strategies are not being formed at department level and there is a higher need for departments to come together for an integrated strategy. A perfect example would be the need for IT, Marketing, Customer Services and Products team to work together for an end-to-end customer offering.
Data to Knowledge is the heart of analytics and there can be a host of tools used for traversing the distance.
Like every problem solving exercise, Data Mining and Analytics is an extremely structured exercise involving a series of rigorous steps
- Business Understanding – involving setting the context and defining the problem to be solved.
- Data Understanding – which involves getting a sense of the data that is available, that can be made available, and that needs to be available for solving the problem
- Data Preparation – One of the most important and rigorous steps of an analytics project, this involves bringing various data elements together and creating a data story. Understanding linkages between various data sources, their integrations and disintegrations, tying them with the problem objective to create new variables, vintage of data, changing shape and design of data capture at the enterprise level are all seemingly tedious but life-saving checkpoints!
- Modeling/Segmentation/Solutioning - This is the point where the wheat is separated from chaff. Having got your data together, can you use the appropriate statistical and analytical techniques such as cluster analysis, regression, neural networks, et al. to solve the problem at hand. The solutions here range from simple reporting dashboard to complex algorithms that are not easy to explain.
- Validation & Deployment – A true romantic movie is never over unless all the things have fallen into place. We need to be able to establish beyond doubt that the results are accurate. Predictive modeling projects have been known to use advanced validation techniques such as coefficient blasting, in and out of time validation, sensitivity analysis, bootstrapping, etc. Deployment faces a different set of challenges in being able to replicate the solution on a production server for ongoing maintenance and reporting.
- The key stakeholder buy-in – This is a step that everyone overlooks as part of the analytics lifecycle. However, this step has nothing much to do with analytics apart from making sure that the first 5 steps are correct to the last dot and cross and is well documented for everyone’s reference.
That’s where the sermon of Rabbi Amit gets over.
Thursday, March 8, 2007
Acquisitions - Offloading Offshore Analytics
- WNS is acquiring Marketics, an offshore analytics firm founded by some ex-P&G CMK senior professionals. People like Shankar Maruwada, Ramki, etc. are extremely smart people. A $65 MM cash deal for its 200 people, with $30MM upfront and $35MM earnout in an year, does show the smartness of the deal.
Isn't this somewhat an action replay of how Inductis was taken over by EXL (another large BPO company in India). Did things really change at Inductis? Not a lot. There were a lot of positive reinforcements, with a few bad things. The worst being the uncertainty around people's career.
The best being a sudden n-fold increase in sales staff. The to-be-debated item of the roster was "can BPO sales guys sell analytics projects, which are so specialized skillset driven ?"
My friends at Marketics (like The Other Side) - I wonder what they are feeling now.
That aside, why do I think this is an important news item -
1. The role of offshore analytics has gone up tremendously - Examples - Inductis doing it in 2001, Companies such as Marketics, Modelytics, MarketRx, Absolute data, Zeus Associates, etc. have been catering to analytics needs of companies across the globe.
2. A lot of these companies have done tremendously well as start ups and have managed to build the first level of growth. However, to hit the next level, most of the companies will need a higher amount of funding. That comes either through PE firms, VC firms, Investors who may want to fool around with the way the company is being run. Or, through buyouts like the Inductis one, where the provider continues to work as an independent company. But the acquisition is more often than not driven by the need to have more funds.
3. Most of these companies were set up by young, ambitious individuals who spotted an opportunity in their respective consulting/core organizations (Pharma folks starting Market Rx, MMG consultants starting Inductis, P&G folks starting Marketics). A lot of these people probably are looking for big money (one of the drivers of entrepreneurial fire is money). Its simpy a question of timing your investment. Well, sell high!
4. Does the scale of the game change for the offshore analytics providers????
Well.. lets keep talking about these!
Whats with going solo?
Kevin Hillstrom of MineThatData has also decided to go solo. No need to tell the readers who Kevin is and the analytical depth with which he writes.
Now, lets Mine This Data, pun unintended! :)
A. The analytics market is growing insanely. There is need and space for a large number of such strong SMEs as Kevin and Avinash.
B. A thorough understanding of analytics is a fairly complex skillset, and rare. Anyone can come and talk data and profiling and dashboarding and modeling. But there aren't too many people who understand the complete data analytics process well. Right from the vision, method, depth and technology for data acquisition to the expansive business application of analytical frameworks, while maintaing a sync the IT and Business startegy of the firm, is no mean task.
C. Most importantly - Intellectual bankruptcy. The number of analytics blogs that have come up, with people writing on specific subjects (Avinash on web analytics) to people writing on all analytics subjects (Kevin), is proof enough for me that the bubbling energy amongst all these intellectuals needs a vent. While blogging helps them think more, and beyond the scope of their day to day assignments, going solo implies they are ready to get dirty once again.
What do we have then - Market <> Skillset <> Desire! What comes out of it at the end - Pure Magic. All the best Avinash, Kevin!
Avinash - if the quality of your posts go down, I will start throwing hate mails at you! :)