Monday, March 12, 2007

Complex Data to Complex Knowledge

Dell Zhang quoted the challenging problems in Data Mining research [ICDM ‘05’].

It will be interesting to touch upon each of these problems in greater detail. However, for now, the most interesting bit is 4. Mining Complex Knowledge from Complex Data. That is what defines the heart of data mining for me.

Mining – From its origins in extraction of minerals, mining has traditional implied extraction of extremely valuable stuff from earth. Wikipedia says any material that cannot be grown from agricultural processes, or created artificially in a laboratory or factory, is usually mined. What is implicit here is the application of intelligence for achieving this feat.

What organizations are increasingly finding difficult to do is to revisit the (apparently) already mined data and come up with new strategies. And when we say already mined, most organizations find it difficult to let go of the semi-cooked analysis that might have been done to meet immediate requirements of marketing executives breathing down the neck of analytics departments.

Complex - A complex is a whole that comprehends a number of parts, especially one with interconnected or mutually related parts. [Wikipedia]

For most of the organizations today, integrating parts of information to see the bigger picture is the new challenge. Today, strategies are not being formed at department level and there is a higher need for departments to come together for an integrated strategy. A perfect example would be the need for IT, Marketing, Customer Services and Products team to work together for an end-to-end customer offering.

Data to Knowledge is the heart of analytics and there can be a host of tools used for traversing the distance.

Like every problem solving exercise, Data Mining and Analytics is an extremely structured exercise involving a series of rigorous steps

  1. Business Understanding – involving setting the context and defining the problem to be solved.
  2. Data Understanding – which involves getting a sense of the data that is available, that can be made available, and that needs to be available for solving the problem
  3. Data Preparation – One of the most important and rigorous steps of an analytics project, this involves bringing various data elements together and creating a data story. Understanding linkages between various data sources, their integrations and disintegrations, tying them with the problem objective to create new variables, vintage of data, changing shape and design of data capture at the enterprise level are all seemingly tedious but life-saving checkpoints!
  4. Modeling/Segmentation/Solutioning - This is the point where the wheat is separated from chaff. Having got your data together, can you use the appropriate statistical and analytical techniques such as cluster analysis, regression, neural networks, et al. to solve the problem at hand. The solutions here range from simple reporting dashboard to complex algorithms that are not easy to explain.
  5. Validation & Deployment – A true romantic movie is never over unless all the things have fallen into place. We need to be able to establish beyond doubt that the results are accurate. Predictive modeling projects have been known to use advanced validation techniques such as coefficient blasting, in and out of time validation, sensitivity analysis, bootstrapping, etc. Deployment faces a different set of challenges in being able to replicate the solution on a production server for ongoing maintenance and reporting.
  6. The key stakeholder buy-in – This is a step that everyone overlooks as part of the analytics lifecycle. However, this step has nothing much to do with analytics apart from making sure that the first 5 steps are correct to the last dot and cross and is well documented for everyone’s reference.

That’s where the sermon of Rabbi Amit gets over.

No comments: