Having finished the last blog by arguing that more people could be benefiting from Predictive Analytics this begs the question of where to start if you are not doing it already.
Is it just a case of buying a predictive analytics software tool and “plugging it in”? … probably not. Nor do you necessarily need to hire expensive contractors/consultants, unless the need justifies it.
As you might expect part of the predictive analytic process is to decide what the potential cost/benefit of the activity would be, and indeed whether predictive analytics as an approach has any chance of meeting the business/research objectives you have in mind. The good news is that there are some pointers to help make those decisions.
A process template for undertaking a predictive analytical project
As I mentioned last time there is a significant overlap between the types of analytical activity described as “Data Mining” and those which are also termed “Predictive Analytics”. For that reason we feel that the process involved in executing the former (and for which there are existing process blueprints) are currently the best templates for undertaking the latter.
There are probably 2 leading process models at the time of writing.
1. SAS have their own called SEMMA.
2. A cross-industry forum including DaimlerChrysler, SPSS, NCR Teradata and others have developed CRISP-DM.
Generally speaking these models cover the same ground, and or not unlike many consulting project engagement models. We tend to use CRISP-DM as the basis for our work for 2 main reasons:
1. Our sense is that more collaborative thought has gone into producing a more detailed template.
2. It is somewhat broader in the sense that it covers the business objectives and the ultimate application (deployment) of the outcomes (e.g. models, scores, etc.) in more detail.
The 6 steps in the CRISP model can be visualised here.
Simply put they are:
1. Business understanding
Starting with a business goal (or goals) - e.g. reduce the rate at which my customers are defecting -this is the crucial step in which we take those objectives and begin to evaluate the business context before embarking (or in some cases deciding not to embark) on the analytical process.
2. Data understanding
The second understanding step is to audit and investigate the data in the various data sources which can potentially provide the grist for the analysis. This covers both a top level analysis of the metadata and a deeper, exploratory, analysis of the data.
We typically see the two understanding steps as part of the same phase. Once complete you should be in a position to evaluate what is likely to be achievable from a modelling perspective. Most often we find that there is enough potential to continue with the modelling, though in many cases the project may turn out to be somewhat different to the original expectations. In a small minority of cases there may not be that potential, or we may feel that it is too costly/risky to undertake the prospective analysis.
3. Data preparation
This is the necessary, but arguably the least interesting, step – unless you like this kind of thing!. Sometimes described as the ETL (Extract, Transform and Load) stage this is where we beat the data into shape by importing it into an appropriate format for the target analytical tool(s) chosen in the understanding phase. In our experience this is the step that takes a bigger chunk of the project than one might expect.
4. Modelling
This, and the next step which is strongly linked, is the crux of the process. This is where we apply one or more – usually several – appropriate modelling techniques to the data. We shall talk more about user interfaces to models and software tools later but model selection “usually” requires a level of expertise to identify appropriate modelling techniques which fit the shape of the data (e.g. some modelling algorithms require input data to be normally distributed).
5. Evaluation
Quite simply how well does the model perform. This may be as straightforward as looking at the percentage accuracy of the model predictions against an unseen test (“holdout”) sample. It could be about evaluating how the model performs more sophisticated scenarios related to profitability, or the risk of investigating too many non-fraudulent credit card transactions and annoying too many loyal customers.
6. Deployment
The whole point of the exercise is to apply the results of the analytical process in a way which creates benefit going forward. Broadly speaking this takes 2 forms.
1. It could be about simply making decisions based on the insight generated, e.g. deciding to open a new retail outlet in a location which scores highly from the perspective of market potential.
2. Or it could be about integrating the results data (e.g. propensity scores), or even the model, in a way which can automate operational actions. For example we might embed an on-line advertising click fraud detection model in out web analytics process to send/report alerts when potentially malevolent transactions are generated.
Or we might simple generate a list of new customers who were scored highly by a model which predicts lifetime value, but who we believe need to be engaged early in their lifecycle to meet that potential. Such a list can generate call centre actions or marketing campaigns (and the model may also indicate which is the more appropriate).
One important note is that - as you can see from the CRISP diagram – this isn’t necessarily a linear process. For example we might find gaps in the data in the preparation step that lead us to re-evaluate our understanding/objectives, or alternatively we find that the data we are modelling may have issues which can be resolved through new transformations (e.g. imputing missing values) as part of a new preparation step.
It may look quite heavy, but it doesn’t have to be. In our experience the process model can range from:
- Small scale: I have a spreadsheet with the last 5 years sales in it I wonder if I can predict sales for the next 3 months?
In a sense this puts the cart before the horse; having data sparks a potential business objective which we might not have thought about. The more explicit objective could be to meet sales targets. The whole exercise (up to deployment at least) could take less than a day. - To the larger scale: Can I identify new customers who have the potential to be the most profitable in the future? Against a business objective of growing customer profitability.
This example starts with the business objective in the regular way but may require more convoluted merging of data from various databases, potentially related to customer acquisition from different channels, products, divisions, countries, etc. A project of this kind could take weeks, and sometimes months, to complete.
What happens in practice?
I’ll use the next few blogs to walk through the process model and give some pointers to what we’ve found to be important on engagements. I’ll try and be candid enough to identify areas where we’ve had success (or otherwise), where things can go wrong, and where we think there are limitations in the methodology.
More from Applied Insights
See more: Applied Insights Blog
See more: Predictive analytics
See more: Data mining
This entry was posted on 17 Oct 2006 by John McConnell.
Filed under:
- Applied Insights Blog
- Predictive analytics
- Data mining
Find an article or post
Archives
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- December 2005
- November 2005
- October 2005
- September 2005
- August 2005
- July 2005
Keywords
Analytics strategy Blogchat Campaign analysis Consumer insight Data integration Data mining Europe Forecasting Future conferences KPIs Loyalty Optimisation Past conferences Predictive analytics Search engine marketing Segmentation Surveys Testing WAA Web analytics


