In my last article I had a quick look at some other tools for the analyst toolkit other than their web analytics system. These included Business Intelligence or OLAP tools, visualisation tools, statistical analysis and data mining tools. This week I want to take a deeper look at the use (and possible abuse) of statistical analysis and data mining techniques.
Statistical analysis and data mining covers a wide variety of approaches, methodologies and techniques that might be useful for the web analyst. They can be broadly be classified as follows:
- Statistical analysis
- Classification techniques
- Clustering and segmentation methodologies
- Forecasting
- Text analysis
It’s probably best to start with a note of caution. There’s a saying “If you torture the data long enough, it will tell you anything you want it to”. These kinds of data analysis techniques can be very powerful and they can be used to uncover nuggets of gold in your data. They also need to be used carefully. The analyst needs to ensure that the results are robust, reliable and above all make sense. Data mining is as much an art as it is a science.
Simple statistical analysis techniques such as frequencies and histograms can reveal interesting patterns in your data. I’ve written before about the dangers of using averages metrics such as “average pages per visits” as they hide interesting differences in behaviour. Worse than that, they can actually be misleading.
Often in the work we do, we will spend a lot of time initially carrying out exploratory analysis looking at the patterns and distributions in the data. It’s time well spent. It gives you a feel for what is going on below the topline metrics and also helps later when you begin to look at the results of other analytical techniques. As a marketing analyst you need to have a sense of how the data is made up, how the topline metrics are constructed and where they come from. For example, you may find that there are some extreme values or “outliers” that might affect your results and so need to be dealt with in some way or another.
With statistical analysis you may want to compare different groups of visitors or customers. For example, looking to see whether the repeat order rate is higher amongst some groups of customers than others. You can apply statistical tests to see whether any differences are real significant differences or whether they just might be because of the variability in the data. Significant difference testing can be important in experiments such as A/B tests to ensure that “A” is really better or worse than “B” before making any changes to the site.
There are many different types of “classification” techniques including regression analysis, often used in credit scoring, as well as Articicial Intelligence approaches including neural networks. The class of techniques that I want to take a look at today is the use of “decision trees“. There are a number of different algorithms in this type of technique including CHAID, CART and QUEST. These algorithms essentially do the same thing in different ways and that is to assign the data records (such as visitors or customers) into groups of interest based upon the other variables that you have on the record.
For example, you may have records on customers that splits them into two groups: “single order customers” and “repeat customers”. You may then also have a whole string of other data on those customers and you are interested in understanding what are the key characteristics that distinguish between someone who orders once and someone who goes on to order again. Decision Tree methods will look at all the other variables and determine which one is the most important factor in determining the difference between a single order shopper and a repeat order shopper. It then repeats the process again and gain until it has determined what all the significant factors are in order of priority.
The great thing about decision trees is that the output is very visual and relatively easy to understand. They can get a bit big and cumbersome though especially if you are dealing with a lot of variables. Decision Tree techniques have been used for years in direct marketing work to determine which type of people are most likely to respond to mailings, so that companies can cut down on mailing costs.
In online marketing, mailing costs isn’t such as big issue as it is in the offline world but we have used techniques like decision tress in other areas to understand what the factors are that influence visitors to do something or not. In the example above of single order customers vs repeat order customers we did a piece of work where we looked at many potential factors that included:
- the size or the first order
- the number of visits to the website after the first order
- the product category of the first order
- the product categories browsed after the first order
- whether they were opted in to the email newsletter
- how many newsletters they had received
- the timing of the newsletters after the first order
We found that the most important factor in determining whether someone went on to order again after their first order (out of all the ones we examined) was that someone had opted into the email newsletter and had received a newsletter within 5 days of that first order. Vital input into a retention marketing programme.
Decision Tress techniques are also useful for profiling and understanding different segments of visitors or customers. Segmentation techniques are what I will be looking at in the next part of this series.
Till then…
More from Applied Insights
See more: Articles
See more: Predictive analytics
See more: Data mining
This entry was posted on 8 Jun 2006 by Neil Mason.
Filed under:
- Articles
- Predictive analytics
- Data mining
Find an article or post
Archives
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- December 2005
- November 2005
- October 2005
- September 2005
- August 2005
- July 2005
Keywords
Analytics strategy Blogchat Campaign analysis Consumer insight Data integration Data mining Europe Forecasting Future conferences KPIs Loyalty Optimisation Past conferences Predictive analytics Search engine marketing Segmentation Surveys Testing WAA Web analytics


