<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.0.3" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Applied Insights</title>
	<link>http://www.applied-insights.co.uk</link>
	<description>Creating customer insight through data</description>
	<pubDate>Fri, 07 Nov 2008 14:54:06 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.3</generator>
	<language>en</language>
			<item>
		<title>Seeing … or Not Seeing</title>
		<link>http://www.applied-insights.co.uk/news/2008/08/30/seeing-%e2%80%a6-or-not-seeing/</link>
		<comments>http://www.applied-insights.co.uk/news/2008/08/30/seeing-%e2%80%a6-or-not-seeing/#comments</comments>
		<pubDate>Sat, 30 Aug 2008 09:43:48 +0000</pubDate>
		<dc:creator>John McConnell</dc:creator>
		
	<category>Applied Insights Blog</category>
	<category>Predictive analytics</category>
	<category>Data mining</category>
		<guid isPermaLink="false">http://www.applied-insights.co.uk/news/2008/08/30/seeing-%e2%80%a6-or-not-seeing/</guid>
		<description><![CDATA[When we think about how to evaluate a predictive model the first thing we typically think of is how accurately does that model predict against the (unseen) test data. More often than not though when we develop models our business/research customers want more than that. They want to know how the algorithm got to the [...]]]></description>
			<content:encoded><![CDATA[<p>When we think about how to evaluate a predictive model the first thing we typically think of is how accurately does that model predict against the (unseen) test data. More often than not though when we develop models our business/research customers want more than that. They want to know how the algorithm got to the predictions i.e. they want to understand the model.</p>
<p>The more transparent predictive methods don&#8217;t just predict they also reveal the patterns that underlie them. The two main benefits of this are that</p>
<ol>
<li>Subject Matter Experts (SMEs)– typically on the business/research side – can assess the model&#8217;s validity by viewing these patterns, for example as rules or formulae. This way they can see if the inherent relationships make sense. Do they see any potential anomalies in the data that we didn&#8217;t pick up when we previously explored it?</li>
<li>And of course the patterns themselves may reveal useful insights. We often find specific segments of interest; demographic groups who have a higher propensity to convert through a given channel, or re-purchasers who have short, but potentially interesting and valuable, buying cycles.</li>
</ol>
<p>The bottom line is that when we can see what a model is doing we can glean much more from it than the likelihood that the outcome of interest (convert, attrite, default, etc.) will happen.</p>
<p>To be frank most of our projects are like this. This is where Decision Tree methods often win out because the output let&#8217;s us visually explore the data to both understand the model and to examine other potential patterns of interest. They may not necessarily give us the most accurate predictions but often the SMEs care more about understanding than predicting. This is a classic trade-off in PA.</p>
<p>There are exceptions to this. The alternative view is that accuracy is paramount and it could be that the winning model is opaque. Neural Network models are a case in point. Depending on the software you are using you might see a ranked list of fields which contribute to the prediction along with the prediction itself and perhaps an associated confidence level. Even if the final network is displayed it doesn&#8217;t necessarily explain much more.</p>
<p>For the most part these are the two most typical scenarios however we are currently designing a 3<sup>rd</sup> type – where opaqueness is the main objective (together with an acceptable level of predictive accuracy of course). We&#8217;re talking to a government department who don&#8217;t want to have to send sensitive data out and who don&#8217;t want our models to reveal any of that information either. So the gist of our approach is that we&#8217;ll develop black-box models on our data and let them deploy them on their database. They&#8217;ll give us addresses and predictive scores in return but in so doing we won&#8217;t know why a particular address was selected.</p>
<p>Anyone living in the UK will understand the political backdrop to this as there have been various high profile cases of data going AWOL (<a href="http://www.timesonline.co.uk/tol/news/uk/crime/article4583747.ece">here is the latest one</a>). We are hoping that a somewhat unorthodox application of Predictive Analytics might help the UK government provide a valuable public service without further compromising the confidentiality of its citizens. There&#8217;s many a slip twixt the cup and the lip mind you … we&#8217;ll keep you posted …
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.applied-insights.co.uk/news/2008/08/30/seeing-%e2%80%a6-or-not-seeing/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>An Introduction to Predictive Analytics, London, 22nd May 2008</title>
		<link>http://www.applied-insights.co.uk/news/2008/04/30/an-introduction-to-predictive-analytics-london-22nd-may-2008/</link>
		<comments>http://www.applied-insights.co.uk/news/2008/04/30/an-introduction-to-predictive-analytics-london-22nd-may-2008/#comments</comments>
		<pubDate>Wed, 30 Apr 2008 11:03:47 +0000</pubDate>
		<dc:creator>Neil Mason</dc:creator>
		
	<category>Conferences and presentations</category>
	<category>Past conferences</category>
	<category>Segmentation</category>
	<category>Forecasting</category>
	<category>Predictive analytics</category>
	<category>Data mining</category>
		<guid isPermaLink="false">http://www.applied-insights.co.uk/news/2008/04/30/an-introduction-to-predictive-analytics-london-22nd-may-2008/</guid>
		<description><![CDATA[Neil and John ran a one day workshop in Predictive Analytics in association with the Emetrics Marketing Optimisation summit on 22nd May at the Hotel Russell in London. A course outline is below.
Please contact us if you would be interested in joining one of our courses or developping a customised in-house training session on predictive [...]]]></description>
			<content:encoded><![CDATA[<p>Neil and John ran a one day workshop in Predictive Analytics in association with the Emetrics Marketing Optimisation summit on 22<sup>nd</sup> May at the Hotel Russell in London. A course outline is below.</p>
<p>Please <a title="Contact us" href="http://www.applied-insights.co.uk/about/contact-us/">contact us</a> if you would be interested in joining one of our courses or developping a customised in-house training session on predictive analytics.</p>
<h2>Predictive Analytics - course outline</h2>
<p>An Introduction to Data Mining and Predictive Analytics is a one day workshop covering the foundations of this innovation marketing analytics discipline. During the course of the day you will gain a thorough familiarisation with some of the key principles and methodologies of data mining and predictive analytics and learn how to apply them to common marketing problems such as:</p>
<ul>
<li>How can I predict campaign response?</li>
<li>How do I segment my website visitors or customers?</li>
<li>How can I anticipate possible customer defections?</li>
</ul>
<p>In this one day interactive course we will cover the following topics:</p>
<h2><span style="color: #4f81bd"><em>Introduction:<br />
</em></span></h2>
<ul>
<li>What is data mining and how is that different to predictive analytics?</li>
<li>How organisations are currently using data mining and predictive analytics across their businesses and to solve particular marketing problems</li>
</ul>
<h2><span style="color: #4f81bd"><em>Processes and implementation<br />
</em></span></h2>
<ul>
<li>How to go about a data mining/predictive analytics project</li>
<li>An overview of a standard industry process (CRISP-DM)</li>
</ul>
<h2><span style="color: #4f81bd"><em>Methods and applications<br />
</em></span></h2>
<ul>
<li>
<div>An overview of the main types of data mining and predictive analytics applications:</div>
<ul>
<li>Forecasting</li>
<li>Segmentation</li>
<li>Classification</li>
</ul>
</li>
<li>
<div>An introduction to main methodologies such as:</div>
<ul>
<li>Time-series forecasting</li>
<li>Regression analysis</li>
<li>Decision trees (CHAID, CART and so on)</li>
<li>Cluster analysis</li>
<li>Neural networks</li>
</ul>
</li>
<li>
<div>Case studies and examples of how these techniques are used and deployed in both online and offline marketing is areas such as:</div>
<ul>
<li>Retention modelling</li>
<li>Conversion propensity modelling</li>
<li>Visitor segmentation</li>
</ul>
</li>
</ul>
<p> 
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.applied-insights.co.uk/news/2008/04/30/an-introduction-to-predictive-analytics-london-22nd-may-2008/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Web Analytics Congress, Maarssen, The Netherlands, May 2008</title>
		<link>http://www.applied-insights.co.uk/news/2008/04/29/web-analytics-congress-maarssen-the-netherlands-may-2008/</link>
		<comments>http://www.applied-insights.co.uk/news/2008/04/29/web-analytics-congress-maarssen-the-netherlands-may-2008/#comments</comments>
		<pubDate>Tue, 29 Apr 2008 13:25:43 +0000</pubDate>
		<dc:creator>Neil Mason</dc:creator>
		
	<category>Conferences and presentations</category>
	<category>Past conferences</category>
	<category>Analytics strategy</category>
	<category>Predictive analytics</category>
	<category>Data mining</category>
		<guid isPermaLink="false">http://www.applied-insights.co.uk/news/2008/04/29/web-analytics-congress-maarssen-the-netherlands-may-2008/</guid>
		<description><![CDATA[At this year&#8217;s annual Web Analytics Congress in Holland, Neil delivered a keynote presentation on Marketing Optimisation and Predictive Analytics.

]]></description>
			<content:encoded><![CDATA[<p>At this year&#8217;s annual <a title="Dutch Web Analytics Congress" href="http://www.webanalyticscongres.nl/index.aspx" target="_blank">Web Analytics Congress</a> in Holland, Neil delivered a keynote presentation on Marketing Optimisation and Predictive Analytics.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.applied-insights.co.uk/news/2008/04/29/web-analytics-congress-maarssen-the-netherlands-may-2008/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Emetrics Marketing Optimization Summit, San Francisco, May 2008</title>
		<link>http://www.applied-insights.co.uk/news/2008/04/29/emetrics-marketing-optimization-summit-san-francisco-may-2008/</link>
		<comments>http://www.applied-insights.co.uk/news/2008/04/29/emetrics-marketing-optimization-summit-san-francisco-may-2008/#comments</comments>
		<pubDate>Tue, 29 Apr 2008 13:10:49 +0000</pubDate>
		<dc:creator>Neil Mason</dc:creator>
		
	<category>Conferences and presentations</category>
	<category>Future conferences</category>
	<category>Segmentation</category>
	<category>Forecasting</category>
	<category>Predictive analytics</category>
	<category>Data mining</category>
		<guid isPermaLink="false">http://www.applied-insights.co.uk/news/2008/04/29/emetrics-marketing-optimization-summit-san-francisco-may-2008/</guid>
		<description><![CDATA[At this year’s Emetrics Summit in San Fransisco, Neil will be presenting a session in the &#8220;Advancced Analytics Track&#8221; entitled “Cutting through the NOISE: Applications of data mining and predictive analytics”.
The presentation will be looking at the application of techniques such as segmentation and propensity modelling to better understand website visitor behaviour.

]]></description>
			<content:encoded><![CDATA[<p>At this year’s Emetrics Summit in San Fransisco, Neil will be presenting a session in the &#8220;<a title="EMetrics San Fransisco" href="http://www.emetrics.org/2008/sanfrancisco/track_advanced_web_analytics.php" target="_blank">Advancced Analytics Track</a>&#8221; entitled “Cutting through the NOISE: Applications of data mining and predictive analytics”.</p>
<p>The presentation will be looking at the application of techniques such as segmentation and propensity modelling to better understand website visitor behaviour.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.applied-insights.co.uk/news/2008/04/29/emetrics-marketing-optimization-summit-san-francisco-may-2008/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Internet Marketing Conference, Stockholm, November 2007</title>
		<link>http://www.applied-insights.co.uk/news/2007/11/23/internet-marketing-conference-stockholm-november-2007/</link>
		<comments>http://www.applied-insights.co.uk/news/2007/11/23/internet-marketing-conference-stockholm-november-2007/#comments</comments>
		<pubDate>Fri, 23 Nov 2007 16:53:09 +0000</pubDate>
		<dc:creator>Neil Mason</dc:creator>
		
	<category>Conferences and presentations</category>
	<category>Future conferences</category>
	<category>Forecasting</category>
	<category>Predictive analytics</category>
	<category>Data mining</category>
		<guid isPermaLink="false">http://www.applied-insights.co.uk/news/2007/11/23/internet-marketing-conference-stockholm-november-2007/</guid>
		<description><![CDATA[A return visit to Stockholm by Applied Insights this year. This time John will be giving a presentation at the Internet Marketing Conference on &#8220;Predictive Analytics - Why Bother?&#8221;. He&#8217;s also on a panel on the subject of Testing and Analysis and has been roped in to moderating a panel session on Web Analytics. Should [...]]]></description>
			<content:encoded><![CDATA[<p>A return visit to Stockholm by Applied Insights this year. This time John will be giving a presentation at the <a title="IMC" href="http://www.internetmarketingconference.com/" target="_blank">Internet Marketing Conference</a> on &#8220;Predictive Analytics - Why Bother?&#8221;. He&#8217;s also on a panel on the subject of Testing and Analysis and has been roped in to moderating a panel session on Web Analytics. Should be interesting&#8230;
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.applied-insights.co.uk/news/2007/11/23/internet-marketing-conference-stockholm-november-2007/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Emetrics Marketing Optimization Summit, Washington DC, October 2007</title>
		<link>http://www.applied-insights.co.uk/news/2007/10/23/emetrics-marketing-optimization-summit-washington-dc-october-2007/</link>
		<comments>http://www.applied-insights.co.uk/news/2007/10/23/emetrics-marketing-optimization-summit-washington-dc-october-2007/#comments</comments>
		<pubDate>Tue, 23 Oct 2007 16:29:20 +0000</pubDate>
		<dc:creator>Neil Mason</dc:creator>
		
	<category>Conferences and presentations</category>
	<category>Past conferences</category>
	<category>Forecasting</category>
	<category>Predictive analytics</category>
	<category>Data mining</category>
		<guid isPermaLink="false">http://www.applied-insights.co.uk/news/2007/10/23/emetrics-marketing-optimization-summit-washington-dc-october-2007/</guid>
		<description><![CDATA[At this year&#8217;s Emetrics Summit in Washington DC, Neil presented a paper entitled &#8220;Cutting through the NOISE: Applications of data mining and predictive analytics&#8221;.
You can download a copy of the presentation here&#8230;

]]></description>
			<content:encoded><![CDATA[<p>At this year&#8217;s Emetrics Summit in Washington DC, Neil presented a paper entitled &#8220;Cutting through the NOISE: Applications of data mining and predictive analytics&#8221;.</p>
<p>You can download a copy of the presentation <a id="p151" title="Neil Mason Emetrics DC 2007.pps" href="http://www.applied-insights.co.uk/wp-admin/Neil%20Mason%20Emetrics%20DC%202007.pps">here&#8230;</a>
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.applied-insights.co.uk/news/2007/10/23/emetrics-marketing-optimization-summit-washington-dc-october-2007/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Predictive analytics Part 2</title>
		<link>http://www.applied-insights.co.uk/news/2007/10/19/predictive-analytics-part-2/</link>
		<comments>http://www.applied-insights.co.uk/news/2007/10/19/predictive-analytics-part-2/#comments</comments>
		<pubDate>Fri, 19 Oct 2007 22:29:05 +0000</pubDate>
		<dc:creator>Neil Mason</dc:creator>
		
	<category>Articles</category>
	<category>Forecasting</category>
	<category>Predictive analytics</category>
	<category>Data mining</category>
	<category>Web analytics</category>
		<guid isPermaLink="false">http://www.applied-insights.co.uk/news/2007/10/19/predictive-analytics-part-2/</guid>
		<description><![CDATA[In part one of this series, I examined visitor segmentation, a data-mining technique. Now, let&#8217;s look at how data mining can be used to understand important visitor behavior over time.
Quite often when we use Web analytics systems, we focus on what visitors do during a particular visit. The classic conversion funnel is a good example [...]]]></description>
			<content:encoded><![CDATA[<p>In part one of this series, I examined visitor segmentation, a data-mining technique. Now, let&#8217;s look at how data mining can be used to understand important visitor behavior over time.</p>
<p>Quite often when we use Web analytics systems, we focus on what visitors do during a particular visit. The classic conversion funnel is a good example of this trendMost Web analytic systems look at the conversion funnel in the context of a single visit, that is, they report on how people got to page A, then B, then C, and so on within a single visit. This information is useful because it helps identify potential process areas that need improvement. But if we think about those times when a visitor might make multiple visits to a site before a conversion, the classic conversion funnel might not give you a true perspective on what&#8217;s happening.Take the example of buying car insurance online. In the U.K., it&#8217;s a very competitive business. Consumers typically shop around for quotes and go for the best value proposition. As a result, it&#8217; s very unlikely people will arrive on a site and buy car insurance on their first visit. Maybe they&#8217;ll arrive from a search engine, check out the proposition, and bookmark the site for future reference. Maybe later they&#8217;ll come back, get a quote, and leave to compare it to other quotes. Hopefully they&#8217;ll return to complete the policy application process, and a sale is made.</p>
<p>A generic conversion funnel analysis will contain an amalgam of all three types of behavior: research, quote, purchase. As a result, you&#8217;re not seeing a true reflection of your ability to convert opportunity into value unless you analyze visitor behavior over sequences of visits, rather than just within the single visit.</p>
<p>If you work with Web analytics data, you know it&#8217;s hard enough to understand what&#8217;s going on when examining a person&#8217;s behavior in a single visit. Analyzing behavior over multiple visits adds complexity. Here, data mining and predictive analytical techniques come into play.</p>
<p>If we accept (as in the car insurance example) that conversion is often a multivisit process, we must understand the process&#8217;s key drivers over time if we are to influence that visitor&#8217;s behavior. We must find out what behaviors over multiple visits are most likely to lead to a successful outcome.</p>
<p>Using a decision-tree technique like CHAID can help you understand how different visitor behaviors over multiple visits may increase or decrease the likelihood of converting a browser into a buyer. CHAID, which is highly visual, shows factors that influence conversion in a tree diagram in the order they influence people.</p>
<p>) can help you understand how different visitor behaviors over multiple visits may increase or decrease the likelihood of converting a browser into a buyer. CHAID, which is highly visual, shows factors that influence conversion in a tree diagram in the order they influence people.) can help you understand how different visitor behaviors over multiple visits may increase or decrease the likelihood of converting a browser into a buyer. CHAID, which is highly visual, shows factors that influence conversion in a tree diagram in the order they influence people.) can help you understand how different visitor behaviors over multiple visits may increase or decrease the likelihood of converting a browser into a buyer. CHAID, which is highly visual, shows factors that influence conversion in a tree diagram in the order they influence people.As with the segmentation approach described in part one, data must be in the right shape before an analysis is started. That requires extracting and summarizing data to key activities and events in each visit of the visitor lifecycle. I often think that data mining and predictive analytics are part art, part science. The art requires possessing the right data in the right format for algorithms to provide meaningful and useful results. In these days of automated analytics, anyone can produce a model. It&#8217;s a question of whether the model is good or not.</p>
<p>In working with these techniques, we commonly find there are a small number of highly influential conversion drivers over multiple visits. Naturally those drivers vary from site to site, but the importance of time is usually one thing they share in common. The time between the first and second visit, and the second and third visit, and so on, are quite often a good predictor of the subsequent outcome.</p>
<p>As the need to tune the online marketing processes continues, organizations must add capabilities to their analytics tool kit. Data-mining and predictive analytical techniques are firmly established within other marketing disciplines. Perhaps their time is now coming in the online world.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.applied-insights.co.uk/news/2007/10/19/predictive-analytics-part-2/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Predictive Analytics Part 1</title>
		<link>http://www.applied-insights.co.uk/news/2007/10/05/predictive-analytics-part-1/</link>
		<comments>http://www.applied-insights.co.uk/news/2007/10/05/predictive-analytics-part-1/#comments</comments>
		<pubDate>Fri, 05 Oct 2007 22:20:01 +0000</pubDate>
		<dc:creator>Neil Mason</dc:creator>
		
	<category>Articles</category>
	<category>Segmentation</category>
	<category>Predictive analytics</category>
	<category>Data mining</category>
	<category>Web analytics</category>
		<guid isPermaLink="false">http://www.applied-insights.co.uk/news/2007/10/05/predictive-analytics-part-1/</guid>
		<description><![CDATA[In my last article I outlined my belief that what we call “web analytics” is becoming a more diverse and complex field. What we have traditionally considered to be web analytics has been the analysis of site behavioural data captured, processed and reported on by a proprietary system designed to do just that. But as the [...]]]></description>
			<content:encoded><![CDATA[<p>In my last article I outlined my belief that what we call “web analytics” is becoming a more diverse and complex field. What we have traditionally considered to be web analytics has been the analysis of site behavioural data captured, processed and reported on by a proprietary system designed to do just that. But as the online channel evolves and becomes more complex , the tools to help us understand what’s happening must also evolve and become more complex. In some areas, such as in the case social media, this may mean the development of new tools. In other areas it may mean the application of old tools to this new channel.<br />
One of the areas that we work in a great deal is in the use of <a href="http://en.wikipedia.org/wiki/Data_mining" target="_blank">data mining</a> and predictive analytical techniques. I first got started in this area about 15 years ago when at ACNielsen using these types of methodologies to help clients to try and figure out which half of their advertising money they were wasting. I have a book on my bookshelf that was published 25 years ago on the use of model building techniques in marketing. So the techniques aren’t new but what is relatively new is the systematic use of these techniques in the online marketing space.</p>
<p>I think that there are some reasons for this. Historically our main concern has been on managing the vast volumes of data and wrestling out of the web analytics systems a few numbers that told us how well we were doing and that we could do something about. Also, in the past, the natural organic growth in the channel has meant that we have not been faced with the need to scramble for market share and to fully optimise our business processes. And to some extent, we have not been asking the right questions. This is now changing. We understand our few numbers and we want to know more. The online world is far more competitive and we are beginning to ask questions that go beyond the limits of our traditional analytical tool set. Questions like:</p>
<ul>
<li>“How do I understand the effects different marketing channels have on generating sales?”</li>
<li>“What does the purchase lifecycle look like over multiple visits and how can I optimise it?&#8221;</li>
<li>“How should I be segmenting my audience or customers, to improve the effectiveness of my marketing activity?”</li>
</ul>
<p>To answer these types of questions we are going to have to start to organise the data in different ways and we need to bring in some different tools. First of all we need to integrate our data so that we can see different aspects of the acquisition, conversion and retention processes in one place, Secondly we need to aggregate our data so that its focuses on the visitor or customer rather than the click or the visit. Thirdly we need to cut through the noise in the data using more sophisticated analytical techniques to get at the key insights. Let me give you an example of what I mean.</p>
<p>We all know that different types of people come to our websites for different reasons and to do different things. If I treat everyone the same, I am being sub-optimal in my decision making about how I allocate marketing funds and about how I manage the user experience. I need to segment my audience so that I can market to these different groups more effectively. However, I can’t do that on the basis on how they behave on the website alone, I need to also understand their demographics, their intentions, their aspirations and their opinions. So I need to integrate my hard core behavioural data with profiling and attitudinal data drawn from other data sources like surveys.</p>
<p>Next, I am interested in the behaviour of visitors over multiple visits rather than what they do in a single visit. So I need to aggregate the data so that I have a record of the behaviour of different visitors over a period of time. Also I probably need to summarise the data and create additional attributes which describe aspects of that behaviour over time such as number of visits made, number of conversions events, types of conversion events and so on.</p>
<p>Finally, I need to analyse the data to identify interesting and meaningful segments of visitors. In all likelihood I will probably have quite a large and noisy dataset where I won’t be able to see the forest for all the trees. Traditional querying and reporting techniques are unlikely to be an effective method of identifying the patterns, I need to use something that will find the patterns in the data for me. In this case I decide to use cluster analysis. The cluster analysis process looks for groups of visitors in the data, where the people within the groups have something in common but what they have in common is different from group to group. What I have to do then is interpret that data to understand what it is the visitor segments have been clustered on and decide whether these are meaningful and useful segments that I can do something with. This process may yield some surprising results and enable to think about the audience in a way that I had not previously thought of them before. I may find patterns and relationships in the data that I would never have found using traditional analysis techniques.</p>
<p>So using data mining and predictive analytical techniques will allow organisations to unlock more value from their data but it requires a different approach to managing your data, different tools and different skills. Next time I will look at another application of data mining and predictive analytics; to understand what are the important factors are that affect someone’s propensity to buy something during the purchase lifecycle.</p>
<p>Till then…
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.applied-insights.co.uk/news/2007/10/05/predictive-analytics-part-1/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>The increasing diversity and complexity of “Web Analytics”</title>
		<link>http://www.applied-insights.co.uk/news/2007/09/21/the-increasing-diversity-and-complexity-of-%e2%80%9cweb-analytics%e2%80%9d/</link>
		<comments>http://www.applied-insights.co.uk/news/2007/09/21/the-increasing-diversity-and-complexity-of-%e2%80%9cweb-analytics%e2%80%9d/#comments</comments>
		<pubDate>Fri, 21 Sep 2007 22:17:40 +0000</pubDate>
		<dc:creator>Neil Mason</dc:creator>
		
	<category>Articles</category>
	<category>Predictive analytics</category>
	<category>Data mining</category>
	<category>Web analytics</category>
		<guid isPermaLink="false">http://www.applied-insights.co.uk/news/2007/09/21/the-increasing-diversity-and-complexity-of-%e2%80%9cweb-analytics%e2%80%9d/</guid>
		<description><![CDATA[This week I’ve been starting to get my head around the presentation that I will be giving at the Emetrics Summit in Washington in October. I was looking at the agenda and was struck by the vast breadth of material being presented over a full three days, with 11 different tracks and 3 workshops. I [...]]]></description>
			<content:encoded><![CDATA[<p>This week I’ve been starting to get my head around the presentation that I will be giving at the <a href="http://emetrics.org/2007/washingtondc" target="_blank">Emetrics Summit</a> in Washington in October. I was looking at the agenda and was struck by the vast breadth of material being presented over a full three days, with 11 different tracks and 3 workshops. I consider myself to be a bit of an Emetrics veteran (this will be my eighth) and it used to be considered exciting when we split up onto separate tables for an hour or so in the conference room to discuss different topics. Now we can go to a whole track on it and not see each other for 3 days except at the networking events.</p>
<p>The Emetrics Summit is a bellwether of the web analytics industry. It’s not just the growth in the size of the conference that reflects the dynamics in the industry but also the increased diversity and complexity of the subject matter and the content. At the Washington Summit there are tracks on subjects ranging from Marketing Optimisation to Public Sector measurement to Web 2.0 analytics. This shows how the industry is developing in lots of different direction and effectively what we are seeing is the emergence of different disciplines within what we call “web analytics” and I believe what we will see is the emergence of different specialists within these different disciplines.  I expect it won’t be too long before practitioners and consultants within the space will find that they cannot cover all the ground and in common with other marketing services industries (ie market research, direct marketing, PR etc) we will see specialisation increase.</p>
<p>For example, take the development of Web 2.0 and social media. The Web Analytics Association has recently set up a separate committee to look at this whole area, as “traditional” approaches to web analytics are not suited to measuring and understanding the impact of this evolving medium. It’s likely that as social media continues to develop that different measurement tools will evolve, perhaps requiring different skill sets to analyse and interpret the data. I draw a parallel with the market research industry that I worked in for a few years; you had people who were essentially skilled in “quantitative” analysis and those that were specialists in “qualitative” analysis. Few could do both well.</p>
<p>The track that I am speaking at in Washington is another case in point. The “Statistical Success” track is a new track to the Summit. Whilst that sounds pretty scary even to a bunch of web analysts, again it’s an indicator of the development and maturity of the industry. The track includes a number of presentations that talk about the use of various statistical and advanced analytical techniques in evaluating online marketing performance. This is relatively new to web analytics but it’s not new to consumer analytics. The direct marketing industry, for example, has been using advanced analytical techniques such as regression analysis, decision trees and so on for years to predict likely response. The market research industry has been using techniques such as cluster analysis to identify and understand different consumer segments.</p>
<p>Now these techniques are being used to help understand more fully different aspects of visitor behaviour on websites and the effectiveness of online marketing campaigns. Techniques such as multi-variate testing and behavioural targeting are statistical processes that have been productized and packaged up into services by companies such as Optimost, Offermatica, Touch Clarity and the like.</p>
<p>What we are also seeing is statistical analysis, data mining and predictive analytics  being deployed in an ad-hoc way by analysts skilled in these techniques using tools such as SAS, SPSS, KXEN and the like. These packages have long been an essential tool of the “offline” marketing analyst and now they are finding their way into the online marking analyst’s tool box as well. In my presentation at Emetrics I will be looking at these advanced analytical techniques in more detail and how they can be applied in online marketing analytics. Since not all of you are going to be making it to Washington (I assume), its something that I’m also going to be covering here over the coming weeks.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.applied-insights.co.uk/news/2007/09/21/the-increasing-diversity-and-complexity-of-%e2%80%9cweb-analytics%e2%80%9d/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>How to do Predictive Analytics - Part 5</title>
		<link>http://www.applied-insights.co.uk/news/2007/03/15/how-to-do-predictive-analytics-part-5/</link>
		<comments>http://www.applied-insights.co.uk/news/2007/03/15/how-to-do-predictive-analytics-part-5/#comments</comments>
		<pubDate>Thu, 15 Mar 2007 15:29:49 +0000</pubDate>
		<dc:creator>John McConnell</dc:creator>
		
	<category>Applied Insights Blog</category>
	<category>Analytics strategy</category>
	<category>Predictive analytics</category>
	<category>Data mining</category>
		<guid isPermaLink="false">http://www.applied-insights.co.uk/news/2007/03/15/how-to-do-predictive-analytics-part-5/</guid>
		<description><![CDATA[Steps 4 and 5 – Modelling and Evaluation, The Theory
Now for the serious stuff (or the fun stuff depending on your inclination!). Of course the modelling phase is at the core of a predictive analytic effort. CRISP rightly separates modelling and evaluation into separate steps which emphasises the importance of the latter. However they are [...]]]></description>
			<content:encoded><![CDATA[<p><span lang="EN-GB"><strong><span lang="EN-GB">Steps 4 and 5 – Modelling and Evaluation, The Theory<br />
</span></strong></span><span lang="EN-GB">Now for the serious stuff (or the fun stuff depending on your inclination!). Of course the modelling phase is at the core of a predictive analytic effort. <a title="CRISP Home" target="_blank" href="http://www.crisp-dm.org/">CRISP</a> rightly separates modelling and evaluation into separate steps which emphasises the importance of the latter. However they are intrinsically linked and we will consider them both together here.<br />
</span><span lang="EN-GB" /><span lang="EN-GB">As this is really the central issue I’ll break into 2 parts. Let’s talk about the theory of how we go about it and the in the next blog entry I’ll try and “make it real? with a practical example.</span></p>
<p><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB">Just to recap on how we got here. Starting with a research or business objective we’ve garnered enough understanding to embark on a predictive exercise. Furthermore we’ve explored the data and found predictive potential. More than likely we’ve uncovered enough relationships in the data as we explored it to indicate that patterns exist which will allow us to predict the outcome(s) of interest.</span></p>
<p><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB"><strong>So who can do this?<br />
</strong></span><span lang="EN-GB">Traditionally predictive modelling has been the domain of the expert. The statistician, mathematician, econometrician, the numerate researcher or the more expert “analyst?, etc..  This is still largely the case today but we are seeing increasing signs of analytical democratisation.</span></p>
<p><span lang="EN-GB" /><span lang="EN-GB">Some of the contemporary tools discussed below do require less expertise to develop models today because of smarter user interfaces. There has been a  move to more automated algorithms like decision trees where the analyst does not need to know as much about the data structures or the requirements/assumptions of the algorithm to specify the <em>correct</em> analysis. More traditional statistical methods, like Regressions for example, do require the analyst to understand the technique well enough to specify the right settings/options and to follow certain rules about the data; e.g. that the input variables are not too highly correlated (i.e. “multi-colinear?). Methods and algorithms from the world of Artificial Intelligence e.g. Neural nets, and the trees are generally more tolerant of different data patterns and have fewer options for the analyst to worry about.</span></p>
<p><span lang="EN-GB" /><span lang="EN-GB">Nevertheless it is rarely the case that we can just “press the button? without a certain level of expertise in the analytical tool and/or the handling of data. But with a few days training most business and research users should be able to run models even in the most advanced tools. </span><span lang="EN-GB">More specifically developed Analytical Applications can often provide a higher level of accessibility to deeper analytical methods for broader, less expert, audience.</span></p>
<p><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB"><strong>And how?&#8230;<br />
</strong></span><span lang="EN-GB">For heavy duty predictive modelling the analyst will typically have an arsenal of predictive tools and algorithms at his/her disposal. We’ll revisit the various tools/platforms later but the vendors who probably offer the most are <a title="SAS Home" target="_blank" href="http://www.sas.com/">SAS</a> and <a title="SPSS Home" target="_blank" href="http://www.spss.com/">SPSS</a>. Though there are some, relatively, new entrants making headway such as <a title="KXEN home" target="_blank" href="http://www.kxen.com/">KXEN</a>, <a title="Salford Systems Home" target="_blank" href="http://www.salford-systems.com/">Salford Systems</a> and <a title="Think Analytics home" target="_blank" href="http://www.thinkanalytics.com/">Think Analytics</a>. See the Gartner <a title="Gartner Quadrant" target="_blank" href="http://www.gartner.com/DisplayDocument?id=488171">Magic Quadrant for Customer Data Mining</a> for one view of the landscape of predictive software tools.<br />
</span><span lang="EN-GB"><br />
</span><span lang="EN-GB" /><span lang="EN-GB">In the last step we spent some time ensuring that the data was in the right shape for this step. Hence, in the simplest sense the modelling process itself is just about defining the input and output variable(s) of interest and building and evaluating multiple models.</span></p>
<p><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB"><strong>Which method to choose?<br />
</strong></span><span lang="EN-GB">In part of course this will depend on what you have available. If you only have Excel then, without purchasing an add-on like <a title="XLMiner home" target="_blank" href="http://www.resample.com/xlminer/">XLMiner</a>, you have access to the models available in the Excel statistical pack. As I mentioned in earlier blogs if you are entering the predictive arena for the first time you may want to consider some of the freely available software, particularly <a title="R Project Home" target="_blank" href="http://www.r-project.org/">R</a>. The caveat to this is that, as I write, you need to be able to learn the R language to drive the models. I am not currently aware of any particular user interfaces that help accelerate the usage. Despite that initial technical hurdle R does offer a very impressive range of modelling algorithms. </span><span lang="EN-GB">Alternatively you may have one, or more, of the toolsets from the Gartner Quadrant mentioned earlier.</span></p>
<p><span lang="EN-GB" /><span lang="EN-GB">We should probably try as many of the appropriate candidate models as time allows. Some – particularly those that come from classical statistics (see the earlier point) – may not be appropriate because of the shape of the data so may be rule out. Going in, especially with new data, it is usually difficult to know which type of model will give us the best predictions  . From experience analysts may like to start with methods they know have produced the best models with what feels like similar data.</span></p>
<p><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB"><strong>So what is a model?<br />
</strong></span><span lang="EN-GB">The different types of algorithms construct models in different styles but at the most abstract level a model defines a pattern, or relationship, between the input variables and the output (outcome) variables. A [<strong>S</strong></span><span lang="EN-GB"><strong>tatistical</strong>] regression model, for example, will use a mathematical formula to achieve this. A <strong>Decision Tree/Rule induction</strong> model will produce a tree or a set of rules to characterise the relationship. Whereas a <strong>Neural Network</strong> model will typically build a more opaque view of the relationships by connecting an abstract network of nodes, links and weights to encapsulate the underlying pattern.</span></p>
<p><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB"><strong>The core train/test process<br />
</strong></span><span lang="EN-GB">One of the beauties of predictive analytics is the way in which we construct a simple experimental structure which allows us to test (validate) models on unseen data. The empirical approach, if it is done properly, gives us a pretty good approximation to how the models will perform when deployed in a live setting on new data. For example, let’s say we have a data set from a period in time when we know which customers churned or stayed. We would typically model a customer’s likelihood to churn on a subset (60% say) of that data  and then test it on the other 40% to see how well the model predicts churn. If the accuracy is good enough (and that depends on the success criteria that we defined) then … if all other things are equal and we had constructed a representative enough data mining table … then we would expect similar results if we use the model going forward in a live setting. </span><span lang="EN-GB" /><span lang="EN-GB">Usually this means that we randomly split the data into two subsets</span><span lang="EN-GB" /></p>
<p><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB" /></p>
<ul>
<li>
<div><span lang="EN-GB">The training subset is the one use to build the model (the 60% in the churn modelling scenario described above).<br />
</span></div>
</li>
<li>
<div><span lang="EN-GB">The testing subset is the one we use to evaluate the model (the 40% for the above scenario). This second set is used to effectively simulate what we want to do in practice (when we deploy); that is to use our model to accurately predict the outcome(s) of interest.</span></div>
</li>
</ul>
<p><span lang="EN-GB">We do this because the true test of a model is not how well it can predict the outcome when it knows it (which is what it does with the training subset). Rather how well can it predict the outcome when it doesn’t know what the outcome is.</span></p>
<p><span lang="EN-GB" /><span lang="EN-GB"><strong>So how good is my model (really)?<br />
</strong></span><span lang="EN-GB">Until now we have only considered how accurate the model is by considering what percentage of the time it gets the prediction right e.g. predict churners. In practice of course this is only part of the evaluation process. We may find, for example, that our model is good at finding low value fraud (of which there is likely to be more and hence our overall percentage prediction) is higher … but that the more valuable transactions which hurt us more are missed. One way to address this could be to focus on (e.g. create a subset which focuses on the valuable minority while still being sufficiently representative to be deployable). Either way our evaluation of candidate models, and hence the models we might continue to develop and refine, should be led by model evaluations which include all the factors that we really care about. These are often around the cost/benefit of the actions that the model would have us take in the field to act on its predictions. This is where more involved simulations enable us to make more meaningful assessments of the future impact of a model.</span></p>
<p><span lang="EN-GB" /><span lang="EN-GB" /><span lang="EN-GB">Next we will take a real life example to better illustrate how this step can work in practice…<br />
</span>
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.applied-insights.co.uk/news/2007/03/15/how-to-do-predictive-analytics-part-5/feed/</wfw:commentRSS>
		</item>
	</channel>
</rss>
