May 032011

Our research indicates that interest in predictive analytics and data mining has never been higher, and that organizations are increasingly turning to the technology to take their BI capabilities to the next level (i.e., the ability to predict who will be their best customers, which customers are likely to churn, and optimum performing suppliers, etc.). Moreover, organizations are not only using predictive analytics to analyze structured data, but are also applying text mining and analysis tools to analyze unstructured (text-based) information. (Please do let us know your opinion on the use of text mining and analysis by taking our survey at .)

A number of factors are driving the use of data mining and predictive analytics. More general trends  include that organizations have a great deal of data to analyze — so much data, in fact, that the term “big data” is now in vogue. Naturally, companies want to capitalize on this valuable resource by utilizing predictive analytics to gain additional insights they can apply to optimize their BI, marketing, and various performance management practices. A lot of this data is unstructured; in particular, Web, contact center, surveys, maintenance logs, sensors, and consumer social media sites are all contributing to the exploding amounts of unstructured data that almost every organization in every industry is generating.

Price-performance constraints for big data processing have also declined. Thus, it’s no longer so prohibitive to implement predictive analytics applications. Trends that are more specific to predictive analytics include the continual improvement of tools and the increased options organizations now have for adopting the technology — in particular cloud-based platforms and software as a service (SaaS) offerings (more on this in a moment).

Data-mining software and practices have also evolved. The technology has been around for 15 years or so, and practitioners have accumulated considerable experience implementing applications in a range of domains, industries, and applications. Vendors, consultancies, and other developers have captured some of their expertise and best practices in the form of packaged models, methods, and templates, which has helped to reduce some of the difficulties associated with building applications. I’m not saying that data mining or predictive modeling is by any means an easy application; however, these developments do help take some of the gamble out of model building, testing, and implementation.

Data-mining functionality has also advanced to the point where it is increasingly found embedded in other applications. This has helped to make the technology friendlier, alleviating some of the need for users to posses an in-depth knowledge of the technology. For example, we are increasingly seeing predictive analytics functionality “exposed” to the general user via an intuitive GUI based on the familiar spreadsheet interface (as opposed to the complex interfaces associated with yesterday’s data-mining workbenches). Such friendly interfaces, when coupled with built-in workflows, provide the necessary “hand holding” for the less-technical business user who might otherwise be leery of attempting to utilize the powerful capabilities offered by predictive analytics. This is because analytic workflows act to guide the user through every step in the predictive analytics process — from accessing, exploring, and preparing the data, to automating the creation of models and applying them to solve specific business problems and generating and sharing reports based on their findings. In fact, if the technology is embedded correctly, and exposed via a well-designed interface, the business user need not even know that he or she is using data mining.

Another trend, which holds considerable potential but that is just beginning, is the advent of cloud-based data-mining applications and services. Good examples are offered by Predixion Software, in2Clouds, and Quiterian. The availability of predictive analytics in the cloud is significant. It makes the technology — one that has typically been hard to get started with due to complexity, cost, skill set requirements, processing needs, etc. — accessible to more end-user organizations that otherwise might not be able, or which would be unwilling, to attempt to use predictive analytics. Thus, I don’t think it is a stretch to say that cloud-based predictive analytics could lead to a surge in the adoption of the technology. If you haven’t done so already, you really should check out the SaaS predictive analytics offerings available from Predixion Software, in2Clouds, and Quiterian.

A more favorable attitude toward text mining and analysis has also emerged, and organizations now seem to find the technology more approachable. Our research shows that organizations are no longer content with using text analysis/mining tools in a standalone manner; they now have advanced their data integration capabilities to the extent that they are blending unstructured data into their data warehouses to support their corporate data analysis efforts.

The ability to manage and analyze unstructured data is important if organizations want to integrate data from social media into their BI and data warehousing systems. Today, this remains mostly in the realm of the major Internet players, such as Google, Yahoo!, and Facebook, etc. as opposed to more traditional enterprises. But this, too, I believe will soon change as organizations develop a better understanding of just how to go about analyzing social media (i.e., learning the types of consumer trends they need to look for and how to do it) and how to apply the findings. For example, methods to optimize online as well as more traditional marketing efforts as well as how to use such findings to support product development.

While social media analysis is destined to play an increasingly important role, we are also seeing text mining and analysis applied to analyze unstructured data found in maintenance logs, sensor systems, and other operational devices in order to perform predictive maintenance on machinery, electronics, and other complex equipment. For example, major aircraft and equipment manufacturers — ranging from providers of helicopters to earth-moving equipment and process control systems — are working on text-mining-based predictive maintenance efforts.

Following are some real examples of how organizations are applying predictive analytics for a variety of applications. . (I’d very much appreciate your help in learning how your organization is using—or not using–text mining and analysis. You can find our survey at .)

Optimizing Customer Channel Interactions at Belgacom

European telecom provider Belgacom offers a good example of how telecom providers can gain a competitive edge by using predictive analytics to optimize their marketing and customer interaction processes across various channels.

At Belgacom every interaction in the customer lifecycle – from acquisition to cross promotions and retention – is optimized with predictive models developed using KXEN software. This allows the company to present personalized actions or offers to customers—regardless of whether the interaction takes place in the call center, at a retail store, or on the company’s website.

Basically, predictive analytics allows Belgacom to deliver the right offer, to the right customer at the right time, and get the most out of its marketing euros and a higher return on marketing investment. Predictive models also help to uncover previously unseen customer insights. For example, Belgacom detected that more customers of older ages were subscribing to digital TV services. Making use of this customer insight, marketing was able, in an early stage, to modify its digital TV offers, for example by including a free full installation by a technician.

One key challenge for Belgacom was that its former data mining solutions did not easily integrate with the company’s data warehouse. Developers had to spend a lot of time translating models into a format that the warehouse could use. In what offers a good example of how predictive analytics software has matured, the latest version of KXEN “talks” with the company’s Teradata data warehouse right out of the box. This helped cut model deployment time significantly. In addition, all models are refreshed periodically (e.g., monthly, weekly, etc.), meaning that the business never has to be satisfied with old, out of date models.

RTL Nederland and Social Media Mining

More and more consumers are using social media sites like Facebook, Myspace and Twitter to express their opinions regarding a multitude of products and services. Consequently, I find this application especially interesting because it provides a good example of how a company can use text mining and predictive analytics to analyze social media in order to obtain useful feedback it can apply to enhance decision making. In this case, decisions pertaining to changes in television programs in order to increase their popularity with the viewing public.

RTL Nederland, a Netherlands-based entertainment company, has teamed up with European-based market research firm InSites Consulting. Their aim was to apply predictive analytics software from IBM/SPSS, Inc. to obtain market insight from social media sources about the opinions of the viewers regarding TV shows that have aired.

InSites uses text analytics to gain insight into the unstructured data that is found in publicly available user-generated comments (i.e., “online buzz”) on social media sites (e.g., Facebook, Twitter, etc.). By capturing viewer insights on certain programs, RTL Nederland is able to gather timely feedback from viewers on the reality competition TV programs “X Factor” and “So You Think You Can Dance.”

RTL uses this viewer feedback (pertaining to the judging panel, the show’s theme, the choice of music and the candidates, etc.) to help it make decisions pertaining to changes in the shows’ formats. According to RTL representatives, when approaching the final episodes of these reality shows, the online buzz about the shows’ candidates can increase by about 400%. This provides an extensive source of information on viewer likes and dislikes.

For example, RTL decided to change the voting procedure in the middle of the live shows of “So You Think You Can Dance.” Sentiment analysis (of user-generated comments on social media sites) showed an increase in the positive buzz, indicating that viewers liked the adaptation of the show’s new format. Other viewer requests were taken into account as well. For instance, because a number of viewers commented on the lack of information and visuals pertaining to what happens behind the scenes during filming, “X Factor” candidates were given a camcorder to shoot a typical day in the life of the show. This amateur video footage was then posted on the show’s Web site for fans to watch. The result was more positive buzz.

This project involved analyzing a large number of online conversations. According to InSite reps, they analyzed the sentiment of over 71,000 online conversations pertaining to “X Factor.” This provided them with a comprehensive tool to measure attitudes indirectly and to quickly adapt the show to better respond to viewer needs, thus keeping the audience more involved and helping to increase viewer ratings.

Predictive Analytics for Selective Marketing at Colruyt

Belgian retailer Colruyt faced a problem confronting most retailers when it comes to formulating a marketing strategy: How to make their promotions stand out from the overwhelming amount of junk mail received by consumers from a multitude of retailers? “If a customer goes through a flyer only to find that there are no products of interest, they will feel that we have wasted their time,” said Bart Van Roost, Head of Colruyt’s Analytics Department. “And as a result, they may not bother opening our promotional envelopes in the future.”

Colruyt searched for a way to ensure that customers would look at their flyer every two weeks. They concluded that the best way to achieve this would be for each household to only receive promotional coupons for products they are interested in. To make this marketing strategy practical, Colruyt uses predictive analytics software from SAS to forecast which promotional items are most likely to result in purchases by each individual household. Initial results show an increase in the use of coupons, as well as higher average spending by each household.

To ensure that each household gets the most appropriate coupons, Colruyt developed predictive models that can calculate purchasing probabilities based on past customer behavior, as well as on household and demographic information stored in the Colruyt database. (With 1.6 million “extra-card” holders and 11,000 products on offer, this involves managing and exploiting a huge amount of data.) Based on these findings, it selects 30 promotional coupons that each household is likely to use. Coupons are chosen from among the 400 products that are on offer during each promotional period.

The predictive modeling application also takes marketing restrictions and rules into account. For example, each household will only receive up to eight promotions within the same product category. This is to avoid any household receiving only promotional coupons for drinks, cosmetics or any other particular product category.

Colruyt’s use of predictive analytics has resulted in an increase in customer loyalty and retail spending, and a decrease in printing and mailing costs.

Cutting Cardiac Surgery Mortality Rates at Sequoia Hospital

Sequoia Hospital, based in Redwood City, Calif., USA, uses predictive models, developed with IBM-SPSS software, to improve patient care and reduce the mortality rate in cardiac surgeries more than 50%.

Predictive models analyze a cardiac database of more than 10,000 patients, including demographics, types of surgeries, risk factors and outcomes. The findings are used to inform doctors and recommend crucial pre- and postoperative procedures that help reduce complications and extend the length and quality of patients’ lives.

Use of predictive analytics supports the latest advancement in evidence-based medicine that integrates and analyzes existing information from various sources, including healthcare databases, medical precedents, and actual medical cases. Combining this information with an individual patient’s condition, medical history and ailments, allows Sequoia doctors to better counsel patients on the best strategy for care at a given point in time.

Sequoia performs procedures ranging from stents and catheterizations to valve replacements, angioplasties and coronary bypass surgeries. Cardiac surgeons use analyses to better determine if and when a patient is an appropriate candidate for surgery, and how to best manage the case. The models take into account individual patient factors such as age, weight, current state of health, previous surgeries and number of procedures required, and analyze and compares this data against similar national and local cases. In this manner, surgeons are able to understand potential outcomes by type of procedure and risk factors to provide customized recommendations for patients. For example, in a matter of seconds, the models inform a cardiac surgeon on the risk of mortality for an 80-year-old patient with renal insufficiency that needs an aortic valve replacement. This helps the surgeon, patient and family to clearly understand the risk factors involved and to make a fully informed decision on whether it is best to proceed with surgery or wait.

Prior to the use of predictive analytics, an analysis could take up to two weeks to develop. Now, the software provides flexibility, mobility and customized analysis allowing doctors to get results almost instantaneously. This type of prospective risk management and personalized approach has enabled Sequoia to earn the highest ranking over five years for mortality and complications according to the Society of Thoracic Surgeons, which tracks and evaluates procedures and outcomes for hospitals nationwide.

The software also creates the evidence required to guide Sequoia’s clinical pre- and postoperative procedures. For instance, the models revealed that an anticoagulant drug often given to patients after a heart attack dramatically increases the chances of serious postoperative bleeding. Based on that information, Sequoia was able to put a protocol in place to stop the drug at least five days prior to surgery to allow the patient’s platelets to recover and significantly reduce bleeding events.


Organizations across a broad range of industries are applying data mining and predictive analytics as a way to utilize the vast stores of data (both structured and unstructured) they are collecting, and to increase the effectiveness of their BI efforts.  I expect that we will see steady growth in the use of data mining and predictive analytics over the next two years, just as we have seen over the past 10 years or so.

I’d like to get your opinion on the use of text mining and analysis. I encourage you to take our survey (see ). As has always been our policy, responses will remain confidential; they will be aggregated to determine overall corporate adoption trends. I will present my findings in upcoming Cutter Consortium research.

Curt Hall, Senior Consultant
Business Intelligence Practice


  3 Responses to “Interest in Data Mining and Predictive Analytics Grows”

  1. hey Curt,
    Interesting article, I have read your reports for years and feel they are often clear, and give concise descriptions of the nuts and bolts in BI. I am curious how do you see MapReduce and Hadoop fitting into data mining and predictive anlytics?

    Second, I work with sensor data, a field that is growing leaps and bounds. In this area, which I realize is different from BI, we are seeing the exponential side of data growth, with a real need for tools that can help us zero in on areas of data that are relevant. Any tools developed in BI that you see could relate to this?

    The tools for analytics differ somewhat in that our data set is changing over time. Many times the data remains relatively similar and then there can be great shifts of change. Being about to detect this sort of thing is very important. Would love your commments on this.

    Many thanks,

  2. avatar

    Companies are mainly using MapReduce–or its open source variant Hadoop–to process high volumes of unstructured data. And it is very well suited for applications where it is necessary to analyze all the data (instead of a subset). Thus, it’s no surprise that Hadoop’s primary use has been to sift through log files maintained on thousands of Web servers to extract data for reporting purposes.

    But you can use MapReduce/Hadood dor a lot of other applications, including for data collection and transformation (pre-processing), image recognition and analysis, text analysis, social network analysis (Facebook, etc.), data mining and machine learning, and large-scale data movement.

    I don’t know the particulars of your application, but you should definitely investigate Hadoop for your sensor data analysis needs. For sensor data you really need to analyze all the data. Also, with Hadoop, data does not require structuring.

  3. I just returned from the 2011 Gartner Business Intelligence Summit in Los Angeles that was well-attended. One of the recurring themes was data mining and another theme heard often was the improvement of predictive analysis capabilities and their movement into the hands of analysts and managers in end-user departments. I was there with Quiterian that has such a product and it was gratifying to see that the features in our new release 3.0 announced at the conference were so frequently mentioned as the direction that BI is taking … straight into the hands of end-users ready to do their own analysis.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>