Mar 092010

Data warehousing, BI, and analytics, in general, are undergoing profound changes resulting from the always-on business environment that is redefining organizational requirements pertaining to how data is collected, processed, and used. So much data is generated so quickly that it has sparked the advent of new — or, if you’re in an argumentative mood, at least revised — data management and data analysis technologies and practices, which are beginning to garner increasing attention by organizations seeking to take advantage of their huge data assets.

I frequently talk to people whose organizations are drowning in data. Web data acquired from personalization, advertising, marketing, and other CRM activities is swamping many organizations. Contact and call center interactions especially have led to explosive growth in corporate data. The same can be said of data acquired from surveys, viral marketing efforts, and other customer-input systems, as well as from scanner-based applications (supermarket, supply chain, RFID, etc.). Soon, you can expect to add data gleaned from social media sites, such as Facebook and Twitter, to the mix. Organizations are also turning their attention to the huge streams of process and performance-related data continuously emanating from distributed messaging systems, BPM platforms, databases, and other enterprise applications as sources to be monitored, aggregated, and analyzed.

On the technology front, new or improved data management and analysis tools and practices are finding their way into corporate data warehousing environments. One of the most visible is the high-performance analytic database. Companies are turning to them to alleviate problems dealing with processing performance and management (less-frequent tuning in order to get the required analytic performance, ability to scale to meet increasing data volumes and usage, etc.) issues. For example, Office Max implemented ParAccel to augment its data warehousing environment, accelerating complex query processing associated with its market-basket analysis analytics.

Last year, we heard considerable talk about MapReduce and its open source implementation Hadoop. MapReduce and Hadoop are used to write programs that can be parallelized and run at massive scales on commodity hardware. They are designed primarily for analyzing big data sets, particularly for applications requiring the analysis of entire data sets — for example, text mining, click-stream analysis, photos, maps, graph analysis (i.e., finding the shortest path between items), RFID and sensor data, behavioral analytics, machine learning, and some forms of statistical analysis. Though they are still are used primarily by Internet-based companies, there is some talk that they are beginning to find their way into end-user organizations’ DW and BI environments, particularly for postprocessing of data for ETL operations.

One of the more interesting developments has been the rise of Complex Event Processing (CEP) and stream analytic processing systems. CEP has sparked corporate interest because it holds the promise of enabling companies to increase operational efficiency by providing a means to identify and interpret the effect of seemingly unrelated events taking place across the organization and then notifying the appropriate stakeholders in near real time. Stream analytics — or stream-computing — holds the promise of opening up a whole new world of real-time analytic applications. Stream analytics can analyze hundreds or thousands of simultaneous data streams — such as stock prices, retail sales, weather reports, voice, video, and sensors — and deliver nearly instantaneous analyses, which can take the form of dashboard displays and visualizations for human analysis. They also can serve to drive business processes, trigger other applications, or be stored in databases for further (offline) analysis.

We’re also seeing core breakthrough technology developments. One of the most important involves the incorporation of flash memory storage with data warehousing and analytic databases. Some folks are viewing flash memory storage as one of the most important hardware developments in a long time, because its addition to data warehousing and analytic databases serves to alleviate the bottleneck of reading and writing data to disk, thus helping to speed up processing.

But, while all these technology advances are important, one of the most significant developments pertains to how organizations now view data warehousing, BI, and analytics, in general. In fact, I would argue that there has been something of a sea change in corporate attitudes, with corporate leaders today increasingly seeing data warehousing and BI as strategic applications. They no longer view their data warehouses and BI environments as useful only for generating reports or for analysts to conduct research or exploratory analyses. Rather, in addition to such “old school” data analysis applications, organizations are increasingly utilizing analytics to drive and/or optimize a broad range of systems (both analytical and operational) — everything from personalization, customer scoring, and basket analysis to fraud prevention, text analysis, and predictive analytics.

Alas, the big question we all want answered is, to what extent are organizations actually adopting these technologies and practices? And what issues are they encountering in doing so?


  7 Responses to “Drowning in Data? Strategic Analytics Throws a Lifeline”

  1. Is it about drowning in data or drowning in interruptions caused by the data. As data analysis and pattern recognition become more common the alerts from false positives or improper context can be a real problem. We all know that once interrupted it can take a long time before we get back to our former level of productivity.

  2. […] The Cutter Blog » Blog Archive » Drowning in Data? Strategic Analytics Throws a Lifeline Though Hadoop primarily used by internet cos now finding its way into enterprise DW & BI (tags: […]

  3. Hi Curt,

    Thanks for noticing OfficeMax’s efforts. If you’re going to Enterprise Data World on March 17th, be sure to see OfficeMax presenting their market basket analytic implementation in “Turning Data into Merchandising Reality”

    Kim Stanick

  4. A great article that follows on the heels of a number of business oriented publications about the “data glut”:

    The Economist:
    Harvard Business Review, Blogs:
    The Wall Street Journal:

    Let me offer some suggestions on resources if your readers are interested:

    In order for an analytics approach to be successful there needs to be a level of confidence in the data infrastructure:

    Confidence that the data is complete, correct, and timely.
    Confidence that the context from the source of the data is appropriate; “fit for use”.
    Confidence that the human resources have the skills, subject matter expertise, and tools to build a story from the patterns uncovered through BI in a warehouse context.

    More than anything else, I think that the ability for an organization to inspire new ideas through data analytics, rather than just providing evidence that an idea has worked, begins an organization down a path to more maturity and innovation.

    DAMA International, is a non-profit professional organization. We have created a body of knowledge that can help organizations as well as individual data managment professionals build a foundation confidently. The DAMA Guide to the Data Management Body of Knowledge is a resource equivalent to what PMI has created for the PMBOK but around data.

    The Data Warehouse Institute is an organization that provides specialty education, training, certification, news, and research for executives and information technology (IT) professionals worldwide with a data warehouse/BI focus.

    Best of luck, and bravo on hitting a very timely nail on the head!

  5. avatar

    Kim, I will be at EDW and will try and check out the Office Max presentation.

  6. avatar

    I agree completely with your post. In fact, our research has consistently shown over many years that data quality remains one of the biggest–if not the biggest–roadblock encountered by organizations with the DW and analytics efforts. In order to get it right, it’s essential that an organization have the correct “plumbing” in place to support their BI and analytics efforts…

  7. I’d have to agree about your point on how organizations view data warehousing these days. Many industries today rely on those datas for analytic purposes , surveys for example. No matter where technology has actually lead us today, it still is critical for us to mind whether the datas collected and or stored are actually reliable in the future.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>