Data warehousing, BI, and analytics, in general, are undergoing profound changes resulting from the always-on business environment that is redefining organizational requirements pertaining to how data is collected, processed, and used. So much data is generated so quickly that it has sparked the advent of new — or, if you’re in an argumentative mood, at least revised — data management and data analysis technologies and practices, which are beginning to garner increasing attention by organizations seeking to take advantage of their huge data assets.
I frequently talk to people whose organizations are drowning in data. Web data acquired from personalization, advertising, marketing, and other CRM activities is swamping many organizations. Contact and call center interactions especially have led to explosive growth in corporate data. The same can be said of data acquired from surveys, viral marketing efforts, and other customer-input systems, as well as from scanner-based applications (supermarket, supply chain, RFID, etc.). Soon, you can expect to add data gleaned from social media sites, such as Facebook and Twitter, to the mix. Organizations are also turning their attention to the huge streams of process and performance-related data continuously emanating from distributed messaging systems, BPM platforms, databases, and other enterprise applications as sources to be monitored, aggregated, and analyzed.
On the technology front, new or improved data management and analysis tools and practices are finding their way into corporate data warehousing environments. One of the most visible is the high-performance analytic database. Companies are turning to them to alleviate problems dealing with processing performance and management (less-frequent tuning in order to get the required analytic performance, ability to scale to meet increasing data volumes and usage, etc.) issues. For example, Office Max implemented ParAccel to augment its data warehousing environment, accelerating complex query processing associated with its market-basket analysis analytics.
Last year, we heard considerable talk about MapReduce and its open source implementation Hadoop. MapReduce and Hadoop are used to write programs that can be parallelized and run at massive scales on commodity hardware. They are designed primarily for analyzing big data sets, particularly for applications requiring the analysis of entire data sets — for example, text mining, click-stream analysis, photos, maps, graph analysis (i.e., finding the shortest path between items), RFID and sensor data, behavioral analytics, machine learning, and some forms of statistical analysis. Though they are still are used primarily by Internet-based companies, there is some talk that they are beginning to find their way into end-user organizations’ DW and BI environments, particularly for postprocessing of data for ETL operations.
One of the more interesting developments has been the rise of Complex Event Processing (CEP) and stream analytic processing systems. CEP has sparked corporate interest because it holds the promise of enabling companies to increase operational efficiency by providing a means to identify and interpret the effect of seemingly unrelated events taking place across the organization and then notifying the appropriate stakeholders in near real time. Stream analytics — or stream-computing — holds the promise of opening up a whole new world of real-time analytic applications. Stream analytics can analyze hundreds or thousands of simultaneous data streams — such as stock prices, retail sales, weather reports, voice, video, and sensors — and deliver nearly instantaneous analyses, which can take the form of dashboard displays and visualizations for human analysis. They also can serve to drive business processes, trigger other applications, or be stored in databases for further (offline) analysis.
We’re also seeing core breakthrough technology developments. One of the most important involves the incorporation of flash memory storage with data warehousing and analytic databases. Some folks are viewing flash memory storage as one of the most important hardware developments in a long time, because its addition to data warehousing and analytic databases serves to alleviate the bottleneck of reading and writing data to disk, thus helping to speed up processing.
But, while all these technology advances are important, one of the most significant developments pertains to how organizations now view data warehousing, BI, and analytics, in general. In fact, I would argue that there has been something of a sea change in corporate attitudes, with corporate leaders today increasingly seeing data warehousing and BI as strategic applications. They no longer view their data warehouses and BI environments as useful only for generating reports or for analysts to conduct research or exploratory analyses. Rather, in addition to such “old school” data analysis applications, organizations are increasingly utilizing analytics to drive and/or optimize a broad range of systems (both analytical and operational) — everything from personalization, customer scoring, and basket analysis to fraud prevention, text analysis, and predictive analytics.
Alas, the big question we all want answered is, to what extent are organizations actually adopting these technologies and practices? And what issues are they encountering in doing so?