All the focus on big data is missing the point. Yes, high performance computing architectures let us analyze very large data sets. And yes, that is interesting and helpful. But let’s go with a thought experiment here. Imagine the following:
- Real-time data feeds from all source systems;
- Incremental, multi-generational real-time data feeds and data storage so all prior versions of data are accessible;
- The end of batch processing, nightly loads, ETL or other boring stuff in order to prepare data;
- All queries you can dream of (well, maybe 98% of the queries) running in in less than a second;
- All the rest of the queries running in minutes, not hours and yes, even crazy Cartesian products, intended or not;
- The ability to construct a hierarchy of models that users can then interrogate themselves, with queries on those models running in a few seconds;
- Unlimited build-out scalability to handle (yes) big data;
- Accessing the data using many different query techniques (SQL, XQuery, MDX, or whatever else you like provided it is popular enough);
- All this data co-mingled with transactional data in a replicated everywhere environment that has the highest fault tolerance possible;
If the data management time for all this disappeared, what would be the effect on businesses? Good question.
What remains are the following problems:
- Improving data quality. Speed may kill but poor data quality kills adoption;
- Understanding the data structure and organization;
- Understanding the data itself;
- Preventing bad query (and bad model) sprawl. Shoddy thinking, embedded in a reusable data model can infect many minds;
If data preparation time disappears in this new world, all these problems improve significantly. The faster people see data and interact with data, the faster they understand it and the better they will model it. And the faster they remove data quality problems. Its ubiquity increases, and its cost decreases – a lot. This future drop in the cost of information will raise the value of the humans who model the data. If would also put enormous pressure on business decision making processes. Some processes would be far too slow to actually use fast data. And some might be fast enough to use fast data, but still not benefit. Just because you may have some useful data at the speed of thought does not mean you should act upon it. The rate of opportunity and change in markets (or for market leaders, the rate of change you wish to inflict upon the market) may not need to be as fast as fast data. On the other hand, if all companies had equally fast data, would the rate of change in markets accelerate leading to a compression of time wars? As if we didn’t have that already!
The emergence of fast data would also mark the merging of transactional and analytic data systems. Users would have the convincing illusion they were querying the source system. This would also mark the end of data warehousing as a useful concept. Instead, we will be left with oodles of data modeling to do. Disaster recovery might also disappear as fail-in-place architectures with copies of data in many places will obviate the need for hot sites, warm sites, cold sites and stinky sites. Except for speed of light problems (transactions limited by distance because of the time it takes for electrons to travel), data that can stand a little bit of asynchronicity (and the majority of business data falls into this category) can be quickly absorbed in this architecture. Undoubtedly, a lot of application and system software would need to be rewritten, since most of them treat all data as if it were a speed of light problem.
Does this sound at all compelling? And when will such a world exist? I hope soon. I am tired of long-running database processes still ravaging our systems and punishing analysts.
When big data becomes really-freaking-fast data, then the game changes.