It still amazes me how many enterprise data warehousing/business intelligence (DW/BI) projects struggle, often to the point of paralysis, with the “Inmon/Kimball” debate. This impasse revolves around whether a DW/BI program should insist upon routing all information through a complex, third normal form (3NF) data layer or take it straight to a user-intelligible star schema repository from where it can be reported more or less directly. It’s easy to fault the 3NF for more than doubling the complexity, expense, and data latency of a DW/BI project, but also for being of zero direct value to the project sponsors and their stakeholders. On the other hand, projects that deliver data immediately to star schemas can quickly become complex themselves as the scope of the warehouse grows. When the conformed stars scale out, they too end up necessitating enormous reengineering efforts whenever the underlying business requirements change.
Truth is, neither of these architectural paradigms can be agile because they both result in inflexible juggernauts that defy economical impact analysis when new features are needed. Both approaches scale to installations that are so inscrutable and fragile that it becomes more economical simply to solve new requirements with new applications rather than updating the existing assets. Both the Inmon and Kimball approaches leave companies struggling with “legacy” warehouses.
One of the most promising solutions to this juggernaut problem revolves around data architectures that go beyond our traditional 3NF. Warehousing teams have achieved more robust designs when they push their target schemas past fourth and fifth normal forms into a variety of “hypernormal forms” (HNF). These forms include “data vault” forms, associative strategies, and even CJ Date’s new sixth normal form, where all attributes of a source schema should be shredded into something akin to key-value pairs when stored in a target system. My recent year of fieldwork revealed that, indeed, hypernormalized approaches yield warehouse designs that are far more robust in the face of changing requirements.
There are several tool suites and practice communities that have demonstrated the commercial viability of HNF for large projects with tight budgets and even tighter deadlines. What I particularly appreciate is how these new solutions truly enable agile data warehousing by enabling and accelerating incremental warehouse delivery patterns.
In terms of engineering advantages, HNF yields target schemas that are what I call “three-way robust”:
- Teams can add small increments of functionality without undertaking massive reengineering.
- Modifications to existing database tables involve only local impacts.
- As the model grows, the cost of scaling the warehouse increases less than linearly.
My fieldwork’s most remarkable discovery, however, was how the epitome of hypernormalized tools truly enable the fourth tenet of the agile data warehousing manifesto, where we focus on “evolving data models over incrementing application code.” There are tools that can take business models and automatically generate 90% of the integration, presentation, and semantic layers for a team. By leveraging such technology, warehouse developers can focus on the remaining 10% where the business rules are complex and hand coding adds the greatest value. Such tools make the warehouse project and the entire organization agile because IT and business can now collaboratively model a small sliver of the warehouse, generate a user-tangible result, and evaluate it together. With the distraction and time lost to coding now reduced to a minimum, IT is able to rapidly address additional requirements step-by-step by evolving the information users can touch, keeping the business constantly involved in the partnership, and rapidly zeroing in on the exact operational insights the current business challenge demands.
So, the days of endless Inmon/Kimball debates are over. New tools and practices have moved the pivotal question far beyond “star schema versus third normal form.” DW/BI project kickoffs should now be immersing themselves in the question: “Which hypernormalized modeling technique and supporting toolset will work best for our organization?” It is the astounding improvements in programmer productivity and customer satisfaction this new generation of solutions allow that demands we make this change in mindset.