Efforts to rework existing data models come down to the very same question that is asked for reworking most existing systems: retrofit what exists or start over with a clean slate?
It’s a non-trivial question and the answer – whatever it is – can greatly impact the ultimate success of the effort. This article relates some lessons learned from a successful data model project that illustrates the benefits of careful planning and continual evaluation. That said, the project was not without problems, but the experience is representative of many data model efforts where resources, priorities, and budgets are all under pressure: in other words, most every company.
The incumbent data model (built as part of a consulting engagement with my firm) had been serving its purpose well over the previous couple of years, structuring data from various sources in a way that laid a logical and efficient foundation to generate reports and extracts and to support analysts’ investigations into the data. However, as part of continuing improvement and to increasingly utilize the data warehouse as the central source of our extracts, a plan was created to support a new extract from the data warehouse. This required integrating data from a new source which introduced new concepts and complexity in a way that the current model could not support as-is. When assumptions about underlying concepts need to be revisited, it can often lead to extensive rework of a data model. In this case the conceptual gaps in the incumbent model were known, but because the model fit the original use cases it wasn’t until the business need arose that the dedication of time and resources to remedy those gaps would prove beneficial.
Taking this new opportunity to fully define and clarify the business concepts and to revisit the technical approach helped to create a more robust solution that could support many new imports and extracts in the future. However, as happens in many cases, the amount of effort required and the overall impact ran up against the business pressure to just “get it done.” This led to a few headaches along the way that probably could have been avoided. But that’s data modeling, and IT, in the real world. Given a bit more time for planning, impact analysis, prototyping, and testing/re-evaluation, the process we used to find and implement the best solution could have been smoother. With that background as context, here are a few insights and lessons learned for future efforts:
Find the right solution
It sounds simplistic and obvious but continuing to ask and answer the basic questions about business reasons and desired outcomes always matters. Asking these questions leads to understanding the business goals involved, and that leads to defining the concepts used for modeling. It’s not enough to come up with a great technical solution without knowing the actual problem(s) to be solved, and the solution won’t last without understanding the concepts well enough to evaluate the potential need for allowing new requirements or new data.
In this working example, the existing data model had one provider table containing both physicians (individuals) and facilities (groups). Additionally, different data sources defined facility/practices/practice units differently, but in previous use cases, they could be treated the same. With the new data source and extract requirements, these concepts needed to be better delineated. Spending a few weeks working alongside the business team to clarify not only these concepts but others led to a clearer understanding of the data, and that led to the new model being structured more logically. Building a new basic conceptual model answered many questions and brought previously hidden assumptions to light so that any and all design decisions could be made in a more deliberate and informed matter. Overall, spending the time to understand the data from a business perspective helped generate more confidence in the new model’s ability to reflect the real world, and ultimately in its ability to handle future requirements and to provide clearer insight to those using it.
Evaluation never stops
When choosing a data model design pattern, it’s important to understand both the problems and the potential solutions the design pattern offers. Critical evaluation is key. Patterns are what they are because they’ve been proven over time to solve certain problems. Know how and why they solve those problems and how that matches up (or doesn’t) with the goals and objectives of the project; don’t blindly follow trends. And beyond finding what fits for the sections of the model being redesigned, be sure to evaluate how well that method will fit with the rest of the data model. The success of following a pattern also goes beyond the initial choice. It’s important during the implementation process for those involved to understand how and why it works, since there are always further, more detailed decisions to be made. Take time to make sure there’s a thorough understanding of how the design works and how to implement it in the full context of the project. And don’t forget that the goal is more than just getting something that works. Even when following patterns, there is an art to technical design, so don’t be afraid to take a step back to see the big picture, evaluate it with a fresh perspective, or account for unnoticed issues.
In our real-life example, a portion of the data warehouse was switched to a data vault pattern. This choice was made because of the data vault’s flexibility when importing new sources, since the data warehouse was constantly growing. However, there was a learning curve to understanding the data vault methodology in contrast to the original 3NF model, so it took time on the development side to figure out how to best make use of it, work out the kinks, and integrate it into the rest of the data warehouse, and for testers and business users to become familiar with – and for some to buy into – the new approach. As a result, some of a data vault’s benefits, like making use of the increased possibility for loading data in parallel, were not leveraged initially. In the end the benefits of the new data vault were fully realized, but in hindsight taking the time to be sure that everyone had a solid understanding of both the purpose and the technicalities of the methodology would have helped to make this process smoother.
Plan early and often
Data migration planning is often left until later in the project lifecycle, but that shouldn’t be the case. It’s also the case that many data migration plans underestimate the effort and impact involved. Be aware of that impact. Issues with foreign keys, data marts, reports, and extracts almost always creates more work than is accounted for. In this example, there were many sections of the data warehouse linking back to the section that was being reworked, and many extracts and reports built on top of it. Taking an inventory of the changes to be made up front helps to provide a better estimate for the time and resources needed to make those changes.
Also, migrating data piece by piece (e.g. source by source) results in more immediate results than doing a single data migration for everything – but it can get complicated. When using a piecemeal approach, knowing when to pull data from both models and how to merge the data together is important. In this example, while it was initially difficult to resolve issues downstream, the migration and testing benefits of having the new and old models available outweighed the complexities involved in working with side-by-side models.
Like most IT work, it is critical to analyze, test, assess, re-evaluate, and address issues early and often. Creating time in any project to iterate through problems and to get users involved increases the chances for overall success. Not doing so is asking for trouble. In the end, data model work in the real world of stressed timelines, resources, and budgets is not easy, but with careful planning and communication, it can be just a little bit easier.
Originally published in
Digital Insurance Magazine
Read the original article here.