Building a Robust Data Infrastructure: The Role of Data Modelling in Modern Data Stacks

Data warehouse modelling is often the initial step in setting up contemporary data stacks. It’s critical to design an architecture that works with the data models you want to develop. Frequently, I observe individuals jumping right into coding intricate transformations without considering the best way to arrange the tables, schemas, and databases in their warehouse. It is imperative that you start the modelling process with your models in mind while designing your data warehouse if you want to succeed. 

The emergence of ideas such as the modern data stack and the accompanying spike in demand for data transformation solutions are not coincidental.

A growing number of businesses are starting to realize the benefits of fully owning their data stack. Although complexity has been eliminated by automated processes, ‘interaction’ with the organization has also been eliminated at crucial junctures in the process. The transformation that occurs to data when it passes through a “black box” is unknown to us. Furthermore, information might not be properly arranged in a way that is beneficial or advantageous to our business when it is released. Aggregated data might be crucial, for instance, for businesses that depend on two-sided markets (or SaaS platforms).

Data warehouse modelling involves what?

Data engineers will first make sure that the website’s front and backend procedures, as well as the many technologies in use, are accurately recording data. To gather raw data to ingest into your warehouse, you must make sure it is being captured. Data engineers make ensuring the appropriate procedures are in place so analytics can access the data they require, as opposed to working directly with the data itself, which is typically more of an analytics function. It is important to follow Data modelling best practices.

Analytics engineers concentrate on creating dbt data models that aid in the transformation of raw data. Nevertheless, based on the organization and the size of the data team, they may also be in charge of setting up the data warehouse. It typically makes sense for the analytics engineer to choose a structure for the models they are developing because they are interacting with the converted data so intimately. It can be most convenient to have the same individual work on each of these tasks because dbt and data warehouse modelling are so closely related. 

Subsequently, the analytics engineer’s models are employed by the data analyst. These models are used to generate reports and dashboards that are then used by stakeholders in the business. Front-line decision makers are empowered by self-service analytics platforms, which offer interactive data analytics. Usually, these stakeholders stand for several corporate divisions, including growth, product, sales, and marketing. They will utilize these visuals, depending on the domain, to decide what kind of campaign to develop, how to keep customers, and what goods to refill next. 

Three different model kinds for your data warehouse

So what exactly is involved in data warehouse modelling? The usage of DBT has led to the popularity of three primary types of data models. Every one of these types has a distinct function and is arranged differently in the warehouse. We’ll talk about the purposes of each of them and where in your data warehouse they belong.

Base Models 

Base models are views that are placed immediately on top of your raw data. dbt now refers to them as staging models. Basic casting and column renaming are among them, so you can maintain consistency in your standards across many data sources. You choose what kind of timestamps to use, how to name date fields, which case to use for column names (snake or camel), and how to establish primary keys in these models. Data analysts instead make references to base models in their reports and dashboards instead of the raw data tables. 

Intermediate Models

When utilizing a tool like dbt to assist make your data conversions modular, intermediate models are crucial. Intermediate models are meant to expedite the execution time of your data models and facilitate the debugging of more intricate models by analytics engineers. 

Serving as a bridge between base and core models, intermediate models are situated in between them. They are essential to the transformation process even if the analytics engineer who built them is typically the only one who can access them. They enable the same code to be run once and referenced downstream, eliminating the need for it to be repeated in numerous core data models. 

Neither raw data nor core models are referenced by these models; only base models are. As a tool rather than a source or central model, they are not usable by analysts for their visualizations. Fortunately, access to specific schemas may be easily restricted so they are not available for use in final analyses or reporting through data warehouses such as Snowflake, Google Big Query, and others. 

Core Models

Your core models are your data models that generate a completely transformed dataset that business stakeholders and data analysts may use. They are the result of your transformation process’s completion! To create a final dataset, core models make use of base models and intermediate models. These can be straightforward dimension tables that link related base models or intricate data models with intricate sub logic.