Data Modeling

What is Data Modeling?

Data modeling is the process of taking in data inputs and business processes and formalizing these inputs into a cohesive model of the data. This cohesive model is built to support business processes and is optimized to be straightforward to manipulate and scalable with the requirements of the underlying organization.

Why do we do Data Modeling?

Data modeling is important in setting up a software business system. While data relationships can be defined on-the-go in a software system, this can lead to a lot of issues down-the-line, with inconsistency and slowness plaguing many software systems today. Further, while technical debt is expected in any software system, data modeling helps to reduce the accumulation of that technical debt by requiring the team to be thoughtful about how all the data will interrelate, the common data access requirements, and business key performance indicators (KPIs) that the data modeling must help with.

Who does Data Modeling?

Data modeling is a full team effort, combining business processes into a full-fledged model. That being said, technical team members, such as software engineers, typically lead the process, since the diagraming and optimizing of the data modeling process can be a highly technical endeavor that requires knowledge of relational database systems, computer hardware, and software optimization techniques.

What are the Steps in Data Modeling?

There are many steps in the data modeling process and each company will accomplish the data modeling process a bit differently. At a high-level, the steps are typically as follows:

Conceptual Data Model

This is the first step in the data modeling process. Generally, this process lacks enough detail to actually implement a database architecture, but provides high-level details on table relationships. In many cases, this step of the process defines the objects needed, the relationships between those objects (one-to-many, one-to-one, etc), and parent-child relationships between data.

Conceptual Data Model
Conceptual Data Model

Logical Data Model

This is the intermediary step, which typically describes the table structure in more detail, but doesn’t dive into a particular technology. For example, a logical model may show a relational database diagram, but doesn’t show it for a particular product such as Snowflake or Postgres.

Logical Data Model
Logical Data Model

Physical Data Model

This is the final step in the process, in which an actual RDMS and/or hardware is selected and implemented, where the model represents the full implementation process.

Physical Data Model
Physical Data Model

What are Entity–Relationship Model (ERM) diagrams?

An entity-relationship model (ERM) is a system for defining the entities and relationships between those entities. An entity is an object that can be defined independently from other pieces of data. For example, an entity could be something like ‘customer’ or ‘restaurant’. The relationships between those entities could be something like ‘parent’, ‘child’, ‘contains’, etc. An ERM will often include attributes for individual tables and the foreign keys for other tables as well. The diagram will typically have a rectangle for each entity and arrows drawn between the entities with relationship labels across these arrows.

ERM Diagram
ERM Diagram

What Questions Should I Ask Prior to Data Modeling?

What business processes and workflows are we looking to streamline, improve, or start with this data modeling?

This question is important to best understand why a data modeling process is being undertaken in the first place. Data modeling should be done in pursuit of streamlining, improving, or starting business processes. It is good to get the team on the same page when it comes to the underlying reason for taking on a big project.

How are these business processes done today, or ideally executed in the future?

If your team is looking to do data modeling to take over an existing process or to create a new one, it is great to understand what the end-to-end workflow is going to look like once the project is done. Before data modeling, it is important to understand what the success criteria is for the project.

How are we going to continually gather this data and externalize this data to the team?

It is important to understand how this system will not be a black box, but rather tightly integrated into the existing organization.

What are the independent objects within my data?

Once we understand the data and goals of the project, we can begin to think about how to actually do the underlying modeling of the data, understanding the distinct objects and entities we are working with.

What are my data access requirements?

For this question, it is best to understand how often each object may be accessed, and the speed requirement for accessing that data. If any data needs to be accessed often, or the data needs to be accessible quickly, this can inform the structure of the data model, as the data model may need to be optimized for speed and simplicity for these data structures in particular. If using SQL, it is best to index or pre-aggregate this data to ensure good performance.

What off-the-shelf software should we be using to store data and access data?

In the end of the process, it can be expensive to build custom solutions in-house. In general, an external databasing system and embedded analytics platform may be purchased to create an end-to-end data modeling workflow. Understanding the commercial offerings on the market will help with the data modeling process.

Can I do Data Modeling for NoSQL?

Yes! Data modeling is a generic term for organizing data to support business processes as part of an information systems initiative. While a lot of data modeling literature centers around the relational database model, the same principles apply to NoSQL. While NoSQL is handy for having a flexible structure, it is still very important to understand data access requirements for an organization and structure the NoSQL data in a way to match those needs.

Can Data Modeling Benefit non-technical Individuals?

Absolutely! Data modeling is done to benefit the goals of the underlying organization for which the data modeling was done. To that end, data modeling should coincide with the organization’s underlying business processes. By codifying these business processes into a data model, an organization can run a more streamlined operation, leading to efficiency. Further, formalizing business processes into a data model can help clarify processes across the organization and find shortcomings in the process when an end-to-end workflow needs to be configured.

Can AI and Large Language Models Help with Data Modeling?

AI can be extremely helpful in the data modeling process. Data modeling is a cumbersome process for taking raw input and requirements and outputting a coherent and cohesive interrelationship between entities and processes. With the rise of large Language Models (LLMs), data modeling can easily be streamlined. Especially as these models become multimodal, it is not hard to imagine inputting a PDF of business processes and conceptual data and receiving a formal ERM as an output in a matter of seconds. While this technology has not been implemented end-to-end at this point, it is only a matter of time before an offering like the one described is available as a commercial offering.

Data Modeling Conclusion

Data modeling is an important part of any organization that has software engineering resources. With the rise of SQL database systems, data modeling has become a common practice in most businesses that require a software element to their day-to-day operations. As AI continues to evolve, data modeling best practices are likely to continue to improve, as more organizations will have access to fast, knowledgeable assistants to carry out a cumbersome, technical, and complicated data modeling process. This article hopefully gave a good primer to practitioners and beginners alike on how data modeling works.

Related terms:

No items found.