What is data mesh?
Data mesh in simple terms is a relatively new data management approach with the goal of bringing data closer to the business. Technically speaking, data mesh refers to a modern distributed architecture and set of principles for data management. There’s a lot more to a data mesh approach than just technology and architectural principles – data mesh is first and foremost a way of thinking and organizing.
This is the first in a series of blog posts that takes a closer look at what data mesh is, what are its benefits and when organizations should consider a data mesh approach. The following posts focused more on the role of technology and data mesh teams, culture and how to get started with data mesh implementation. We have summarized all these resources in a single extensive guide.
Introduced by Zhamak Dehghani, data mesh has lately been one of the most actively discussed topics related to data platform thinking and data architectures. It takes an approach derived from domain-driven design, and introduces it into the data world. The very same approach has already disrupted the software industry and pushed it to move from monolithic solutions to microservice-based architectures, and from centralized IT teams to local domain teams. Now, the same is happening for predetermined organizational data platform architecture and a centralized data team.
Data mesh has increased in popularity and gained a lot of traction as organizations have faced organizational and architectural shortcomings, which lead to businesswise nonoptimal data solutions and limit the business value from data.
What are the benefits of data mesh?
Data mesh represents a paradigm shift in the way we think about and manage data platforms and architectures. It means moving away from a centralized unified data platform governed by one massive data platform team, and towards decentralized independent domain teams. These teams are responsible for managing, owning and serving the data within their domain as products. This sort of decentralized data management approach has several benefits including
- Domain teams independence over prioritization and technology that fits their needs
- Understanding on the business side of data is as close to the development as possible
- View “data-as-product” and support the interoperability of data
Efficient data mesh implementation provides the benefits of radically decreasing lead times and giving business domains the ability to quickly prioritize and make decisions that are relevant to them. It brings data available through the organization while at the same time providing freedom on the technological level. The key benefit of data mesh is not primarily technical, but rather organizational and cultural.
How does the data mesh address the limitations with current platform thinking?
Current data platform thinking has certain problems – referred to as architectural failure modes by Dehghani. In data mesh approach, these problems are addressed by shifting the way we think about data and summarized below.
- On the platform level, centralizing all data onto one platform becomes problematic for large enterprises that have an extensive and rapidly changing variety of data sources and use cases.
In the data mesh approach, the platform problems are addressed by shifting the way we think and organize around data. Data is viewed as a product, and each domain handles and serves data related to their area of business.
- On the technical level, responding to new needs in a centralized data platform requires changes in the whole data pipeline, which makes it difficult to stay agile and responsive.
According to data mesh architectural principles, domains are responsible for their data products as well as their quality. These are offered to other domains through predefined interfaces such as APIs or flat files. Even though domains have their own solutions, they can share the same infrastructure. This can create cost benefits with economies of scale. As domains own the data products, they can quickly respond to new needs with their own prioritization.
- On the team level, the centralized solving of data requests leads to long response times due to disconnected teams that cannot understand the needs of business or other teams needs. Long lead times may suffocate innovative prototyping and learning.
In decentralized data management, domain teams are able to focus on their data products, bring in new data sources, and further develop solutions that they understand and are able to prioritize from the business perspective. Data mesh thereby fosters data-driven innovation by allowing greater autonomy and flexibility for data owners.
- On the competence level, data experts in a centralized data platform team become too specialized in their area of expertise, and may create platform level bottlenecks due to the difficulty of finding specific data engineering talent.
Data mesh decentralizes both data ownership and data skills by distributing these among cross-functional domain teams. In a decentralized data management model such as data mesh, experts’ skillsets are broader and enable easier rotation of technical specialists between different data products. Domains also have the ability to match the needed competence profiles to their specific needs.
To truly understand the challenges and benefits as well as how data mesh addresses these, I recommend reading the original posts by Dehghani:
- How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
- Data Mesh Principles and Logical Architecture
What kinds of organizations is data mesh for?
Current data platform architecture built on a centralized data lake and/or data warehouse is not going out of existence – and that’s not the goal of data mesh. A centralized data platform with a specialized team usually works well for small and medium-sized enterprises and organisations whose data landscape is not constantly changing, or whose business domains are relatively simple.
But, Dehghani argues, when an organisation grows in size, its data domains also become more diverse and new data sources are introduced at a fast pace. That’s when preceding technical architectures and ways of organizing start resulting in unnecessary friction and slowing in data-related needs.
Data Mesh or data lake?
Comparing data mesh to data lake is largely futile as the terms are conceptually very different from each other. Data lakes are data storage repositories, which store, organize, protect and offer data while data mesh is a set of principles for decentralized data management. The main objective of both, in essence, is to offer faster time to analytical insights and increase the business value of analytics.
In a data mesh architecture data is usually distributed and queried from domain owned data storages, some of them being data lakes. Data lakes and data mesh can therefore co-exist and complement each other.
Does organization size matter in the context of data mesh?
It is impossible to draw a clear line for when preceding approaches and data architecture become ineffective. Organization size in itself is a cumbersome metric – depending on the core business and organization structure, even large companies can remain efficient with a centralised data platform if it isn’t at the heart of their business.
Instead, we could consider evaluating the size of the IT team – or, more specifically, the number of people working as enablers of the data platform – but this metric will not be able to produce a straight answer either.
A more natural approach, then, would be to consider whether the size of your data platform, and the team supporting it, slow down the cycle of innovation and turn it to a bottleneck.
This can manifest in different ways:
- Continuously longer lead times, despite a growing team size, as the team is disconnected and lacks domain knowledge
- The appearance of data solutions separate from the centralized data platform as business starts building its own capabilities due to lack of trust for the centralized team to fulfill its needs
- Needing temporary solutions to integrate new data sources as business doesn’t want to wait for the official way to utilize the data platform
Does centralized data architecture provide enough flexibility?
Centralized architecture is usually designed for a few main types of data-related sources and use cases. Over time, the architecture is often expanded beyond these main pipeline types, but as a result, these require compromises and differentiating architectural decisions that will either add technical complexity, or lack the full potential of optimal solutions.
This can be evaluated by asking whether your data platform is able to fulfill all requested use cases, and still maintain an acceptable level of technical complexity for your data platform team.
A suboptimal setup can display several symptoms:
- Some of the data platform components are only understood by a small part of the team
- Data sources need a major amount of preprocessing before they can be loaded onto the data platform
- All incoming requests are forced to follow a limited number of patterns for loading or offering data
Public examples of companies that utilize data mesh
As digital transformation spreads across society and organizations become more data-advised, we should no longer try to fool ourselves into thinking that the current data platform model will scale for everyone.
Finding public examples of companies that already utilize data mesh is still a difficult task apart from a few exceptions:
- Intuit has recently shared a breakdown of their data mesh strategy
- JPMorgan Chase has discussed their experiences and we have shared our tips on how to get started with data mesh implementation
- Zalando presented their data mesh story at the Spark+AI Summit
- HSBC has explicitly mentioned data mesh in their data strategy
These references to data mesh implementations still represent the early stages of utilizing the paradigm, and questions about the sustainability, maintainability or ROI of the approach remain unanswered. But while these questions can remain open for quite some time, it is necessary to begin discussing and hypothesizing which types of organizations could achieve the biggest productivity boost with the data mesh approach.
Summarizing thoughts about the business value of data mesh
The small number of real-life examples and the lack of companies publicly utilizing data mesh make it difficult to recognize the business value hidden in the new paradigm. The drive for change in data platform thinking has to happen by identifying and addressing the symptoms with the current state, and analyzing whether the value proposition of data mesh can help alleviate them.
The approach is not something that small and medium-sized enterprises can really find valuable at full scale, as it can easily lead to a more siloed platform and ungoverned tangible architecture. But even for smaller organizations, certain aspects such as product thinking and end-to-end data engineering competence can help enhance the productivity of the data platform as a whole.
Want to read more about my thoughts on Data Mesh? Check out the blog The role of technology in data mesh architecture.
Want to learn more about enhancing your organization’s productivity with data? Take a look at our new book Growth Reinvented: Turn Your Data and Artificial Intelligence into Money!
Data mesh in a nutshell
What is data mesh?
Data mesh represents a paradigm shift in the way we think about and manage data platforms and architectures. Data mesh has lately been one of the most actively discussed topics related to data platform thinking and data architectures.
What is data mesh architecture?
Data mesh architecture as a term may be a bit misleading as data mesh is first foremost a way of thinking and organizing with the goal getting value from analytics data scale. Data mesh architecture requires custom solutions that are already achievable technologically, yet the challenge with data mesh architectures is often more organisational than technological.