Introduced by Zhamak Dehghani, data mesh has lately been one of the most actively discussed topics related to data platform thinking and data architectures. It takes an approach derived from domain-driven design, and introduces it into the data world. The very same approach has already disrupted the software industry and pushed it to move from monolithic solutions to microservice-based architectures, and from centralized IT teams to local domain teams.
In this blog post, we summarize the pain points data mesh aims to solve, how this happens, and what kinds of organizations might find value in utilizing the paradigm.
What is the core idea of data mesh, and what benefits does it have?
Data mesh represents a paradigm shift in the way we think about and manage data platforms and architectures. It means moving away from a centralized unified data platform governed by one massive data platform team, and towards decentralized independent domain teams. These teams are responsible for managing, owning and serving the data within their domain as products.
Current data platform thinking has certain problems – referred to as architectural failure modes by Dehghani:
On the platform level, centralizing all data onto one platform becomes problematic for large enterprises that have an extensive and rapidly changing variety of data sources and use cases.
On the technical level, responding to new needs requires changes in the whole data pipeline, which makes it difficult to stay agile and responsive.
On the team level, the centralized solving of data requests leads to long response times due to disconnected teams that cannot understand the needs of business or other teams needs. Long lead times may suffocate innovative prototyping and learning.
On the competence level, data experts become too specialized in their area of expertise, and may create platform level bottlenecks due to the difficulty of finding specific data engineering talent.
In data mesh approach, these problems are addressed by shifting the way we think about data:
On the platform level, data is viewed as a product, and each domain handles and serves data related to their area.
On the technical level, domains are responsible for their data products as well as their quality. These are offered to other domains through APIs. Even though domains have their own solutions, they share the same infrastructure.
On the team level, the domain teams are able to focus on their data products, bring in new data sources, and further develop solutions that they understand and are able to prioritize from the business perspective.
On the competence level, data experts’ skillsets are broader and enable easier rotation of technical specialists between different data products.
Efficient data mesh implementation provides the benefits of radically decreasing lead times and giving business domains the ability to quickly prioritize and make decisions that are relevant to them. It brings data available through the organization while at the same time providing freedom on the technological level. The key benefit is not primarily technical, but rather organizational and cultural.
To truly understand the challenges and benefits as well as how data mesh addresses these, I recommend reading the original posts by Dehghani:
- How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
- Data Mesh Principles and Logical Architecture
What kinds of organizations is data mesh for?
Current data platform architecture built on a centralized data lake and/or data warehouse is not going out of existence – and that’s not the goal of data mesh. A centralized data platform with a specialized team usually works well for small and medium-sized enterprises and organisations whose data landscape is not constantly changing, or whose business domains are relatively simple.
But, Dehghani argues, when an organisation grows in size, its data domains also become more diverse and new data sources are introduced at a fast pace. That’s when preceding technical architectures and ways of organizing start resulting in unnecessary friction and slowing in data-related needs.
Public examples of companies that utilize data mesh
As digital transformation spreads across society and organizations become more data-advised, we should no longer try to fool ourselves into thinking that the current data platform model will scale for everyone.
Finding public examples of companies that already utilize data mesh is still a difficult task apart from a few exceptions:
- Intuit has recently shared a breakdown of their data mesh strategy
- JPMorgan Chase has discussed their experiences on implementing data mesh
- Zalando presented their data mesh story at the Spark+AI Summit
- HSBC has explicitly mentioned data mesh in their data strategy
These references to data mesh implementations still represent the early stages of utilizing the paradigm, and questions about the sustainability, maintainability or ROI of the approach remain unanswered. But while these questions can remain open for quite some time, it is necessary to begin discussing and hypothesizing which types of organizations could achieve the biggest productivity boost with the data mesh approach.
Does organization size matter in the context of data mesh?
It is impossible to draw a clear line for when preceding approaches and data architecture become ineffective. Organization size in itself is a cumbersome metric – depending on the core business and organization structure, even large companies can remain efficient with a centralised data platform if it isn’t at the heart of their business.
Instead, we could consider evaluating the size of the IT team – or, more specifically, the number of people working as enablers of the data platform – but this metric will not be able to produce a straight answer either.
A more natural approach, then, would be to consider whether the size of your data platform, and the team supporting it, slow down the cycle of innovation and turn it to a bottleneck.
This can manifest in different ways:
- Continuously longer lead times, despite a growing team size, as the team is disconnected and lacks domain knowledge
- The appearance of data solutions separate from the centralized data platform as business starts building its own capabilities due to lack of trust for the centralized team to fulfill its needs
- Needing temporary solutions to integrate new data sources as business doesn’t want to wait for the official way to utilize the data platform
Does centralized data architecture provide enough flexibility?
Centralized architecture is usually designed for a few main types of data-related sources and use cases. Over time, the architecture is often expanded beyond these main pipeline types, but as a result, these require compromises and differentiating architectural decisions that will either add technical complexity, or lack the full potential of optimal solutions.
This can be evaluated by asking whether your data platform is able to fulfill all requested use cases, and still maintain an acceptable level of technical complexity for your data platform team.
A suboptimal setup can display several symptoms:
- Some of the data platform components are only understood by a small part of the team
- Data sources need a major amount of preprocessing before they can be loaded onto the data platform
- All incoming requests are forced to follow a limited number of patterns for loading or offering data
Summarizing thoughts about the business value of data mesh
The small number of real-life examples and the lack of companies publicly utilizing data mesh make it difficult to recognize the business value hidden in the new paradigm. The drive for change in data platform thinking has to happen by identifying and addressing the symptoms with the current state, and analyzing whether the value proposition of data mesh can help alleviate them.
The approach is not something that small and medium-sized enterprises can really find valuable at full scale, as it can easily lead to a more siloed platform and ungoverned tangible architecture. But even for smaller organizations, certain aspects such as product thinking and end-to-end data engineering competence can help enhance the productivity of the data platform as a whole.
Want to learn more about enhancing your organization’s productivity with data? Take a look at our new book Growth Reinvented: Turn Your Data and Artificial Intelligence into Money!