This article discusses the concepts, planning, and design approaches involved in microservices and DDD, and attempts to split a single application into multiple DDD-based microservices.

Definition of microservices

“Micro” in microservices, while indicating the size of the service, is not the only criterion that makes an application a microservice. When teams move to a microservices-based architecture, their goal is to increase agility, which is to deploy functionality autonomously and frequently.

Therefore, it is difficult to give a simple definition of the microservices architectural style. I like Adrian Cockcroft’s brief definition of microservices: “A service-oriented architecture consists of loosely coupled elements with bounded contexts.”

Although this defines a high-level design heuristic, the microservice architecture has characteristics that set it apart from previous service-oriented architectures. Based on previous articles, we have summarized some characteristics that a microservice architecture should have:

  1. Services are well defined around business context rather than arbitrary technical abstractions;
  2. Hide implementation details and expose functionality through intent interfaces;
  3. Services do not share internal structures beyond their boundaries, such as databases;
  4. The service has the capability of quick fault recovery.
  5. The team has independent functions and can release changes independently.
  6. Teams embrace cultures of automation, such as automated testing, continuous integration, and continuous delivery.

In short, we can summarize this architectural style as follows:

A loosely coupled service-oriented architecture in which each service is encapsulated within a well-defined bounded context, enabling fast, frequent, and reliable delivery of applications.

Domain-driven design and bounded context

The power of microservices lies in clearly defining their responsibilities and defining the boundaries between them. Its purpose is to establish high cohesion within the boundary and low coupling outside the boundary. That is, things that tend to change together should stay together. As with many real-life problems, this is easier said than done, and businesses evolve and assumptions change. Therefore, refactoring capability is another key consideration when designing systems.

In our view, domain-driven design (DDD) is the key, an essential tool when designing microservices, whether it’s breaking down individual applications or building a new project from scratch. Domain-driven Design, best known for Eric Evans’ work, is a set of ideas, principles, and patterns that help us design software systems based on the underlying models of the business domain. Developers and domain experts work together to create business models using a common language. These models are then bound to meaningful systems and collaborative agreements are established between those systems and the teams working on those services. More importantly, they design conceptual contours or boundaries between systems.

Microservice design draws inspiration from these concepts, as all of these principles contribute to building modular systems that can change and evolve independently.

Before we go any further, let’s take a quick look at some basic DDD terms. A complete overview of domain-driven design is beyond the scope of this article.

Domain: Represents the work done by the organization. Such as retail or e-commerce.

Subdomains: Organizations or business departments within an organization. A domain consists of multiple subdomains.

Ubiquitous Language: This is a language for expressing models. In the example below (Figure 1), Item is a model that is the unified language for each subdomain. Developers, product managers, domain experts, and business stakeholders can all agree on using the language and use it in their artifacts (code, product documentation, and so on).

Figure 1: Subdomain and bounds context in the e-business domain

Bounded Contexts: Domain-driven design defines Bounded Contexts as “Settings that determine the meaning of a word or statement as it occurs.” In short, this means that the model has meaning within boundaries. In the example above (Figure 1), “Item” has a different meaning in each context. In the Catalog context, Item represents a product that is available for sale, while in the Cart context, it represents an Item option that the customer has added to the Cart. In Fulfillment context, it represents the warehouse material to be shipped to the customer. These models are different, and each model has a different meaning and may contain different attributes. By separating these models and isolating them within their respective boundaries, we can express them freely without ambiguity.

Note: The difference between a subdomain and a bounded context must be understood. Subdomains belong to the problem space, which is how our business sees the problem, and bounded contexts belong to the solution space, which is how we will implement the solution to the problem. Theoretically, each subdomain may have multiple bounded contexts, although we strive to provide only one bounded context per subdomain.

How do microservices and bounded contexts relate

Now, where do microservices fit in? Can each bounded context be mapped to a corresponding microservice? Not necessarily. Let’s see why. In some cases, the bounds or contours of a bounded context can be very large.

Figure 2: Bounded context and microservices

Consider the example above. There are three different models in the pricing boundary context: Price, Priced items and Discounts, which take care of the Price of the catalogue item, calculate the total Price of the list item and the Discounts they use respectively. We could create a single system with all of the above models, but it would be an unreasonably large application. As mentioned earlier, each data model has its immutability and business rules. Over time, if we’re not careful, the system can turn into a big ball of mud, with blurred boundaries, overlapping responsibilities, and possibly even back to where we started — monolithic applications.

Another approach to modeling this system is to separate or group related models into separate microservices. In DDD, these models (prices, pricing items, and discounts) are known as Aggregates. An aggregation is a self-contained model composed of related models. We can only change the state of an aggregation through a published interface, and aggregation ensures consistency, and invariants are always in good shape.

Formally, an aggregation is a cluster of associated objects, viewed as a unit of data change. External references are limited to one member of the specified aggregation, the aggregation root. A set of conformance rules must be applied at the boundaries of aggregation.

Figure 3: Microservices in a pricing context

Again, there is no need to model each aggregation as a different microservice. This turns out to be the case for the service (aggregation) in Figure 3, but this is not necessarily a rule. In some cases, it might make sense to host multiple aggregations in a single service, especially if we don’t fully understand the business domain. It is important to note that consistency is only guaranteed in a single aggregation, and the aggregation can only be modified through published interfaces. Any violation of these rules increases the risk that the application will become a big ball of mud.

Context mapping: An approach to precisely demarcating microservice boundaries

Another basic tool is context mapping, which again comes from domain-driven design. A single application usually consists of different models, which are mostly tightly coupled, which may know each other’s implementation details, and which may have side effects from changing one model to another, etc. Determining these models (in this case aggregations) and their relationships is critical when you break down individual applications. Context maps help us do this. They are used to identify and define relationships between various bounded contexts and aggregations. In the example above, the bounds context defines the boundaries of the model (price, discount, and so on). Context maps define the relationships between these models and between different contexts. With these dependencies identified, we can determine the correct collaboration model between the teams implementing these services.

A full exploration of context mapping is beyond the scope of this article, but we’ll illustrate it with an example. The following figure shows the various applications that process payment for e-commerce orders.

The shopping cart context is responsible for online authorization of the order; Order context handles the payment process after order fulfillment, such as settlement; The contact center handles any anomalies such as payment retries and changes to the payment method used for orders. For simplicity, we assume that all of these contexts are implemented as separate services, and that all of them encapsulate the same model. Notice that these models are logically identical. That is, they all follow the same unified domain language — payment method, authorization, and settlement. It’s just that they’re part of different contexts.

Another indication is that the same model propagates in different contexts, all of which integrate directly with a single payment gateway and perform the same operations with each other.

Figure 4: Wrong context mapping defined

Redefine service boundaries: Map the aggregation to the correct context

There are some very obvious problems with the above design (Figure 4). Payment aggregation is part of multiple contexts. It is impossible to enforce immutability and consistency across various services, let alone the concurrency issues between these services. For example, what happens if the contact center changes the payment method associated with the order while the order service is trying to settle with the payment method previously submitted. Also, note that any change in the payment gateway will force changes to multiple services, possibly involving multiple teams because they share these contexts.

With some tweaking and aligning the aggregation with the correct context, we can better represent these subdomains (Figure 5). A lot of changes are needed.

Let’s look at the changes:

  1. Payment aggregation has a new home — payment services. The service also extracts payment gateways from other services that require payment services. Because a single bounded context now has a single aggregation, invariants are easy to manage; All transactions occur within the boundaries of the same service, avoiding any concurrency issues.
  2. Payment aggregation uses an anti-corruption layer (ACL) to isolate the core domain model from the payment gateway’s data model, which is typically provided by a third party and is subject to change. In future articles, we will delve into application design based on the “ports and adapters” pattern. The ACL layer typically contains adapters that transform the payment gateway’s data model into the payment aggregation data model.
  3. The shopping cart service invokes the payment service by calling the API directly, because the shopping cart service needs to complete the payment authorization when the customer makes a purchase on the site.
  4. Document the interaction between the order and the payment service. The order service emits a domain event (more on this later in this article). The payment service listens for this event and completes the settlement of the order
  5. There may be many aggregations for contact center services, but we are only interested in order aggregation in this use case. When the payment method is changed, the service emits an event that the payment service responds to by revoking the previously used credit card and processing the new credit card.

Figure 5: Redefined context mapping

Typically, monolithic or legacy applications have many aggregations and overlapping boundaries. Creating context maps of these aggregations and their dependencies will help us understand the contours of extracting any new microservices from these monolithic applications. Keep in mind that the success of a microservice architecture depends on low coupling between aggregates and high cohesion within aggregates.

It is also important to note that the bounded context itself is a suitable cohesive unit. Even if a context has multiple aggregations, you can combine the entire context and its aggregations into a single microservice. We have found this heuristic particularly useful for areas that are somewhat obscure, such as new business areas that organizations are entering. We may not know enough about the correct boundaries of separation, and any premature aggregation decomposition may lead to expensive refactoring. Imagine having to merge two databases into one because of data migration, because we stumbled upon two aggregations belonging to the same class. But make sure these aggregations are sufficiently isolated through the interface so that they don’t know each other’s intricate details.

Event storms: Another technique for identifying service boundaries

Event Storming is another essential technique for identifying aggregations (and microservices) in a system. It is a very useful tool for decomposing monolithic applications as well as designing complex microservice ecosystems. We have broken down a complex application using this technique and intend to write a separate article about our experience with event storms. In this article, we only give a quick high-level overview.

In short, event storms are brainstorms that occur within application teams (singleton, in this case) to identify the various domain events and processes occurring in the system. The team also needs to determine the sum or model of the effects of these events and any subsequent effects that may result. As the team does this brainstorming, they will identify different overlapping concepts, ambiguous domain languages, and conflicting business processes. They group related models, redefine aggregations, and identify overlapping processes. As this work progresses, the bounded context to which these aggregations belong becomes clear. Event storm sessions can be useful if all the teams are in the same room (physical or virtual) and start mapping events, commands, and processes on a Scrum-style whiteboard. At the end of this exercise, the following results are usually produced:

  1. Redefined aggregate list. These could be the new microservices
  2. Domain events that need to flow between these microservices
  3. Commands directly invoked by other applications or users

Here is a sample sample we produced at the end of an event storm workshop. Agreeing on the right aggregation and bound context is a great collaborative activity for the team. In addition, the team ended this meeting with a common understanding of the domain, a common language, and precise service boundaries.

Figure 6: Event storm board

Communication between microservices

As a quick review, a single application has multiple aggregations within a single process boundary. Therefore, the consistency of individual aggregations can be managed within this boundary. For example, if a customer places an order, we can reduce the inventory of goods and send an email to the customer, all in one transaction. All operations will either succeed or fail. However, when we break down the singleton and scatter the aggregation into different contexts, we will have dozens or even hundreds of microservices. But processes that have hitherto existed within a single boundary of a single application are now spread across multiple distributed systems. Achieving transactional integrity and consistency across all of these distributed systems is very difficult and comes at the expense of system availability.

Microservices are also distributed systems. Thus, the CAP theorem also applies to them: “A distributed system can only provide two of the three required features: Consistency, Availability, and Partition Tolerance (‘ C ‘– Consistency,’ A ‘– Availability, and’ P ‘– Partition Tolerance in CAP).” In real-world systems, partition fault tolerance is non-negotiable — the network is unreliable, virtual machines can go down, latency between regions can get worse, and so on.

Therefore, we can choose “availability” or “consistency”. Now, we know that sacrificing “usability” in any modern application is not a good idea either.

Figure 7: CAP theorem

Design your application around final consistency

If we wanted to build transactions across multiple distributed systems, we would again be stuck with a single application. But this time it will be the worst kind of monomer: a decentralized monomer application. If any of these systems becomes unavailable, the entire process becomes unavailable, often resulting in a poor customer experience, broken promises, and so on. In addition, changes to one service often lead to changes to another, leading to complex and expensive deployments. Therefore, it is best to design the application according to our use cases, tolerating slight inconsistencies to improve usability. For the example above, we can make all processes asynchronous to achieve final consistency. We can send email asynchronously, independent of other processes; If the promised item is later unavailable in the warehouse, the item may need to be replenished, or we may stop accepting orders for the item beyond a certain threshold.

Sometimes, we might encounter a scenario where two aggregated strong ACID transactions might be required across different process boundaries. This is a great sign to revisit these aggregations and combine them into one. Event storms and context maps will help us identify these dependencies early before we start breaking them down at different process boundaries. The cost of merging two microservices into one is high, something we should strive to avoid.

Support for event-driven architectures

Microservices can emit basic changes that occur on their aggregation, called Domain events, and any service that is interested in these changes can listen for these events and perform actions in its Domain. This approach avoids any coupling in behavior (one domain does not have to dictate what other domains should do) and in time (the successful completion of a process does not depend on all systems being available at the same time). Of course, this would mean that the system is ultimately consistent.

Event-driven architecture

In the example above, the order service publishes an event: the order has been canceled. Other services subscribed to the event handle their respective domain functions: payment services for refunds, inventory services to adjust inventory of goods, and so on. To ensure the reliability and resiliency of this integration, note the following:

  1. The producer should ensure that at least one event is emitted. If a failure occurs during the process, ensure that a fallback mechanism exists to refire the event
  2. Consumers should ensure that events are consumed in an idempotent manner. If the same incident happens again, there will be no side effects for consumers. Events may also arrive out of order. Consumers can use timestamp or version number fields to ensure that events are unique.

Because of the nature of some use cases, event-based integration may not always be possible. Look at the integration between the shopping cart service and the payment service. This is a synchronous integration, so we need to pay attention to a few things. This is an example of behavioral coupling — the shopping cart service might invoke a REST API from the payment service and instruct it to authorize the payment of the order, while the time coupling — the payment service needs to be available to the shopping cart service before it can accept the order. This coupling reduces the autonomy of these contexts and may also introduce unnecessary dependencies. There are several ways to avoid this coupling, but if we use all of these options, we lose our ability to provide immediate feedback to our customers.

  1. Transform REST apis into event-based integration. However, if the payment service only exposes the REST API, this option may not be available
  2. The shopping cart service immediately accepts the order and has a batch job to take it over and invoke the payment service API
  3. The shopping cart service generates a local event and then invokes the payment service API

Combining the above approach with retry results in a more resilient design in the event of failures and upstream dependencies (payment services) becoming unavailable. For example, synchronous integration between shopping carts and payment services can be backed up with event-based or batch retries in the event of a failure. This approach has an additional impact on the customer experience: customers may have entered incorrect payment details, and we cannot force them to be online while we process payments offline. Alternatively, recovering failed payments could increase costs for the business. But in all possible cases, making the shopping cart service resilient to the unavailability or failure of the payment service does more good than harm. For example, if we can’t collect payments offline, we can notify the customer. In short, there are trade-offs between user experience, resilience, and operating costs, and it’s wise to take these trade-offs into account when designing your system.

Avoid choreographing services to meet the specific data needs of callers

An anti-pattern that exists in any service-oriented architecture is that services cater to a particular access pattern of the caller. Typically, this happens when the caller team works closely with the service provider team. If teams are developing a monolithic application, they often create an API that crosses the boundaries of different aggregations, thus tightly coupling those aggregations together. Let’s look at an example. For example, an order details page on the Web requires a mobile application to display order details and order refund processing details on a single page. In a singleton application, the Order-get-API (assuming it is the REST API) needs to query both the Order and refund, merge the two aggregations, and send a composite response to the caller. Because the aggregation is on the same process boundary, this can be done without much overhead. The caller can get all the data he needs in a single session.

If the order and refund are part of different contexts, the data no longer appears within a single microservice or aggregation boundary. One option to preserve the same functionality for the caller is to make the order service responsible for invoking the refund service and creating a composite response. This approach causes the following problems:

  1. The order service is now integrated with another service purely to support callers who need refund data and order data. The autonomy of the order service is now reduced because any changes to the refund aggregation result in changes to the order aggregation.
  2. The order service has another integration, so there is another point of failure to consider: if the refund service fails, can the order service still send some data, and can the caller handle the failure gracefully?
  3. If the caller needs to make a change to get more data from the refund aggregation, both teams now need to make the change simultaneously
  4. If this pattern is followed across platforms, it can lead to a complex web of dependencies between domain services that cater to caller-specific access patterns.

Back-end dedicated to the front end (BFFs)

One way to mitigate this risk is to have the caller team manage the orchestration between the various domain services. After all, the caller knows more about access patterns and has complete control over any changes to those patterns. This approach decouples domain services from the presentation layer, allowing them to focus on core business processes. However, if Web and mobile applications start calling different services directly, rather than calling a composite API from a singleton, this can impose performance overhead on those applications — making multiple calls over a low-bandwidth network, processing and merging data from different apis, and so on.

Instead, another mode called Backend for front-ends can be used. In this design pattern, back-end services created and managed by consumers, in this case the Web and mobile teams, are responsible for integrating multiple domain services purely to provide a front-end experience to the customer. Web and mobile teams can now design data contracts based on the use cases they need. They can even use GraphQL instead of the REST API to have the flexibility to query and get what they want. It is important to note that this service is owned and maintained by the consumer team, not the team that provides the domain service. Front end teams can now optimize for their needs — mobile applications can request smaller payloads, fewer sessions from mobile applications, and so on. Below is the revised business flow chart. The BFF service now invokes the Order and Refund domain services for its use cases.

For the back end of the front end

It is also useful to build BFF services early in order to avoid splitting too many services out of a monolithic system. Otherwise, either domain services must support interdomain choreography, or Web and mobile applications must invoke multiple services directly from the front end. Both options lead to performance overhead, one-time work, and a lack of autonomy between teams.

By Chandra Ramalingam

Translator: Liu Yameng

Planner: Tian Xiaoxu