• Dropbox Reveals Atlas – a Managed Service Orchestration Platform
  • Author: Eran Stiller
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: Hoarfroster
  • Proofreader: 5Reasons, Kamly, Husiyu

Dropbox has launched Atlas, a hosted service choreography platform

In a recent blog post, Dropbox announced Atlas, a platform that aims to provide users with all the conveniences of a service-oriented architecture while minimizing the maintenance costs of owning a service.

The goal of Atlas is to support small, stand-alone functions that save product teams the overhead of managing various services, including capacity planning, alarm Settings, and so on. Atlas also makes use of the background automatic deployment service to provide users with serverless systems (such as AWS Fargate), while being supported by the background automatic configuration service. According to authors Naphat Sanguansin and Utsav Shah, they evaluated using off-the-shelf solutions to run the platform. However, to reduce migration risk and ensure low engineering costs, they decided to continue hosting the services on the same deploy-choreography platform as the rest of Dropbox.

The reason for building the Atlas project is that Monolith Metaserver, Dropbox’s central Python library, is a replacement. The development of Altas will be a multi-year process and is still ongoing today. Atlas is currently providing more than 25 percent of the services of monolith, which it intends to replace. The authors give key conclusions about the migration process:

One of the most important things we’ve learned over the years is that writing well-thought-out code is critical early in a project’s life cycle. Otherwise, technical burdens and code complexity can quickly come together. Cancellation of import cycles and Metaserver (…) Decomposing was probably the most strategically effective part of the project, as it prevented new code from causing problems while also simplifying our code.

They point out that many previous attempts to improve Metaserver have failed because of the size and complexity of the code base. This time, they considered Altas as a stepping stone rather than a milestone and devised an execution plan for Atlas. The idea is that if the next part of the project fails for whatever reason, each incremental step already has its value. A key example of this strategy involves an improvement to an integrated code architecture that has value with or without Atlas implementation. In addition, many of the enhancements developed for Atlas will be ported back into Metaserver to further increase project value.

Before and after, graph sourceDropbox

The design of Atlas involves some key work around componentization, choreography, and operability. Atlas introduced Atlasservlets as logical, minimal groupings of HTTP routes to improve componentization. “In preparation for Atlas, we worked with the product team to assign Atlas Servlets to each route in Metaserver, thus building more than 200 Atlas Servlets out of more than 5,000 routes,” they said. Each Servlet is assigned an owner with unique rights to manage it. Also, to break down the Metaserver codebase, they had to break most Python import cycles. This process took several years to achieve.

To improve the orchestration, each Servlet in Atlas is its own cluster. By default, this policy provides isolation capability, because routes with abnormal behavior should only affect other routes in the same Atlasservlet. Again, this decision allows code to be pushed independently. In addition, Dropbox decided to standardize on gRPC. To continue processing HTTP traffic, they use envoys provided in the GRPC JSON transcoder, which they use as proxy servers and load balancers in front of servlets.

HTTP transcoder, graph sourceDropbox

According to them, “the secret secret of Atlas is manageable experience” when it comes to operational issues. Much of this work relies on automated Canary analytics and automated scaling. The former allows every code change and push to be automatically checked by the system before being put into production, while the latter eliminates much of the need for capacity planning.

Canary version analysis, image sourceDropbox

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.