Translator press: write code well, fully test, communicate clearly with partners, gray release, online to have monitoring and one-button start and stop.

  • A Framework for Shipping High Quality Software)
  • Translator: Fundebug

In order to ensure readability, free translation rather than literal translation is used in this paper, and a lot of modifications are made to the source code. In addition, the copyright of this article belongs to the original author, and translation is for study only.

Both the software raiders and the technical lead share a common goal: to deliver high-quality products on time. But ambitious requirements, tight deadlines, and previously unexplained technical debt often derail the priorities that development teams have set. However, software quality is important, otherwise countless bugs will keep the development team busy and make an already tight development schedule even more stressful.

This blog post suggests a model for steadily releasing high-quality software.

source

This framework is something I’ve accumulated while working on a super challenging project. My goal is to make a web application globally accessible, scalable and credible. A simple goal that took months and months to accomplish:

  • Scale from a single server to a cluster of servers
  • Across all microservices, make changes to the basic services at the platform level
  • Always work with different teams to gain insights

All service deployments need to be seamless or the service will go down.

What would Donald Rumsfeld do?

Who’s Musfield?

Donald Rumsfeld was a German-Born American who graduated from Princeton University in 1954 and began serving in the U.S. Navy in the same year. He was elected to the U.S. House of Representatives from Illinois in 1962 and was re-elected four times. In 1969, he resigned from the U.S. House of Representatives and joined President Richard Nixon’s cabinet as director of the White House Office of Economic Opportunity and assistant to the President. He served as U.S. Ambassador to NATO in 1973 and was appointed White House Chief of Staff by President Gerald Ford in 1974. In 1975, he was appointed secretary of Defense, becoming the youngest secretary of defense in U.S. history. In 2001, he became the oldest secretary of defense in history.

At the age of 83, former Defense Secretary Donald Rumsfeld said, “Having spent my life in business, war, and politics, I decided to try to develop a mobile game app. “

He is not a professional software developer, but his following quote is on point:

There are known knowns. These are things we know that we know. There are known unknowns. That is to say, There are things that we know we don’t know. But there are also unknown unknowns. There are things we don’t know Don ‘t know.

This is a simplified version of Johari Window in philosophy. We apply it to software development, as shown in the following table:

The title Developers know Developers don’t know
Other developers know Everybody knows I don’t know, you know
Other developers don’t know I know you don’t know No one knows

Everybody knows

Requirements features, bugs discovered, user needs, and so on, we all know. However, a feature implemented in code does not necessarily behave as expected. For example, code that has not been tested can fall into the “know nobody knows” category until you have a clear idea of how the code works and document it clearly.

It’s one thing to think that your code will behave as you write it, it’s another to prove that it does. Unit testing, functional testing, and even manual step testing to increase what you know and reduce what you don’t.

Know you don’t know/don’t know you know

I’m going to combine these two because they’re related.

  1. I know you don’t know

    Part of this is when the developers themselves know it, but the other developers they work with don’t. A good example is when a new API is created to replace the old functionality, but another developer keeps using the old API. Another example is changing the behavior of some common code.

  2. I don’t know, you know

    Part of this is when the developer doesn’t know that other developers on the team do. For example, a seemingly small update to a core component can trigger a chain of code rewrites by other teammates. Or some strange trait that only a few siege lions know about.

Clear communication is important! It’s even okay to overcommunicate! Email frequently, review designs frequently, and keep in touch with users. If you’re going to make big design changes, as a developer/team leader/manager, you need to spend a lot of time talking to your users and understanding their usage scenarios. Through user interviews, design can be optimized and future problems can be prevented in advance.

No one knows

This is the most challenging part. There is no model that can be prepared for unpredictable events-compare it to never happening. Those that don’t know include: hacking, data loss, theft, sabotage, software bugs, etc. Don’t worry about it, we can use two indicators to quantify: 1. Mean time to repair; 2. Mean time to detect

The best thing to do is to keep those two times as small as possible. Imagine the business impact of 5 minutes and an hour spent repairing data corruption.

  1. Mean detection time

    A monitoring system is necessary for basic metrics (memory, CPU, disk read/write rate, etc.), HTTP request status (500, 400, etc.) and others. Best case scenario: a problematic release immediately triggers an alarm sent to the administrator. This ensures a timely response to problems and speedy recovery. (You can use fundebug to monitor online bugs in real time.)

  1. Mean repair time

Configure a switch on certain properties in the system so that if that property causes serious problems on a release, it can be turned off immediately to prevent serious damage.

In addition, you need to have an agile release system. If it takes you two days to successfully publish a fix, your average fix time is 2 + days. You need to optimize your entire release process.

What else do you want to say?

Before releasing a core update, be sure to ask yourself these questions:

  1. Is there an adequate log monitoring system to facilitate debugging?
  2. Is there an on-line switch configured for risky features? If there is a problem, how long will it take to shut it down?
  3. After a feature is released, if something goes wrong, are there enough metrics to help analyze it?

One publishing method is recommended here: gray publishing.









Are your users experiencing bugs?

Experience the Demo
Free to use