The first partDevOps introduce

  • DevOps The three-step approach to work: flow, feedback, and continuous learning and experimentation

A brief history of

  • DevOps Based on lean, constraint theory, Toyota production system, flexible engineering, learning organization, safety culture, personnel optimization factors and other knowledge systems, and reference to the high trust management culture, service-oriented leadership, organizational change management methodology. Apply all of these most credible principles togetherIT In the value streamDevOps Such results

The lean movement

  • Two key principles of Lean are the belief that lead time (the time it takes to turn raw materials into finished products) is one of the best measures of improving quality, customer satisfaction, and employee happiness; Delivery of small batches of tasks is a key factor in reducing lead time

  • Lean principles focus on how to create value for customers through systematic thinking that involves establishing lasting goals, embracing scientific thinking, creating collaborative models that flow and pull (not push), promoting quality at source, being humbly oriented, and respecting all individuals in the process

The agile manifesto

  • An important principle in the Agile Manifesto is to “deliver working software frequently, in weeks or months, with shorter cycles recommended,” and to emphasize incremental releases using small batch tasks

  • Mike came to the conclusion that most companies in the lean community have not grasped the Kata, the core of Lean.


Chapter 1 agile, Continuous Delivery, and the Three-step approach


1.1 Manufacturing value stream

  • A basic concept in Lean is called value stream

  • For “a series of orderly delivery activities performed by an organization based on customer needs”, or “a series of activities required to design, produce and provide products or services to customers, which contains the dual value of information flow and material flow”

1.2 Technical value stream

  • In DevOps, we typically define a technical value stream as “the process required to transform a business idea into a technology-driven service that delivers value to customers.”

  • Not only should it be delivered quickly, but it should also be deployed without disruption and disruption, such as disrupted customer service, performance degradation, or information security noncompliance

Focus on deployment lead time

  • Deployment lead time is a subset of the value stream and the focus of this book. The value stream starts when Engineer 1 (including development, QA, IT operations, and information security) commits a change to version control, and ends when the change successfully runs in production, provides value to the customer, and generates effective feedback and monitoring information

  • The goal is to adopt a pattern where testing and operations are synchronized with design and development, resulting in a faster value stream and higher quality. This synchronous pattern can only be achieved if the work tasks are in small batches and quality is built into every part of the value stream

  • Define lead time and processing time

  • Lead time and processing time (sometimes referred to as touch time or task time) 3 are two common measures of value stream performance

  • The lead time starts after the work order is created and ends when the work is completed. The processing time starts when the job is actually processed and does not include how long the job has been queued (see figure below)

  • Common scenario: deployment lead time of several months
  • This is especially true in test and production environments where lead times are long and rely heavily on manual testing or various approval processes

  • As deployment lead times get longer, they need to be remedied by potholes at almost every stage of the value stream. In general,

  • Our goal: minute-level deployment lead times

  • In DevOps’s ideal world, developers get quick and continuous feedback on their work, can quickly and independently develop, integrate, and validate code, and deploy it to production (either by themselves or by others)

  • In order to achieve these goals more easily, architecture design needs to be optimized through modularity, high cohesion, and low coupling

Focus on the rework indicator — %C/A

  • The third key metric in the technical value stream is completion time and the exact percentage of total time spent (%)C/A). This metric reflects the output quality of each step in the value stream

1.3 Three-step method:DevOpsBasic principles of

  • The Phoenix Project developed the principle based on the three-step approachDevOpsBehaviors and patterns

  • Step 1: Make development to operation work flow quickly from left to right. In order to maximize the optimize workflow, work needs to be visual, reduce each batch size and wait for the interval, by speeding up technological value stream velocity, shorten the lead time needed for meet the demand of internal or external customers, especially shorten the time needed for code deployment to a production environment, can effectively improve the quality of work and production, and make the enterprise have a stronger external competitiveness
  • Step 2: Apply continuous, rapid work feedback at each stage from right to left. This not only creates safer working systems, but also allows catastrophic accidents to be detected and resolved before they occur
  • Step 3: Build a culture of creativity and credibility that supports dynamic, rigorous, and scientific experimentation. By actively taking risks, you can learn not only from success but also from failure

1.4 summary

  • Value stream concept
  • Important metric — lead time
  • Three steps (supportDevOpsThe principle of)

Chapter 2 Step 1: Flow principle

  • It’s about creating a fast, smooth workflow from development to operations that delivers value to customers

2.1 Make work visible

  • The content of work in technology is invisible, which is a significant difference from the manufacturing value stream
  • Because clicking is so easy, different teams can “kick around” with incomplete information, and problems can be passed downstream, completely unnoticed until the product is not delivered to the customer on time, or the application breaks down in production
  • To be able to identify where work is flowing, queuing, or stalling, you need to visualize the work as much as possible. Visual workboards are a better way to work, as in kanban orSprintOn the plan board, use paper or electronic cards to display tasks

  • By having all the work in each work center in a queue and visually displayed, it is easier for stakeholders to prioritize the work from a global perspective

2.2 Limit wIP

  • But technical workers are easily interrupted because the consequences of the disruption can seem invisible to everyone. For example, if an engineer is assigned to multiple projects at once, he has to switch back and forth between multiple tasks, cognitive rules, and goals, paying the cost of re-entering the role

  • When using kanban to manage work, you can limit multitasking by, for example, setting a limit on wIP per column of kanban or per workcenter, and placing an upper limit on the number of cards in each column

  • For example, set the upper limit of wIP for test work to 3. When there are already three cards in the test queue, adding new cards is not allowed until a card is completed or one of the three cards is returned to the previous queue (the column to the left)

  • Dominica DeGrandis, one of the experts in using Kanban in DevOps, points out that “controlling queue length (wIP) is a very powerful management tool because it is one of the most important factors affecting lead times —

  • When wIP is restricted, you may find that there is not much work to do because you have to wait for someone else. While it may be tempting to take on a new job (i.e., “Doing something is better than doing nothing”), it is better to identify the cause of the wait and help solve the problem of the wait

2.3 Reducing the batch size

  • Another key to building a smooth and fast workflow is to get the work done in small batch mode
  • In lean, an important lesson is that small batch models should be consistently pursued in order to reduce lead times and improve deliverables. In theory, the minimum batch is singleton flow, which is the processing of only one unit of product per operation

  • This high-volume release creates sudden, massive wIP, resulting in massive disruption to all downstream workcenters, resulting in poor liquidity and reduced quality
  • In the development (orDevOpsIn a process, batch size is the number of units that work products move between phases. For software, the easiest thing to see is code. When engineers check in code, they do a certain amount of work in batches
  • In the technology value stream, the singleton stream can be achieved through continuous deployment. Each change committed to version control is integrated, tested, and deployed to production

2.4 Reduce handover times

  • In the technical value stream, if deployment lead times are measured in months, it is usually because it takes hundreds or even thousands of operations to deploy code from a version control system to production

  • Work waits occur when relying on resources shared by different value streams, such as centralized operations

  • Even in the best of circumstances, some information or knowledge is inevitably lost in the handover

  • To reduce these problems, either try to reduce the number of handoffs, automate most of the operations, or restructure the organization so that the team can independently provide value to the customer without having to rely on others

2.5 Continuously identify and improve constraint points

  • In any value stream, there is always a flow direction, a constraint point, and any optimization that doesn’t address that constraint point is an illusion
  • There are 5 key steps to optimize the work center after the constraint point
  • Identify the constraints on the system
  • Decide how to exploit this system constraint point
  • Based on the above decision, consider the overall work
  • Improve system constraint points
  • If the constraint point has been broken, go back to the first step, but eliminate inertial system constraints
  • inDevOpsIf you want to shorten the lead time from a month or quarter to a few minutes
  • Environment setup: If production or test setup always takes weeks or months, on-demand deployment is not possible. The solution is to create a fully self-service environment on demand, ensuring that the team can automate the creation of the environment when it needs it
  • Code deployment: If the deployment of code takes weeks or more (for example, 1,300 manual, error-prone operations involving up to 300 engineers per deployment), then on-demand deployment is not possible. The solution is to automate the deployment process as much as possible so that any developer can automate deployment on demand
  • Preparation and execution of tests: If every code deployment takes two weeks to prepare the test environment and configure the data set, and another four weeks to perform all regression tests manually, then on-demand deployment cannot be achieved. The solution is to automate testing so that it can keep pace with code development while performing deployment safely and in parallel
  • Tightly coupled architecture: If the architecture were tightly coupled, on-demand deployment would not be possible because every time a code change was made, the engineer would have to get permission from the change review board to implement the change. The solution is to create loosely coupled architectures so that developers can make changes safely and autonomously, increasing productivity
  • If the above constraints are broken, the next constraint may be the development department or product manager

2.6 Eliminate dilemmas and waste from the value stream

  • Waste and dilemmas

  • Work in progress: This refers to any work in the value stream that has not been fully completed (for example, requirements documents or change orders that have not yet been reviewed), work that is in a queue (such as work orders waiting for QA review or server administrator review)

  • Extra work: Additional work performed during delivery that does not add value to the customer, which may include documents that are never used in the downstream workcenter, or reviews or approvals of output that do not add value

  • Extra features: Features built in the delivery process that the organization or customer does not need at all (such as “gold plating”)

  • Task switching: When people are assigned to multiple projects and value streams, they need to switch contexts

  • Wait: Wait between jobs due to competition for resources

  • Movement: The amount of work done to move information or data between work centers. For example, in a project where frequent communication is required, team members are not actually working together, so they can’t sit together and work closely together. In addition, the transition of work also creates a waste of movement, requiring additional communication to clarify any ambiguities

  • Defects: Due to errors, imperfections or ambiguities in information, materials or products that require a certain amount of effort to confirm

  • Non-standard or manual operations: Non-standard or manual work that relies on others, such as using servers, test environments, and configurations that cannot be automatically rebuilt repeatedly

  • Pit filler: People and teams had to be placed in situations that were not reasonable in order to achieve the organization’s goals (submitted hundreds of work orders overnight to the software version)

  • Our goal is to visualize these wastes and dilemmas (wherever pit fillers are needed) and systematically improve them to reduce or eliminate them in order to achieve the goal of fast flow


Chapter 3 step 2: Feedback principles

  • The first step describes the principles that enable work to flow quickly from left to right in the value stream. The second work method describes the principles that enable quick and continuous feedback at each stage from right to left

3.1 Work safely in complex systems

  • An important feature of complex systems is the inability to see the system as a whole, to understand how the parts fit together
  • Another characteristic of complex systems is that doing the same thing twice does not necessarily result in the same result
  • Take these four steps to make complex systems work more securely
  • Manage complex work to identify design and operational issues
  • Work together to solve problems and build new knowledge quickly
  • Apply the new knowledge of the region to the global scope throughout the organization
  • Leaders need to continue to cultivate people with these talents

3.2 Discover problems in time

  • Establish fast feedback and feedforward loops during all work execution. This includes creating automated build, integration, and test processes to detect early code changes that could lead to defects
  • We also need to establish a comprehensive monitoring system to monitor the health of service components in the production environment so that unexpected service conditions can be quickly detected. Monitoring systems can also help us measure if we are off target and radiate the results across the value stream so we can see how our actions affect the rest of the system

3.3 Work together to overcome problems and gain new knowledge

  • Once a problem arises, we must also brainstorm and mobilize all involved to solve it

  • The goal of brainstorming is to contain the problem, prevent it from spreading, and then locate and deal with it to avoid recurrence

  • For all participants to gain a deeper understanding of how to manage the system and turn the inescapable, early stage of ignorance into a learning process

  • The reasons for this are as follows:

  • Prevent problems from being brought downstream, where the cost and effort of fixing them increases exponentially, and you run up technical debt

  • Prevents the workcenter from starting new work, which might introduce new errors into the system

  • If the problem is not resolved, the workcenter may encounter the same problem again in the next operation (say 55 seconds later), requiring a higher repair cost

  • This all-for-all approach seems to go against the grain of conventional management, as local problems disrupt operations as a whole. However, learning was made possible by universal participation

  • PDCA Ring — Plan, Do, Check, Act

  • Catastrophic accidents can be nipped in the bud only by solving small problems as early as possible in a national way

  • In order to implement rapid feedback in the technology value stream, we must establish mechanisms equivalent to light rope and public response

  • When the safety cord is triggered, we get together to solve the problem and stop doing any new work until the problem is solved. 4 This provides quick feedback to everyone in the value stream (especially the person who caused the system failure), allowing us to quickly isolate and locate the problem before it becomes more complex and causes the cause and effect of the problem to become blurred

3.4 Guarantee quality at source

  • Examples of ineffective quality control
  • Other teams are required to help complete a series of tedious, error-prone, and manual tasks that should be automated by the demanders themselves
  • It requires approval from busy people far from the actual workplace, forcing them to make decisions without understanding the situation and potential impact of their work, or simply to routinely stamp approval
  • Write lots of documents with dubious details that are out of date soon after they are written
  • Push a lot of work to the operation and maintenance team and expert committee for approval and processing, and then wait for the response
  • As much automation as possible is usually performed byQAAnd information security personnel to carry out quality checks. Automated tests are performed on demand, without the need for developers to request or initiate tests to the test team. In this way, developers can quickly test their code and even deploy their code changes into production

3.5 Optimized for downstream work centers

  • Lean defines that we must design for two types of customers: external customers (who are most likely to pay for the services we provide) and internal customers (who immediately follow us to receive and process the work). According to Lean principles, our most important customers are our downstream customers

3.6 summary

  • Establishing rapid feedback mechanisms is critical to achieving high quality, reliability, and security in the technology value stream. To do this, identify problems as they occur, work together to solve them and build new knowledge, control quality at source, and constantly optimize for downstream work centers

Chapter 4 step 3: Principles of continuous learning and experimentation

4.1 Establish learning organization and safety culture

  • Three types of corporate culture
  • Pathological: Pathological organizations are characterized by a lot of fear and threat in the organization. Due to political reasons, individuals often hide or distort the truth in order to protect their own interests. In such organizations, faults and accidents are often hidden
  • Bureaucratic: Bureaucratic organizations are characterized by rigid rules and processes, and all departments often “clean up their own mess.” In such organizations, accidents are dealt with through a system of judgment, which often results in a mix of kindness and punishment
  • Generative: Generative organizations are characterized by active exploration and sharing of information to enable the organization to better fulfill its mission. In such an organization, all employees across the value stream share responsibility, reflect positively on the incident, and conduct a true root cause investigation

  • In the event of accidents and failures, focus on how to redesign the system to prevent recurrence rather than go after people
  • For example, a non-finger-pointing review could be conducted after each accident to provide an objective explanation of why and how the accident occurred and to agree on the best measures to optimize the system

4.2 Institutionalize the improvement of daily work

  • More important than the routine is the continuous improvement of the routine

  • Improve routine work by specifying time to set aside, including time to pay off technical debt, fix defects, refactor and optimize code and environment. You can set aside a period of time in between each development cycle, or schedule Kaizen Blitze sessions where engineers work as teams on issues of interest to them

  • For the technology value stream, making work systems more secure can also help identify and address potential risks. For example, we may start with a no-blame after-the-fact investigation of incidents that affect customer service, but over time we will gradually identify other potential risks

4.3 Transform local discovery into global optimization

  • Once results are achieved locally, they should be shared with others in the organization so that more people can benefit from them. Our goal is to take this tacit knowledge (that is, knowledge that is difficult to convey through documentation or communication) and translate it into explicit knowledge so that others can take this expertise and apply it in practice

  • NR is well known for its adherence to standardized procedures. Failure reports should be written in case of any process or operation deviation in order to accumulate experience. Are processes and system designs constantly updated based on these lessons, regardless of the strength of the failure signal or the magnitude of the risk

  • The result: when any new crew goes to sea, they quickly grow from the collective wisdom accumulated over time

  • In the technology value stream, we should build a global knowledge base through a similar mechanism. For example, turning all incident reports into a searchable knowledge base makes it easier for teams to use it to solve similar problems, while building an organization-wide shared source code base so that everyone can easily use the entire organization’s code, libraries, and configuration. These mechanisms help to translate the expertise of individuals into the collective wisdom of serving more members

4.4 Inject flexibility into your daily work

  • In the technical value stream, introducing similar strains into a system by reducing deployment lead times, improving test coverage, shortening test execution times, and even decoupling architectures when necessary can improve developer productivity and reliability

4.5 The leadership strengthens the learning culture

  • Leaders can use the following questions to help and coach experimenters
  • What did you do in the last step? What happened?
  • What did you learn from it?
  • What is the status quo?
  • What is the next target condition?
  • What are the obstacles in your current job?
  • What’s the next step?
  • What is the desired outcome?
  • When can I have a review?
  • The way leaders help front-line workers identify and solve problems in their daily work is actually at the heart of Toyota’s production system, as well as learning organization, improvement process and high reliability organization

4.6 summary

  • The third principle of the three-step approach achieves a learning organization, a high level of trust and cross-departmental cooperation between functions, an acceptance of the fact that “failures will always occur in complex systems”, and an encouragement to talk about any problems to build a safe work system

4.7 Part I Summary

  • Discusses theDevOpsThe three-step working method of organization transformation: flow principle, feedback principle and continuous learning and experiment principle

books

  • The Toyota Formula: Transforming What We Think about Leadership and Management
  • The Fifth Discipline: The Art and Practice of learning Organizations
  • Explore! Understanding Exploratory Software Testing
  • Kanban Approach: The Road to Successful Incremental Change in Technology Enterprises
  • Lean Thinking
  • The Fifth Discipline: The Art and Practice of learning Organizations
  • Toyota Routine
  • Lean Enterprise
  • Walking Management