Decoupling, architecture, and teams
This article discusses the relationship between code organizations and social organizations in software development. I discussed why software and teams don’t scale easily, lessons we can learn from biology and the Internet, and showed how we can decouple software and teams to overcome scaling problems.
The discussion is based on my 20 years of experience building large software systems, but I also provide research data to Nicole Forsgren _, Jez Humble, and Gene Kim_ to support most of the assertions I make here. This is a highly recommended reading.
Software and software teams cannot scale.
It’s a very common story where the first version of a product, perhaps written by one or two people, usually looks very easy. It may offer limited functionality, but it is written quickly and meets customer requirements. Customer communication is great because customers usually communicate directly with developers. Any defects can be fixed quickly and new features can be easily added. After a while, though slower. Version 2.0 took a little longer than expected, bugs were harder to fix, and new features didn’t seem so easy to roll out. The natural response to this is to add new developers to the team, but each additional person on the team seems to reduce productivity. As software ages and increases in complexity, it seems to shrink. In extreme cases, organizations can find themselves running on software that is very expensive to maintain and seemingly impossible to change. There are negative economies of scale. The problem is that you don’t have to “go wrong” with this, and it’s so common that one could almost say it’s a “natural” property of software.
Why is that? There are two reasons, code related and team related. Neither the code nor the team scales well.
As the code base grows, it becomes more and more difficult for an individual to understand. The limits of human cognition are fixed, and while it is possible for an individual to maintain a detailed mental model of a small system, once beyond a certain scale, it becomes larger than the scope of individual cognition. Once the team grows to more than five or more people, it is nearly impossible for one person to keep up with all parts of the system. When no one understands the complete system, fear prevails. In a large, tightly coupled system, it is difficult to know the impact of any significant change because the consequences are not localized. Instead of breaking down commonalities and creating abstractions and generalizations, developers learn to work with minimal impact and repetition. This feeds back into system complexity, further amplifying these negative trends. Developers no longer have any ownership over code they don’t really understand and are reluctant to refactor. Technical debt increases. This also leads to unpleasant and unsatisfactory work, and encourages “talent evaporation,” with the best developers, those who can more easily find work elsewhere, moving on.
Teams can’t scale. As the team grows, communication becomes more difficult. A simple network formula comes into play: C = n(n-1)/2 (where n is the number of people and C is the number of communication channels)
With the expansion of the team size, the communication and coordination needs of the team rise geometrically. It is difficult for individual teams to maintain a coherent entity within a certain scale, and even without managerial input, human societies naturally tend to split into smaller groups, which will lead to the formation of informal subgroups. Peer communication becomes difficult and is naturally replaced by emerging leaders and top-down communication. Team members are transformed from equal stakeholders in the system to mentored workers. Impaired motivation and lack of sense of ownership driven by the decentralized effect of responsibility.
Management often intervenes at this stage and formalizes a new team and management structure to organize them. But whether formal or informal, large organisations find it hard to keep people motivated and engaged.
It’s not fair to blame these scaling problems on unskilled developers and poor management, but scaling problems are a “natural” attribute of growing and aging software, and unless you spot them early, identify inflection points and work on them, it’s always going to happen and it’s hard to mitigate it. Software teams are being created, the amount of software in the world is growing, and most software is small, so a successful and growing product is created by a team that has no experience with large software. Development. It is unrealistic to expect them to recognise the inflection point and know how to respond when problems of scale start to bite.
Learn from nature
I recently read Geoffrey West’s excellent book Scale. He talks about the mathematics of scale in biological and socio-economic systems. His thesis was that all large and complex systems obey basic scaling laws. This is a fascinating read and highly recommended. For the purposes of this discussion, I want to focus on his point that many biological and social systems are remarkably scalable. Take a basic mammalian body plan. We share the same cell types, bone structure, nerves and circulatory systems as all mammals. However, the size difference between a mouse and a blue whale is about 10^7. How does nature use the same basic materials and plans to deal with organisms of such huge and different sizes? The answer seems to be that evolution discovered fractal branching networks. If you think about a tree, this is pretty obvious. Each small part of the tree looks like a small tree. The same is true of our mammalian circulatory and nervous systems, which are branches of fractal networks where a small part of your lung or blood vessel looks like a scaled-down version of the whole.
The picture
Can we take these ideas from nature and apply them to software? I think we can learn some important lessons. If we can build large systems with smaller parts that look like whole systems, then there is a chance that they will contain the pathologies that affect most software as it grows and ages.
Are there existing software systems that can scale successfully by multiple orders of magnitude? The obvious answer is the Internet, a global software system with millions of nodes. Subnets do look and work like smaller versions of the entire Internet.
Decouple the properties of software.
The ability to separate software components from the larger system is a core technology for successful scaling. The Internet is essentially a decoupled software architecture. This means that every node, service, or application on the network has the following properties:
• Follow shared communication protocols. • State is shared only through explicit contracts with other nodes. • No knowledge of implementation is required to communicate. • Versioning and standalone deployment.
The Internet can scale because it is a network of nodes that communicate through a well-defined set of protocols. Nodes share their state only by protocol, and the implementation details of a node do not need to be understood by the nodes it communicates with. The global Internet is not deployed as a single system; each node is individually versioned and deployed. Nodes come and go independently of each other. Internet protocol compliance is the only thing that really matters to the system. Who builds each node, when it is created or removed, how it is versioned, and the particular technologies and platforms it uses are irrelevant to the Internet as a whole. This is what we call decoupled software.
Decouple the attributes of the team.
We can expand the team by following similar principles:
• Each sub-team should look like a complete small software organization. • The team’s internal processes and communications should not become a focus outside the team. • How the team implements the software should not matter outside the team. • The team should communicate with the broader organization on external issues: common protocols, functionality, service levels, and resources.
Small software teams are more efficient than large software teams, so we should break up large teams into smaller teams. The lesson of nature and the Internet is that sub-teams should look like a single, small software organization. How small? Ideally, one to five people.
It is important that each team should look like a small independent software organization. Other approaches to team building are less effective. It’s often tempting to break up large teams by function, so we have a team of architects, a team of developers, a team of DBAs, a team of testers, a deployment team, and an operations team, but that doesn’t solve any of the scaling problems we talked about above. Each team needs to touch on individual functions, and if you want to avoid waterfall project management, you usually do it iteratively – you will. Communication boundaries between these functional teams become a major barrier to effective and timely delivery. Teams are not decoupled because they need to share important internal details to work together. The benefits vary from team to team: development teams are often rewarded for feature delivery, test teams are rewarded for quality, and support teams are rewarded for stability. These different interests can lead to conflicts and poor delivery. If the development team never has to read logs, why care about logging? Why care about delivery when the test team is rewarded for quality?
Instead, we should organize our teams through decoupled software services that support business functions or logical function groups. Each sub-team should design, code, test, deploy, and support their own software. Individual team members are more likely to be generalists than specialists, because a small team will need to share these roles. They should focus on automating as much of the process as possible: automated testing, deployment, and monitoring. Teams should choose their own tools and decide for themselves how to build their systems. While the organizational protocols the system uses to communicate must be decided at the organizational level, the choice of tools to implement the services should be delegated to the team. This fits well with the DevOps model of software organizations.
The degree of autonomy of the team reflects the degree of decoupling from the wider organization. Ideally, organizations should care about the functionality and ultimately business value that the team provides, as well as the cost of providing resources to the team.
The role of the software architect is important in this organization. Instead of focusing on the specific tools and technologies used by the team, or micromanaging the internal architecture of the services, they should focus on the protocols and interactions between the various services and the health of the overall system.
Inverse Conway: Software organizations should model the target architecture.
How can decoupled software and decoupled teams stay consistent? Conway’s law states that:
“The organization of designing systems…… The designs that are restricted to production are copies of the communication structures of these organizations.”
This is based on the observation that the architecture of a software system will reflect the team structure of the organization that created it. We can “crack” conway’s Law by reversing it; Organize our team to reflect the architecture we want. With this in mind, we should align our decoupled team with our decoupled software components. Should this be a one-to-one relationship? I think this is ideal, although a small team seems able to provide several decoupled software services. I think the inflection point for teams is larger than the inflection point for software, so this kind of organization seems to work. However, it is important that software components remain isolated from their own release and deployment stories, even if they share the same team. If the team gets too big, we want to be able to split the team, and being able to hand over various services to different teams would be a major benefit. We cannot do this if services are tightly coupled or share processes, version control, or deployment.
We should avoid having multiple teams working on the same component, which is an anti-pattern and in some ways worse than having a large team working on an oversized single code base, because communication barriers between teams can lead to an even worse sense of lack of ownership and control.
Communication requirements between decoupled teams building decoupled software are minimized. Again using the Internet as an example, if the process is simple and well documented, apis provided by other companies can often be used without any direct communication. Communication should not require any discussion of software processes or implementations within the team, but rather should be about delivering functionality, service levels, and resources.
A decoupled software team organization that builds decoupled software should be easier to manage than other approaches. Larger organizations should focus on providing clear goals and requirements for the team in terms of functionality and service levels. Resource requirements should come from the team, but can be used by the organization to measure return on investment.
Decoupling teams build decoupling software
Decoupling software and teams is key to building a high-performance software organization. My anecdotal experience supports this view. I have worked in organizations where teams are segregated by software functionality or software layers, or even by customers. I’ve also worked on large, chaotic teams with a single code base. All of these are affected by the scaling problem described above. The happiest experience was always that my team was a complete software unit, independently building, testing, and deploying decoupled services. But you don’t have to rely on my anecdotal evidence; Acceleration (as described above) has survey data to back it up.
Original source:
Mikehadlow.blogspot.com/2018/11/dec…