preface

New technologies are emerging all the time. Over the past decade, we’ve seen a lot of exciting new technologies, including new frameworks, languages, platforms, programming models, and more. These new technologies have dramatically improved the working environment for developers and shortened time to market for products and projects. However, as people who have worked on the front lines of the software industry for many years, we have to face the fact that the initial joy of adopting new technologies diminishes rapidly as the project lifecycle grows. No matter how glamorous the initial choice, six months or a year later, as long as the project is still active and the business is expanding — more and more features need to be added — some common issues will gradually become apparent. Builds are slow, new features kill you, team members don’t get involved quickly, documents don’t get updated, etc.

How does architectural corruption occur in long running projects? Why can’t common object-oriented technologies solve this problem? How do you slow architectural decay?

This article will try to explain all this and propose corresponding solutions. The reader should have considerable development experience — at least one year on the same project; The role of the company responsible for the evolution of architecture and products will find inspiration in this article.

architecture

Architecture is a word that keeps cropping up in various guises in various contexts. From wikipedia entries, we hear about Plugin architecture, Database Centric architecture, Model-View-Controller architecture (MVC), service-oriented Architecture (SOA), three-tier Model, Model-driven Architecture (MDA) and so on. Strangely enough, the bigger the words, the more painful it is for the actual developer. SOA is great — but in its day, it gave developers the illusion of “common data types” for vendors; MDA doesn’t even have a chance to become a new CASE tool that makes jokes.

Before moving on, ask yourself this question: Have these big words actually benefited you in the long run? The more utilitarian question is: Have you, as a front-line developer, had a good experience working on a long-term project?

The evolution of technology and lingering pain

The growth of enterprise applications seems to have taken off a decade ago. Since the Microsoft ASP/LAMP(Linux, Apache, MySQL, PHP) era, various enterprise applications have migrated to browsers. After ten years of development, the current camp has blossomed. Unlike the past, today’s technology is not just about programming languages. Common programming routines, best practices, methodologies, and communities are unique to each technology. The current mainstream camps are:

  • Rails

  • Java EE platform. It is worth mentioning that the Java VM has become a new hosting platform, and Scala and JRuby are more active and eye-catching

  • LAMP platform. Linux/MySQL/Apache hasn’t changed much, and the PHP community has taken a lot of cues from the Rails community, with many better development frameworks emerging

  • Microsoft.net platform

  • Django

There’s no reason not to be excited about these new technologies. They solve many of the problems that existed before they appeared. On their websites, they boast various productivity slogans, such as creating a blog app in 15 minutes; 2 minute quick tutorial and so on. They are much easier to learn than they were in the last 21 days.

Cold water should be poured on the question posed at the beginning of this article, which is haunted by any of these technologies. A high-performing Team with Ruby on Rails went from 2 minutes to 2 hours of build time after six months with a 10-person team; One of our previous projects with Microsoft.net 3.5 (C# 3.0) generated 20,000 lines of code and took more than half an hour to build; Some of our customers, working on a 10-year Java code base, struggle to keep their technology stack current: Spring, Hibernate, Struts, etc., face the dilemma of having 72 projects open at the same time to get compiled in Eclipse; Because compiling and packaging took too long, they removed most of the unit tests — a huge quality risk.

If you’ve ever worked on a long-term project, you should be well aware that this kind of pain can’t seem to be fundamentally resolved by any kind of framework. These new-age frameworks address most of the obvious problems, but they can’t do much about the problems you face in a long-term project.

Step by step: How does architecture decay

No matter how the architect describes architecture in a colorful way in any era, projects in development are never more than this:



Basic Architecture

Some basic guidelines:

  • To reduce coupling, the system should be layered in an appropriate way. By far the most tested layering is MVC+Service.

  • To provide basic access, some basic, platform-level apis should be introduced. Use a framework like Spring to do this.

  • Use AOP to horizontally slice common business level operations, such as logging, permissions, and so on.

  • You also need databases, continuous integration servers, and corresponding environment-independent build scripts and database migration scripts to keep your project building.

Phase 1

An architecture that satisfies this condition is initially very pleasant. The frameworks we described in the previous section all fit into this architecture. Development is very fast at this stage: the IDE opens quickly, features are developed quickly, teams tend to be small and communication is fine. Everyone was happy — because of the new technology, because the architecture was so simple, so clear, so effective.

The stage 2

The good times did not last long.

Soon your boss (or client, whatever) has a package of ideas to implement on the team. The work proceeded in an orderly fashion. More features were added and more team members joined in. New features are being developed in the same way as the previous architecture; The new team members were delighted with the clarity of the structure and followed it scrupulously. It won’t take long — maybe three months, or less — for your codebase to look like this:



After normal development

You may soon realize that there is something wrong with this. But it’s hard to see what that actually means. Common actions revolve around refactoring — extracting vertically related items into a new project; Horizontal correlation is extracted to form a project called Common or Base.

No matter what type of refactoring you do, some changes are creeping in (perhaps just at different speeds). The build process inevitably gets longer. What started out as a minute or two turned into a few minutes, a dozen minutes. By refactoring the build script and removing unnecessary parts, the build time drops to a few minutes, you’re satisfied, and you move on.

Phase 3

More features, more members added. Build times are getting longer again. The IDE slows down as more code is loaded; There’s a lot more communication — not everyone knows all the code anymore. At some point, an ethical programmer tries to refactor part of the repetitive logic, finds that there is too much code involved, much of it business he doesn’t understand, and gives up. As more people did so, the code base became bloated, and eventually no one could figure out how the system worked.

For architecture technology I created a Java architecture learning group 619881427, which will share recording microservices, distributed, source code analysis, JVM, Java engineering these topics of video, interested friends can add.

The system continues to slowly mess up in a state of chaos — a process that has taken much longer than this writing, with some back-and-forth, but from what I’ve observed, within a year or less, no matter what technical framework or architecture is applied, this process seems inexorably fateful.

Common solutions

We are not sitting on our hands. Great colleagues around me took various solutions to problems before they were discovered. Common solutions are as follows:

Upgrade your work environment

Nothing motivates developers like a computer that keeps up with The Times. At most every three years, upgrade your developer’s computer — upgrading to the best configuration at the time can dramatically increase productivity and motivate developers. On the other hand, using outdated computers to develop on slow machines is not only an objective loss of development efficiency, but also a psychological slackness of developers.

The upgraded work environment includes not only the computer, but also the work space. A good, communicative space (and working style) can help identify problems and reduce them. Partitions are not suitable for development.

Build in phases

In general, the build sequence is: build locally to make sure everything works, then commit and wait for continuous integration to work. Native builds become unbearable after 5 minutes; Most of the time you want this feedback to be as short as possible. The beginning of a project tends to run all the steps: compile all the code, run all the tests. As the project cycle gets longer and more code is added, the time gets longer and longer. After several attempts at refactoring the build script could no longer be optimized, “phased build” became the overwhelming choice. Run specific steps at a time, such as running only specific tests and building only necessary parts, through proper splitting and layering. Then commit and let the continuous integration server run all the steps. This allows the developer to move on.

Distributed build

Even with local speed, teams that adopt phased builds soon find CI server build times increasingly unsatisfactory. Getting a build half an hour after each commit is unacceptable. A variety of distributed technologies have been created. In addition to the capabilities provided by the common CI server itself, many teams have developed their own distributed techniques, often enabling them to distribute code across multiple machines for compilation and running tests. This solution works over long periods of time — when builds are slow, you can significantly reduce build time by simply adjusting the distribution strategy and running the build process on more clustered machines.

Use JRebel or Spork

Some new tools can significantly speed up developers’ work. JRebel takes the Java language that needs to be compiled and changes it and saves it instantly, reducing the amount of time needed to modify, save, recompile, and deploy. Spork can start a Server that caches code related to RSpec tests so that RSpec tests don’t have to be reloaded when they run, greatly increasing efficiency.

What exactly is the problem?

The above solution solves part of the problem well in a specific time domain. However, a year, two or more into the project, they still end up with longer build times, slower development, confusing code, opaque architecture, and difficulty for newcomers. What is the crux of the problem?

People like simplicity. But this seems to be more of a lie — not many teams can keep it simple all the time. People like simplicity simply because it’s hard to do. It’s not that people don’t want to. Most people know that software development is not as labor-intensive as other industries — the more people, the more production. As mentioned in the Myth of the Man-month, adding more people to a project creates chaos while increasing work output. In the short term, these messes can be absorbed by the team in various ways; But in the long run, as the team changes (new people come in, old people leave) and the natural forgetting curve of people, the code base spirals out of control, the chaos can’t be digested, and the project doesn’t stop, new features are added, and the architecture is corroded day by day.

There is always a boundary between human understanding, and requirements and features don’t — there are always more features today than there were yesterday; This version always has more features than the last one. In long periods of development, it is normal to forget the previous code; It is also normal to forget certain conventions. It’s normal to make small, unintentional errors, and it’s also normal for them to go unnoticed in a large code base. These small inconsistencies and errors accumulate over time and eventually become uncontrollable.

Few people have noticed that scale is the root cause of architectural decay — the discontinuity of causality in time and space makes it impossible to learn from it, but to repeat the tragic cycle over and over again.

The solution

The ultimate goal of the solution is to limit the size of the project before chaos occurs, before we become cognitively impaired. It’s not easy. Most teams are under considerable pressure to deliver. Most business users don’t realize that adding unrestrained demand to a project/product will only lead to the product crashing. Looking at Lotus Notes, you can see how convoluted and difficult the product can end up being. We are mainly talking about the technical solution here. In business, you also need to always be alert to increased demand.

Adopt new technology

This is probably the cheapest and easiest option to adopt. New technologies are often created to solve specific problems, and they are often a collection of experiences. Learning and understanding these new technologies can greatly reduce the amount of experience that was necessary to accomplish certain technical goals in the past. Just as the hero of a wuxia novel, often in a strange encounter, suddenly gains the inner power of some outlier for years, these new technologies can quickly relieve the team of certain pain points.

There are plenty of examples to prove the point. Before Spring, developers basically had to follow the practices in the J2EE pattern documentation to build their systems. There are some simple frameworks that can help with this process, but overall, there is a lot of manual work to be done in dealing with things that seem basic today like database connections, exception management, system layering, and so on. When Spring comes along, you don’t have to spend a lot of effort to have a well-layered system with most of the infrastructure in place. This helps reduce the size of the code base and resolve low-level bugs that may arise.

Rails is another extreme example. Rails brings not only ease of development, but also years of experience in deploying in the Linux world. Database Migration, Apache + FastCGI, or Nginx + Passenger, technologies that used to seem like complex anomalies become trivial in Rails — they can be deployed by someone with a little command line knowledge.

No one organization can own all of these new technologies. Therefore, as software practitioners, it is necessary to keep an eye on the technical community. Working behind closed doors only accelerates the rot of architecture — especially when there are already established solutions for their own inventions in the open source community. Behind those seemingly bright products, there are actually numerous cases of failure and successful experience in support.

For architecture technology I created a Java architecture learning group 251981998, which will share recording microservices, distributed, source code analysis, JVM, Java engineering these topics of video, interested friends can add, come in can get free learning materials below.

We had a program. Realizing that the need might shift to a document database like key-value, the team took a bold step to implement a no-SQL like database inside SQLServer using THE XML capabilities of SQLServer 2008. It was a new invention, and the creators were excited at first, to finally have a chance to do something different. However, as the project progressed, more and more requirements emerged: Migration support, monitoring, administrative tool support, documentation, performance, and so on. As the project progressed, it turned out that these capabilities were very similar to popular MongoDB — MongoDB already solved most of the problems. At this point, the code base is fairly large — and this part of the code is confusing to many team members; After a year, only about two people knew how it was implemented. If we had adopted MongoDB early on, the team would have had the opportunity to eliminate most of the work involved.

It’s worth noting that arrogant developers are often impatient with new technology; Or lack the patience to understand the capabilities or limitations of a new technology. Every product has a problem domain that it targets, and beyond that, new technologies are often not mature enough to deal with it. Developers need to constantly read, think, and engage to validate their problem domains. Dabbling is not a good attitude and prevents new technology from being promoted within the team.

The selection of new technologies often takes place at a specific time of the project/product, such as the beginning, a specific pain point. In the daily phase, developers still need to keep an eye on the codebase. Next, refactoring to physically isolated components is another solution to an ever-growing code base.

1. Reconstruct to physically isolated components

The obvious trend is that there is always more demand for the same product. Last year we had 100 features, this year we have 200. There were 100,000 lines of code last year, and maybe 200,000 this year. Last year’s two gigabytes of memory worked just fine, but this year it looks like it will have to be doubled. Last year there were 15 developers, this year there are 30. Last year it was 15-20 minutes at most, this year it’s an hour, and it’s distributed.

Someone will notice the design problems of the code and refactor it diligently. Someone will notice the slow build and constantly improve the build time. However, few people have noticed that the code base is getting bigger, which is the root of the problem. Many common strategies are organizational: for example, dividing code bases into functional modules (such as ABC functionality) or hierarchies (such as persistence layer, presentation layer), but these splintered projects still exist in the developer’s workspace. Regardless of how the project is organized, developers need to open all projects to complete the compilation and run process. I once saw a team that needed to open 120 projects in Visual Studio; I myself have experienced the need to open 72 projects in Eclipse to complete the compilation.

The solution is to physically isolate these components. Just as teams use libraries such as Spring/Hibernate/Asp.NET MVC/ActiveRecord without having to put their corresponding source code into the workspace for compilation, teams can also organize stable working code units into corresponding libraries. Mark the version and reference the binary directly.

There are different solutions on different technology platforms. The Java world has a long history of Maven libraries that manage different versions of JARS and their history well; .net is a shame, there is nothing really mature in this area – but with Maven’s implementation, it would not be too difficult for the team to build one themselves (integration with MSBuild might be difficult); The Ruby/Rails world has the famous gem/ Bundler system. Instead of putting your own independent modules into rails/lib /, organize them into a new gem and reference them with dependencies (you need to build your own gems library within the team).

At the same time, the code base also needs a major overhaul. The previous code structure might have been as follows (here we use SVN as an example, because SVN has explicit trunk/branches/tags directory structure). Git/hg)



The original library structure

After improvement, it will look like the following figure:



Improved library structure

Each module has its own code base, its own separate upgrade and release cycle, and even its own documentation.

The scheme looks easy to understand, but in practice is fraught with difficulties. Long after the team has been running, few people care about dependencies between modules. Once broken down, analyzing the dependencies between the dozens or hundreds of existing projects can be tricky. The simplest way to do this is to check the commit record of the code base. For example, if a module has not been committed in the last 3 months, the module is basically ready to be pulled out as a binary dependency.

Many open source products have been developed through this process, such as Spring (see the J2EE Design and Development Programming Guide, where Rod Johnson basically explains the whole idea behind Spring). Once the team starts to think this way, revisit the codebase every once in a while, and you’ll find that the core codebase can’t get out of control, and you’ll get a well-designed, stable set of components.

2. Separate modules into separate processes

The solution above has only one core principle: always keep the core code base within the scope of what the team can understand. If it works well, it can go a long way toward solving the problem of architecture corruption due to code size. However, this solution only addresses isolation at the static level of the system. As more and more modules are isolated, the system therefore needs more and more dependencies to run. This part of the dependency is divided into two types at runtime: one is similar to Spring/Hibernate/Apache Commons, the foundation of the system to run, run time these must exist; The other is relatively independent business functions, such as cache reading, electronic mall payment module and so on.

The second type of dependency goes a step further: put it in a separate process. Now a little larger system, login, logout functions have been removed from the application, either SSO scheme to login, or simply proxy to another login system. The LiveJournal team came up with memcached when they realized that reading and writing from the cache could actually be done in a separate process (rather than running in the same environment as EhCache). In one of our previous projects, we found that the payment module was completely self-contained, so we isolated it to create a new, never-ending system without an interface that processed payment requests through REST. In another publishing project, we found that the process of editing and writing a report was actually separate from the process of publishing the report at the business level, although there was class-level reuse. We ended up making the report publishing process a resident service with which other modules of the system interact via MQ messages.

This solution should be easy to understand. Unlike Solution 1, this one is more about thinking business-oriented about the system. Since the system will run this module in a separate process, there may be some code duplication in different processes. For example, Spring exists in two unrelated projects at the same time. But if one of their business components repeats in both processes of the same project, many people become squeamish. (As an aside: This cleanliness obsession exists in OSGi environments as well.) A word of caution here: when running in different processes, they are completely separated physically and run-time. The architecture must be thought of in terms of processes, not simply physical structures.

The shift in architectural thinking from a single-process model to a multi-process model is also not easy — it requires the architect to consciously reinforce this practice. The popular.NET and Java worlds tend to lump everything together. In the Linux world, Rails/Django provides a better balance of process coordination between the best products. For example, use of memcached. In addition, with more and more multicore environments, rather than struggling to take advantage of multicore capabilities at the programming language level, multicore processes can easily and significantly take advantage of multicore capabilities.

3. Form a highly loosely coupled platform + application

Now take a longer view. Imagine we’re building a system like Kaixin001, Facebook, renren. What they all have in common is access to an almost unlimited number of third-party apps, from simple ones like buying and selling friends to a variety of flashy social games. Miraculously, this doesn’t require the developers of third-party apps to use the same technology platforms as they do, nor does it require the server to provide unlimited computing power — most of the architecture is controlled by the developers.

This is not difficult to implement in enterprise applications. Here’s the secret: When a user accesses a third-party application through Facebook, Facebook actually accesses the third-party application through the background, sends the current user’s information (and friends’ information) to the service url specified by the third-party application via HTTP POST, and then renders the APPLICATION’s HTML results to the current page. In a sense, this technology is essentially a server-side mashup (see InfoQ article for details).



Facebook App architecture

The advantage of this architecture is that it is extremely distributed. What appears to be a uniform system is actually composed of several small applications with very low coupling and completely different technical architectures. They do not need to be deployed on the same machine and can be developed, upgraded, and optimized separately. The breakdown of one application does not affect the operation of the entire system; The self-updating of each application has no impact on the overall system.

It’s not the ultimate solution, it only works under certain conditions. When the system is very large in scale, such as consisting of several subsystems; The interface is basically the same; There is less correlation between subsystems. For this premise, this architecture can be considered. Abstract out very little, really valid, public information between systems via HTTP POST. Other systems can be independently developed, deployed, and even tailored for application access. If you don’t, millions and millions of lines of code will pile up on a system, and over time, developers will lose control of the code, and architectural corruption is a matter of time.

For example, the financial system of a bank consists of more than ten subsystems, including modules of salary, assets, statements and so on. Each part has relatively independent and complex functions. If the entire system is split in this way, a single point of optimization can be achieved without restarting the entire application. For each application, developers are able to adapt their familiar technical solutions to smaller pieces of code, reducing the potential for architectural corruption.

conclusion

No bad architecture, change makes it

I visit a lot of teams. At the beginning of many projects, they spend a lot of time choosing which technology system, which architecture, and even which IDE to use. Just like a child choosing his favorite toy, I believe that no matter what the process is, the team will happily choose what they choose and believe that their choice is not wrong. And so it is. It’s hard to have real architectural challenges at the beginning of a project. The hard part is that over time, people forget; There are a lot of people coming in who need to understand old code while doing new features; Every breakthrough in code volume will cause architectural maladaptation; These include the difficulty of introducing new features and making it difficult for newcomers to learn quickly; Longer build times and so on. Can this alert the team to structural solutions rather than AD hoc ones?

About the document

Many people say that Agile doesn’t promote documentation. They say documentation is hard to write. They say developers can’t document. So there’s no documentation.

Strangely, that’s not what I see. People who write good programs can write good words. ThoughtBlogs are overwhelmingly populated by programmers, and many of them write great.

Documentation in projects is often pitifully sparse. New people come in confused. Strangely enough, newcomers can quickly learn how to use these tools in a day or two by reading RSpec or JBehave, but there’s no documentation on the team.

Aside from the fact that the project continues to run and deliver features, I believe that a large, unstable code base is the source of documentation’s rapid failure. If we can narrow down the code base by following the solution described above, then standalone modules or applications have the opportunity to have more unique value on a smaller scale. Think of Rails3/Spring frameworks today. They often have over 20 third-party dependencies that we don’t find difficult to understand. The most important reason is that with dependencies isolated, these modules have their own documentation to learn from.

The same can be said for enterprise projects.

Create an ecosystem of applications, not a single project

Features are constantly being added to the same product. No surprise there. In the light of our previous analysis, however, we should rethink this common sense. Is it to create an increasingly large, slow, lifeless product, or to decompose it organically into a vibrant ecosystem with different dependencies? People on all sides of the project (business users, architects, developers) need to step away from short-term thinking and focus on creating a sustainable application ecosystem.