By understanding how software engineering works at Google, we can take a higher level look at whether our projects are robust enough.

The code base

Most of Google’s code exists in a single source code repository that is accessible to all engineers within Google. But Chrome and Android have separate codebase.

Google’s code base, as of January 2015, totaled 86 terabytes of data, 1 billion files, and 9 million source code files containing 2 billion lines of code. There have been 35 million submissions to date, with an average of 40,000 updates per working day.

Any Google employee can access all of the code, download it, compile it, and rewrite it in their own environment, but any changes must be submitted with the approval of the code owner.

All development takes place in the head of the library. After any changes are made to the code, the automated system tests them and notifies developers and code reviewers within minutes whether the tests for the changes have failed.

Each branch of the code base has a separate file labeled “code owner,” and only the code owner has the right to review submitted changes. Typically, all members of a project team are “code owners.”

Build system

Google uses a distributed build system called Blaze. Blaze provides standard commands for compiling and testing all code in the library. Blaze is a unified build tool that allows all Google engineers to build and test any software at any time and work across projects.

Programmers write “BUILD” files that guide Blaze on how to compile the software. In Go code, build files can be generated automatically.

Each compilation step must be “isolated”, relying only on declared inputs. For compilation to run distributed, it is mandatory that all dependencies be entered correctly: only declared inputs are sent to the compiled machine.

The result of each compilation step is deterministic. This ensures that the compilation system can cache the compilation results. The software engineer can go back to the old version number, recompile and get exactly the same binary results.

The compilation results are cached in the cloud. Intermediate results are included so that when other compilation requests come in, the system applies the cached results directly.

Incremental recompilation is very fast. The compilation system runs in memory, and when the compilation task is re-executed, it can analyze the incremental changes that have occurred since the last compilation of the file.

Check before submission. Google has specialized automation tools that perform a set of standard checks when initiating code reviews and preparing to commit changes to a code base.

Code review

Google has developed a Web-based code review management tool. Programmers can request a code review, and reviewers can compare differences on browsers and write comments. When the code writer initiates a review request, the system automatically sends an email to the reviewer with a link to the code review page.

Any changes to the source code must be reviewed at least once. If the change is not made by the “code owner,” one of the owners must also review it.

The system can automatically recommend suitable reviewers. Of course, the people who write the programs can choose the censors themselves.

Google encourages engineers to keep every code change to a small scale. Code changes of 30-99 lines are generally considered “medium”; Lines above 300 are marked as “large”; Lines 1000 to 1999 are “huge.”

test

Unit testing is a must and is widely used in Google development. Integration testing and regression testing are also popular. Google has an automated tool to measure the extent of test coverage, and the results can also be viewed in the code browser.

Always stress test before deployment. The project team should use charts or graphs to show key parameters, especially delays and error rates under pressure.

Bug tracking

The bug-tracking tool Google uses is called Buganizer. Some teams assign bugs to people, while others assign bugs in routine meetings.

Development of language

Google has four internal languages, and engineers are generally advised to choose between them. The four languages are C++, Java, Python and Go. Needless to say, reducing the number of languages increases code reuse and improves internal collaboration.

Each language has a code specification to ensure a uniform style. Company-wide, there are also “code readability” training, with experienced drivers training newcomers. Code reviews also require specific reviews for readability.

Interaction between different languages is handled by Protocol Buffers. Protocol Buffers is a data description language developed by Google. Similar to XML, it can serialize structured data and be used for data storage and communication protocols

Debugging tools and performance analysis tools

Google’s servers connect to a number of libraries and provide tools for debugging servers. In the event of a server crash, the stack trace can be automatically exported to a log file. There is also a Web interface for debugging, where you can view incoming and outgoing RPC calls, changed command line flag values, resource consumption, performance analysis, and more.

release

For most of Google’s project teams, there are regular software engineers responsible for the release.

Most software, issued more frequently. It’s usually weekly, or bi-weekly, or even daily for some project teams. Therefore, automatic version is a must. Frequent releases help engineers stay motivated, increase overall speed, iterate more, get more feedback, and make more useful corrections.

online

For any changes to go live and be available to users, many people outside the project team need to approve them. Approvals come from multiple sources, including legal compliance, privacy protection, security requirements, reliability, business requirements, and more.

Google has an internal live approval tool to perform reviews and live approval. Through customization, the tool has different review and approval processes for different products.

** Fault summary **

After a major service incident occurs, the relevant personnel should draft a fault summary report. Documentation describes incident details, including title, summary, impact, time period, cause, failure component, and action. The summary focuses on the problem and how to solve it in the future, not on the people or to punish those responsible.

Frequent code rewrites

Google encourages frequent code rewrites, any piece of software, every few years. One is to optimize the product, use the latest technology, remove useless features, and transfer knowledge to new employees to keep them motivated.