Original: Yunpeng- Basic technology Baidu APP Technology team
preface
Baidu App has developed from a single search tool to today’s comprehensive content consumption service platform with search and Feed streaming as two engines, and its complexity is no longer comparable. As a super App with a daily life of more than 100 million, it has a huge business scale and more than 1,000 relevant technical personnel. The client supports mainstream mobile technology, involving nearly 100 business parties, with complex technical forms, nearly 300 components and millions of codes. The engineering problems brought about by this are a great challenge for the technical team.
The expansion of the project resulted in a lot of small problems, such as non-standard component management, long compilation times, project file merge conflicts, Xcode default incomplete compilation isolation, and so on, resulting in a lot of time spent by developers in the development environment. Currently the industry’s more popular tools for large-scale engineering support is relatively weak, in practice there are always some constraints, difficult to achieve the ideal state.
The birth of EasyBox is dedicated to creating a modern, efficient and elegant R & D tool chain for super App.
The main purpose of this article is to share some of our experiences in practical engineering from the perspective of toolchain.
An overview of the
The main body of EasyBox is composed of engineering assembler (Installer), multi-warehouse management tool (MGit) and binary management tool (LFS), which are respectively responsible for the construction of work area (component dependency analysis, project generation and combination), source repository management and binary management.
The source code of the required warehouse was cloned by the multi-warehouse management tool, and the binary package was downloaded by the binary management tool. Then the assembler generated corresponding projects according to the description table, combined the hierarchy and finally generated Workspace.
practice
The birth process of EasyBox is essentially a process of gradual deepening of engineering. We all understand the truth of divide and rule, but this does not mean that engineering is a simple demolition and reorganization. Its purpose is to rationalize the project engineering, so that developers can quickly understand the engineering architecture and enter the development state, avoid developers to spend too much time in the development environment, so as to improve the efficiency of the development of coding, testing and other stages. In this process, we continuously optimized the three aspects of standardizing component management and use, strengthening engineering ability and improving compilation speed, and finally formed EasyBox’s unique advantages.
1. Standardize component management and use
Components (source/binary) are described by boxSpec, and dependencies and API management are uniquely determined by boxSpec. In combination with compile isolation, component boundaries are clearly delineated. The version number of the component is strictly in accordance with the Semantic Versioning specification. When a new version of the component is released, it is subject to a series of checks such as continuous integration analysis API changes before it is released.
The undesirable phenomena in the environment of broken window theory, if left unchecked, will induce people to imitate and even intensify.
Component binary distribution and interface distribution are performed based on source code distribution to ensure security, facilitate source code regression and release of different types of binary on the same node if necessary. Interface distribution is used to complete component replacement of matrix products. The release descriptions and binaries are managed by a dedicated file server, and binaries are banned from the source repository to avoid warehouse bloat. For more details on source code and binary management, see below.
2. Strengthen engineering capability
2.1 Component independence and isolation
Unlike most other tools in the industry, EasyBox takes the approach of splitting each component into a separate project, while also placing the build products of different components in different directories to ensure complete build isolation.
There are actually two remaining pits from Xcode:
-
OC/C/C++ files of the same project can be accessed by each other even if they are not under the same Target and have no dependencies (Swift does not allow this).
-
Xcode automatically adds the folder (BUILD_PRODUCTS_DIR) where the compiled products are located to FRAMEWORK_SEARCH_PATHS. By default, the compiled products under the same Workspace are in the same folder.
Both of these pits break compilation isolation, which can lead to different compilations on different developers’ computers, often resulting in compilations that you did and others didn’t, due to the fact that components are compiled in different order.
However, because Xcode project files have been widely maligned, and project file merge collisions are a nightmare when multiple people collaborate, we have implemented a component configuration table to de-file the project. Each component is configured with a BoxSpec file (similar to CocoaPods’ PodSpec). The component source code corresponds to the project structure based on the actual directory mapping, and generates an XCFilelist to maintain the list of resources configured for the current component, and ultimately generates the project file. This project file is not tracked by Git to avoid merge conflicts.
Compilation isolation provides many benefits, such as component boundaries must be clear, and it is no longer a matter of “luck” whether hundreds of components combined can compile. It limits the scope of component modification, which helps components to be compiled and built in different environments (apps), reused in multiple applications, and to be dependent on components when output is expected.
In addition, with the decoupling of protocol in interface publishing and componentization, component output can selectively carry only the interfaces of other components (instead of implementation), so that it is easy to complete the stripping and replacement of dependent components.
2.2 Separation and management of multiple warehouses
Centralized code management leads to an increasingly bloated main repository, and code permissions are difficult to manage, which is a very painful thing for large team development. It is easy to have components added into the code by others without the knowledge of the owner, coupled with a certain rate of component change, the situation will become worse.
The principle of increasing entropy In nature, all matter will tend to disorder.
In order to solve this problem, we split each component into an independent warehouse to complete physical isolation, so as to tighten the warehousing permission and ensure that each item has its own owner. In addition, it plays a vital role in multi-product reuse and is also a necessary condition for the construction of Zhongtai. In practical practice, in order to avoid some warehouses too trivial, there will be a warehouse with multiple components, but it does not affect the overall situation.
To some extent, the splitting of multiple repositories intensifies the complexity of the project, and developers inevitably need to operate multiple repositories. At this time, the direct use of Git is costly and error-prone, so we specially designed a multi-repository management tool (MGit, shared by Android/iOS dual-end). Different from Repo, the source repository management tool of Android system, MGit keeps most of Git’s instructions and usages, meanwhile ensures the security of multi-repository operations during internal execution, and makes necessary reminders when performing risky operations. Developers only need to use mgit instead of Git command to complete the multi-warehouse operation, which not only maintains the use habit of most developers, but also can operate multiple warehouses safely and conveniently at the same time. At the same time, we make use of Gerrit’s topic mechanism and modify it to achieve group submission of multiple warehouses to ensure atomic entry, thus ensuring the normal operation of automatic packaging mechanism.
One of the core problems caused by multiple repositories is the synchronization of component nodes. When the source code of different components is in different nodes, it is easy to compile incorrectly or function improperly. This problem can be circumvented by adopting the “branch with the same name principle” **, which means that all repositories to be developed remain in the same branch, together with the version dependency management of other components, to ensure that each component node matches. This rule is easy to remember and inexpensive to implement in large teams.
- Warehouse nesting problems Due to historical reasons, we chose at the beginning of open warehouse create warehouse on the components of the original position, the initial design thoughtless brought us a lot of trouble, different branches of gitignore will cause file tracking state of confusion, and then USES the source warehouse to tile, in order to avoid this problem.
2.3 Dynamic division and construction of hierarchy
Large projects tend to follow the path of hierarchical architecture, and the benefits of hierarchical architecture are obvious. Software architecture is clearer, specifications are easier to establish, and one-way access to hierarchical architecture helps reduce engineering complexity. Layered design is essentially the practice of the open close principle, which is often one of our guiding principles when designing architecture, to make software easier to scale and limit the impact of each change. The idea is to divide the software into a series of components and organize them hierarchically so that the lower-level components are not affected by the changes of the upper-level components.
So unlike other tools, EasyBox has chosen to support hierarchy in design, and we want EasyBox not only to act as a package manager, but also to help architects standardize the entire project. As the team grows, the problem of unreasonable dependency becomes more obvious, and many things are no longer simply a matter of giving a rule and Shouting, we tend to have strong rules to make sure there are no obvious problems. Layered design also allows new team members to quickly understand the design of the engineering architecture, constantly reminding the importance of system boundaries, and establishing constraints on dependencies.
“Legislation” is far more straightforward than “ethics”.
The establishment of constraints can avoid many problems. For example, PM requires the display event of a view to be queried. If you do not have a deep understanding of componentization or are lazy, it is easy to directly queried in the UI library, which will bring great problems for the reuse of subsequent components.
(Upper-layer components can access lower-layer components, not reverse access)
In composite builds, both components and subcomponents are linked directly to the end product (App or Dynamic Framework) because the risk of merging between static libraries is high, such as symbol duplication only giving a warning and triggering automatic clipping.
- If the App is compatible with iOS8, apple has a limit on the size of the main package binary, so you can change the bottom Layer to a dynamic library to reduce the size of the main package binary. At this time, some C/C++ components require force_load when they are called across layers. In practical application, OC should be used as much as possible to encapsulate these libraries to avoid upper-layer business directly using these components, and force_load should be used as little as possible to avoid package size increase.
3. Improve compilation speed
3.1 Component binary
The business expansion led to a surge in Baidu App code, which greatly slowed down the compilation speed. The compilation time of only the main business (excluding dozens of business parties and heavyweight triparty libraries such as FFMPEG and OpencV) was nearly 20 minutes (13′ RMBP), so we decided to adopt binary solution to solve this problem. That is, the cluster packages the components into binary and uplots them to the file server. During development, only the source code of the components to be developed is reserved. Other components are stored in binary. By binarization, the normal (1 to 3 components in development mode) full compilation time is compressed to less than 2 minutes (13′ RMBP), incremental compilation is significantly faster, and caching of project files can significantly reduce the number of full compilations.
Boxfile, boxfile. overlay and boxfile. local are used to configure the host project. Boxfile.local > boxfile. overlay > Boxfile. Overlay is the same format as local. Both are temporary configuration files used in the development coordination phase and used for binary source switching. The difference is that overlays are tracked by Git and used for collaborative development and continuous integration packaging, while local is not tracked by Git and only used for local debugging. When binary is cut back to development mode, if the repository branch does not exist, the corresponding branch is created based on the current component version node (not the master) to ensure that the start node of the branch is synchronized.
- Binarization has strong requirements on the stability of the component interface layer. The component interface layer should be as stable as possible, especially paying attention to the binary failure of other components caused by declaration changes such as macros/enumerations. The interface layer should adopt the form of incremental expansion as far as possible (old interface tags are abandoned). At this point, it is important to build a monitoring mechanism to ensure that the version number is correct. We use the Clang plug-in to monitor API changes and verify components before they are released.
Another big problem with binarization is that it makes debugging difficult for developers. Java, JS, and other languages have a well-developed Source Map mechanism to compensate for debugging problems after packaging, which is very rare for OC/Swift. Those of you who have used Carthage know that Carthage can be stepped into the source code from the project, but this is only limited to locally compiled binaries, and can only be accessed externally through a single step. And what we want to achieve is:
Binary package by cluster compilation package, local development using binary files, according to the configuration of the project automatically import source code to complete the source code mapping, source code does not participate in compilation, but breakpoint debugging is still effective.
A breakpoint is a coordinate containing the trigger condition. The breakpoint = the location of the source file + lines of code + the trigger condition.
In LLDB, run the Breakpoint list command to view breakpoint information
You can use the dwarfdump command to view the Debug information in the binary
- Consult the LLVM documentation for additional information on source mapping: 1. At run time, LLDB’s source-map can be used to change the location of source files, which is obviously inconvenient to use. 2. Dynamic link library can be configured by plIST file to complete the source mapping, but in practical application, dynamic library will seriously affect the startup speed and package volume, usually each component is in static inventory, so this scheme is not adopted.
After the binary, the compilation speed has a qualitative leap, in the practice process is also optimized for some details.
3.2 Clang module Cache (header file retrieval cache)
Baidu App currently has nearly 300 components, among which the dependency relationship is very complex and the size of components varies. The header file retrieval process is time-consuming due to repeated import of different files in the pre-processing stage when components are called. As recommended by Apple, we compile most components into framework. In non-framework scenarios, static Library Clang Module Cache can be achieved by generating a ModuleMap, as shown in WWDC 18: Behind the Scenes of the Xcode Build Process, I won’t go into it here.
Note that the Clang Module Cache is not cleaned with Xcode Clean, and there have occasionally been some strange bugs caused by this Cache. In this case, the ModuleCache. Noindex folder under DerivedData needs to be cleared, but with a later Xcode update, this issue has been alleviated.
3.3 Resource compilation cache
In order to optimize the package volume, Most picture resources in Baidu App project use Xcassets. In the process of packaging App, all Xcassets need to be compiled and merged into a CAR file through Actool. Actool does not cache during processing, because there are many picture resources. Each xcassets compilation took nearly a minute (13′ RMBP). Since incremental compilation of source code is inherently fast, this minute can actually have a significant impact on compilation speed. The solution is to check if xcassets have changed by backing up a resource with Rsync and retrigger compilation only when the resource or condition changes.
concept
In the process of practicing EasyBox, we gradually established some abstract concepts, which played an important role in our engineering process.
The rule of law is preferred
After long-term precipitation, projects often form a variety of norms. With a large team size and many norms, it is no longer a matter that can be solved by Shouting in the group. We should try our best to implement mandatory constraints through tools to ensure that the norms are implemented. The specification should also be as simple and easy to remember as possible, with tool assistance to minimize compliance costs.
Restriction Management (compile isolation)
Component boundaries should be clear, and component dependencies and API interfaces should be controllable, which will help components to be compiled and constructed in different environments (App) and reused at multiple ends, so that the output developers have expected dependencies on components. On the contrary, when the component boundary is fuzzy and the component is really output, many problems will be exposed, and it is easy to pull out the radish and bring out the mud.
Problem of pre –
Problems that can be exposed by tools should be exposed as early as possible, such as those that can be exposed before the code is introduced, and should not be delayed until continuous integration is exposed. The earlier problems are discovered, the lower the repair cost, which helps to avoid rework during the actual development process.
Compared to the industry
The most widely used tool chain in the industry is CocoaPods, which is a classic practice for small projects. However, support for large projects is relatively weak, compilation isolation and deengineering are contradictory (the recently released 1.7 beta has started to support complete isolation of independent projects, but it is still in the very early stage), there is no support for hierarchy, and it is also helpful for multi-warehouse and binary management. And these are engineering abilities that are almost necessary for large engineering practice.
In terms of technical scheme, the concept of EasyBox is very different from CocoaPods at the beginning of design, so it does not adopt the common practice of the industry based on CocoaPods transformation, but directly based on XCODEProJ to re-implement a set of new tool chain.
In the future
EasyBox itself has been continuously evolving since its launch, and there are still some imperfections at present. In the future, it will seamlessly integrate the tripartite open source libraries that have been adapted to other package managers. At present, it has been applied in baidu App, Xifan, Kanduo and other teams, and will be promoted to more teams in the future. At the same time, we will promote open source work in the future, and package the whole solution to open source.
conclusion
In recent years, the pursuit of super App in China has led to the rapid expansion of client projects, and the engineering problems caused by this have become particularly serious. To do a good job, he must sharpen his tools. Large teams invest in development toolchains that are often invisible to users, but the benefits to developers are very clear. Unified component specifications, relatively low maintenance and turnover costs; Complete physical and compile isolation makes component boundaries clear and each object has its own owner. The introduction of hierarchy makes the architecture hierarchy clearer and the cost of getting started lower; Without affecting the debugging cost of developers, the compilation speed was reduced to less than 2 minutes (90% increase), which also greatly improved the App package output speed.
We built this modern, efficient and elegant R & D tool chain from scratch by demolding the original scattered R & D tools of Baidu App(iOS terminal). Tool chain also plays an important supporting role in the multi-product reuse of components and the construction of Taiwan. However, tools cannot solve the code problem. Thanks to the promotion of componentization work (a special article will be introduced later), the elephant of Baidu App can put on the armor very smoothly in a short period of time, allowing the stumbling elephant to have vigorous and flexible dance steps.
EasyBox more bear is the configuration, development environment with strong behind the process of the torsion center, and will develop the subsequent measurement, release, and access to form standardized research closed-loop workflow, set management, iteration, output, integration, and other functions in one of the middle one-stop r&d, anyone interested in stay tuned for subsequent articles.
The resources
- semver.org/
- Llvm.org/docs/Source…
- Lldb.llvm.org/use/symbols…
- Github.com/apple/swift…
- Developer.apple.com/videos/play…
- cocoapods.org/
- gerrit.googlesource.com/git-repo/
- Gerrit-review.googlesource.com/Documentati…