Students of client development know that “installation package size” is one of the important basic experience indicators of App. Today, I will introduce some explorations and attempts made by Douyin to optimize the size of the installation package.

This article will take about 8 minutes to complete and you will have an overview of package optimization. It includes:

  • The evolution of the AppStore’s restrictions on installed packages and what the App will benefit from spending time optimizing iOS packages;

  • How to analyze an installation package;

  • How to accurately control the impact of installation package size on AppStore offline;

  • Some common package size optimization methods;

  • Some coding habits that affect package size.

Part 1. What are the effects of package size degradation

To talk about the impact of the size of the installation package on the iOS platform, the first thing we need to know is Apple’s limit on the size of the installation package. Generally, apps are released through AppStore, and a small number of apps, such as internal tools, are issued through enterprise certificates. Apps released through the AppStore can enjoy the support provided by the AppStore to optimize the installation package.

Optimized support for package sizes from the AppStore

For packages released in the AppStore, Apple also offers many ways to optimize apps that are not available for packages issued with corporate certificates.

After the installation package of an App is built and uploaded to AppStore Connect, AppStore Connect will variant it for different devices according to the device and system. When users download the installation package from AppStore, Only download variations for your own devices.

The variations vary depending on the device’s processor architecture (ARM64, ARMV7), screen resolution (2X, 3X), and iOS version.

Of course, this also makes it difficult to quantify the ultimate impact on download size with installers built offline.

AppStore & AppStore Connect limits on package size

App Store OTA download size limit

In order to prevent users from exceeding their carrier plan data, Apple limited the maximum size of apps that users can download from the AppStore through data. This is called OTA download size limit. Its history:

  • In September 2017, the limit was raised from 100MB to 150MB;
  • In late May 2019, Apple relaxed the OTA download limit to 200MB;
  • After the release of iOS13, users with iOS13 or later can download apps with traffic exceeding 200MB. However, users need to set a policy. The default policy is “Request permission over 200MB”, but users with iOS13 or later cannot download apps.

Apple __TEXT segment size limit

Apple’s __TEXT section of the executable file is even more restrictive, exceeding this limit will not be approved by the APP Store. To simplify this limitation, if you want to support iOS8 devices, the App single-architecture main binary __TEXT segment is limited to 60MB (1M at 1000KB instead of 1024). The maximum binary size limit for all architectures in the installation package will be less than 500MB when the supported binary size limit for iOS8 is abandoned.

Impact of installation package size growth

If the AppStore download size increases within OTA download limits, it has little impact on indicators such as user acquisition and retention. Once OTA download limits are exceeded, there is a significant impact on the overall metrics. Bad data metrics from previous statistics: 10% impact on user additions when limited to 150MB and not available for download. After iOS13, the number will be less than 10% due to the relaxation of the restrictions. This data is only for reference and cannot be generalized. There will be differences in the scene of the first installation of different types of apps. For example, there will be more cellular download scenes for life service and travel apps than for av and game apps.

Secondly, for apps that still need to support iOS8 below, exceeding the limit of __TEXT segment size will greatly affect the review and release schedule. Of course, it can be saved by some means, such as splitting dynamic libraries around. However, these measures can lead to an overall larger installation package.

In addition to Apple’s limitations, the degradation of package sizes means somewhat slower startup speeds; More code logic; Lower r&d efficiency; Overly complex code also creates the risk of code changes that negatively affect stability; Make basic experiences like performance worse, so package size is not an isolated indicator, but a side indicator of App health.

Part 2. Composition of the installation package and how to analyze the installation package

Composition of the installation package

When the installation package is packaged using Archieve and unzipped, the following installation package structure is usually displayed:

  • Payload
    • TheApp.app
    • OnDemandResources
  • Symbols

The main content that affects download and install sizes is concentrated in.app. After decompression, the files occupying most of the size of.app are as follows:

  • Main binary (MachO file with the same name as.app);
  • Frameworks (a dynamic library introduced by the App itself);
  • Plugins (App Extensions are still dynamic executables at heart);
  • Xxx. lproj (original translation resources);
  • Various resource bundles;

Installation Package Analysis

By analyzing installation packages, you can learn about the usage of executable files and resources, and the current status of installation packages. Figure out where to start to get the highest ROI for optimization. The difficult part of package size analysis is how to measure the impact of offline packages on download size.

However, since Apple made some adjustments to the installation package after uploading to AppStore Connect, the changes in the offline installation package cannot correspond to the changes in the real download size. These adjustments include:

  • App Slicing is the icing of different architectures.
  • Asset-car images leave only the specific size and compression algorithm variations required by the device;
  • Encryption of the binary __TEXT segment by FirePlay results in a compression ratio of 1 (Apple removed this encryption in the iOS 13+ + device download variant);

So while removing the image from asset.car reduces the package size by 10MB during offline evaluation, the impact on the final download size may be much less than 10MB. By adding 2MB of code, the final download size actually increased by 2MB.

Executable file analysis

The executable files in the installation package take up a large part of the installation package, and this part is related not only to the code but also to compilation, linking parameters added during the process, the compiled machine environment, Xcode version, and so on. Executable files are often analyzed by LinkMap.

LinkMap contains executable file architecture information, segment table, all linked files, and the size of each symbol in the file. LinkMap basically tells you what’s in the executable file. This part of the data is also helpful for targeted optimization.

Part 3. Common package size optimization methods

Optimization of the executable part

Rename partial segment to bypass __TEXT segment FirePlay encryption:

Although Apple removed encryption for the __TEXT segment of executable files in iOS13 +, exceeding OTA download limits is more of an issue on devices with earlier versions of iOS than with older versions. So you can move portions of the executable from the __TEXT segment to another segment to bypass encryption and improve compression ratios. In addition, decrypting the __TEXT section of the Page Load during startup is also a relatively large performance loss, which can optimize the startup time. Currently stable segments that can be moved include:

__TEXT,__cstring
__TEXT,__const
__TEXT,__gcc_except_tab
__TEXT,__objc_methname
__TEXT,__objc_classname
__TEXT,__objc_methtype
Copy the code

You can add the following parameters to OTHER_LDFLAGS to move:

$(inherited) -Wl,-
rename_section,__TEXT,__cstring,__RODATA,__cstring -Wl,-
rename_section,__TEXT,__const,__RODATA,__const -Wl,-
rename_section,__TEXT,__gcc_except_tab,__RODATA,__gcc_except_tab -Wl,-
rename_section,__TEXT,__objc_methname,__RODATA,__objc_methname -Wl,-
rename_section,__TEXT,__objc_classname,__RODATA,__objc_classname -Wl,-
rename_section,__TEXT,__objc_methtype,__RODATA,__objc_methtype -w
Copy the code

Symbol table clipping: for the main binary of App, there is no need to expose symbol information externally. The symbol name exposed to the outside also has some risks to the overall security of the App. You typically crop all symbols by setting STRIP_STYLE= all. But by:

objdump -exports-trie /path/to/MyApp.app/MyApp
Copy the code

EXPORTED_SYMBOLS_FILE=/path/to/emptyfile.txt/EXPORTED_SYMBOLS_FILE=/path/to/emptyfile.txt

For the dynamic library that you build. You just keep the undefined symbols and the global symbols and you can get rid of everything else. STRIP_STYLE=non-global.

Multiple copies of the same code in multiple executables: sometimes a large library is introduced to implement a simple function when writing code. The library is statically linked to the executable. If different executables use the library then there will be multiple copies of the library.

For example, in an iOS scenario where Extension is a standalone executable, Extension traditionally relies directly on the business layer’s network framework when it wants to send a network request. This dependency causes the business layer network framework to exist in multiple binaries.

Dead code clipping: After the construction is completed, if the code of C, C++ and other static languages and some constant definitions are not used, they will be marked as Dead code. DEAD_CODE_STRIP = YES These Dead codes will not be packaged into the installation package. On LinkMap these symbols are also marked as <

>.

Compile-time optimization parameter: GCC_OPTIMIZATION_LEVEL defines what optimization level clang uses to compile optimizations. Xcode default uses -o0 for Debug and -os for Release, depending on the Configuration.

After Xcode 11, Xcode offered a more aggressive compiler optimization parameter -Oz, which reduces code size by recognizing identical code sequences across functions in a single compilation unit. These sequences are encapsulated in functions generated by a single compiler (” Outlined “). Each original code sequence is replaced with a call to the Outlined function. It will reduce the problem of multiple copies of the same code, but it will also make the function calls have a deeper call stack, which will affect performance. Slight impact on performance For details, see the following figure. There is about 6M optimization for binary after Tik Tok enabled Oz.

It’s important to note that after the ARC scene objc_retainAutoreleaseReturnValue is outreach will lead to a didn’t need to be put into the objects are placed in the autoreleasepool autoreleasepool. This can lead to worse results for problematic scripts such as BAD_ACCESS due to delayed release or memory inflation due to delayed release of objects wrapped by @Autoreleasepool, so you need to test this when enabled.

For specific information, Oz is enabled on the left and Os is on the right:

As you can see, the following code moves to an external function:

mov        x29, x29
bl         imp___stubs__objc_retainAutoreleasedReturnValue
Copy the code

This becomes a call to an external function:

bl         _OUTLINED_FUNCTION_0
Copy the code

Mov X29, x29 essentially has no practical significance and serves as a flag in objC’s optimization of autoRelease objects. Found in the compiler to return to objc_retainAutoreleaseReturnValue autorelease objects will be inserted into the markup.

The usual code for creating an object and returning it is:

return objc_autoreleaseReturnValue(ret);
Copy the code

When objc_autoreleaseReturnValue checks for this flag at the return address, its presence means that the returned object will be retained immediately, so there is no need to put it into autoReleasepool. Simply set a flag in Thread Local Storage and reference the returned object with a count of +1. Objc_retainAutoreleasedReturnValue see this logo has no additional retain, together to optimize away the autorelease and retain operations at a time, but this is to optimize the detail, It should still be treated semantically as an AutoRelease object. When Oz is turned on, the tag is moved inside another outgoing function, causing the optimized invalid object to be placed in autoReleasepool and delayed release, causing some memory problems.

Link-time Optimization parameters: LLVM provides link-time compilation Optimization, which is controlled by setting the link-time Optimization in the project

The following options are available:

No does not enable link period optimization;

Monolithic generates a single LTO file, which is regenerated per link. There is no cache and high memory consumption. The parameter LLVM_LTO=YES;

Incremental generates multiple LTO files, Incremental generation, low memory consumption, parameter LLVM_LTO=YES_THIN;

LTO is not recommended for local debugging and time-sensitive build processes. This adds a lot of build time.

It should be noted here that although LTO is link-time optimization, it still needs compilation time. The compiled.a added to LTO is essentially the LLVM BitCode. If the.A built without LTO enabled is used, it is directly machine code. LTO optimization cannot be accomplished by direct linking.

After LTO is enabled, repeated code across compilation units will be linked by the linker separately generating object files with the suffix.lto.o. In particular, there are major optimizations for Objc Runtime constructs such as literal strings for method signatures and Protocol constructs. Enabling Oz and LTO at the same time allows only one copy of each of the external functions to maximize the size of the installation package.

It is recommended to enable this option if you use Protocol heavily in your project. The benefit of Oz & LTO for Tiktok is to reduce the binary size by 18MB.

Resource file section optimization:

Removal of useless resources: By scanning the string constants and resource names in the code. The difference set can obtain unused resources and process them.

Dynamic resources: You can put some low-frequency resources in the cloud, and then go to the cloud to obtain them on demand after app installation.

ODR Resource capture solution: Apple provides on-demand Resource to help reduce the size of the installation package’s first download. Try this solution when something must be built into the installation package for audit reasons, but can be delivered. Of course, resources in the ODR also need to meet the App Store approval standards, otherwise there will be a risk of rejection.

Resource compression: When we do build a resource into the installation package, it should be as small as is acceptable. For example, the built-in video and audio resources in the installation package can be compressed by reducing the definition, bit rate and so on. The iOS native multi-language solution consumes more space, so you can consider developing a more compact solution.

Optimization of images in asset. car: During the compilation process of Assets. Car, some images are sometimes selected and pieced together into a big picture (zzkedAsset) to improve the loading efficiency of images. The smaller graph that is put into the larger graph becomes a reference through the offset.

It is recommended that images with high frequency and low usage be placed in asset. car, which ensures optimal loading and rendering speed. Large images, such as background images, are thousands of pixels long and wide, and this will greatly increase the size of the installation package in asset.car. It is recommended to use WebP for background images and other PNG images larger than 100KB. Compared with PNG format, WebP has better image data compression algorithm, can bring smaller image volume, and has undifferentiated image quality.

As we build, Xcode reprocesses the image with its own compression algorithm. This is why optimizing package sizes by lossless compression of images doesn’t work. For images in asset.car that don’t have translucent effects, using 70% lossy compression is a good way to keep the image sharp while getting a smaller size. If there is a translucent effect of the picture, the use of 70% lossy compression will lead to noise in the translucent place, so the compressed picture had better find the designer to confirm again.

We conducted a reverse study on asset. car, and Xcode adopted different compression algorithms for the same image on different devices and iOS systems, which also resulted in different sizes of images on different devices when downloading.

Xcode currently uses lZfse, Palette_IMG, Deepmap2, Deepmap_lzfse, and ZIP.

Take the iPhoneX for example:

IOS 11. X: The compression algorithm is LZfse and ZIP.

IOS 12.0.x – iOS 12.4.x: the corresponding compression algorithms are deepmap_lzfse and palette_img.

IOS 13.x: The corresponding compression algorithm is DeepMap2. Lzfse < palette_img ~= deepmap_lzfse < deepmap2 in terms of compression ratio.

In BuildSetting if ASSETCATALOG_COMPILER_OPTIMIZATION=space then lower versions of lZfSE compression images will become zip algorithms reducing iOS11.x and below The occupied size of the device picture. Compression algorithms for other iOS versions are not affected by this configuration.

Cleanup of useless code:

General useless code screening can be divided into dynamic and static two ways. The static approach involves scanning code, participating in the build process, or analyzing the end product to determine which code is not being used. The dynamic approach relies mainly on pegs or runtime information to get what code is not executing. Due to Objc’s powerful dynamic nature, dynamic mode is much more accurate than static mode in scenarios with large enough sample sizes.

Static screening screening program:

An easy way to find unused Objc classes is to do a difference set based on __objc_class_list & __objc_class_refs in the otool dump final product.

If code is written in static languages such as C or C++, the basic code logic has already been determined at compile time, so the compiler will help us mark unused code as Dead code and not pack it into the installation package. However, Objc is a typical dynamic language, and a lot of logic is decided at runtime. The error of static scanning is relatively large. Douyin’s accuracy of obtaining unused classes in the initial screening of static results is only 24% (out of 264 samples, 64 were hit).

The main issues introduced by Objc’s dynamic features include:

  • It is actually used but is scanned as useless:

    • A class that is really not used elsewhere, but is logically dependent on itself+load+initialize,__attribute__((constructor))Called at startup time
    • Dynamically called by string
    • Abstract base classes, base classes, and so on are considered useless
    • Dynamically generated code at runtime refers to a class
    • One class is dedicated to handling notifications
    • MTLModel, etc., cannot be counted by classref through the runtime message mechanism assign value
    • Typical DI scenario. If a class declaration follows a Protocol, the Protocol is used externally for method calls
  • Not actually useful but thought to be useful:

    • An object is referenced by another object, but the other object itself is not used. This will miss a check on the Class to which the object belongs

Dynamic screening program:

Line-level code coverage based on staking:

The staking scheme based on GCOV or LLVM Profile binary can collect the staking data at run time to guide the removal of useless code. However, the limitations of pile insertion scheme are also obvious. Pile insertion will degrade the size and performance of binary itself, and the original pile insertion scheme is unable to pass the review line. Data collection can only be limited offline.

A runtime-based lightweight Runtime “class coverage” solution:

When +initialize is invoked for the first time in an Objc class, the system will automatically flag that it has been invoked, and the state will be stored at bit 29 in the Flags field of data in metaClass. This can be obtained using flags & RW_INITIALIZED. The way we get this value has changed since iOS14. WWDC: Advancements in the Objective-C Runtime

#define RW_INITIALIZED (1<<29)
bool isInitialized() {
   return getMeta()->data()->flags & RW_INITIALIZED;
}
Copy the code

The reported data gives us a real picture of our online Class usage, and the resulting data can be used not only to delete unused code. Low usage scenarios can also be distinguished. If it is a low frequency and necessary scenario, cross-end technology can be considered to implement the solution with less impact on the native package size. If these scenarios are for a requirement with low penetration, consider going offline for another requirement.

All the above schemes have been implemented on Douyin. There are also some optimizations that could not be implemented due to engineering and historical reasons, listed below for your reference:

  • Segment compression for dynamic libraries. Some segments of the dynamic library are compressed into a file and loaded into memory manually when the dynamic library is loaded, sacrificing some startup performance.

Part 4. Some coding habits and recommendations that affect package size

Due to the nature of Objc, the code will generate various class structures, method signatures, protocol structures and other byproducts during compilation. These products often lead to deterioration in the size of our installation packages when our projects are large and complex. So try some package-size friendly coding without compromising coding. Many a little makes a mickle has a positive effect on package size in the long run.

Class Method vs C function

We usually expose basic and generic functions in the form of utility classes. Use class methods to complete functionality. But when we use Class Method, we need to generate the Class structure at compile time. The method is called through objc_methodSend. The overhead of this part can be reduced if the C function is adopted. It is recommended to use C functions if they are private functions used within your own component.

Property vs IVAR

Objc automatically generates set and get methods for Class properties. If the property is a private property of the Class, we can use ivar instead of property. Reduce the packet size overhead in this area. It’s important to note that we need to keep the property when we use the getter for property to implement LazyLoad or setter for property for some other side effect.

Reduce the use of blocks

We know that a Block is a special OC object. It takes up some binary space to represent a Block object. So in scenarios where blocks are not necessary. Removing the Block implementation optimizes package sizes, such as chained calls to the navigation Block.

Reduce the number of method arguments & function arguments

When we call a function, the parameters passed in are stored in an argument list. When we call a lot of parameters will have a big impact on the size of our installation package, especially for the method like network request, usually 7 or 8 parameters, and call many places. Therefore, when designing external API, if the parameters are more than 3, we try to solve the problem of parameter transmission by constructing objects.

Do not use multiple lines for frequently used macros

This problem is common in our componentized dependency injection scenarios, Log logging, and so on. When a three-line macro is expanded, the size it ends up consuming after thousands of calls is also terrifying.

Try to avoid copying and pasting code

If componentization is not done well or some large apps have some closed-loop mid-stage business scenarios, there will be a lot of duplicate code in the code. Some may have changed prefixes or namespaces. Consider using PMD to scan the source code in your project. Unify duplicate code.

Reference Documents:

1. Apple App Stochastic document

Help.apple.com/xcode/mac/c…

2. On-Demand Resources Guide

Developer.apple.com/library/arc…).

3. Oz Optimization-outlined Function

Lists.llvm.org/pipermail/l…

MNT IO / 2016/12/06 /…

Join us

We are the team responsible for tiktok client basic capability research and development and new technology exploration. We focus on engineering/business architecture, r&d tools, compilation systems and other aspects to support rapid business iteration while ensuring the r&d efficiency and engineering quality of super-large teams. Continuously explore performance/stability and strive to provide the ultimate basic experience for hundreds of millions of users around the world.

If you are passionate about technology, welcome to join the basic technology team of Douyin, and let us build a hundred-million-level global App. At present, we have recruitment needs in Shanghai, Beijing, Hangzhou and Shenzhen. For internal promotion, please contact [email protected] with email title: name – years of work – Douyin – Basic technology – iOS/Android.


Welcome to Bytedance Technical Team

Resume mailing address: [email protected]