background

The large size of the APP package seriously affects the user’s experience of first installation. For devices with iOS13 or higher, the APP exceeding 200M cannot be downloaded from the mobile network. For devices with iOS13 or higher, you need to enable the download switch in Settings to download the APP exceeding 200M from the mobile network.

To optimize the direction

This paper mainly slimming APP from binary files, compiler link parameters, Dead strip, useless resource image deletion, useless code deletion and resource dynamic distribution, the total slimming of Jingxi APP51M.

1. Bitcode optimization

Apps that contain Bitcode and are uploaded to App Store Connect are compiled and linked on the App Store. Including Bitcode allows Apple to optimize your App binaries again in the future without submitting a new version of the App. In Xcode, Bitcode is turned on by default. If your App supports Bitcode, the other binary forms used by your App should also support Bitcode.

Bitcode is an intermediate representation after compilation before assembly is generated:

Underlying compilation process:

2. Link time Optimization (LTO)

Link Time Optimization (LTO) refers to the implementation of inter-module Optimization at the Link stage. At compile time, Clang emits LLVM bitcode instead of the object file. The linker recognizes these Bitcode files and calls LLVM during linking to generate the final objects that will make up the executable. All the input Bitcode files are then loaded and merged together to generate a module. The LTO mechanism of LLVM is by passing the LLVM IR to the linker so that the entire program analysis and optimization can be performed during the linking.

LTO has two modes:
  • Full LTO combines all the LLVM IR code from each individual object file into one large module, then optimizes it and generates machine code as usual.
  • Thin LTO separates modules, but allows you to import functionality from other modules as needed for optimization and machine code generation in parallel.

The advantage of doing LTO instead of compiling all at once is that (partial) compilation is done in parallel with LTO. For full LTO(-flTO =full), only semantic analysis is performed in parallel, while optimization and machine code generation are done in a single thread. For ThinLTO(-flTO =thin), all steps except the global analysis step are executed in parallel. As a result, ThinLTO is much faster than FullLTO or one time compilation.

The compiler link parameters used are:
Clang: -flto=<value> Sets the LTO mode: full or thin. The default value is full. -lto_library <path> Specifies the location of the library that executes LTO mode. When link time optimization (LTO) is performed, the linker will automatically unlink libLTO. Dylib, or link from the specified path.Copy the code
Set the Xcode Build Setting to:

Analyze it through examples:
--- a.h ---
extern int foo1(void);
extern void foo2(void);
extern void foo4(void);

--- a.c ---
#include "a.h"

static signed int i = 0;

void foo2(void) {
  i = -1;
}

static int foo3() {
  foo4();
  return 10;
}

int foo1(void) {
  int data = 0;

  if (i < 0)
    data = foo3();

  data = data + 42;
  return data;
}

--- main.c ---
#include <stdio.h>
#include "a.h"

void foo4(void) {
  printf("Hi\n");
}

int main() {
  return foo1();
}
Copy the code
Enter terminal operation:
1. Compile a.c to generate bitcode file clang-flto -c a.c -o a.o 2. Compile main Clang-flto a.o main. O-o main. Clang-flto a.o mainCopy the code
According to LTO optimization:
  1. The linker first reads all object files (in this case, bitcode files, masquerading only as object files) in order and collects symbolic information.
  2. Next, the linker parses symbols using the global symbol table. Find undefined symbols, replace weak symbols, and so on.
  3. As a result of the parsing, the library file executing the LTO (default is liblto.dylib) is told which symbols are needed. Next, the linker calls the optimizer and the code generator, returning the object file created by merging the Bitcode files and applying various optimization procedures. Then, update the internal global symbol table.
  4. The linker continues to run until the executable is generated.
The whole optimization sequence of LTO is as follows:
  1. First read a.O (bitcode file) to collect symbol information. The linker recognizes foo1(), foo2(), and foo4() as global symbols.
  2. Read main.o (the real object file) to find the symbol information used in the object file. At this point, main.o uses foo1() and defines foo4().
  3. After the linker completes the symbol resolution process, it finds that foo2() doesn’t use it anywhere to pass it to LTO. If foo2() is not used, then foo3() can be deleted.
  4. Once the symbol is processed, the results are passed to the optimizer and the code generator, and a.o is merged into main.o.
  5. Modify the symbol table information in main.o. Continue linking to generate the executable.
Symbol table information for the resulting executable file main:

Once the link is complete, the only functions left to declare ourselves are main, foo1, and foo4. There is a problem with this, foo4 is not used anywhere, why is it not killed? Because LTO optimization is based on the symbol required by the entry file, parsing optimization is performed externally. So, to optimize dropping Foo4, you need to use a new feature called Dead Strip.

Dead Strip optimization

The linker’s -dead_strip argument does the following:
Remove functions and data that are unreachable by the entry point or exported symbols.
Copy the code

Simply put, this means removing entry functions or functions or code that are not used by the exported symbol. Now foo4 is exactly the case, so you can use -dead_strip to remove useless code.

Zoom in to the dynamic library, when creating the dynamic library you can use -mark_dead_strippable_dylib:

Specifies that the dylib being built can be dead strip by any
client.  That is, the dylib has no initialization side effects.
So if a client links against the dylib, but never uses any symbol
from it, the linker can optimize away the use of the dylib.
Copy the code

If the symbolic information of the dynamic library is not used, the linker automatically optimizes the dynamic library. It won’t crash because of path problems. You can also use -dead_strip_dylibs in the App to get the same functionality.

Strip: Removes the specified symbol. By default in Xcode strip will only be archived, remove the corresponding symbol. Strip-x: all but global symbols can be removed (for dynamic library use)

strip_x() {
  if [ $CONFIGURATION == Release ]; then 
      strip -x $1
  fi
}
for a_framework in `find ${destination} -name "*.framework"`; do
  basename="$(basename -s .framework "$a_framework")"
  binary="${destination}/${basename}.framework/${basename}"
  if [[ "$(file "$binary")" == *"dynamically linked shared library"* ]]; then
    strip_x "$binary"
  fi
done
Copy the code

Strip -s: Remove debug symbols (used by static libraries) Strip: Remove all symbols except those used in the indirect symbol table (used by apps)

Strip Style set, All Symbols under Release

All Symbols: Removes All Symbols, usually enabled in the main project. Non-global Symbols: Removes Symbols that are not Global (Global Symbols are retained, Debug Symbols are also removed). Symbols redirected during linking are not removed. This option is recommended for static/dynamic libraries. Debug Symbols: Remove Debug Symbols. After removing this symbol, breakpoint debugging is not possible.Copy the code

Strip Linked Product: Remove unnecessary symbolic information. The Strip Linked Product option takes effect only when Deployment Postprocessing is set to YES.

Strip Debug Symbols During Copy: Remove the Debug Symbols for libraries, resources, or Extension that are copied into the project package. Set the Release Symbol to YES.

Strip Swift Symbols: Remove all Swift Symbols from the corresponding Target. This option is also enabled by default. The Swift library calls in target files.

Iv. Code Generation Options

Options for code generation conventions:

1, None[-o0]

In this setting, the compiler's goal is to reduce compilation costs and ensure that the desired results are produced during debugging. The program statements are independent: if the program stops at a breakpoint on a line, we can assign a new value to any variable or point the program counter to any statement in the method, and get a run result that is exactly the same as the source code.Copy the code

2, Fast[-o1] large function required compilation time and memory consumption will be slightly increased:

In this setting, the compiler tries to reduce the size of the code file and the execution time, but does not perform optimizations that require a lot of compile time. In Apple's compiler, strict aliases, block rearrangements, and scheduling between blocks are disabled by default during optimization. This optimization level provides a good debugging experience, improved stack utilization, and better code quality than None[-o0].Copy the code

3. The Faster[-O2] compiler performs all supported optimization options that do not involve time-space swapping:

Is higher performance optimization Fast[-O1]. In this setting, the compiler does not loop unwrap, function inlining, or register renaming. Compared with Fast[-o1], this setting increases compilation time, reduces debugging experience, and may increase the code size, but improves the performance of the generated code.Copy the code

Fastest[-O3] : enabled all optimization options supported by Fast[-O1] and register rename options:

Faster[-O2] is the higher performance optimization that instructs the compiler to optimize the performance of the generated code, ignoring the size of the generated code, potentially resulting in larger binaries. It also reduces the debugging experience.Copy the code

Fastest [-OS] ‘provides maximum performance without significantly increasing code size:

This setting enables all optimizations in Fast[-o1] that do not increase the code size, and further optimizations that reduce the code size are performed. The code size increased is smaller than the Fastest[-O3]. It also reduces the debugging experience compared to Fast[-O1].Copy the code

6, Fastest, Aggressive, Optimizations[-ofast] * This rank also executes some more Aggressive Optimizations than the Smallest[-os] ‘:

This setting enabled all of the optimizations in the Fastest[-O3] option, as well as active optimizations that might break the strict compilation standards, but did not affect the code that worked well. This level degrades the debugging experience and can lead to increased code size.Copy the code

4, some arguments for using LTO to reduce code Size

Similar to -OS, instructs the compiler to optimize only for code size, ignoring performance tuning, which can cause code to slow down.Copy the code

Delete useless code

Dead Code Stripping removes unused Code when linking to static languages such as C/C++/ Swift, but OC dynamic language Stripping is not available.

Clean up code scheme:

We need to clean up: useless classes, useless protocols, and useless methods. We adopted the curve method to solve the problem: find out the classes, protocols and methods without reference by using mach-O file, and then divide these into different businesses according to the LinkMap file, and give them to specific businesses for further confirmation, and delete them after confirming that they are useless.

  • Unreferenced classes = __objc_classList – (__objc_classrefs+__objc_superrefs)
  • No reference method = __objc_classList (instanceMethods – __objc_selrefs) + clasMethods – __objc_selrefs
  • No reference protocol = __objc_protolist – (__objc_classrefs+__objc_superrefs) Protocol_list

Project practice

Use the existing tool Snake directly, which requires a simple setup

  • Copy snake to a directory such as $HOME/custom-tool/bin.
  • Open the ~/.bash_profile file: vi ~/.bash_profile and add a line at the top of the file: Export PATH= HOME/custom−tool/bin/:HOME/custom-tool/bin/:HOME/custom−tool/bin/:PATH save the configuration and exit
  • Run source ~/.bash_profile;
  • At this point, the Snake tool takes effect. Find no-reference classes, methods, and protocols
snake -l app_name-LinkMap.txt app_name.app/app_name -c > app_name_unref_class.txt

snake -l app_name-LinkMap.txt app_name.app/app_name -s > app_name_unref_selector.txt

snake -l app_name-LinkMap.txt app_name.app/app_name -p > app_name_unref_protocol.txt

Copy the code

Objective-c Direct Methods

Direct methods have the look and feel of regular methods, but the behavior of C functions. When a direct method is called, it calls its underlying implementation directly, rather than through objc_msgSend.

The purpose of the direct property

Reduce binary file size:
  • Remove objective-C meta Data from related methods
  • Reduced glue code when calling oc methods, reduced instruction count
Improve method call efficiency:
  • Reduced glue code when calling oc methods, reduced instruction count

The object to which the direct property applies

object A declarative way
Method statement attribute ((objc_direct))
Class implementation, category, Extensions attribute ((objc_direct_memters))
attribute @property (direct)

The actual operation of more than 2000 Direct method changes, the binary file is almost 200K or so reduced.

7. Delete the unused pictures in the project

Use the LSUnusedResources tool, import the project to find useless images, tool download address

Find out useless pictures after manual confirmation and delete. Note: The name of some spliced pictures is also detected not to be used, need to manually confirm the second delete ‘

Compress images

  • Add all 2x and 3X images from the project to the Asset file, and Apple will use App Stochastic to list the images for different models and compress them.
  • Images are compressed lossy and a small number of images are compressed directly in Tinypng.
  • A large number of images need to be compressed using scripts for batch compression. The python compression script is as follows:
def compressImages(uncompress_images): for index in range(0,len(uncompress_images)): pngDict = uncompress_images[index] imagePath = pngDict['path'] source = tinify.from_file(imagePath) os.remove(imagePath)  list = imagePath.split("/" + pngDict['name']) source.to_file(os.path.join(list[0], pngDict['name']))Copy the code

Note: Python’s Tinify library only compresses 500 images per month per free key, so apply for several more keys if you have many images to compress

Ix. Delete duplicate pictures in the project

Repeat with script detection project pictures, if using a modular development approach, each component in duplicate images will become very much, find out the various components of the same script way pictures, delete the different components in the same picture, and access to images in the same component, thin body effect, high profit.

Principle: Using SSIM algorithm, if the similarity of two pictures is more than 99%, it is considered to be the same picture. Python comparison code:

Def similarImg(image_path1,image_path2): try: img1 = Image.open(image_path1) img2 = Image.open(image_path2) if img1.size[0] ! = img2.size[0] or img1.size[1] ! = img2.size[1]: return 0 im1 = io.imread(image_path1) im2 = io.imread(image_path2) im1_t = np.atleast_3d(img_as_float(im1)) im2_t = np.atleast_3d(img_as_float(im2)) # psnr_val = peak_signal_noise_ratio(im1_t, im2_t) try: Ssim_val = structural_similarity(IM1_t, IM2_t, win_size=11, gaussian_weights=True, multichannel=True, data_range=1.0, ssim_val = structural_similarity(IM1_t, IM2_t, win_size=11, gaussian_weights=True, multichannel=True, data_range=1.0, K1=0.01, K2=0.03, sigma=1.5) except: return 0 except: return 0 if ssim_val > similarValue: return 1 else: return 0Copy the code

X. Dynamic resource delivery

Delete the built-in resources such as RN and H5 from the local server and download them dynamically. Delete some pictures that can be downloaded dynamically and use the network to download them. Yields are high.

Xi. Follow-up planning

Monitor the size of each version of APP package and analyze whether the increment is reasonable. Only by monitoring each version well will the APP pack not suddenly become a giant.

IPA packet incremental analysis:

Compare the resource files of iPA packages of two versions, and output a package size increment analysis table according to the resource increment. Compare python scripts as follows:

def dir_compare(path1, path2):
    files_in_path1 = filesizes_in_path(path1)
    files_in_path2 = filesizes_in_path(path2)
    filesubpaths = list(map(lambda x: x.replace(path1, ''), files_in_path1))
    filesubpaths += list(map(lambda x: x.replace(path2, ''), files_in_path2))
    filesubpaths = list(set(filesubpaths))
    all_sizes = []
    for filesubpath in filesubpaths:
        fullpath1 = path1 + filesubpath
        fullpath2 = path2 + filesubpath
        size1 = files_in_path1.get(fullpath1, 0)
        size2 = files_in_path2.get(fullpath2, 0)
        all_sizes.append([filesubpath, size1, size2, size2 - size1])
    all_sizes.sort(key=lambda x: x[3], reverse=True)
    print('filepath, {}, {}, increase(Byte)'.format(path1, path2))
    for a_size in all_sizes:
        print('{}, {}, {}, {}'.format(a_size[0].encode('utf-8').decode('utf-8'), a_size[1], a_size[2], a_size[3]))
Copy the code

The comparison results are as follows:

Component module monitoring:

Analyze the binary code size of each component to find incremental components, and output the incremental size of each component by component size granularity

Class file analysis:

Analyze the increments of each class file in the incremental component, and output the classes with large increments in class size granularity.

Linkmap analysis tool: Linkmap

Reference article:

  • LLVM Link Time Optimization: Design and Implementation
  • ThinLTO: Scalable and Incremental LTO
  • Reducing Your App ‘s Size
  • IOS App Weight Loss Tips
  • Objc_direct Property overview