preface

In our application, we used to have a word, startup time. We are often more familiar with what happens after the program is started, because we construct it ourselves. But what happens before it starts is what the system is built for. Having said that, I have to say that one of the core things of iOS —-dyld. This is closely related to the startup time, and from dyLD to the current DYLD3, continuously in the shortening of the startup time, optimize the performance of the system.

  • The startup time: it meansmainThe time before the function starts;
  • Start the endingThis means all the information you need to start the program. For example: the program starts to usedyldWhat’s in it? Which of their offset positions are used for different symbols? What is a code signature? . And so on;

Resources to prepare

Look at the original video

The body of the

Shorten startup time – reduce code

Because the less code, the shorter the run time. So use fewer Dylib libraries, fewer embedded Dylib libraries. Of course, it is better to use the system library. Similarly, declaring fewer libraries and methods and initializing fewer functions will have the effect of shortening runtime.

Shorten startup time – use moreSwiftwrite

Write projects in Swift code because it avoids many of the pitfalls of C, C++, and Objective-C.

  • Swift does not have an initializer;

  • Swift do not allow certain types of unaligned data structure, because our CPU reads the data structure of the different length of memory, the need to switch to the corresponding length, when memory there are many different length of data structure, then read up, and will constantly switching, so very consumption performance, so the memory alignment apple system will start operating. Swift avoids such a pitfall. This is why performance is improved and startup time is reduced;

  • Swift code is more compact;

dyld

Dyld was included in NeXTStep 3.3, which dates back to 1996, before NeXT used static binary data. NeXTStep had different specialized extensions, so Apple developers wrote third-party wrappers on early versions of macOS 10 to support standard Unix software. But it still doesn’t match perfectly, and often there are boundary examples that don’t work. Therefore, the operation is slow. Also, this third party library uses the C++ library on a large scale. It also caused slow operation.

C++ features:

  • C++Initializer sorts, which work well in static environments, but can degrade performance in dynamic environments. So largeC++The code base causes a lot of work to be done by the dynamic linker, which slows it down.

In order to solve this series of problems, the developers of The Apple system, also came up with a lot of ways to solve the problem. One of the methods —– pre-binding technology

The system uses pre-binding technology to find fixed addresses for all dylib and our projects in the system. Then the dynamic loader will load all the contents of those addresses. After the loading is successful, there will be more binary data in the compiled address. To get all the precomputed addresses, and then the next time you call it, you put the real data into that address.

That is, when the program is first loaded, the system will assign the corresponding address, and the contents of these addresses, which is dirty data, will be used to occupy the space. When called again, the actual data will be filled into the corresponding address. This avoids repetitive operations and greatly improves the speed of the program.

The drawback is that every time you start, you compile all the binary data in the project. That’s still a lot of work. There are also problems with safety.

So we have the derivation of dyld 2.

dyld 2

dyld 2Better supportC++

Dyld 2 is a complete rewrite of Dyld as part of macOS Tiger. He correctly supports the C++ initializer semantics, extends the mach-o format, and updates dyld. To obtain efficient C++ library support; Dyld 2 also has full native dLOpen and DLsymd implementations. At the same time, it also improves some functions and improves the security of the platform.

dyld 2Reduced pre-bound content

While the previous dyLD pre-binding handled all dylib and our projects in the system, dyLD 2 only edited the system libraries, which reduced the pre-binding effort.

dyld 2Added infrastructure and platform

Dyld 2 adds a number of infrastructure and platforms, such as x86, X86_64, ARM, ARM64, and many derivative platforms. At the same time, iOS, tvOS, watchOS, all need to match dyLD 2.

The advantage of this is that the real machine only calls the logical execution of the ARM64 architecture and does not need to load the contents of other architectures. The ability to treat execution differently for different architectures.

dyld 2Enhanced security

Firstly, code signature and ASLR are added, that is, the address space configuration is loaded randomly. The address obtained by each run is different, so it is not easy to be hooked.

dyld 2increasedmach-OItem in the header of the file

This is an important boundary check to avoid the addition of malicious binary data.

dyld 2Strengthen the useShare Cache

Eliminate pre-binding and use Share Cache instead. The Share Cache is a single file that contains most of the system’s dylib, so it can be optimized.

The Share Cache also resizes all TEXT segments and all DATA segments, rewriting the entire symbol table in order to reduce the content size so that each process only needs to mount a small number of areas.

Share Cache allows you to pack binary data segments, saving a significant amount of RAM and saving 500M to 1GB of memory space at runtime.

The Share Cache is actually a Dylib prelinker;

Share Cache is also a preproduction data structure for DyLD and Objective-C to use at run time. So when the program starts, you don’t have to do as much load work. The system performance is improved

dyld 2Startup process of

  • Engineering firstmach-OFiles, figure out which libraries you need (Recursive processing);
  • It then maps to all Mach-o files and places them in the address space;
  • Then execute the symbol table query, copy the required content to the corresponding address pointer;
  • Then binding and base address reset, base address and then a random address, this step is to increase security;
  • Finally, run all of the project initializers, ready to execute the main function.

dyld 3

One of the most prominent changes in DyLD 3 is the dynamic linker, which is now the default configuration for macOS systems and will completely replace DyLD 2.

  • dyld 3Improved performance;
  • dyld 3Enhance security;
  • dyld 3Easier to test.

dyld 3Performance improvement issues

To speed up startup and running, the system removed most of dyld from the process, so it is now just a normal background program that can be tested using standard testing tools. It also sets the stage for further performance and speed improvements.

In addition, a small portion of the DYLD is allowed to reside in the process, thus reducing the area of the program under attack. With less code, the code runs faster and, therefore, starts faster.

In Dyld 3, both mach-o file processing and symbol table queries are written to disk. Dyld 3 consists of three parts:

  • It’s out of processmach-OProfilers and compilers;
  • Is also an in-process engine that performs startup termination processing;
  • It is also a startup end-of-cache service;

Most programs start using the cache, but there is always no need to call an out-of-process Mach-o analyzer or compiler; Startup terminals are simpler than Mach-O, they are memory-mapped files and do not need to be analyzed in a complex way, thus increasing the speed of the operation.

dyld 3Enhanced security

In Dyld 2, some security features have been added, but it is difficult to follow the real world situation to enhance security.

In the past, by analyzing the Mach-o header and looking for dependencies, you could just use the modified Mach-O header to attack, change the corresponding path or insert the library into the appropriate location, and break the original program.

So now, outside of the background process, the mach-o file is doing a lot of work.

Symbol table lookup is a time-consuming operation that takes up a large portion of the buffer. Because in a given library, unless you make a software update or change the library on disk, the symbol will always be at the same offset in the library.

Dyld 3 eliminates the need to analyze Mach-o headers or look up symbol tables, which can be time-consuming operations, thus increasing speed.

dyld 3Easier to test

Previously, the low-level functionality that relied on dynamic linkers and plugged their libraries into the process could not be used to test existing DyLD code, making it difficult to test dyLD’s security and performance levels.

dyld 3It is also a startup cloaking cache service

The MAC system adds program terminations directly to the shared cache, so that startup terminations are mapped to the cache, and all DyLib launches using this cache, making it even faster. It also increases the startup speed of the program.

For third party programs, when used or updated, the corresponding end-of-process is generated.

  • dyld 3Fully compatible withdyld 2

dyld 3The resolution of the sign

Dyld must load all symbols, which takes a lot of resources, so you should use the cache. Running the existing program directly, which takes a lot of resources, will take a long time. To solve this problem, The Apple system uses a mechanism —- lazy load symbol resolution.

  • Lazy load symbol resolution: By default, function Pointers in the library, such asprintfIt’s not pointing toprintfBy default, it is pointed todyldThat returns a pointer toprintf, so when started, the callprintfWill enterdyldTo return toprintfMake the first call, and then call it directly the second timeprintf.

Dyld 2 lazy load

  • Symbol query is too tedious;
  • Each symbol is looked up the first time it is called;
  • Missing symbols cause a crash the first time they are called;

Dyld 3 lazy load

  • All symbol lookup is cached, and because all symbols have been cached and calculated, there is no overhead to bind them when the program starts, so it is very fast;
  • You can check if all symbols are present, parse all symbols up front, and crash if missing, rather than crash while using the program;

dyld 3Operation process of

  • Parse all paths, environments;
  • Engineering firstmach-OBinary data;
  • Then execute the symbol table query, copy the required content to the corresponding address pointer;
  • Use these results to create the end-of-life processing and write the end-of-life processing to disk;
  • Check whether the start end handling is correct;
  • And then it maps todylibAmong the
  • Jump to main

Recommendation: Explore the loading process of —–dyld program based on the underlying iOS.