Alitech:

IOS dry: Github

Abstract: The Aspect uses the message forwarding process of OC, which has a certain performance cost. In this paper, the author uses C++ design language, and uses libffi to design the trampoline function, and implements an iOS AOP framework — Lokie. Compared with the familiar Aspects in the industry, the performance has been significantly improved. This article will share the detailed implementation ideas of Lokie

preface

I can’t help but think of the more than ten years of my career. Now, the familiar languages are mainly ASM/C/C++/OC/JS/Lua/Ruby/Shell, etc., while the others are basically forgotten when they are used. Languages are changing too fast, but generally the same thing has changed. I feel that in recent years, all languages are vaguely having a unified trend. Once a feature is good, you’ll find it in different languages. So I’m not a big believer in which language to use, but C/C++ is:)

L****okie

Most of my work is in OC and Ruby, Shell and so on. I have been trying to find a suitable AOP framework for iOS for some time. Aspect is one of the most well-known aspects in the iOS industry. However, the Aspect performance is relatively poor. The trampoline function of the Aspect uses OC language message forwarding and NSInvocation. As we know, both of them are big performance users. There is a test data, basically NSInvocation efficiency is about 100 times the normal message sending efficiency. In fact, aspects can only be used in scenarios where there are no more than 1,000 calls per second. Of course, there are other libraries that have improved performance, but do not support multi-threaded scenarios and have significant performance costs when locked.

Find to find to also do not have what take advantage of the library, then thought to think, oneself write a. And so the Lokie was born.

Lokie design is based on two basic principles: efficiency and thread safety. In order to meet the design principle of efficiency, Lokie uses the efficient C++ design language on the one hand, using C++14 as the standard. C++14 has significantly improved performance over C++98 by introducing some great features such as MOV semantics, perfect forwarding, rvalue references, multithreading support, etc. On the other hand, instead of relying on OC invocation and NSInvocation, we use libffi to design the core trampoline function and cut performance straight from the design. In addition, the implementation of the thread lock also uses a lightweight CAS lockless synchronization technology, for the thread synchronization cost is also reduced.

According to the performance data of some real computers, taking iPhone 7P as an example, Aspect’s million calls consume about 6s, while Lokie overhead in the same scenario is only about 0.35s. According to the test data, the performance improvement is still very significant.

I’m a hothead, and I like to look at the code first. So I’ll start with the lokie’s open source address: Github

Those of you who like to flip through code can go and see it first.

The Lokie header file is very simple, as shown below with just two methods and an enumeration of LokieHookPolicy.

#import <Foundation/Foundation.h>typedef enum : NSUInteger { LokieHookPolicyBefore = 1 << 0, LokieHookPolicyAfter = 1 << 1, LokieHookPolicyReplace = 1 << 2,} LokieHookPolicy; @interface NSObject (Lokie)+ (BOOL) Lokie_hookMemberSelector:(NSString *) selecctor_name withBlock: (id) block policy:(LokieHookPolicy) policy; + (BOOL) Lokie_hookClassSelector:(NSString *) selecctor_name withBlock: (id) block policy:(LokieHookPolicy) policy; -(NSArray*) lokie_errors; @endCopy the code

The parameters of both methods are the same, providing slicing support for class and member methods.

  • Selector_name: that’s the selector name that you’re interested in, and we can usually get that using the NSStringFromSelector API.

  • Block: Is the command to execute. The parameters and return values of a block are discussed later.

  • Policy: specifies whether you want to execute the block before, after, or override the original method.

Monitoring results

Take a scene to see the power of the Lokie. Let’s say we want to monitor all of the page lifecycles to see if they’re normal.

For example, the VC base class in the project is called BasePageController, and the Designated Initializer is @Selector (initWithConfig).

We’ll leave the test code in the application: didFinishLaunchingWithOptions, AOP is so capricious! Isn’t it cool that we monitor the start and end points of the BasePageController life cycle for all BasePageController objects when the app initializes?

Class cls = NSClassFromString(@"BasePageController"); [cls Lokie_hookMemberSelector:@"initWithConfig:" withBlock:^(id target, NSDictionary *param){ NSLog(@"%@", param); NSLog(@"Lokie: %@ is created", target);} policy:LokieHookPolicyAfter]; [cls Lokie_hookMemberSelector:@"dealloc" withBlock:^(id target){ NSLog(@"Lokie: %@ is dealloc", target);} policy:LokieHookPolicyBefore];Copy the code

The argument definition of the block is very interesting. The first argument is the eternal ID target, the object to which the selector is sent, and the rest of the arguments are consistent with the selector. For example, “initWithConfig:” has one argument, and the type is NSDNSDictionary *, so we’re going to do this with initWithConfig: We pass ^(ID target, NSDictionary *param), and dealloc has no arguments, so the block becomes ^(ID target). In other words, in the block callback, you can get the current object and the context of the argument that executed the method, which basically gives you enough information.

It makes sense that the return value of the block must be the same when you replace the original method with LokieHookPolicyReplace. When the other two flags are used, there is no return value. Use void instead.

In addition, we can hook the same method multiple times, like this:

Class cls = NSClassFromString(@"BasePageController"); [cls Lokie_hookMemberSelector:@"viewDidAppear:" withBlock:^(id target, BOOL ani){ NSLog(@"LOKIE: Before viewDidAppear calls will execute this code ");} policy: LokieHookPolicyBefore]; [cls Lokie_hookMemberSelector:@"viewDidAppear:" withBlock:^(id target, BOOL ani){ NSLog(@"LOKIE: ViewDidAppear call later, you can execute this code ");} policy: LokieHookPolicyAfter];Copy the code

Have you noticed that it is very easy to get the execution time of a function if we use a timestamp to record the time before and after.

The first two simple examples are just a few to start with, but AOP is very powerful for monitoring and logging.

Realize the principle of

The implementation of AOP is based on the runtime mechanism of iOS and the trampoline function created by libffi as the core. So here’s a little bit about iOS Runtime as well. This part is probably familiar to a lot of people.

OC Runtime has several basic concepts: SEL, IMP, and Method.

SEL

typedef struct objc_selector *SEL; typedef id (*IMP)(id, SEL, ...) ; struct objc_method { SEL method_name; char *method_types; IMP method_imp; }; typedef struct objc_method *Method;Copy the code

The objc_selector structure is interesting, because I can’t find a definition for it in the source code. But you can infer the implementation of objc_Selector by looking through the code. In objc-sel.m, there are two function codes as follows:

const char *sel_getName(SEL sel) { if (! sel) return "<null selector>"; return (const char *)(const void*)sel; }Copy the code

Sel_getName = sel_getName; sel_getName = sel_getName; sel_getName = sel_getName;

static SEL __sel_registerName(const char *name, int copy) ; / /! In __sel_registerName there are methods to get SEL directly from const char *name... if (! result) { result = sel_alloc(name, copy); }... / /! Static SEL sel_alloc(const char *name,bool copy){sellock.assertwriting (); return (SEL)(copy ? strdupIfMutable(name):name); }Copy the code

So we can basically assume that objc_selector should be defined something like this:

typedef struct { char selector[XXX]; void *unknown; . }objc_selector;Copy the code

To improve efficiency, selecor searches through the string hash of key, which is more efficient than using the string index.

/ /! Objc4-208 version of the hash algorithm static CFHashCode _objC_hash_selector (const void *v) {if (! v) return 0; return (CFHashCode)_objc_strhash(v); }static __inline__ unsigned int _objc_strhash(const unsigned char *s) { unsigned int hash = 0; for (;;) { int a = *s++; if (0 == a) break; hash += (hash << 8) + a; } return hash; } / /! Objc4-723 hash algorithm static unsigned _mapStrHash(NXMapTable *table, const void *key) {unsigned hash = 0; unsigned char *s = (unsigned char *)key; /* unsigned to avoid a sign-extend */ /* unroll the loop */ if (s) for (; ;) { if (*s == '\0') break; hash ^= *s++; if (*s == '\0') break; hash ^= *s++ << 8; if (*s == '\0') break; hash ^= *s++ << 16; if (*s == '\0') break; hash ^= *s++ << 24; } return xorHash(hash); }static INLINE unsigned xorHash(unsigned hash) { unsigned xored = (hash & 0xffff) ^ (hash >> 16); return ((xored * 65521) + hash); }Copy the code

I think the official reason for objc_selector is to emphasize that SEL and const char are different types.

IMP

IMP is defined as follows:

#if ! OBJC_OLD_DISPATCH_PROTOTYPEStypedef void (*IMP)(void /* id, SEL, ... * /); #elsetypedef id _Nullable (*IMP)(id _Nonnull, SEL _Nonnull, ...) ; #endifCopy the code

Added OBJC_OLD_DISPATCH_PROTOTYPES after LLVM 6.0, Objc_msgSend (ID self, SEL op,…) can be used only if Enable Strict Checking of objc_msgSend Calls is set to NO in the build setting. . This is why some of you get an error like this when you call objc_msgSend.

Too many arguments to function call, expected 0, have 2
Copy the code

IMP is a function pointer, which is the final method call is the execution instruction entry.

Objc_method is the key, and it’s also the cornerstone of OC’s method swizzling design. With objc_Method, you can associate function addresses, function signatures, and function names, and when you actually execute a class method, Find the IMP by the selector name. Similarly, we can accomplish special requirements by replacing a selector name with the IMP at run time.

Message sending mechanism

With those three concepts in place, let’s move on to messaging mechanisms. We know that when you send a message to an object, there’s a key function called objc_msgSend, and we’ll talk a little bit about what’s going on in that function.

/ /! Objc_msgSend (ID self, SEL op,...) ;Copy the code

This function is written internally in a assembly, with implementation code provided for different hardware systems. There should be differences between different versions of the implementation, including function names and implementation (the version I looked up was OBJC4-208).

The first thing objc_msgSend does is check if self, the message sending object, is empty. If so, it just returns and does nothing. That’s why sending messages doesn’t crash when the object is nil. After doing this, we will look for the corresponding Method in the cache using self->isa->cache, and call Method-> method_IMP directly. If not, go to the next process and call a function named class_lookupMethodAndLoadCache.

This function is defined as follows:

IMP _class_lookupMethodAndLoadCache (Class cls, SEL sel) { ... if (methodPC == NULL) { //! // Class and superclasses do not respond -- use forwarding SMT = malloc_zone_malloc (_objc_create_zone(), sizeof(struct objc_method)); smt->method_name = sel; smt->method_types = ""; smt->method_imp = &_objc_msgForward; _cache_fill (cls, smt, sel); methodPC = &_objc_msgForward; }... }Copy the code

Message forwarding mechanism is a dynamic method to analyze the message forwarding mechanism, the backup receiver, and the message redirection.

Trampline function implementation

Next, let’s briefly introduce how to implement a trampline function to complete c function forwarding from the assembly perspective. The x86 instruction set, for example, works similarly with other types.

From an assembly point of view, the most direct way to jump a function is to insert JMP instructions. In the x86 instruction set, each instruction has its own instruction length. For example, a JMP instruction has a length of 5, which contains one byte of instruction code with a relative offset of 4 bytes. Let’s say we have two functions, A and B, and if we want calls from B to be forwarded to A, there is no doubt that JMP instructions can help. The next problem we need to solve is how to calculate the relative offsets of these two functions. We can think of it this way, but when the CPU hits the JMP, its execution action is IP = IP + 5 + relative offset.

To explain this more directly, let’s look at the following assembly function (don’t worry if you’re unfamiliar with assembly, this function doesn’t do anything, it just does a jump).

Or you can do it with me by writing jump_test.s, which defines a function that doesn’t do anything.

Void jump_test(){return; void jump_test(){return; }.

.global _jump_test _jump_test: JMP jlable #! To test JMP offsets, manually add nop nop nop nop jlable: rep; retCopy the code

Next, we create a C file: in this file, we call the jump_test function we just created.

#include <stdio.h>extern void jump_test(); int main(){ jump_test(); }Copy the code

Finally, there is the compile link. We create a build.sh to generate the executable file Portal.

#! /bin/shcc -c  -o main.o main.c as -o jump_test.o jump_test.s cc -o  portal main.c jump_test.o
Copy the code

We use LLDB to load and debug the prTAL file we just generated, and put the breakpoint on the function jump_test.

lldb ./portalb jump_testr
Copy the code

On my machine, it is the following jump address, your address may be different from mine, but it doesn’t matter, this does not affect our analysis.

Process 22830 launched: './portal' (x86_64)Process 22830 stopped* thread #1, queue = 'com.apple.main-thread', Stop Reason = breakpoint 1.1 Frame #0: 0x0000000100000F9f portal 'jump_testPortal' jump_test:-> 0x100000F9f <+0>: jmp 0x100000fa7 ; jlable 0x100000fa4 <+5>: nop 0x100000fa5 <+6>: nop 0x100000fa6 <+7>: nopCopy the code

By the time we got to this point in the demo, we had managed to see some of the things we wanted from an assembly perspective.

The current IP address is 0x100000F9f, and the jlable we used in the assembly is now computed and changed to the new target address (0x100000FA7). We know that the new IP is calculated by adding an offset to the current IP, and that the JMP instruction length is 5, as explained earlier. So we can know the following relationship:

new_ip = old_ip + 5 + offset;
Copy the code

Put the address from the LLDB in, and it becomes:

0x100000fa7 = 0x100000f9f + 5 + offset ==> offset = 3.
Copy the code

Looking back at the assembly code, we have three NOPs in the code, each one byte, just after the three NOPs. After a simple verification, we can transform this equation so that we get offset = new_ip – old_ip – 5; Once we know function A and function B, it’s easy to figure out what the operands of JMP are.

Now, it’s pretty clear that we want to jump to B when we call A. Let’s say we have a C API, and we want every time we call the API, we actually jump into our own custom function, so we need to modify the first few bytes of the API, directly JMP into our own custom function. The first five bytes are of course the JMP opcode, and the next four bytes are the calculated offsets.

Finally, a complete example is given. Assembler analysis is packaged with C code.

#include <stdio.h>#include <mach/mach.h>int new_add(int a, int b){ return a+b; }int add(int a, int b){ printf("my_add org is called! \n"); return 0; }typedef struct{ uint8_t jmp; uint32_t off; } __attribute__((packed)) tramp_line_code; void dohook(void *src, void *dst){ vm_protect(mach_task_self(), (vm_address_t)src, 5, 0, VM_PROT_ALL); tramp_line_code jshort; jshort.jmp = 0xe9; jshort.off = (uint32_t)(long)dst - (uint32_t)(long)src - 0x5; memcpy(my_add, (const void*)&jshort, sizeof(tramp_line_code)); vm_protect(mach_task_self(), (vm_address_t)src, 5, 0, VM_PROT_READ|VM_PROT_EXECUTE); }int main(){ dohook(add, new_add); int c = add(10, 20); / /! Printf ("res is %d\n", c); return 0; }Copy the code

Compile script (macOS) :

GCC -o portal. /main.c Run./portal. Res is 30 is displayedCopy the code

At this point, the function call has been successfully forwarded.

More articles from: Github