The original [address] (segmentfault.com/a/119000002…

Summary: Aspect uses OC’s message forwarding process and has some performance cost. In this paper, the author uses C++ design language and libffi to design the core trampoline function, and implements an iOS AOP framework — Lokie. There is a significant improvement in performance compared to well-known Aspects in the industry. This article will share the implementation ideas for Lokie.

preface

Unconsciously think of their own practice of this decade, such as blink. At present, the language I am familiar with is ASM/C/C++/OC/JS/Lua/Ruby/Shell, etc., while the other languages are mostly forgotten when I used to use them. The language is changing too fast, but generally the same. I feel that in recent years, all the languages have a unified trend. Once a feature is good, you’ll find it in different languages. So I’m not really obsessed with which language to use, but C/C++ is a religion:)

Lokie

I use OC and Ruby and Shell for most of my work. I have been looking for a suitable AOP framework for iOS for some time. Aspect is what the iOS industry is better known for. However, the Aspect performance is poor. The Trampoline function of the Aspect uses the MESSAGE forwarding process of OC language, and the function invocation uses NSInvocation. As we know, both of them are high performance. Basically, NSInvocation is about 100 times more efficient than normal message sending. In fact, aspects can only be used in scenarios where they are called no more than 1,000 times per second. There are, of course, other libraries that have improved performance but do not support multithreaded scenarios and suffer significant performance losses once locked.

Seek to seek to also do not have what take advantage of the library, then thought, oneself write one. Lokie was born.

There are only two basic principles of Lokie design, efficiency and thread safety. In order to meet the design principle of efficiency, Lokie adopts the efficient C++ design language, using C++14 as the standard. C++14 is a significant performance improvement over C++98 due to the introduction of some great features such as MOV semantics, perfect forwarding, rvalue references, and multithreading support. On the other hand, we discard the reliance on OC message forwarding and NSInvocation and use libffi to design the core trampoline function, thus directly disabling the performance from the design. In addition, the implementation of thread locks also uses the lightweight CAS lock-free synchronization technology, which reduces the overhead for thread synchronization.

From the performance data of some real computers, taking iPhone 7P as an example, Aspect consumption of one million calls is about 6s, while Lokie overhead in the same scenario is only about 0.35s. From the test data, performance improvement is still very significant.

I’m a hothead and I like to read code first. So I’ll post lokie’s open source address first:

Github.com/alibaba/Lok…

Those of you who like to look at code can go and have a look.

The Lokie header file is very simple, with just two methods and an enumeration of LokieHookPolicy as shown below.

#import <Foundation/Foundation.h>
typedef enum : NSUInteger {
    LokieHookPolicyBefore = 1 << 0,
    LokieHookPolicyAfter = 1 << 1,
    LokieHookPolicyReplace = 1 << 2,
} LokieHookPolicy;

@interface NSObject (Lokie)
+ (BOOL) Lokie_hookMemberSelector:(NSString *) selecctor_name
                           withBlock: (id) block
                              policy:(LokieHookPolicy) policy;

+ (BOOL) Lokie_hookClassSelector:(NSString *) selecctor_name
                                  withBlock: (id) block
                                     policy:(LokieHookPolicy) policy;

-(NSArray*) lokie_errors;
@end
Copy the code

The parameters of both methods are the same, providing slicing support for class and member methods.

  • Selecctor_name: is the name of the selector that you’re interested in, and we can usually get that via the NSStringFromSelector API.
  • Block: is the specific command to execute. The arguments and return values of the block are discussed later.
  • Policy: Specifies that you want to execute a block before and after the selector, or simply override the original method.

Monitoring results

Take a scene to see the power of the Lokie. For example, we want to monitor all page life cycles to see if they work.

For example, the VC base class in the project is called BasePageController, and designated Initializer is @ Selector (initWithConfig).

We’ll leave the test code in the application: didFinishLaunchingWithOptions, AOP is so capricious! Isn’t it cool that we’re monitoring the start and end of the life cycle of all BasePageController objects during app initialization?

Class cls = NSClassFromString(@"BasePageController"); [cls Lokie_hookMemberSelector:@"initWithConfig:" withBlock:^(id target, NSDictionary *param){ NSLog(@"%@", param);  NSLog(@"Lokie: %@ is created", target); } policy:LokieHookPolicyAfter]; [cls Lokie_hookMemberSelector:@"dealloc" withBlock:^(id target){ NSLog(@"Lokie: %@ is dealloc", target); } policy:LokieHookPolicyBefore];Copy the code

The argument definition of a block is very interesting, the first argument is the eternal ID target, the object that the selector is sent, and the rest of the arguments are the same as the selector. For example, “initWithConfig:” has an argument of type NSDNSDictionary

, so we pass ^(id target, NSDictionary) to initWithConfig:

Param), and dealloc has no arguments, so the block becomes ^(id target). In other words, in a block callback, you get the current object and the parameter context to execute the method, which basically gives you enough information.

The return value is also easy to understand. When you replace the original method with LokieHookPolicyReplace, the return value of the block must be the same as the original method. When the other two flags are used, no value is returned and void is used.

Alternatively, we can hook the same method multiple times, such as this:

Class cls = NSClassFromString(@"BasePageController"); [cls Lokie_hookMemberSelector:@"viewDidAppear:" withBlock:^(id target, BOOL ani){ NSLog(@"LOKIE: Before viewDidAppear calls will execute this code ");} policy: LokieHookPolicyBefore]; [cls Lokie_hookMemberSelector:@"viewDidAppear:" withBlock:^(id target, BOOL ani){ NSLog(@"LOKIE: ViewDidAppear call later, you can execute this code ");} policy: LokieHookPolicyAfter];Copy the code

Notice how easy it is to get the execution time of a function if we use a timestamp to record the time before and after.

IOS Development Communication Technology Group: [563513413](Jq.qq.com/? \_wv= 1027&… , share BAT, Ali interview questions, interview experience, discuss technology, we exchange learning and growth together!

The first two simple examples are a starting point. AOP is very powerful in terms of monitoring and logging.

Realize the principle of

The implementation of AOP is based on the iOS Runtime mechanism and trampoline function created by libffi as the core. So I’m going to talk a little bit about iOS Runtime. This part is probably familiar to many of you.

OC Runtime has several basic concepts: SEL, IMP, Method.

SEL

typedef struct objc_selector  *SEL;
typedef id  (*IMP)(id, SEL, ...);

struct objc_method {
    SEL method_name;
    char *method_types;
                IMP method_imp;
} ;
typedef struct objc_method *Method;
Copy the code

Objc_selector is an interesting structure, and I can’t find its definition in the source code. But you can guess the implementation of Objc_Selector by looking through the code. In objc-sel.m, there are two functions as follows:

const char *sel_getName(SEL sel) { if (! sel) return "<null selector>"; return (const char *)(const void*)sel; }Copy the code

Const char * sel_getName = sel_getName; const char * sel_getName = sel_getName;

static SEL __sel_registerName(const char *name, int copy) ; / /! In __sel_registerName there is a method to get SEL directly from const char *name... if (! result) { result = sel_alloc(name, copy); }... / /! Static SEL sel_alloc(const char *name,bool copy) {sellock. assertWriting(); return (SEL)(copy ? strdupIfMutable(name):name); }Copy the code

Looking at this, we can basically assume that the definition of objc_Selector should look something like this:

typedef struct { char selector[XXX]; void *unknown; . }objc_selector;Copy the code

To improve efficiency, selecor lookups are performed by hashing a string to a key, which is more efficient than indexing a string directly.

/ /! Static CFHashCode _objc_hash_selector(const void *v) {if (! v) return 0; return (CFHashCode)_objc_strhash(v); } static __inline__ unsigned int _objc_strhash(const unsigned char *s) { unsigned int hash = 0; for (;;) { int a = *s++; if (0 == a) break; hash += (hash << 8) + a; } return hash; } / /! Static unsigned _mapStrHash(NXMapTable *table, const void *key) {unsigned hash = 0; unsigned char *s = (unsigned char *)key; /* unsigned to avoid a sign-extend */ /* unroll the loop */ if (s) for (; ;) { if (*s == '\0') break; hash ^= *s++; if (*s == '\0') break; hash ^= *s++ << 8; if (*s == '\0') break; hash ^= *s++ << 16; if (*s == '\0') break; hash ^= *s++ << 24; } return xorHash(hash); } static INLINE unsigned xorHash(unsigned hash) { unsigned xored = (hash & 0xffff) ^ (hash >> 16); return ((xored * 65521) + hash); }Copy the code

As for why we made an objc_selector, I think the official point is that SEL and const char are different types.

IMP

IMP is defined as follows:

#if ! OBJC_OLD_DISPATCH_PROTOTYPES typedef void (*IMP)(void /* id, SEL, ... * /); #else typedef id _Nullable (*IMP)(id _Nonnull, SEL _Nonnull, ...) ; #endifCopy the code

OBJC_OLD_DISPATCH_PROTOTYPES was added after LLVM 6.0. You can use objc_msgSend(id self, SEL op,…) if Enable Strict Checking of objc_msgSend Calls is set to NO in the build setting. . This is why objc_msgSend gets an error from the compiler.

Too many arguments to function call, expected 0, have 2
Copy the code

IMP is a function pointer that is the execution instruction entry for the final method call.

Objc_method is critical, and it is the cornerstone of OC’s design for method swizzling at runtime. It uses objc_Method to package function addresses, function signatures, and function names into associations, so that when class methods are actually executed, Find the corresponding IMP by selector name. Similarly, we can accomplish special requirements by replacing an IMP with a selector name at run time.

Message sending mechanism

With these three concepts in mind, let’s move on to the message sending mechanism. We know that there is a key function called objc_msgSend when sending a message to an object. Let’s talk a little bit about what happens in this function.

/ /! Objc_msgSend function definition ID objc_msgSend(id self, SEL op... ;Copy the code

Internally, this function is written in assemblies, which provide corresponding implementation code for different hardware systems. There should be differences between implementations, including function names and implementations (the version I looked at was ObjC4-208).

The first thing objc_msgSend does is check if the message sending object self is empty, and if it is empty, it returns nothing. That’s why sending messages doesn’t crash when the object is nil. Method->method_imp (self. self. isa, self. isa, self. isa, self. isa, self. isa, self. isa, self. isa, self. isa, self. isa, self. isa, self. isa If not, go to the next process and call a function called class_lookupMethodAndLoadCache.

This function is defined as follows:

IMP _class_lookupMethodAndLoadCache (Class cls, SEL sel) { ... if (methodPC == NULL) { //! // Class and superclasses do not respond -- use forwarding SMT = malloc_zone_malloc (_objc_create_zone(), sizeof(struct objc_method)); smt->method_name = sel; smt->method_types = ""; smt->method_imp = &_objc_msgForward; _cache_fill (cls, smt, sel); methodPC = &_objc_msgForward; }... }Copy the code

Message forwarding mechanism dynamic method analysis, backup receiver, message redirection should be many interviewers like to ask:), I think everyone must be familiar with this part of the content, I will not repeat here.

The implementation of trampline function

Next, we will briefly introduce how to implement a Trampline function to complete c function-level function forwarding from the perspective of assembly. The x86 instruction set, for example, works similarly for other types.

From an assembly point of view, the most direct way to jump a function is to insert JMP instructions. Each instruction in the x86 instruction set has its own instruction length, such as the JMP instruction, which is 5 bytes long and contains a one-byte instruction code with a 4-byte relative offset. Suppose we have two functions, A and B, and if we want the call from B to be forwarded to A, the JMP directive will no doubt help. The next problem we need to solve is how to calculate the relative offset of the two functions. We can think of it this way, but when the CPU encounters the JMP, it performs the action IP = IP + 5 + relative offset.

To explain this more directly, let’s look at the following assembler function (don’t worry if you’re not familiar with assembly, this function does nothing but jump).

You can also do this with me, starting with a jump_test.s that defines a function that doesn’t do anything.

Void jump_test(){return; jump_test(){return; }.

.global _jump_test _jump_test: JMP jlable #! Nop nop nop jlable: rep; retCopy the code

Next, we create a C file: in this file, we call the jump_test function we just created.

#include <stdio.h>
extern void jump_test();
int main(){
    jump_test();
}
Copy the code

Finally, to compile the link, we create a build.sh to generate the executable file Portal.

#! /bin/sh
cc -c  -o main.o main.c 
as -o jump_test.o jump_test.s 
cc -o  portal main.c jump_test.o
Copy the code

We use LLDB to load the prtal file we just generated for debugging, and put the breakpoint on the function jump_test.

lldb ./portal
b jump_test
r
Copy the code

On my machine, it is the following jump address, your address may be different from mine, but it doesn’t matter, it doesn’t affect our analysis.

Process 22830 launched: './portal' (x86_64) Process 22830 stopped * thread #1, queue = 'com.apple.main-thread', Stop Reason = breakpoint 1.1 Frame #0: 0x0000000100000F9F Portal 'jump_test Portal' jump_test: -> 0x100000F9F <+0>: jmp 0x100000fa7 ; jlable 0x100000fa4 <+5>: nop 0x100000fa5 <+6>: nop 0x100000fa6 <+7>: nopCopy the code

At this point in the demo, we managed to see something we wanted from an assembly perspective.

First of all, the current IP is 0x100000f9f, and the jLable used in our assembly has now been computed to become the new target address (0x100000FA7). As we know, the new IP is calculated by adding the offset to the current IP. The instruction length of JMP is 5, as we explained earlier. So we can know the following relationship:

new_ip = old_ip + 5 + offset;
Copy the code

Add the address obtained from the LLDB and it becomes:

0x100000fa7 = 0x100000f9f + 5 + offset ==> offset = 3.
Copy the code

Looking back at the assembly code, we used three NOPs in our code, one byte each, just after the jump to three NOPs. After a simple verification, we transform this equation so that offset = new_ip-old_ip-5; Once we know the functions A and B, it’s easy to figure out what the operands of JMP are.

At this point, the idea of A function jump is pretty clear, we want to actually jump to B when we call A. For example, we have a C API, and we want every time we call this API, we actually jump to our custom function. We need to modify the first few bytes of this API, and directly JMP into our custom function. The first five bytes are, of course, the JMP opcode, and the last four bytes are our calculated offsets.

Finally give a complete example. Assembler analysis and C code come packaged together.

#include <stdio.h> #include <mach/mach.h> int new_add(int a, int b){ return a+b; } int add(int a, int b){ printf("my_add org is called! \n"); return 0; } typedef struct{ uint8_t jmp; uint32_t off; } __attribute__((packed)) tramp_line_code; void dohook(void *src, void *dst){ vm_protect(mach_task_self(), (vm_address_t)src, 5, 0, VM_PROT_ALL); tramp_line_code jshort; jshort.jmp = 0xe9; jshort.off = (uint32_t)(long)dst - (uint32_t)(long)src - 0x5; memcpy(my_add, (const void*)&jshort, sizeof(tramp_line_code)); vm_protect(mach_task_self(), (vm_address_t)src, 5, 0, VM_PROT_READ|VM_PROT_EXECUTE); } int main(){ dohook(add, new_add); int c = add(10, 20); / /! Printf ("res is %d\n", c); printf("res is %d\n", c); return 0; }Copy the code

Compiling scripts (macOS) :

GCC -o portal. /main.c Run the following command:./portal The command output is res is 30Copy the code

At this point, the function call has been successfully forwarded.