YYKit series source code analysis article:

  • YYText source code analysis: CoreText and asynchronous drawing
  • YYModel source code analysis: focus on performance
  • YYCache source code analysis: highlights
  • YYImage source code analysis: image processing skills
  • YYAsyncLayer source code analysis: asynchronous drawing
  • YYWebImage source code analysis: thread processing and caching strategy

The introduction

Performance optimization has always been a big part of iOS development, where interface fluency optimization is critical because it is directly related to the user experience. From the most familiar and simple UIKit frameworks to CoreAnimation, CoreGraphics, CoreText, and even OpenGL, optimizations seem endless and test the developer’s skill.

YYAsyncLayer is an asynchronous drawing wheel written by Ibireme. Although the total code is only about 300 lines, the quality is relatively high and involves a lot of optimization thinking, which is worth learning.

Perhaps a lot of people who learn good source code fall into the trap of reading it without understanding it.

Instead of reading the code at face value, we should think more about why the author wrote what he wrote. Because it’s easy to read the API, this shouldn’t be the first thing to read the source code. The level of concern naturally determines the height of the developer.

Source code based on version 1.0.0.

I. Overview of the framework

YYAsyncLayer library code is very clear, just a few files:

YYAsyncLayer.h (.m)
YYSentinel.h (.m)
YYTransaction.h (.m)
Copy the code
  • The YYAsyncLayer class inherits from CALayer, but the author encapsulates asynchronous drawing logic for ease of use.
  • The YYSentinel class is a counting class designed to keep track of the most recent layout request identifiers so that unnecessary drawing logic can be discarded in a timely manner to reduce overhead.
  • The YYTransaction class is a transaction class that captures a timing callback to the main thread runloop to handle asynchronous draw events.

This may confuse some readers, but it doesn’t matter, as the code details will be dissected later, just to get a general idea of the framework.

A quick look at the source code shows that the framework is simply used using a subclass of CALayer called YYAsyncLayer. (You need to implement the proxy method specified by YYAsyncLayer class to manage the whole drawing process, and you can see the README of the framework for detailed use of the method)

Second, why asynchronous drawing?

1, the essence of interface caton

Every time the display on iOS devices draws a frame, a VSync signal will be sent when reset, and the frame buffer will be switched (iOS devices are double cache + VSync). At the same time of drawing by reading the frame buffer data rendered by GPU, it will also notify APP internal that it can submit the result to another idle frame buffer through mechanisms such as CADisplayLink. Then THE CPU calculates the APP layout, submits the calculation to GPU for rendering, and submits the rendering to the frame buffer. When VSync arrives again, switch the frame buffer…… (PS: The above description is the author’s understanding, refer to iOS to keep the interface smooth skills)

When VSync arrives to switch the frame buffer, if the idle frame buffer does not receive a submission from the GPU, the switch will be aborted and the device display system will abandon the drawing, causing the frame to drop.

Therefore, no matter whether the CPU or GPU fails to submit the rendering result to the frame buffer in time, the frame will be dropped. Optimizing interface fluency is essentially reducing the number of frames (roughly 60 FPS on iOS devices), which reduces CPU and GPU stress to improve performance.

2. UIKit performance bottleneck

Most UIKit component drawing is carried out in the main thread, need the CPU to draw, when one time too many components need to draw elements or components are too complex, will inevitably bring pressure to the CPU, this time it’s easy to frame drop (mainly is the text control, a large number of text content of calculation and drawing process is quite complicated).

3. UIKit alternatives: CoreAnimation or CoreGraphics

Of course, the preferred optimization is the CoreAnimation framework. Most of CALayer’s attributes are drawn by the GPU (hardware level) without any CPU (software level) drawing. CAShapeLayer (polygon rendering), CATextLayer(text rendering), CAGradientLayer (gradient rendering) under the CA framework have high efficiency, very practical.

Take a look at the CoreGraphics framework, which is actually cpu-based software drawing. In implementing the CALayerDelegate protocol -drawLayer:inContext: Method (equivalent to UIView’s double-wrapped -drawRect: method) needs to allocate a high-memory context, and at the same time, CALayer or its subclasses need to create an equally large contents. When the CPU-based software is drawn, it also needs to be transmitted to the device display system via IPC (interprocess communication). Note that this context needs to be erased to reallocate memory when redrawing.

Whether it’s context creation, memory reallocation due to redraw, or IPC, there are significant performance costs. Therefore, the performance of CoreGraphics is relatively poor, so we should try to avoid using it directly in the main thread in daily development. In general, assigning CGImage images directly to CALayer’s contents or using CALayer’s derived classes can fulfill most of the requirements and make full use of hardware support. Of course, image processing can be trusted to GPU.

4. The possibility of multi-core equipment

From the above, you can see the poor performance of CoreGraphics. The good news is that devices on the market are no longer single-core, which means that time-consuming tasks can be handled by background threads, with the main thread only responsible for scheduling displays.

Ps: On the multi-core device thread performance problem, the following analysis of the source code will be told

The CoreGraphics framework makes it possible to draw content into a bitmap using an image context, and this operation can be performed on a non-main thread. So, when there are n drawing tasks, multiple threads can be opened to draw asynchronously in the background, and the drawing succeeds in taking the position map back to the host map attribute assigned to CALayer by the main program.

This is the core idea of the YYAsyncLayer framework, which has other highlights that will be explained later.

Although asynchronous drawing by multiple threads consumes a lot of memory, for performance-sensitive interfaces, interaction fluency can be greatly improved as long as engineers control the peak memory. Optimization is often space for time, the so-called fish and bear can not have it both ways. This also shows a problem, the actual development to do targeted optimization, not blindly follow the trend.

Third, YYSentinel

This class is very simple:

.h
@interface YYSentinel : NSObject
@property (readonly) int32_t value;
- (int32_t)increase;
@end

.m
@implementation YYSentinel { int32_t _value; }
- (int32_t)value { return _value; }
- (int32_t)increase { return OSAtomicIncrement32(&_value); }
@end
Copy the code

The -increase method increments value using the OSAtomicIncrement32() method.

OSAtomicIncrement32() is an atomic increment method, thread-safe. In everyday development, if you need to keep shaping numeric variables thread-safe, you can use methods under the OSAtomic framework, which tend to perform better than using various “locks” and have elegant code.

The actual use of this class will be explained later.

Four, YYTransaction

YYTransaction looks a lot like the system’s CATransaction in that they are both “transactions,” but in reality they are very different. Through the nesting method of CATransaction, we guess that CATransaction uses a stack structure to manage tasks, and YYTransaction uses a collection to manage tasks.

What YYTransaction does is record a series of events and invoke them when appropriate. As to why, you need to understand what YYTransaction does first, and eventually you will see 😁.

1. Submit tasks

YYTransaction has two properties:

@interface YYTransaction()
@property (nonatomic, strong) id target;
@property (nonatomic, assign) SEL selector;
@end
static NSMutableSet *transactionSet = nil;
Copy the code

Simple, method receiver (target) and method (selector), a YYTransaction is actually a task, and the global transactionSet is used to store these tasks. The commit method is nothing more than the initial configuration and loads the task into the collection.

2. The right time to pull back

static void YYTransactionSetup() {
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        transactionSet = [NSMutableSet new];
        CFRunLoopRef runloop = CFRunLoopGetMain();
        CFRunLoopObserverRef observer;
        observer = CFRunLoopObserverCreate(CFAllocatorGetDefault(),
                                           kCFRunLoopBeforeWaiting | kCFRunLoopExit,
                                           true,      // repeat
                                           0xFFFFFF,  // after CATransaction(2000000)
                                           YYRunLoopObserverCallBack, NULL);
        CFRunLoopAddObserver(runloop, observer, kCFRunLoopCommonModes);
        CFRelease(observer);
    });
}
Copy the code

An Oberver listener is added to the main thread’s RunLoop, which is called back to kCFRunLoopBeforeWaiting and kCFRunLoopExit when the main thread RunLoop is about to go to sleep or exit. The priority of the Oberver is 0xFFFFFF, after the priority of the CATransaction. I didn’t find any Revelations).

As you can see, the author uses a “low profile” to hack into the main RunLoop and do the asynchronous drawing after processing the important logic (i.e., the drawing task managed by CATransaction), which is also a priority trade-off.

Here’s what happens inside the callback:

static void YYRunLoopObserverCallBack(CFRunLoopObserverRef observer, CFRunLoopActivity activity, void *info) {
    if (transactionSet.count == 0) return;
    NSSet *currentSet = transactionSet;
    transactionSet = [NSMutableSet new];
    [currentSet enumerateObjectsUsingBlock:^(YYTransaction *transaction, BOOL *stop) {
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Warc-performSelector-leaks"
        [transaction.target performSelector:transaction.selector];
#pragma clang diagnostic pop
    }];
}
Copy the code

At a glance, you just perform the tasks in the collection separately.

3. Customize the Hash algorithm

The YYTransaction class overrides the hash algorithm:

- (NSUInteger)hash {
    long v1 = (long)((void *)_selector);
    long v2 = (long)_target;
    return v1 ^ v2;
}
Copy the code

The default hash value for the NSObject class is the memory address in base 10, where the author does a bit xor of the memory address of the _selector and _target, meaning that as long as the _selector and _target addresses are the same, the hash value is the same.

What’s the point of this?

There is a collection mentioned above:

static NSMutableSet *transactionSet = nil;
Copy the code

Nssets, like other programming languages, are hash-based collections that cannot have repeating elements, and repeating is a no-doubt hash. Here the hash value of YYTransaction depends on the memory address of _selector and _target, which means two things:

  1. For the same YYTransaction instance,_selectorand_targetAs long as one memory address is different, it will be represented as two values in the collection.
  2. Different YYTransaction instances,_selectorand_targetAll memory addresses are the same and are represented as a value in the collection.

Readers familiar with Hash should be familiar, so what is the business purpose of doing this?

Quite simply, this avoids repeated method calls. Events added to the transactionSet are iterated when the Runloop is about to go to sleep or exit, and the same method receiver (_target) and the same method (_selector) are considered repeated calls within a Runloop cycle.

Here’s an example:

In YYTextView of YYText, the main purpose is to load the custom draw logic into transactionSet and then execute it uniformly when Runloop is about to end. The priority of Runloop callback avoids competing with system draw logic for resources. NSSet is used to merge multiple draw requests for a Runloop cycle into one.

Fifth, YYAsyncLayer

@interface YYAsyncLayer : CALayer
@property BOOL displaysAsynchronously;
@end
Copy the code

YYAsyncLayer inherits from CALayer and exposes a method that can be turned on or off asynchronously.

1. Perform initial configuration

- (instancetype)init {
    self = [super init];
    static CGFloat scale; //global
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        scale = [UIScreen mainScreen].scale;
    });
    self.contentsScale = scale;
    _sentinel = [YYSentinel new];
    _displaysAsynchronously = YES;
    return self;
}
Copy the code

Here, the contentsScale of YYAsyncLayer is set to the scale of the screen, which is a physical pixel/logical pixel, so that the display resolution of different devices can be fully used to draw a clearer image. However, if contentsGravity sets a stretchable type, CoreAnimation will override the contentsScale.

A YYSentinel instance is also created.

@ 2 x and @ 3 x chart

In fact, the scale of iPhone4 and above is 2 or above, which means at least two physical pixels for each logical pixel length. So a lot of artists will only cut @2x and @3x to you, not 1x.

The @2x and @3x maps are a mechanism that Apple uses to optimize the display, so if the iPhone device is scale 2 it will read the @2x map first, and scale 3 it will read the @3x map first, which means, CALayer’s contentsScale must correspond to the scale of the device to achieve the desired effect (different devices display the same logical pixel size).

Fortunately, UIView and UIImageView handle the contentsScale of their internal CALayer by default, so you don’t explicitly configure the contentsScale unless you use CALayer and its derivatives directly.

Overwrite drawing method

- (void)setNeedsDisplay {
    [self _cancelAsyncDisplay];
    [super setNeedsDisplay];
}
- (void)display {
    super.contents = super.contents;
    [self _displayAsync:_displaysAsynchronously];
}
Copy the code

You can see that there are two methods, -_cancelasyncdisplay, which cancel the drawing and parse the implementation logic later; -_displayAsync is the core method for asynchronous drawing.

2. YYAsyncLayerDelegate proxy

@protocol YYAsyncLayerDelegate <NSObject>
@required
- (YYAsyncLayerDisplayTask *)newAsyncDisplayTask;
@end
Copy the code
@interface YYAsyncLayerDisplayTask : NSObject
@property (nullable, nonatomic, copy) void (^willDisplay)(CALayer *layer);
@property (nullable, nonatomic, copy) void (^display)(CGContextRef context, CGSize size, BOOL(^isCancelled)(void));
@property (nullable, nonatomic, copy) void (^didDisplay)(CALayer *layer, BOOL finished);
@end
Copy the code

YYAsyncLayerDisplayTask is a drawing task management class, which can be used by willDisplay and didDisplay callbacks to draw and finish drawing, and most importantly display, which needs to implement this code block and write the business drawing logic inside the code block.

The proxy is essentially a bridge between the framework and the business interaction, but this design is a bit redundant in my opinion, where it might seem more comfortable to interact with the business directly through the proxy method rather than using intermediate classes.

3. The core logic of asynchronous drawing

Deleted part of the code:

- (void)_displayAsync:(BOOL)async { __strong id<YYAsyncLayerDelegate> delegate = self.delegate; YYAsyncLayerDisplayTask *task = [delegate newAsyncDisplayTask]; . dispatch_async(YYAsyncLayerGetDisplayQueue(), ^{if (isCancelled()) return;
            UIGraphicsBeginImageContextWithOptions(size, opaque, scale);
            CGContextRef context = UIGraphicsGetCurrentContext();
            task.display(context, size, isCancelled);
            if (isCancelled()) {
                UIGraphicsEndImageContext();
                dispatch_async(dispatch_get_main_queue(), ^{
                    if (task.didDisplay) task.didDisplay(self, NO);
                });
                return;
            }
            UIImage *image = UIGraphicsGetImageFromCurrentImageContext();
            UIGraphicsEndImageContext();
            if (isCancelled()) {
                dispatch_async(dispatch_get_main_queue(), ^{
                    if (task.didDisplay) task.didDisplay(self, NO);
                });
                return;
            }
            dispatch_async(dispatch_get_main_queue(), ^{
                if (isCancelled()) {
                    if (task.didDisplay) task.didDisplay(self, NO);
                } else {
                    self.contents = (__bridge id)(image.CGImage);
                    if(task.didDisplay) task.didDisplay(self, YES); }}); }); . }Copy the code

First don’t tube YYAsyncLayerGetDisplayQueue () method of how to obtain the asynchronous queue, no need to worry about first isCancelled () do some early end rendering logic judgment, can speak these behind.

Well, there could actually be less core code:

- (void)_displayAsync:(BOOL)async { ... dispatch_async(YYAsyncLayerGetDisplayQueue(), ^{ UIGraphicsBeginImageContextWithOptions(size, opaque, scale); CGContextRef context = UIGraphicsGetCurrentContext(); task.display(context, size, isCancelled); UIImage *image = UIGraphicsGetImageFromCurrentImageContext(); UIGraphicsEndImageContext(); dispatch_async(dispatch_get_main_queue(), ^{ self.contents = (__bridge id)(image.CGImage); }); }]; . }Copy the code

At this point, it is clear that a bitmap context is created in the asynchronous thread, the display code block of task is called to draw (business code), and then a bitmap is generated. Finally, it enters the main queue and assigns CGImage to contents of YYAsyncLayer. After rendering by GPU, CGImage is submitted to the display system.

4. End useless drawing in time

For the same YYAsyncLayer, it is very likely that a new drawing request will come when the current drawing task is not completed, and the current drawing task is useless and will continue to consume excessive CPU (GPU) resources. Of course, this scenario mainly occurs when the list interface is quickly scrolling, and redrawing requests are very frequent due to the reuse mechanism of the view.

To solve this problem, the author uses a lot of judgment to end useless drawing in time. If you look at the source code or the asynchronous drawing core logic code posted above, you will find a frequent operation:

if(isCancelled()) {... }Copy the code

Look at the implementation of this code block:

YYSentinel *sentinel = _sentinel;
int32_t value = sentinel.value;
BOOL (^isCancelled)(void) = ^BOOL() {
  returnvalue ! = sentinel.value; };Copy the code

This is where the YYSentinel count class comes in. Here, a local variable value is used to keep the count of the current draw logic, ensuring that any other thread that changes the value of the global _sentinel variable does not affect the current value. If the current value is not equal to the latest _sentinel. value, it indicates that the current drawing task has been abandoned, and you need to return the logic in time.

So, when do we change this count?

- (void)setNeedsDisplay {
    [self _cancelAsyncDisplay];
    [super setNeedsDisplay];
}
- (void)_cancelAsyncDisplay {
    [_sentinel increase];
}
Copy the code

Obviously, when a redraw request is submitted, the counter increments by one.

😁 I have to say, this is a really exciting optimization technique.

5, asynchronous thread management

I have removed the code to determine whether the YYDispatchQueuePool library exists, which is actually a queue management wrapper extracted by the author, with the same idea as the following code.

static dispatch_queue_t YYAsyncLayerGetDisplayQueue() {// Maximum number of queues#define MAX_QUEUE_COUNT 16Static int queueCount; Static dispatch_queue_t queues[MAX_QUEUE_COUNT]; static dispatch_once_t onceToken; static int32_t counter = 0; QueueCount = (int)[NSProcessInfo processInfo]. ActiveProcessorCount; queueCount = queueCount < 1 ? 1 : queueCount > MAX_QUEUE_COUNT ? MAX_QUEUE_COUNT : queueCount; // Tip 2: Create a serial queue and set the priorityif([UIDevice currentDevice]. SystemVersion. FloatValue > = 8.0) {for (NSUInteger i = 0; i < queueCount; i++) {
                dispatch_queue_attr_t attr = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_USER_INITIATED, 0);
                queues[i] = dispatch_queue_create("com.ibireme.yykit.render", attr); }}else {
            for (NSUInteger i = 0; i < queueCount; i++) {
                queues[i] = dispatch_queue_create("com.ibireme.yykit.render", DISPATCH_QUEUE_SERIAL); dispatch_set_target_queue(queues[i], dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)); }}}); Int32_t cur = OSAtomicIncrement32(&counter);if (cur < 0) cur = -cur;
    return queues[(cur) % queueCount];
#undef MAX_QUEUE_COUNT
}
Copy the code
Point 1: The number of serial queues is equal to the number of processors

First of all, understand the difference between concurrency and parallelism: parallelism must be concurrent, but concurrency does not have to be parallel. On a single-core device, the CPU runs different threads by switching context frequently, fast enough that it appears to be ‘parallel’ processing, but we can only say that this is concurrent rather than parallel. For example: You’re running a 100-meter race with two people, and you keep switching lanes while the other two are in their own lane, and finally, all three of you reach the finish line at the same time. We see the runway as a task, so the other two are doing it in parallel, and you’re doing it in parallel.

So, in fact, an N core device can execute at most n tasks in parallel at any one time, which means that at most n threads are not competing with each other for CPU resources.

When you have more threads than the number of processor cores, you can actually have parallel threads competing for the same processor’s resources, and frequent context switching can also consume processor resources.

Therefore, the author believes that threads exceeding the number of processor cores have no advantage in processing speed, but are easy to manage in business and can maximize the utilization of processor resources.

However, there is only one thread in the serial queue. In this framework, the author uses the same number of serial queues as the processor core to poll asynchronous tasks, which effectively reduces thread scheduling operations.

Tip 2: Create serial queues and set priorities

On systems larger than 8.0, the priority of queues is QOS_CLASS_USER_INITIATED, lower than QOS_CLASS_USER_INTERACTIVE, which is related to user interaction.

On systems below 8.0, the dispatch_set_target_queue() function sets the priority to DISPATCH_QUEUE_PRIORITY_DEFAULT(the second parameter forces all threads we created to perform tasks serially if using serial queues).

The priority of the main queue can be guessed to be greater than or equal to QOS_CLASS_USER_INTERACTIVE, so that these serial queues have a lower priority than the main queue to avoid contention for resources between threads created by the framework and the main thread.

The corresponding relationship between the priorities of the two types is as follows:

 *  - DISPATCH_QUEUE_PRIORITY_HIGH:         QOS_CLASS_USER_INITIATED
 *  - DISPATCH_QUEUE_PRIORITY_DEFAULT:      QOS_CLASS_DEFAULT
 *  - DISPATCH_QUEUE_PRIORITY_LOW:          QOS_CLASS_UTILITY
 *  - DISPATCH_QUEUE_PRIORITY_BACKGROUND:   QOS_CLASS_BACKGROUND
Copy the code
Point 3: Polling returns to the queue

Incrementing the local static variable counter with the atomic increment function OSAtomicIncrement32(), then polling back to the queue by modulo operations.

Note the use of a judgment: if (cur < 0) cur = -cur; When cur increases out of bounds, it becomes the maximum negative number (in binary terms, the negative number is represented by the inverse plus one of a positive integer).

Why use N serial queues for concurrency

One might wonder why we need n serial queues for scheduling instead of one parallel queue.

Because parallel queues cannot accurately control the number of threads, it is possible to create too many threads, causing CPU threads to be scheduled too frequently, affecting interaction performance.

It might be tempting to use a semaphore (dispatch_semaphore_t) to control concurrency, however this only controls the number of concurrent tasks, not threads, and is not very elegant to use. With serial queues, it’s easy to know exactly how many threads you’re creating, and everything is under control.

This is the core idea of YYKit for thread processing.

conclusion

I don’t know if you get a sense of what YYAsyncLayer’s 300 or so lines of code cover. In fact, learning a good source code requires a lot of knowledge in the process of understanding and learning the source code, which is also the value of good source code.

Take a dive into the art of code.