preface

The traditional crawler on the mobile end is generally based on webView. By injecting JS, it obtains cookies after login and lets the server use headless browser to simulate the login state to crawl data.

This approach is simple and effective, but it is difficult to climb or even impossible to climb for sites that have reverse crawling (IP restrictions, whether emulators are in an abnormal environment or not).

Business driven technology has experienced four stages in the evolution of crawler on mobile terminal

  • Cookie crawler (early sites didn’t have crawler mechanisms)
  • Cookie + local crawl (some websites are crawling backwards and cannot be crawled through the server, so local HTML parsing is obtained)
  • Cookie + local crawl + Screenshot authentication (For serious anti-crawl services that cannot be collected through webView login, skip APP and SCREENSHOT OCR are used to crawl)
  • Tensorflow + BroadCast Extension (Obtain real-time interface content through screen recording, and obtain key page content through image recognition by Tensorflow, ignoring anti-crawl mechanism)

What is BroadCast Upload Extension?

BroadCast Upload Extension came out in iOS10, when you could only in-app BroadCast, that is, record the current APP.

In WWDC2018, apple released ReplayKit2 to update this extension, which can do the iOS System BroadCast, which is to record the iOS System interface, not limited to an APP, but need to call up the recording from the control center, still some trouble.

In iOS12, iOS and introduced RPSystemBroadcastPickerView class, can arouse the control center by pressing the button in the APP to record screen choose interface.

Client flow:

Architecture:

  • ** screen recording plug-in: **iOS native screen recording plug-in, get the latest screen recording frame, passed to the middleware.
  • ** Middleware: ** Saves the latest frame content passed by the plug-in, is responsible for the processing of frame objects, message sending (heartbeat request, frame request).
  • **APP: ** is responsible for classifying the received image objects with classify, and step matching according to the results (whether server OCR is required, what is the next key page).

The plug-in side keeps sending the latest video frame to the middleware. The middleware obtains the latest frame after the completion of the last POST request, converts it into a picture object and sends it. The setting field in the configuration file controls the dealy time between the completion of the last POST request and the next picture conversion.

Image compression is performed after the frame is processed into an image (TensorFlow also performs compression before the image is detected, which is removed in advance) to ensure the size and speed of post requests.

To keep the memory footprint as low as possible (the maximum available memory for screen recording is 50MB, which will collapse), control the frequency of image conversion, compress image requests, and put image conversion into AutoReleasepool.

Screen recording plug-in side

The screen recording plug-in uses the BroadCast Upload Extension of the iOS system, which can be added in project-Target + Application Extension.

After the BroadCast target is added to the project, a SampleHandler class is added to receive callbacks from the system screen recording plug-in.

#import "sampleHandler. h" @interface SampleHandler() @end@implementation SampleHandler (void) broadcastStartedWithSetupInfo: (NSDictionary < > nsstrings *, NSObject * *) setupInfo {NSLog (@ "APP screen start"); } / / record the screen switch iOS APP > 11.2 (void) broadcastAnnotatedWithApplicationInfo applicationInfo (NSDictionary *) { NSLog(@" Switch APP in recording "); } // broadcastFinished - (void)broadcastFinished {NSLog(@" broadcastFinished "); ProcessSampleBuffer :(CMSampleBufferRef)sampleBuffer withType:(RPSampleBufferType)sampleBufferType { Switch (sampleBufferType) {case RPSampleBufferTypeVideo: / / record the screen image information callback / / here will get image information is passed to the middle class to handle break; Case RPSampleBufferTypeAudioApp: callback / / / / audio recording screen information Handle audio sample buffer for app audio break; Case RPSampleBufferTypeAudioMic: / / screen to record voice input information callback / / Handle audio sample buffer for mic audio break; default: break; } } @endCopy the code

This section does very little but simply passes the captured video frame information to the middle class and refreshes the last frame saved by the middle class.

The middleware

Middleware is responsible for doing a lot of things, state synchronization, image conversion and memory footprint.

  • CMSampleBufferRef to UIImage object
  • Control the frequency of buffer to image
  • Send the image to the main APP via HTTP request
  • Send a heartbeat packet to tell the APP plug-in is alive
// // MXSampleBufferManager. M // // buffer转UIImage object // Created by Joker on 2018/9/18. // Copyright © 2018 Scorpion. All  rights reserved. // #import "MXSampleBufferManager.h" #import <VideoToolbox/VideoToolbox.h> #define clamp(a) (a>255? 255:(a<0? 0:a)) @implementation MXSampleBufferManager + (UIImage*)getImageWithSampleBuffer:(CMSampleBufferRef)sampleBuffer{ CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer); CVPixelBufferLockBaseAddress(imageBuffer,0); size_t width = CVPixelBufferGetWidth(imageBuffer); size_t height = CVPixelBufferGetHeight(imageBuffer); uint8_t *yBuffer = CVPixelBufferGetBaseAddressOfPlane(imageBuffer, 0); size_t yPitch = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 0); uint8_t *cbCrBuffer = CVPixelBufferGetBaseAddressOfPlane(imageBuffer, 1); size_t cbCrPitch = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 1); int bytesPerPixel = 4; uint8_t *rgbBuffer = malloc(width * height * bytesPerPixel); for(int y = 0; y < height; y++) { uint8_t *rgbBufferLine = &rgbBuffer[y * width * bytesPerPixel]; uint8_t *yBufferLine = &yBuffer[y * yPitch]; uint8_t *cbCrBufferLine = &cbCrBuffer[(y >> 1) * cbCrPitch]; for(int x = 0; x < width; x++) { int16_t y = yBufferLine[x]; int16_t cb = cbCrBufferLine[x & ~1] - 128; int16_t cr = cbCrBufferLine[x | 1] - 128; uint8_t *rgbOutput = &rgbBufferLine[x*bytesPerPixel]; Roundf (y + cr * 1.4); roundf(y + cr * 1.4); roundf(y + cr * 1.4); Roundf (y + cb * -0.34 + cr * -0.34); roundf(y + cb * -0.34 + cr * -0.33); Roundf (y + cb * 1.762); roundf(y + cb * 1.762); roundf(y + cb * 1.762); rgbOutput[0] = 0xff; rgbOutput[1] = clamp(b); rgbOutput[2] = clamp(g); rgbOutput[3] = clamp(r); } } CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB(); CGContextRef context = CGBitmapContextCreate(rgbBuffer, width, height, 8, width * bytesPerPixel, colorSpace, kCGBitmapByteOrder32Little | kCGImageAlphaNoneSkipLast); CGImageRef quartzImage = CGBitmapContextCreateImage(context); UIImage *image = [UIImage imageWithCGImage:quartzImage]; CGContextRelease(context); CGColorSpaceRelease(colorSpace); CGImageRelease(quartzImage); free(rgbBuffer); CVPixelBufferUnlockBaseAddress(imageBuffer, 0); return image; } @endCopy the code

The main APP side

The main APP side is more business related operations

  • Enable HTTP Serve to receive requests (GCDAsyncSocket)
  • Obtain the image for TensorFlow identification, and output the result of classify.
  • The match is performed based on the identification result and the obtained configuration information, and the target key frame is uploaded to the server OCR
  • Screen recording plug-in survival detection

How does the model train?

Reference links: codelabs.developers.google.com/codelabs/te… You can use the training project provided on the official website to simply train the model. Demo project will bring a trained wechat model.

Use effect

Demo

Github.com/yushengchu/…