background

On August 9, 2018, Black Hat USA, the world’s top security conference, was held at the Mandalay Bay Convention Center in Las Vegas, the United States. This annual event has a long history of 21 years and is recognized as one of the world’s top information security industry events.

As a security team with years of practical experience and technical accumulation in the field of domestic security, Meituan Security Research Institute was once again invited to participate in this event. Under the strict screening condition that the pass rate of the topic is less than 20%, Meituan safety engineer Ju Zhu and his partner brought the topic of this speech “Art of Dancing with Shackles: Best Practice of App Store Malware Automatic Hunting System “, honored on Black Hat USA 2018 International Stage.

Keynote speakers

Ju Zhu currently works in Meituan Basic RESEARCH Platform/Information Security Center/Basic Research Group. He has more than 8 years of security research experience, of which about 6 years have been engaged in advanced threat-related research, including 0Day, nDay and vulnerability mining. Ju Zhu has been dedicated to using automated systems to Hunt for advanced threats in the wild, and has won CVE for many times, and has been repeatedly thanked by Google, Apple, Facebook and other manufacturers. As a Speaker, he has been invited to attend top security conferences at home and abroad, such as BlackHat and CSS.

Issues in interpretation

Apple’s iOS system is one of the most secure among all the popular operating systems, and because of this, it is also an important target and research object of hackers. Although the attack is difficult, but once successful, the commercial value is very high, so the iOS system is “favored” by hackers.

Because Apple’s business model is special, and iOS is not an open source system, and Apple attaches great importance to security, it has carried out a thorough security design for iOS, which makes the security of iOS far superior to other operating systems. The security community has faced difficulties and challenges in automating the large-scale capture of advanced threats to the platform, while defensive detection solutions have not been able to obtain sufficient permissions or even information to detect deep-level attacks when the end user is exposed to true APT attacks such as PEGASUS.

This topic made breakthrough progress in this direction after in-depth research. Successfully designed a system that automatically obtains large-scale application samples, and realized a low-cost and scalable safe sandbox automatic analysis system by building clusters based on Raspberry Pi. Finally, the APT attack capture and analysis system of automatic sample collection and automatic security analysis is realized.

First, let’s take a look at the overall architecture of the system.

Overall System Architecture

In fact, the entire iOS malware Hunt system is basically divided into two distinct parts:

  • The first part, the App Crawl system, is mainly used to collect newly released or existing apps from the App Store. Of course, as a source of infection, we also collect applications from third-party App Stores and even public malware repositories (such as Virus Total) to enhance our malware database. In addition to apps, other potentially malicious file types (such as profiles) are also targeted. (By the way, you can refer to my article titled “Death Profile” on “wild iOS Profile attacks” in BlackHat Asia 2018.)

  • The other part, the sandbox analysis system, mainly tracks the application behavior dynamically and correlates the behavior log according to the rule decision engine to give the final result. Sandbox systems actually come in different types, including frida-based iOS real devices, user mode emulated ARM servers (such as the Raspberry Pi system), and full system emulated VMS.

System structure

Specifically, the whole system is mainly composed of five modules.

  1. Automatic Crawl system: Automatically crawls and crawls apps on all App markets, including the App Store and other third-party markets, successfully implementing automatic login, purchase and download of apps through reverse analysis.
  2. App Crack system: Decrypts apps downloaded from the App Store for sandbox dynamic behavior analysis.
  3. Sandbox analysis system: Breaking through the traditional system design based on real machine (iOS device) sandbox, the Raspberry Pi mode and QEMU mode are innovated to use low-cost and scalable cluster mode to dynamically monitor the running behavior of applications, such as File, Network, XPC, IOKit and Profiled.
  4. Dynamic tracking behavior system: mainly used to collect various monitoring behavior logs of running samples in sandbox system.
  5. Decision engine system: Based on the open source Nools system, real-time or non-real-time monitoring logs are used to judge sample behavior.

So how do they work effectively?

System operation process

  • Firstly, the automatic crawler system is used to construct corresponding login, purchase and download operations, and the application program is captured from iTunes server and sent to Crack system. Crack then decrypts Apple’s DRM and generates IPA files that run on jailbroken devices and emulators.
  • Then, to build the IPA runtime environment and sandbox analysis system, we introduced two solutions, the first is the traditional analysis of these applications on a real jailbroken device; The second is the innovative use of Raspberry Pi-based emulator clusters to run and analyze applications.
  • Finally, the Frida framework based on open source is used to dynamically track the behavior of each IPA application through customized development, and then the decision engine is used to check whether IPA is a malicious application and whether APT attacks may exist.

Below, we’ll elaborate on how each module works based on its breakdown.

First, the Crawl of an App Store App is basically a graybox reverse engineering of the iTunes protocol.

After studying, we found that the basic steps to achieve App Store Crawl through iTunes on a PC host include the following:

  • The first step is to grab Meta data for the target application, such as name, category, size, and so on.
  • The second step is to log in with the Apple ID, purchase the product, use iTunes authorized PC to meet the application download requirements, and save the application to the local disk. Here, we have to use a number of techniques to overcome the App Store’s anti-crawling mechanism.
  • The last step is to crack the downloaded application. Since all apps on the App Store are packaged by Apple, this obviously prevents vendors from doing dynamic and static analysis based on security, so the runtime memory of the target application needs to be dumped into plain code.

Therefore, based on the above process, we can design the following architecture.

Automatic Crawl & Crack system architecture

Crawl, Apple ID login, PC authorization, iOS device authorization, IPA signing, and Crack after installation. It’s actually an automated system based on iTunes Store applications.

The App Meta Information Crawler is responsible for obtaining application details, including download urls and price information. The app download crawl tool automatically downloads apps from these urls. These applications are then sent to each jailbroken device for decryption, which will be used for later static and dynamic analysis.

Automatic Crawl system

For the Crawl system, we can operate in three parts.

  1. Crawler: The App Store is regionally restricted, i.e. the Apple ID of region A cannot download apps from region B. Therefore, we designed different spiders for different areas. The Meta information obtained includes App ID, download address, icon and other basic information.
  2. App download Crawler: The same IPA file as “IPA file downloaded from iTunes client” can be automatically downloaded by reverse analyzing multiple binaries and communication protocols to construct the login and purchase request of Apple ID.
  3. Import DRM data: the IPA file downloaded above cannot actually be installed and run directly, because there is a missing Sinf file, which is a DRM data file containing authorization information. For Apple, they keep only one Copy for each application. When the user purchases the App, the server dynamically generates DRM information and sends it back in the response data purchased by the application. Then iTunes or App Store will be responsible for repackaging DRM data into IPA files. Therefore, we can simply save and download the previously obtained “Sinf data” into an IPA file.

We all know that apps downloaded from the App Store are encrypted. This makes it difficult for us to analyze behavior using jailbreak devices and emulators, so we also need to decrypt the IPA. Now let’s talk about the technical essentials of Crack system.

App Crack system

If the user’s account has never been logged in on an iOS device, the App it purchased cannot run on that device, namely DRM protection. On a device, if a user logs in to his or her account, Apple assumes that the user has authorized the device, and any apps purchased with that account can be installed and run. But we need to automate it.

By reversing the setup, we found that StoreServices.framework was used to manage account information, and finally we made a Tweak and Undocument API to implement the Apple ID login process.

We have a large number of samples to analyze, and the next task is static and dynamic analysis. The industry is very mature for static analysis, such as MachOView and so on, which we won’t cover here. Dynamic analysis, at present, is mainly based on Frida systems.

Frida is a powerful and portable Hook system that supports both mobile (such as iOS and Android) and PC (such as MacOS) systems. More importantly, it allows Hook points to be controlled from a script (such as JavaScript) without configuration and compilation. Therefore, it is the most popular dynamic analysis framework system. Of course, they all have to rely on real devices.

Next, let’s take a look at the traditional sandbox system for real machines (iOS devices).

Sandbox analysis system

Traditional truth-based sandbox systems (iOS devices)

From the above figure, the workflow of implementing a real machine sandbox system (iOS) based on Frida mainly includes the following aspects:

  • First, configure Frida on iOS devices for behavior tracking.
  • The Frida controller module will then trigger a sample run on the iOS device, or any other action (e.g., installing a profile, using a browser to visit a website, etc.), and track the system behavior of interest;
  • Finally, the behavior log is collected to the host, and the log will become the input of the decision engine system, which will judge the sample behavior in real time or not as needed.

Although Frida has always been the mainstream of App dynamic detection, there will be serious bottlenecks if we need to detect a large number of samples or a large number of cases, because we are faced with a large amount of real machine (iOS device) investment, and cost and scalability are fatal problems, so we innovatively use low cost Raspberry Pi to replace it. And the successful implementation of virtualization, clustering.

Raspberry Pi-based iOS virtual machine

On the virtualization side, we implemented a dynamic loader that loads iOS executables, re-implemented the System Library and Framework to keep iOS executables running, and so on.

This makes it easy to dynamically monitor Mach-O behavior and submit these logs to a decision engine to determine if the application is malware, or even a Chain of APT attacks, and so on.

If we want to use the existing server for maintenance, we can also port it to RUN in QEMU.

With this “low-cost hardware emulator” design, large numbers of samples can be scanned automatically on a daily basis, resulting in cost savings and improved scalability and sample detection efficiency.

Such an efficient sandbox analysis system is bound to produce a large number of analysis logs, so we need a high performance, high real-time rule decision engine system to do the final judgment processing.

Decision engine system

Nools is a rule engine inference system based on Rete and implemented using JavaScript. It can support the real-time judgment mode of continuous log input, and the decision rules written with it have strong flexibility and portability, which makes our sample detection obtain high availability.

conclusion

There has been no good practice in the industry for “automated Hunt Advanced Threat detection for iOS mass samples”, and we have demonstrated the feasibility of advanced threat Hunt system based on automatic capture, security sandbox automatic analysis system, and iOS virtualization. Such a large number of sample detection cases and logs also provide necessary conditions for the introduction of AI system in the future.

About Meituan security

Most of the core developers in the security department of Meituan-Dianping Group have years of practical experience in the Internet and security field. Many students have participated in the security system construction of large Internet companies, and there are many global security operation talents with millions of IDC scale offensive and defensive confrontation experience. There are also CVE “diggers” in the security department, speakers invited to speak at international top conferences like Black Hat, and of course, many beautiful business girls.

At present, The security department of Meituan-Dianping covers penetration testing, Web protection, binary security, kernel security, distributed development, big data analysis, security algorithms, etc., as well as global compliance and privacy protection strategy formulation. We are building a mobile office network adaptive security system with a scale of millions of IDC and access of hundreds of thousands of terminals. This system is built on zero-trust architecture and spans a variety of cloud infrastructures. Including network layer, virtualization/container layer, Server software layer (kernel/user mode), language virtual machine layer (JVM/JS V8), Web application layer, data access layer, etc., and can build automatic security event awareness system based on big data + machine learning technology. Strive to be the industry’s most advanced built – in security architecture and depth defense system.

With the rapid development of Meituan-Dianping and increasing business complexity, security departments are facing more opportunities and challenges. We hope to bring more security projects that represent the best practices of the industry to the ground, provide a broad platform for more security practitioners to develop, and provide more opportunities to explore the emerging areas of security.

Meituan Security 2018 fall recruitment has started! There are more than ten hot positions in data security, Web security, mobile security, IT security, penetration testing, security research, privacy protection compliance, product research and development, etc. Welcome to join us. Please send your resume to [email protected]