Discussion on intrusion detection and protection strategies of large Internet enterprises

preface

How do you know if your business has been hacked? Is no one to “black”, or due to the lack of their own perception, temporarily unable to detect? In fact, intrusion detection is a serious challenge that every large Internet enterprise must face. The more valuable a company is, the more vulnerable it is to intrusion. Even Yahoo, the granddaddy of the Internet, suffered a massive data theft when it was taken over. Security is no small matter, once the Internet company is successfully “invaded”, its consequences will be unimaginable.

This article will not mention specific intrusion detection models, algorithms and strategies for the sake of “attack and defense versus confrontation”. Those who wish to directly copy the “intrusion strategy” may be disappointed. But we will share part of the operation ideas, please peer advice, if it can help the latter, it will be better, but also welcome everyone to discuss with us.

Definition of intrusion

Typical invasion scenarios:

Hackers in far away places, through the network remote control target of laptop/mobile phone/server/network equipment, and then read the target at random the privacy of data, or use on the target system’s functions, including but not limited to cell phone use a microphone to monitor target, use camera to peep monitoring target, using the target device of computing power, mining and Using the network capabilities of the target device to launch DDoS attacks and so on. Or cracking the password to a service to access sensitive data and control access control/traffic lights. All of these are classic invasion scenarios.

We can give a definition of invasion: it means that hackers control and use our resources (including but not limited to reading and writing data, executing commands and controlling resources, etc.) for various purposes without authorization. In a broad sense, a hacker using SQL injection vulnerability to steal data, or get a target domain name’s ISP account password to manipulate DNS to point to a black page, or find the target’s social account to take unauthorized control of virtual assets on Weibo /QQ/ email, all fall under the category of intrusion.

Intrusion detection for enterprises

In most cases, the scope of enterprise intrusion detection is narrow: generally, it refers to the hacker’s control of PC, system, server and network (including office network and production network).

The most common way for hackers to control host assets such as PC and server is to execute commands through Shell. The action of obtaining Shell is called GetShell.

For example, access WebShell via Web service upload vulnerability, or execute commands/code directly using RCE vulnerability (RCE environment provides a Shell in disguise). In addition, through some way to implant the “Trojan back door”, the subsequent direct use of Trojan integrated SHELL function for remote control of the target, this is also more typical.

Therefore, intrusion detection can focus on the action of GetShell and the malicious behavior after the success of GetShell (in order to expand the battle results, hackers will mostly use Shell to detect, dig and steal, and lateral movement to attack other internal targets, and these characteristics that are different from good guys can also be important characteristics).

Some peers, including commercial products, like to report “external scanning, attack detection, and attempt behavior” before GetShell, euphemically calling it “situational awareness” to tell companies that someone is “trying to attack.” In my opinion, the actual combat value is not great. Many enterprises, including Meituan, are basically under “unidentified” attack all the time. Knowing that someone is “trying” to attack, if they cannot act effectively and warn the action effectively, it has no great practical value in addition to consuming effort.

Are normal when we used to “attack”, will be in such a normal down to solve the problem, you can use what reinforcement strategy, which can realize the normalized operation, if there is any strategy cannot be normalized operation, such as the need to temporary assault with a lot of people to work overtime, that this strategy is mostly in the near future will gradually fade away. There’s no fundamental difference between whether we do it or not.

Web attacks that do not directly GetShell, such as SQL injection and XSS, are not considered in the narrow sense of “intrusion detection” for the time being. It is suggested that they can be classified into “vulnerability”, “threat perception” and other fields for further discussion. Of course, the use of SQL injection, XSS and other entrance, GetShell operation, we still focus on GetShell this key point, do not care about the vulnerability entrance.

“Intrusion” and “mole”

A scenario close to invasion is the “inside ghost”. The invasion itself is a means, and GetShell is just the starting point. The hacker GetShell aims to control resources and steal data later. The “inside ghost” naturally has legal authority and can legally access sensitive assets, but for purposes other than work, they illegally dispose of these resources, including copying copies, transferring leaked data and tampering with data for profit.

The behaviors of the inside ghost are not in the scope of “intrusion detection”, but are generally managed and audited from the perspective of internal risk control, such as duty separation and double audit. There are also data security products (DLP) to assist it, which will not be detailed here.

Sometimes, hackers know that employee A has access to target assets, so they target A and use A’s access to steal data, which is also classified as “intrusion”. After all, A is not A malicious “mole”. The intrusion detection also fails if it cannot be captured at the moment when hacker attacks A, or if it cannot distinguish between the stolen data of hacker controlled A and the access data of normal employee A.

The nature of intrusion detection

As mentioned above, hacking means that hackers can operate our assets without our consent, and there are no restrictions on the means. So how to find out the difference between intrusion and legitimate normal behavior, and separate it from legal behavior, is “intrusion discovery”. In the algorithmic model, this is a labeling problem (intrusion, non-intrusion).

Unfortunately, the “black” sample of this kind of intrusion is very rare, and it is difficult to find the pattern of intrusion through a large amount of labeled data and supervised training intrusion detection model. Therefore, intrusion detection strategy developers often need to invest a lot of time to refine more accurate expression models, or spend more effort to construct “intrusion-like” simulation data.

A classic example is that to detect webshells, security practitioners can search GitHub for some publicly available WebShell samples, which number less than 1,000. That’s not enough for the millions of training demands of machine learning. Moreover, from the perspective of technical techniques, these sample sets on GitHub have a large number of similar samples generated by a single technical technique, while some samples of antagonistic techniques are missing. Therefore, such training, which tries to get the AI to learn WebShell features and distinguish them from “large samples”, is unlikely to be perfect in principle.

At this point, it is called traditional feature engineering to make technical classification for known samples and extract more accurate expression model. However, traditional feature engineering is often regarded as inefficient and repetitive work, but the effect is usually relatively stable. After all, a kind of WebShell can be found stably by adding a technical feature. However, although the construction of a large number of malicious samples is supported by the halo of machine learning and AI, it is often difficult to achieve success in the actual environment: the samples automatically generated can hardly describe the original meaning of WebShell, and most of them describe the features of the algorithm automatically generated.

On the other hand, the distinction between intrusion is whether the act itself is “authorized” or not, which in and of itself does not have any significant distinguishing characteristics. Therefore, the cost of intrusion detection can be greatly reduced if the legitimate access can be converged to a limited channel through some reinforcement, and the channel can be strongly differentiated. To strict certification, access to the source, for example, whether a natural person, API or applications, are required to hold legal bills, and handing out notes, according to different situations to do more latitude, authentication and authorization, reoccupy IAM on the paper record and monitor them to access scope, also can produce more at the bottom of the Log to do abnormal access model.

This life-cycle risk control model is also the premise and basis for the implementation of Google’s BeyondCorp borderless network.

Therefore, there are two main ideas of intrusion detection:

Pattern matching based on black characteristics (for example, WebShell keyword matching).
According to the business history behavior (generate baseline model), make an anomaly comparison (neither white nor black) for the intrusion behavior. If the business history behavior is not convergent enough, use reinforcement to converge it, and then pick out the abnormal behavior of the minority that is not in compliance.

Intrusion detection and attack vector

Depending on the target, the attack surface that may be exposed to the hacker will be different, and the hacking methods may be completely different. For example, hacking into our PC/ laptop, and hacking into servers deployed in the machine room/cloud, there is a big difference in attack and defense methods.

For a specific “target”, there may be a limited set of channels through which it can be accessed and a limited number of paths through which it can be attacked. The combination of “attack method” + “attack surface of target” is called “attack vector”.

Therefore, when discussing the effect of intrusion detection model, it is necessary to define the attack vector first and collect corresponding logs (data) for different attack paths before making corresponding detection model. For example, the Shell command data set based on SSH login cannot be used to detect WebShell behavior. The data collected based on network traffic is also impossible to sense whether a hacker has executed any command in the Shell environment after SSH.

Based on this, if an enterprise says it has done a good job in APT awareness model without mentioning specific scenarios, it is obviously “boasting”.

Therefore, intrusion detection must first list all kinds of attack vectors and collect data for each subdivision scenario (HIDS+NIDS+WAF+RASP+ application layer log + system log +PC……). , combined with the actual data characteristics of the company, make the corresponding detection model to adapt to the actual situation of the company. Different companies’ technology stacks, data sizes, and exposed attack surfaces all have a significant impact on the model. For example, many security workers are particularly good at WebShell detection under PHP, but a Java-based company……

Common methods of intrusion and response

If you don’t have a good understanding of the common methods of hacking, it can be difficult to target, and sometimes even fall into the trap of political correctness. Let’s say the penetration testing team says, we did action A, and you didn’t notice, so you can’t. However, the actual situation is that the scene may not be a complete invasion chain, and even if the action is not detected, the effect of intrusion detection may not be affected. Professional experience is required to support and make decisions on the harm caused by each attack vector to the company, how to prioritize the probability of occurrence, and how to solve the cost and benefit of the attack vector.

Here is a brief introduction to the classic process in the hack tutorial (refer to the kill chain model for the complete process) :

Hackers may not know enough about a target before they break into it, so the first thing they do is often “check out”, or gather information to learn more. For example, a hacker needs to know what assets (domain name, IP, service) the target has, what their respective status is, whether there are known vulnerabilities, who manages them (and how they are managed legally), and what known leaks exist (passwords in the social worker database, etc.)……

Once scour is complete, skilled hackers will prepare and verify the feasibility of “attack vectors” on a case-by-case basis based on the characteristics of various assets. Common attack methods and defense suggestions are listed below.

High-risk Service Intrusion

All public services are “high-risk” service, because the agreement or open source components to realize the agreement, there may be a known attack methods (advanced attackers even have the corresponding zero day), as long as the value of your high enough, the hacker has enough power and resources to mining, so when you open the high-risk service to the Internet, everyone open oriented at that moment, It opens the door to hackers.

For example, SSH, RDP and other services related to operation and maintenance management are designed for the administrator. As long as you know the password/secret key, anyone can log in to the server and then complete the intrusion. A hacker could get the credentials by guessing the password (in combination with a social worker database leak, an online search or brute force cracking). In fact, this kind of attack is too common, hackers have long made the full automation of the whole Internet scanning worm tools, cloud buy a host if set a weak password, often in a few minutes will be infected with the worm virus, is because this kind of automated attackers are too much.

Maybe your password is strong, but that’s not a reason to keep the service exposed to the Internet. We should limit these ports to only our OWN IP (or internal fortress host), and completely eliminate the possibility that hackers can get into us through it.

Similarly, MySQL, Redis, FTP, SMTP, MSSQL, Rsync, etc., all their own used to manage servers or databases, files, services, should not be open for the Internet unlimited. Otherwise, a worm-like attack tool could break into our services within minutes, even encrypt our data directly, or even demand payment in Bitcoin for extortion.

There are also high-risk services that have RCE vulnerabilities (remote command execution), and once the port is open, the hacker can use the existing exploits directly to GetShell to complete the intrusion.

Prevention Suggestion: The cost of intrusion detection for each high-risk service is high, because there are many specific high-risk services and they may not have common features. Therefore, the convergence of attack portals is more cost-effective through reinforcement. Prevent all high-risk ports from being open to the Internet, thus reducing the probability of intrusion by more than 90%.

Web intrusion

With the hardening of the high-risk ports, many of the attacks in the hacker’s knowledge base will become invalid. But Web services are the dominant form of service for modern Internet companies, and they can’t all be turned off. Therefore, based on PHP, Java, ASP, ASP. Net, Node, C written CGI and other dynamic Web service vulnerabilities, has become the main entrance of hackers.

For example, upload a WebShell directly by using the upload function, execute a remote WebShell (or code) directly by using the file inclusion function, and then execute arbitrary commands directly as the Shell entrance by using the code execution function, parse some pictures and videos, and upload a malicious sample. Trigger parsing library vulnerability……

Application security under Web services is a specialized field (Doug even wrote the book White Hat on Web Security), and the specific attack and defense scenarios and confrontations are well developed. Of course, since they are all Web services as the entry point, there will be some commonality in the intrusion behavior. It is relatively easy to find some differences between hacker GetShell and normal business behavior.

For intrusion trace detection of Web services, we can consider collecting WAF Log, Access Log, Auditd system call, Shell instruction, and network Response data to extract the characteristics of successful attacks. It is suggested that we focus on these aspects.

Zero day invasion

According to leaked toolkits, in the early years the NSA had 0day weapons to attack Apache, Nginx and other services directly. This means that opponents are likely to GetShell in a dozen days, regardless of how our code and services are written.

But for intrusion detection, this is not scary: no matter what vulnerability an adversary uses as an entry point, the Shellcode it uses has something in common with the behavior itself. Apache has 0day vulnerability was attacked, or a PHP page has low-level code vulnerability was used, from the behavior of the invasion, perhaps is completely the same, intrusion detection model can also be common.

Therefore, it may be more worthwhile to focus on the hacker GetShell entry and what happens afterwards rather than the vulnerability entry. Of course, the specific exploit or actual follow-up, and then verify whether its behavior is in line with expectations.

Office Terminal Intrusion

In most APT reports, hackers start from people (office terminals) first, such as sending an email to trick us into opening it, controlling our PC, observing/browsing it for a long time, and then roaming the Intranet after obtaining our legal credentials. So most of these reports focus on describing trojans used by hackers and how similar family codes are. Most anti-APT products and solutions also use similar methods to test the behavior of “no-kill Trojan horse” at the system call level of office terminals.

Therefore, the combination of EDR products + email security gateway + behavior audit of office network exit + sandbox of APT products can collect corresponding data and make a similar intrusion detection awareness model. The most important point is that hackers like to pay attention to important internal infrastructure, including but not limited to AD domain control, mail server, password management system, permission management system, etc. Once captured, it is equivalent to become the “God” of the Intranet, they can do whatever they want. Therefore, for the company, important infrastructure should be targeted attack and defense reinforcement discussion, Microsoft even issued a special reinforcement white paper for AD attack and defense.

Basic principles of intrusion detection

A model that cannot thoroughly follow up every alarm is equivalent to an invalid model. After the invasion, there are alarms before the defense, but too many did not follow/did not check thoroughly, this is the “afterthought”, equivalent to do not have the ability to find, so the average daily alarm tens of thousands of products, security operation personnel often expressed very helpless.

We have to block out some of the similar alarms that occur repeatedly in order to focus on closing the loop on each one. This results in whitelisting, or underreporting, so underreporting of the model is inevitable.

Since any model will have underreporting, we must make multiple models at multiple latitudes to form associations and depths. Assuming that WebShell static text analysis is distorted and bypassed by hackers, malicious calls to the RASP (run time environment) can also be monitored, giving you the option of accepting missing reports from individual models but still having the ability to discover them as a whole.

Since every single scenario model has errors and omissions, we need to consider “cost performance” for what scenario we do or don’t do. For example, some deformed Webshells can be written to be very similar to business codes, which can hardly be recognized by human eyes. Therefore, it is a poor cost-effective decision to pursue confrontation in text analysis. If through RASP testing solution, it is more cost-effective and more feasible.

It’s not easy to know all of a hacker’s methods of attack, nor is it possible to build a strategy for every one (given that resources will always be scarce). Therefore, for key services, security hardening must be implemented (and the effectiveness of security hardening must be regularly monitored) to ensure that the paths that hackers can attack converge to a maximum extent and attack only in critical links. At least for the core business with the ability to protect the bottom.

Based on the above principles, we can know for a fact that we may never be able to detect intrusions 100% of the time at a single point, but we can make it difficult for an attacker to bypass all points with some combination of methods.

When the boss or the blues challenge, when there is lack of a single point detection, if in order to “politically correct”, in the single point on the endless devotion, trying to put a single point do the ability of 100% can be found, most of the time may be trying to create a “perpetual motion machine”, pure waste of manpower, resources, and does not produce the actual income. Save the resources, cost-effective layout of more deep defense chain, the effect will obviously be better.

The mainstream form of intrusion detection products

After all, intrusion detection is based on data to model, for example, the detection of WebShell, first to identify the Web directory, and then the text analysis of the files in the Web directory, which needs to do a collector. The intrusion detection model based on Shell command needs to acquire all Shell commands, which may involve Hook system call or Shell hijacking. Detection based on the network IP reputation, traffic payload, or content check based on the mail gateway may be implanted in the network boundary to collect traffic in bypass mode.

Some integrators collect logs of all parties based on multiple sensors and collect them in a SOC or SIEM, and then submit them to a big data platform for comprehensive analysis. Therefore, intrusion detection related products in the industry can be roughly divided into the following forms:

Host-based Intrusion Detection System (HIDS for short) is called host-based intrusion detection system (HIDS for short) after a hacker attacks a host and performs operations on the host, which may generate traces of logs, processes, commands, and networks.
- Typical products: OSSEC, Ivy Cloud, Security Knight, Security Dog. Google also recently released an Alpha version of a similar product, Cloud Security Command Center. Of course, some APT manufacturers often have Sensor/Agent on the host, such as FireEye, etc.
Network detection: Most attack vectors offer payload to the target through the network, or the protocol controlling the target has strong characteristics, so they have the advantage of network identification.
- Typical products: Snort to commercial NIDS/NIPS, APT to the likes of FireEye’s NX.
Centralized log storage and analysis: This product allows hosts, network devices, and applications to output their own logs to a unified background. In this background, logs of various types are comprehensively analyzed to determine whether multiple paths of an intrusion can be associated. For example, the Web access log of host A shows that there are scanning and attack attempts, and then an unfamiliar process and network connection is added to the host layer. Finally, Host A makes horizontal penetration attempts to other hosts on the Intranet.
- Typical products: LogRhythm, Splunk and other SIEM products.
APT Sandbox: Sandbox products are closer to a cloud version of advanced anti-virus software, which performs observation behavior through simulation to counter the weak characteristics of unknown samples. However, it requires a simulation operation process and has high performance overhead. It was considered as a “cost-effective” solution in the early stage. However, because malicious files are more difficult to hide in behavior than to fight against features, it has now become a core component of APT products. Unknown samples obtained through network traffic, terminal collection, server suspicious sample extraction, email attachment extraction, etc., can be submitted to the sandbox for a run to determine whether malicious.
- Typical products: FireEye, Palo Alto, Symantec, Microstep.
Terminal intrusion Detection products: There are no actual products for mobile devices and they are not necessary. PC terminal is the first necessary antivirus software, if you can detect malicious programs, to a certain extent can avoid invasion. But if you encounter no-kill advanced 0day and Trojan, anti-virus software may be bypassed. Referring to the idea of HIDS on the server, the concept of EDR was also born. In addition to the local logic, the host will collect more data to the back end for comprehensive analysis and linkage. Others say that the next generation of antivirus software will bring EDR capabilities, but currently sold separately.
- Typical products: Anti-virus software Bit9, SEP, Symantec, Kaspersky, McAfee; EDR products are not enumerated, Tencent iOA, Ali Alilang, to a certain extent, can act as a similar role;

Evaluation index of intrusion detection effect

First, active discovery cases/all intrusions = active discovery rate. This index must be the most intuitive. The tricky thing is the denominator, a lot of real intrusions, if there’s no external feedback, we don’t detect it, it doesn’t show up in the denominator, so the effective detection rate is always inflated, who can guarantee that all the current intrusions have been found? (In practice, however, with enough intrusions, whether SRC receives information or a big story from the Dark Web, you can always calculate an active discovery rate by adding the intrusion that is objectively known to you in the denominator.)

In addition, the real intrusion is actually a low-frequency behavior, large Internet enterprises if a year round hundreds of intrusion, certainly not normal. Therefore, if there is no real intrusion case for a long time, this index does not change for a long time, and it cannot describe whether the intrusion detection ability is improving.

Therefore, we generally introduce two indicators to observe:

Blue army against active detection rate
Known scenario coverage

The blue army’s active high-frequency confrontations and exercises can make up for the low frequency deficiency of real invasion events, but because the Blue Army’s grasp of attack methods are often limited, after many exercises, methods and scenes may be listed. Assuming that the builder has not yet completed a scene, the Blues do the same pose 100 times, adding 100 undetected cases does not help the builder any more. Therefore, it is also a good evaluation index to take out the coverage rate of known attack methods.

The intrusion detection team focuses on the priority assessment and rapid coverage of known attack methods, and has its own professional judgment on what level of construction is needed (refer to the “cost performance” principle in the intrusion detection principle).

There are basic acceptance principles for declaring an intrusion detection capability built into a scenario:

In this scene, the average daily work order is less than X, and the peak value is less than Y. At present, the daily average of all scenarios is
The same event is generated for the first time.
Ability to learn from false positives.
Alarms are readable (clear risk description, Key information, handling guide, auxiliary information, or index for qualitative purposes). Alarms in key-value mode are not encouraged. It is recommended to use natural language to describe the core logic and response process.
There are clear documentation, self-test reports (just like delivering a r&d product, product documentation and self-test process is a guarantee of quality).
There are blue army actual combat acceptance reports for the scene.
It is not recommended to call wechat, SMS and other interfaces to send alarms (the difference between alarms and events is that events can be closed loop while alarms are only reminders). A unified alarm event framework can effectively manage events to ensure closed loop and provide long-term basic operation data, such as stop-loss efficiency and false positive amount/rate.

The strategist’s documentation should state which situations the current model is aware of and which fail to alert (testing one’s understanding of the scenario and one’s own model). Based on the above judgment, the maturity of the strategy can form a self-score, 0-100 free approximate estimation. It’s often hard to get to 100 points for a single scene, but that doesn’t matter because the marginal cost of going from 80 to 100 can be high. It’s not recommended to go for the extreme, but to look at the whole picture and move quickly to the next scene.

If a less-than-perfect scenario often involves real confrontation and no cross-cutting strategies to compensate, the self-review results may need to be reviewed and the criteria for acceptance raised. At least solving the actual cases in the work should be given priority.

Key elements that affect intrusion detection

When discussing the factors that affect intrusion detection, we can briefly look at what errors have occurred that have prevented the defense from proactively discovering an intrusion:

Dependent data loss, such as HIDS on the machine, not deployed installation /Agent hung/data reporting process lost /Bug, or data loss in the background transmission chain.
Policy script Bug, not started (in fact, we have lost this policy awareness).
We haven’t built a strategy yet (most of the time we didn’t build a strategy until we discovered this scenario after the invasion).
The sensitivity/maturity of the strategy is not enough (e.g., the scan threshold is not reached, and WebShell uses a morphing countermeasure).
The model relied on part of the basic data error, made a wrong judgment.
The alarm of success was reported, but the wrong judgment/no follow-up/auxiliary information of the student in charge of emergency response was not enough to determine and take action.

So in fact, for an intrusion event to be caught, we need an intrusion detection system to run for a long time, with high quality and high availability. This is a very professional job, beyond the ability and willingness of most safety engineers. Therefore, it is recommended to assign dedicated operations personnel to be responsible for the following objectives:

Integrity of data collection (full link reconciliation).
Every policy is working properly at all times (automated dial monitoring).
Accuracy of basic data.
Convenience of work order operation support platform and traceability auxiliary tools.

Some of you might wonder, isn’t the validity of the model the key factor affecting intrusion detection? Why is it all this mess?

In fact, the daily data volume of the intrusion detection system of large Internet enterprises may reach hundreds of terabytes or even more. It involves dozens of business modules, hundreds of machines. In terms of digital scale, it is no less than the entire data center of some small and medium enterprises. Such a complex system needs professional support from SRE, QA and other supporting roles in order to maintain it at high availability standards for a long time. If we only rely on individual security engineers, it is difficult for them to study security attack and defense while also considering the quality of basic data, availability and stability of services, standardization of changes when releasing, various operational indicators and timely response to operation and maintenance failures. The end result is that there are intrusions that can be detected within the capability, and there are always surprises that “just happen” to go undetected.

Therefore, the author believes that with the poor operation quality of most security teams, in fact, there is no round to fight strategy (technology). Of course, once the resources are invested to keep up with the ancillary work, intrusion detection really needs a strategy.

At this point, there are so many methods of attack, why choose this scene first? Why do you think building up to a certain point is enough to meet the needs of the moment? Why choose to discover some samples over the confrontation of others?

These seemingly subjective things, very test professional judgment. And in front of the leadership is easy to back the “lack of responsibility” hat, such as to find an excuse for difficulties rather than to find a method for the target, this technique hackers attacked many times, why not solve, that technique with what said in the scope of vision, but to solve again next year?

How to find APT?

An APT is an advanced persistent threat. Since it is advanced, it means that the Trojan is likely to be kill-free (not detected by antivirus software or ordinary features), the use of advanced vulnerabilities (hardened to the teeth may not be able to stop the enemy from coming in), and the attack method is also very advanced (attack scenarios we may not have seen).

So, APT actually means something like an undetected intrusion. However, there are always APT detection products in the industry. How do they do it?

Trojan free, they use sandbox + manual analysis, even if the efficiency is low, or try to make qualitative, and quickly IOC (threat intelligence) synchronization to other customers, found 1 case, global customers have the same perception ability.
They use the anomaly detection model to identify suspicious IP relationships and payloads that they do not recognize. Of course, after identifying, also want operation personnel to follow up carefully, ability qualitative.
With advanced attack methods, they still assume that hackers use known methods such as harpoons and puddls to execute, and then collect logs in mailbox attachments, PC terminals and other links to analyze user behaviors. UEBA tries to find out users’ unusual actions.

Well, what about us? The author also has no good method to find the legendary “kill free” Trojan horse, but we can extract some characteristics of the samples and behaviors generated by known hacker attack frameworks (such as Metasploit and Cobalt Strike). We can assume that a hacker has taken control of a machine, but when it tries to spread laterally, we have models that identify the host’s laterally moving behavior.

The author believes that there is no 100% method to find APT in the world. However, we can wait for the APT implementation team to make mistakes. As long as we are deep enough and the information is not symmetrical enough, it is definitely difficult to not touch all our bells completely.

Even if the attacker needs to carefully avoid all detection logic, it may give the opponent a psychological shock, which may slow down the opponent’s approach to the target, lengthening the time. And at this point, if he makes a mistake, it’s our turn.

All of the previous high standards, including high coverage, low false positives, forcing every alarm to follow up to the end, and a “digging everywhere” attitude, are waiting for this moment. Catching an opponent worthy of admiration, that sense of achievement, or very memorable.

Therefore, I hope that all engaged in intrusion detection security peers can stick to it, even after hearing countless times “Wolf”, the next time to see the alarm, still can use the highest awe to meet the opponent (alarm abuse me thousands of times, I wait for alarm such as first love).

Correct posture of AI in the field of intrusion detection

For the last two years, it seems the story wouldn’t be complete without AI. However, with the popularity of THE CONCEPT of AI, many people have put traditional data mining, statistical analysis and other ideas, such as classification, prediction, clustering, association and other algorithms, all in the HAT of AI.

In fact, AI is a modern method that has a very practical output in many places. In the case of WebShell text analysis, it can take a long, long time to separate dozens of sample technology types from thousands of samples, and a much longer time to model one by one (yes, feature engineering is really a much longer job in this scenario).

However, with AI, data marking, training and parameter adjustment can quickly get a model that is not too fit in the laboratory environment, and quickly put into production environment. If you’re a little bit more skilled, you can finish it in 1-2 months.

In this scenario, AI, as a modern method, can greatly improve efficiency. But the problem is that, as mentioned above, the black samples and WebShell samples of hacker attacks are often extremely scarce, and it is impossible to describe the complete characteristics of hacker intrusion. Therefore, the results of AI output, whether false positive rate or false negative rate, will be greatly affected by training methods and input samples. We can use AI, but we must not completely hand it to AI.

A common phenomenon in the security field is that it is more difficult to turn scenarios into token problems than to solve token solutions through mathematical models. At this time, security experts are often required to go first and algorithm experts to follow, rather than directly let algorithm experts “alone”.

For a specific attack scene, how to collect the corresponding invasion data, consider the difference between the invasion action and normal behavior, and the feature extraction process often determines the final effect of the model. The feature determines the upper limit of the effect, and the algorithm model can only determine how close to the upper limit.

Previously, I have seen a case where the AI team produced a WebShell model with excellent results in the laboratory environment and a false positive rate of 1/1000000, but it was completely unable to operate with an average of 6000 alarms per day at the initial stage when it was put into the production environment. At the same time, there were many cases of missing alarms. These situations were gradually resolved as the security team and AI engineers worked together. But it failed to replace the original feature engineering model successfully.

There are many products and articles in the industry that practice AI, but unfortunately, most of these articles and products are “shallow” and do not practice operational effects in a real environment. Once we apply the previous criteria to it, we find that AI, while a good thing, is definitely a work in progress. Real operations often require traditional feature engineering and AI in parallel, as well as continuous iteration.

The future is bound to be the world of AI, but how much intelligence, the front may be how much artificial. We are willing to continue to explore and share more with our colleagues on this road.

About Meituan security

Most of the core developers of Meituan Security department have years of practical experience in the Internet and security field. Many students have participated in the security system construction of large Internet companies, and there are many global security operation talents with millions of IDC scale offensive and defensive confrontation experience. There are also CVE experts in the ministry of Security, including speakers invited to speak at international top conferences such as Black Hat, and of course, many beautiful management girls.

At present, Meituan Security covers penetration testing, Web protection, binary security, kernel security, distributed development, big data analysis, security algorithms, and global compliance and privacy protection strategies. We are building a mobile office network adaptive security system with a scale of millions of IDC and access of hundreds of thousands of terminals. This system is built on zero-trust architecture and spans a variety of cloud infrastructures. Including network layer, virtualization/container layer, Server software layer (kernel/user mode), language virtual machine layer (JVM/JS V8), Web application layer, data access layer, etc., and can build automatic security event awareness system based on big data + machine learning technology. Strive to be the industry’s most advanced built – in security architecture and depth defense system.

With the rapid development of Meituan and increasing business complexity, the security department faces more opportunities and challenges. We hope to bring more security projects that represent the best practices of the industry to the ground, provide a broad platform for more security practitioners to develop, and provide more opportunities to explore the emerging areas of security.

【 Amway Small advertisement 】

Meituan Security department is recruiting Web& binary attack and defense, background & system development, machine learning & algorithm and other partners. If you would like to join us, please send your resume to zhaoyan17@meituan.com

Specific job information can refer to here

Meituan Security Emergency Response Center MTSRC homepage: security.meituan.com