Today, with the rapid development of the Internet, UGC/PGC platforms such as live broadcast platforms, content communities and video websites are booming and springing up like mushrooms. However, huge security risks are also coming, content audit has become the most critical firewall for enterprises. Companies hire legions of content moderators to deal with objectionable content ranging from pornography to violence to crime. Among them, yellow division is probably the most mysterious and the most reverie post.

Speaking of the yellow division, we smile, look at the yellow map and high income, it is not natural. But is it really that desirable from the practitioner’s point of view? Ali Gather security xiaobian interview in charge of Ali content security (Ali green net) senior racing driver — from Ali security multi-media algorithm team Tang Qiu. How did he learn superb driving skills after years of fighting in the front line?

From Yellow Expert to “Yellow Expert”

Tang Qiu told xiaobian, the development of yellow division today, its identity has different changes.

The first generation of pornographers, as you might expect, identified pornographic images and videos with a pair of naked eyes. But this job is not easy, they need to complete a large number of pictures and videos every day identification. So it’s not just the physical toll of working for a long time. Watching too much pornography, even distorted pornography, can also be psychologically damaging, affecting sex and marriage.

Images from the Internet

The second generation of “pornographers” has emerged amid the rising cost of manual pornography and rampant pornography on the Internet.

The second generation of “yellow discriminator” evolved from artificial yellow discriminator to machine intelligent yellow discriminator. By using artificial intelligence, deep learning and big data sampling techniques, tens of thousands of normal and pornographic images are trained to create an intelligent yellow discriminator model.

Steps for generating intelligent yellow authentication model

In the steps shown above, standards and labeling data are more difficult than training models. Because the real world is complex, different people often perceive the same picture differently.

For example, the picture of women swimwear, if the beach as the background, how to judge, to indoor environment? How should children’s portrait judge?

In order to set standards, they have to bear a lot of pressure, once failed to deal with the pressure of public opinion and regulatory authorities will be punished. Ali content security team’s operation and algorithm students discussed and revised the first version several times, and in the subsequent marking process, according to the problems encountered several times to supplement, the standard was stable.

Ali content security’s intelligent yellow authentication technology is very simple to use, input a picture or video, the algorithm model returns a score between 0 and 100. The score is a non-linear indicator of how likely an image is to be sexually explicit: images with a score of 99 or more are almost certainly pornographic and can be processed automatically by the machine; Scores of 50-99 require manual review; A score below 50 is considered normal, as 50 and above contains more than 99 percent of pornographic images.

Ali content security image recognition results

In the process of obtaining the initial sample, Tang Qiu told xiaobian a data: nearly 2,000 websites, 60 + million suspected pornographic pictures, 13 + million high-quality labeling. He says this is the most important building block for intelligent yellow detection.

“Old driver” leads “New driver”

So here’s a question: will there be a case of church disciples starving to death?

“Teaching apprentices makes it easier for masters. What’s more, masters and apprentices can learn from each other and make progress together.” Tang Qiu told Ali gather security xiaobian, intelligent audit + artificial audit is the current domestic Internet companies mainstream content audit method.

Intelligent audit

Intelligent recognition can process hundreds of millions of pictures every day, which can not only save a lot of labor costs of enterprises, but also greatly improve the accuracy of recognition. It can also provide content detection in the form of audio and video, text and live broadcast, covering the risk of pornography, violent terrorism, political involvement, advertising and other content. Every video/information released will be reviewed by machines and humans.

More intelligent audit function can access to the content of the ali gather safety security service free testing experience (http://jaq.alibaba.com/green), enterprises can users through an access, low cost and OSS, ECS cloud products such as seamless docking. It supports ali department Taobao, Alipay and other core business content security. At present, Weibo, Panda TV and Alipay all use Ali’s content security function, covering social networking, live broadcasting, finance and other industries.

Manual audit is supplemented

Although machine recognition is more and more accurate in the later stage, it can be competent for most of the audit work, but there are certain limitations compared with manual audit. It is difficult for machines to simulate a normal user experience, to understand the meaning behind content, and to make accurate “human judgments”. In addition, in today’s prevalent live broadcast and video industry, it is difficult for the machine to detect its content, and manual audit is still needed.

However, with the update of intelligent technology and the improvement of identification efficiency, the proportion of artificial audit will be lower and lower. Artificial intelligence may completely liberate the traditional yellow authenticator and content audit specialist, and finally the traditional yellow authenticator will develop to the third generation.