Founded in March 2012, Toutiao is only four years old. It went from a dozen engineers to hundreds, to more than 200. Product line from the connotation of jokes, to today’s headlines, today’s special sale, today’s movie and other product lines.

I. Product background

Toutiao is a personalized information client for users. Let’s share with you the current toutiao data (according to internal and public data synthesis) :

  • 500 million registered users

150 million in May 2014, 300 million in May 2015, and 500 million in May 2016. Almost doubled.

  • Live 48 million users a day

It was 10 million days in 2014 and 30 million days in 2015.

  • 500 million PV per day

500 million article views, 100 million videos. Page requests exceeded 3 billion.

  • The user stays for more than 65 minutes

1. Article capture and analysis

We produce about 10,000 pieces of original news on a daily basis, including major news websites and local websites, as well as some novels, blogs and other articles. It is not difficult for engineers to write a Crawler.

Toutiao will then manually filter sensitive articles. In addition, Toutiao toutiao has a number of original articles in the selection queue.

Next, we will conduct text analysis of the article, such as classification, label, topic extraction, and calculation by article or news location, heat, weight, etc.

2. User modeling

When users start to use Toutiao, real-time analysis of user action logs is performed. The following tools are used:

– Scribe

– Flume

– Kafka

We dig into users’ interests and learn from their every action. Main uses:

– Hadoop

– Storm

The resulting user model data is stored in MySQL/MongoDB (read-write separation) and Memcache/Redis, as in most architectures.

With the continuous expansion of the number of users, the number of machine clusters processed by user model is larger. About 7,000 before 2015. The user recommendation model includes the following dimensions:

1 User Subscription

2 tag

3 parts of the article break up push

At this point, you need to make recommendations all the time.

3. “Cold start” for new users

Toutiao will be “identified” by the phone, operating system, version and so on. In addition, for example, when users log in through social accounts, such as Sina Weibo, toutiao will make preliminary “portraits” of users from their friends, fans, microblog content, forwarding and comments.

The main parameters for analyzing users are as follows:

– Attention, fan relationships

Relationship –

– User label

In addition to phone hardware, Toutiao also analyzes the apps installed by users. For example, the combination of model and APP analysis, the difference between xiaomi, Samsung and Apple, in addition to the user’s browser bookmarks. Toutiao captures users’ actions to APP channels in real time. It also includes channels the user subscribes to, such as movies, jokes, and merchandise.

4. Recommendation system

Recommendation system, also known as recommendation engine. It is a core part of Toutiao’s technology architecture. Including automatic recommendation and semi-automatic recommendation system of two types:

1 automatic recommendation system

– Automatic candidate

– Automatically matches users, such as locating user addresses and extracting user information

– Automatically generates push tasks

This requires efficient, concurrent push systems that reach hundreds of millions of users.

2 semi-automatic recommendation system

– Automatic selection of candidate articles

– According to the user station inside and outside the action

On the technical side, the channels of headlines are divided into classified channels, interest label channels, keyword channels, text analysis, etc., which are divided into relatively independent development teams. Currently there are over 300 classifiers, and new user models are being added. The original user models are still functional without being undone.

Before the introduction of toutiao, the content is mainly to grab the articles of other platforms, and then to heavy, a year millions of levels, not too big. Mainly user action log collection, interest collection, user model collection.

Technical indicators of information apps, such as screen sliding, whether users finish reading an article, and retention time, need our special attention

5. Data storage

Toutiao uses MySQL or Mongo persistent storage +Memched (Redis), has many libraries (a large memory library), also tried to use SSD products.

Toutiao picture storage, directly placed in the database, distributed save files, read using CDN.

6. Message push

Notification push, for users: timely access to information. For operations, it can increase user activity. For example, when Toutiao is promoted, it can increase DAU by about 20%. If it is not promoted, it will affect DAU by about 10% (2015 data).

ROI to pay attention to after push: click rate, click number. Monitor the number of App uninstalls and push disables.

The main content of toutiao’s push includes emergency and hot news, some comments and replies, and friends outside the site register to join.

In Toutiao, push is also personalized:

– Frequency personalization

– Content personalization

– the regional

Interest –

Such as:

According to the city: a news event in Chaoyang, Liaoning, sent to the local users in Chaoyang.

According to interest: for example, JD acquires Yidian and sends it to users who are interested in the Internet.

The tools and options of the push platform must meet the following standards:

– Channel, the first speed should be fast, but controllable, reliable, and save resources

– Push fast, have different dimensions of policy support, traceability, development interface should be friendly

– In the background of push operation, feedback should be fast, including timeliness, popularity and convenient operation of tools

– For the operation side, it is clear whether the recommendation is confirmed, including the processing of push copy

Therefore, push background should provide daily, complete data background, and provide A/B Test scheme support.

Part of the push system uses its own IDC, which has a large amount of transmission and serious bandwidth consumption. You can use services similar to Ali Cloud, which can effectively save costs.

Ii. Toutiao System Architecture

3. Toutiao micro service architecture

By breaking up subsystems, large applications are broken down into small ones, and the general layer is abstracting for code reuse.



The system is typically layered. The focus is on infrastructure, which is expected to improve rapid iteration, disaster recovery, and a range of work. It is expected that business teams can do business iteration and architecture adjustment faster.

4. Toutiao’s virtualization PaaS platform planning

Through the three layers, unified management through the PaaS platform. Provide universal SaaS services with a universal App execution engine. At the lowest level is the IaaS layer.



IaaS manages all the machines and integrates the public cloud. Some hot events in the headlines will be promoted and pushed across the country. For high network bandwidth, we use the public cloud to unify and abstract what type of computing resources we need. Infrastructure combines the idea of service, such as logging, monitoring and other functions, so that services can enjoy the capabilities provided by infrastructure without paying attention to details.

Five, the summary

The important part of Toutiao is:

  • Data generation and acquisition
  • Data transfer. Kafka does a message bus to connect online and offline systems.
  • Data is stored. Data warehouse, ETL (extraction conversion loading)
  • Data calculation. How the data tables in the data warehouse can be efficiently queried is very important, because it is directly related to the efficiency of data analysis. Common query engines can be grouped into three modes: Batch class, MPP class, Cube class, and toutiao has been applied in all three modes.

Send you the following learning materials, pay attention to the public number: IT elder brother, reply: interview questions and actual combat projects, can receive