Related reading:

Why must distributed have Redis?

Report on the Development of Chinese IT Workers after the age of 35

Internet technology (Java framework, distributed, clustering) dry goods video Daquan, do not look regret! (Free download)

Author: Chen Bo

Profile: Sina Weibo technical expert, author of “Deep Into Distributed Caching”.

Source: freshmanTechnology

Weibo has more than 160 million daily active users and tens of billions of daily visits. In the face of massive access from a large user base, a good architecture and constantly improving cache system plays a very important supporting role.

This article, by Sina Weibo technical expert Chen Bo, is divided into the following four parts to explain in detail how those huge data are presented:

  • Data challenges in the running process of Microblog

  • Feed platform system architecture

  • Cache architecture and evolution

  • Summary and Prospect

Data challenges in the running process of Microblog

Feed platform system architecture

The Feed platform system architecture is divided into five layers:

  • The top layer is the end layer, such as the Web side, the client side, some of the iOS or Android clients used by everyone, as well as some open platforms, some interfaces for third-party access.

  • The next layer is platform access layer. Different pools are mainly used to centrally allocate good resources to important core interfaces. In this way, in case of sudden traffic, better flexibility can be provided to improve service stability.

  • Below that is the platform services layer, mainly Feed algorithms, relationships, and so on.

  • Next comes the middle tier, which provides some services through various intermediaries.

  • The lowest layer is the storage layer.

Feed Timeline

When you brush your micro-blog daily, for example, refresh it on the main site or client endpoint, and you get 10 to 15 micro-blogs. How is this constructed?

After the refresh, the user’s attention relationship will be gained first. For example, if he has a thousand followers, he will get the thousand IDS, and then according to the thousand UUIds, he will get some micro-blogs published by each user.

At the same time, it will get the user’s Inbox, which is the special messages he receives, such as a group of micro-blog, group micro-blog, following relationship, and list of micro-blog followers.

After getting this series of micro-blog list, collect and sort, get the required IDS, and then take the corresponding micro-blog content of each micro-blog ID for these ids.

If the tweets are forwarded, it has an original tweet, which it takes further. Obtain user information through the original micro-blog, further filter these micro-blogs according to the user’s filter words, filter out the micro-blogs that users do not want to see.

According to the above steps to leave the micro-blog, will be further to see, the user of these micro-blog collection, like, do some Flag Settings, but also on these micro-blog various counts, forwarding, comments, like the number of assembly, finally put the ten micro-blog back to the user’s various end.

From this point of view, a user requests a dozen records at a time, and the back-end server probably assembles hundreds or even thousands of data pieces in real time and then returns them to the user.

The whole process depends on the strength of Cache system, so the design of Cache architecture will directly affect the performance of weibo system.

Feed the Cache architecture

Let’s look at the Cache architecture, which is divided into six main layers:

  • The first layer is the Inbox, which is mainly grouped into some tweets, and then directly into some tweets of the group owner. Inbox is less, the main way is to push.

  • The second layer is Outbox. Every user will send regular tweets to its Outbox. According to the number of ids stored in the Cache, the Cache is actually divided into multiple Cache ids. The common Cache ID is about 200, and the long Cache ID is about 2000.

  • The third layer is some relationships, its attention, fans, users.

  • The fourth layer is content, some content of each micro-blog exists here.

  • The fifth layer is some existential judgments, such as whether I like a certain weibo post. Before, some celebrities said that I did not like the micro blog, how can it show that I like, caused some news. And this is a record, she actually liked it at some point but probably forgot.

  • At the bottom there is a larger layer — counting, counting the comments and forwarding of each micro-blog, as well as the number of users’ followers and followers.

Cache architecture and evolution

Simple KV data type

Next, we will focus on the evolution of the Cache architecture of Weibo. When weibo was first launched, we stored it as a simple KV data type.

We mainly adopted hash sharding storage in MC pool. After a few months, we found some problems: some node machines were down or for other reasons, a large number of requests would penetrate the Cache layer and reach DB, resulting in the whole request slowing down, or even DB freezing.

So we quickly changed it and added an HA layer so that even if some node went down or died in the Main layer, the requests would go further through the HA layer, not through the DB layer.

This can ensure that in any case, the whole system hit rate will not be reduced, system service stability has a relatively large improvement.

Now, this is something that’s used a lot in the industry, and then a lot of people say I’m just going to hash, but there are some pitfalls.

For example, I have a node, node 3 breaks down, Main removes it, and some QA of node 3 is distributed to several other nodes. This business is not very large, but it penetrates DB, and DB can still resist.

But if this node 3 recovers, and it’s added, then the access to node 3 comes back, and then node 3 goes down again for network reasons or machine reasons, and some of the requests from node 3 are distributed to other nodes.

At this point, there will be a problem. The data that was written back to other nodes is no longer updated. If it is not removed, there will be mixed data.

In fact, weibo is a square type of business, such as an emergency, a star to find a girlfriend, instant traffic 30%.

After an emergency, a large number of requests will appear on some nodes, which will make these nodes very hot. Even the MC cannot meet such a large number of requests. The MC becomes a bottleneck, slowing down the entire system.

For this reason, we have introduced L1, which is also a Main relationship pool. Each L1 is approximately 1/n, 1/6, 1/8, and 1/10 of the memory of Main. Depending on the number of requests, I will increase by 4 to 8 L1s, so that L1 will be the first to be accessed when all requests come in.

If L1 hits it, it will directly access it, and if not, it will access main-HA layer, so that in the case of some burst traffic, L1 can resist most hot requests.

For the microblog itself, the hotter the new data, the larger the volume with only a small amount of memory added.

To sum up briefly: through the storage of simple KV data type, we are actually MC oriented, the Hash node in the layer does not drift, and Miss penetrates to the next layer to read.

Through multiple sets of L1 read performance improvement, can withstand peak and burst traffic, and the cost will be greatly reduced.

For the read and write strategy, take multi-write, read through layer by layer, if Miss, write back. For the data in it, we initially used Json/ XML. After 2012, we directly used the Protocol Buffer format and compressed some large data with QuickL.

Set-like data

Simple QA data, but what about complex collection data?

For example, if I follow 2000 people and add one more person, it will involve some modification. One way is to take all 2000 ids down to modify, but this will be a lot of bandwidth, machine pressure.

There’s also some paging, and I have 2000 of them, and I just need to fetch a few pages, like page two, which is ten to twenty, and I wonder if I could not fetch all the data back.

There are also some linkage calculations of resources, which will calculate that ABC also follows user D among some people I follow. This involves partial data modification and acquisition, including calculation, which is actually not good for MC.

All kinds of concerns are stored in Redis, through Hash distribution, storage, a group of multiple storage to separate read and write. Redis now has about 30 terabytes of memory and 2-3 trillion requests per day.

My mother always teaches me to form good habits. That’s how you grow up to be a useful person. A good habit is to respect your teacher so that you can grow up to be a useful man. Good habits are to respect teachers, cherish classmates, polite to people; Is not careless, do things do not delay; Or cherish public property, do not waste food. Why is that? Because have good habits, be a person of noble character, know how to respect others, will be respected by others. I’m going to work hard to do that. I have some bad habits, sometimes study very careless, some can do the questions wrong. In life, also very careless, once got up in the morning unexpectedly wear the wrong clothes. I eat slowly and sometimes I have leftovers. I also got up slowly, should have dressed quickly, but I always dilly-dally, slow speed. “I’m going to get rid of these bad habits this semester. Get up in the morning and get dressed quickly, without dragging. Learning is not careless, carefully complete each question. When eating, eat quickly and leave no leftovers. I will start small and gradually develop good habits. I believe I can become a good student of both character and learning! I intend to get rid of these bad habits in this term. Get up in the morning and get dressed quickly, without dragging. Learning is not careless, carefully complete each question. When eating, eat quickly and leave no leftovers. I will start small and gradually develop good habits. I believe I can become a good student of both character and learning!” Before kindergarten, I can’t do anything, even wearing clothes is my mother to put on good clothes for me, going to kindergarten, so can’t, my mother exercise I want to learn to wear clothes by myself. One day, my mother put clothes in front of me and started letting me wear them myself. At the beginning. I cry again and is not to wear, but also throw clothes all over the ground, then sit on the ground began to cry, wait for a long time, she still ignore me, darling I had to put up the clothes to wear, one out of the room door, mother laughed, and look at my dress, sweater and pants are wearing the wrong, I quickly back to the room again to wear it again, the wear well, picked up the coat, But the coat button can not button up, the button can be naughty, as if deliberately against me, I put the button to buttonhole – the root of human evil; Love — the source of happiness and light. I’ve been on the stage of these ideas. Suddenly I noticed two figures passing in front of me, sitting on the grass not far away. It was a young man and a young woman walking from the farm. There are farmers’ cottages across the field. After a mournful silence, and with a long sigh, I heard these words uttered from the lips of a consumptive patient: the source of happiness and light. I’ve been on the stage of these ideas. Suddenly I noticed two figures passing in front of me, sitting on the grass not far away. It was a young man and a young woman walking from the farm. There are farmers’ cottages across the field. After a mournful silence, and with a sigh, I heard these words uttered from the lips of a consumptive patient — the source of happiness and light. I’ve been on the stage of these ideas. Suddenly I noticed two figures passing in front of me, sitting on the grass not far away. It was a young man and a young woman walking from the farm. There are farmers’ cottages across the field. After a mournful silence, and with a sigh, I heard these words uttered from the lips of a consumptive patient — the source of happiness and light. I’ve been on the stage of these ideas. Suddenly I noticed two figures passing in front of me, sitting on the grass not far away. It was a young man and a young woman walking from the farm. There are farmers’ cottages across the field. After a mournful silence, and with a sigh, I heard these words uttered from the lips of a consumptive patient — the source of happiness and light. I’ve been on the stage of these ideas. Suddenly I noticed two figures passing in front of me, sitting on the grass not far away. It was a young man and a young woman walking from the farm. There are farmers’ cottages across the field. After a mournful silence, and with a sigh, I heard these words uttered from the lips of a consumptive patient — the source of happiness and light. I’ve been on the stage of these ideas. Suddenly I noticed two figures passing in front of me, sitting on the grass not far away. It was a young man and a young woman walking from the farm. There are farmers’ cottages across the field. After a mournful silence, and with a sigh, I heard these words uttered from the lips of a consumptive patient — the source of happiness and light. I’ve been on the stage of these ideas. Suddenly I noticed two figures passing in front of me, sitting on the grass not far away. It was a young man and a young woman walking from the farm. There are farmers’ cottages across the field. After a mournful silence, and with a sigh, I heard these words uttered from the lips of a consumptive patient — the source of happiness and light. I’ve been on the stage of these ideas. Suddenly I noticed two figures passing in front of me, sitting on the grass not far away. It was a young man and a young woman walking from the farm. There are farmers’ cottages across the field. After a mournful silence, and with a sigh, I heard these words uttered from the lips of a consumptive patient — the source of happiness and light. I’ve been on the stage of these ideas. Suddenly I noticed two figures passing in front of me, sitting on the grass not far away. It was a young man and a young woman walking from the farm. There are farmers’ cottages across the field. After a mournful silence, and with a long sigh, I heard the words, “My dear! Dry your tears, the supreme love has opened our eyes and made us admirers of it. Is it,

My mother always teaches me to form good habits. That’s how you grow up to be a useful person. A good habit is to respect your teacher so that you can grow up to be a useful man. Good habits are to respect teachers, cherish classmates, polite to people; Is not careless, do things do not delay; Or cherish public property, do not waste food. Why is that? Because have good habits, be a person of noble character, know how to respect others, will be respected by others. I’m going to work hard to do that. I have some bad habits, sometimes study very careless, some can do the questions wrong. In life, also very careless, once got up in the morning unexpectedly wear the wrong clothes. I eat slowly and sometimes I have leftovers. I also got up slowly, should have dressed quickly, but I always dilly-dally, slow speed. “I’m going to get rid of these bad habits this semester. Get up in the morning and get dressed quickly, without dragging. Learning is not careless, carefully complete each question. When eating, eat quickly and leave no leftovers. I will start small and gradually develop good habits. I believe I can become a good student of both character and learning! I intend to get rid of these bad habits in this term. Get up in the morning and get dressed quickly, without dragging. Learning is not careless, carefully complete each question. When eating, eat quickly and leave no leftovers. I will start small and gradually develop good habits. I believe I can become a good student of both character and learning!” Before kindergarten, I can’t do anything, even wearing clothes is my mother to put on good clothes for me, going to kindergarten, so can’t, my mother exercise I want to learn to wear clothes by myself. One day, my mother put clothes in front of me and started letting me wear them myself. At the beginning. I cry again and is not to wear, but also throw clothes all over the ground, then sit on the ground began to cry, wait for a long time, she still ignore me, darling I had to put up the clothes to wear, one out of the room door, mother laughed, and look at my dress, sweater and pants are wearing the wrong, I quickly back to the room again to wear it again, the wear well, picked up the coat, But the coat button can not button up, the button can be naughty, as if deliberately against me, I put the button to buttonhole – the root of human evil; Love — the source of happiness and light. I’ve been on the stage of these ideas. Suddenly I noticed two figures passing in front of me, sitting on the grass not far away. It was a young man and a young woman walking from the farm. There are farmers’ cottages across the field. After a mournful silence, and with a long sigh, I heard the words uttered by one of the consumptive patients: “My dear! Dry your tears, the supreme love has opened our eyes and made us admirers of it. Is it,

While working with Redis, there are actually some other issues. For example, from the focus relationship, I focus on 2000 Uids, one way is full storage.

However, weibo has a large number of users, some users log in less, some users are very active, so the cost of storing all the users in memory is relatively large.

For example, only active users are stored. If you have not been active for a while, you will be kicked out of Redis and added to the Cache when you are accessed again.

There is a problem here, because Redis works in single-threaded mode, and if it adds a certain UV and focuses on 2000 users, it may scale to 20,000 UIds, and 20,000 UIds are plugged back in and basically Redis is stuck, unable to provide other services.

So we extend a new data structure, 20,000 UIds directly open the end, when writing directly into Redis, the overall efficiency of reading and writing will be very high.

Its implementation is an open array of long, addressed by a Double Hash.

We’ve made some other extensions to Redis, which you may have seen in some of our previous sharing online, to put data into public variables.

The whole upgrade process, we tested 1 GIGAByte to load in 10 minutes, 10 gigabytes to load in more than 10 minutes, and now it’s a millisecond upgrade.

For AOF, we use the rolling AOF, each AOF has an ID, reaches a certain amount and then we scroll to the next AOF.

When the RDB falls to the ground, we will record the AOF file and its location when the RDB is built, and implement full incremental copy through the new RDB and AOF extension mode.

Other data types: count

Then there are some other data types, such as a count, which in fact every Internet company may encounter, for some small and medium sized businesses, in fact MC and Redis are sufficient.

But there are some characteristics of counting in microblog: there are multiple counts for a single Key, such as the number of forwarding, the number of comments, and the number of likes for a microblog; A user has the number of followers, the number of followers, all sorts of numbers.

Because it is a count, its Value size is relatively small. According to its various business scenarios, it is about 2-8 bytes. Generally, 4 bytes is more than 4 bytes.

And then there are about a billion new tweets a day, and the total is even more impressive, and then a request, maybe hundreds of counts have to be returned.

Counter Counter Service

It is possible to take Memcached initially, but it has a problem. If the count exceeds its content capacity, it will cause some counts to be culled and lost after a downtime or restart.

And then there might be a bunch of counts where it’s zero, and then how to save it, whether to save it or not, and that takes up a lot of memory.

Weibo counts billions of times every day, even storing 0 will take up a large amount of memory, if not, it will lead to penetration into DB, which will have an impact on service solubility.

After 2010, we adopted Redis access. With the increasing amount of data, we found that Redis memory payload was still relatively low, requiring at least 65 bytes for a KV.

But we actually need 8 bytes for a count, and then the Value is about 4 bytes, so it’s only 12 bytes valid, and there are 40 more bytes that are wasted.

And that’s just a single KV, but if you have multiple counts for a Key, it’s a lot more wasteful.

For example, four counts, a Key of 8 bytes, four counts each count is 4 bytes, 16 bytes need about 26 bytes, but with Redis about 200 bytes.

Later, through Counter Service developed by ourselves, the memory was reduced to 1/5 to less than 1/15 of Redis, and hot and cold data were separated. Hot data were stored in memory, and if cold data became hot again, it was put into LRU.

RDB and AOF are implemented to realize full incremental replication. In this way, hot data can be stored in a single machine, while cold data can be stored in a single machine.

The entire storage architecture looks like the figure above, with memory on the top and SSD on the bottom. In memory, it is pre-divided into N tables, and each Table is delimited according to the pointer sequence of ID.

If an ID is added to the Table and a new count is added to the Table, a small Table is dumped to the SSD with the new ID at the top of the Table.

Some people ask, if in a certain range, my ID was originally set to count 4 bytes, but weibo is very hot, more than 4 bytes, become a large count how to deal with?

For those exceeding the limit, we store them in Aux dict. For the tables that fall into SSD, we have a special IndAux to access them and copy them through RDB.

Other data types: Existence judgment

In addition to counting, weibo has some business, some existential judgments. For example, if a micro blog shows whether it has been liked, read or recommended, if the user has already read the micro blog, do not show it to him.

Each record is very small, such as Value 1 bit, but the total amount of data is very large.

For example, about 100 million new posts are published on Weibo every day, and the total number of posts read may be tens of billions or hundreds of billions.

How to store it is a big problem, and a lot of it has zero existence. Again, do you want to store the 0 or not?

If so, hundreds of billions of records are saved every day; If not, a large number of requests will eventually pass through the Cache layer to the DB layer, and no DB can handle that much traffic.

We also did some selection: first, we looked directly at whether we could use Redis. A single KV is 65 bytes, a KV can be 8 bytes, and the Value is only 1 bit, so the efficiency of newly added memory per day is very low.

The second kind of Counter Service is newly developed by us. Each KV Value has 1 bit, and I only save 1 BYT. A total of 9 byTs are enough.

In this way, if you add 900G memory every day, you may only be able to save the latest several days. If you save three days, it is almost 3 T fast. The pressure is also quite big, but it is much better than Redis.

Our final solution was to develop the Phantom ourselves, starting with a segmented shared memory, and eventually using only 120GB of ram.

The algorithm is very simple, you can hash each Key N times, and if one of the bits of the hash is 1, you hash it three times, three numbers and set it to 1.

If I hash X2 three times, and then I’m going to see if X1 exists, if I hash X2 three times, if I hash X1 three times, I think it exists; If some hash X3, and its bits are 0, it’s 100 percent certain that it doesn’t exist.

Its implementation architecture is relatively simple, the shared memory is split into different tables in advance, in which the open method of calculation, and then read and write, the landing of the way of AOF+RDB processing.

Because the entire process is stored in shared memory, the data will not be lost when the process is upgraded and restarted. When external access, build Redis protocol, it directly extends the new protocol can access our service.

To summarize: So far, we have looked at high availability, scalability, high component performance in Cache clusters, and especially storage costs. There are other things we haven’t looked at, such as operation and maintenance, and there are now thousands of servers.

Further optimization

As a service

The first solution is to serially manage the entire Cache and the configuration to avoid frequent restarts. In addition, if the configuration is changed, you can directly modify it with a script.

Servitization also introduces Cluster Manager to achieve external management, through an interface to manage, can carry out service verification.

In terms of service governance, capacity expansion and reduction can be achieved, and SLA can also be well guaranteed. In addition, for development purposes, Cache resources can now be masked.

Summary and Prospect

In conclusion, we have optimized and enhanced the microblog Cache architecture from different aspects such as data architecture, performance, storage cost and servitization. Welcome to have research or questions about this peer comments, with us to discuss.

Have you learned anything after reading this article? Please share it with more people


Welcome to “Internet Architect”, we share the most valuable Internet technology dry articles, to help you become a thoughtful full stack architect, we only talk about the Internet, architecture, nothing else! Build the most valuable community of architects.

This public account covers China’s main chief architect, senior architect, CTO, technical director, technical person in charge and other groups. Share the most valuable architectural ideas and content. To build the most valuable circle of architects in China’s Internet circle.

  • Long press the qr code below to quickly follow us

  • If you want to group study, please click on the lower right corner of the “group study” menu into the group