Maintain dozens of languages and sites, iQiyi international website WEB page optimization practice

1. Introduction

Iqiyi International website (www.iq.com) provides high-quality videos for overseas users. Since its launch, iQiyi has supported dozens of international sites and guaranteed mass users’ high-speed viewing experience in many southeast Asian countries.

The international site service is characterized by user access from outside the country, and the back-end server is also deployed abroad. In this way, there are complex objective conditions: network and security policies of each country are not quite the same, and users in different countries have different network construction levels. There are few cases of domestic Internet companies going to sea, and the construction of iQiyi International Station is progressing in groping.

In order to provide better user experience for overseas users, the back-end team of IQiyi has done a lot of performance optimization during this period. We also hope to save these exploration experiences and communicate with our peers.

In this article, we will analyze the highlights in detail, including but not limited to:

Full-link optimization of WEB performance;
Unique AB scheme, horizontal data comparison, progressive layer by layer;
Redis own API implementation of multi-instance local cache synchronization, cache preheating;
Business to achieve hot drama second level update;
Self developed cache framework, easy access.

2. Technical research

Caching and asynchrony are said to be high concurrency killers. And generally do technical performance optimization, technical solutions are as follows:

And performance optimization is a systematic engineering, involving back-end, front-end, system network and various infrastructure, each need to do their own performance optimization. For example, the front end includes reducing Http requests, using browser caching, enabling compression, CDN acceleration, and so on, and the back end optimization is much more. This paper will select the optimization work done by the back-end team of iQiyi International Station and make a more detailed introduction of the phased achievements.

Note: When analyzing system performance problems, the following indicators can be used to measure:

Web**** : FP (full name: First Paint), FCP (full name: First Contentful Paint), etc. The first screen time refers to the time from the user opening the web page to the completion of the first screen rendering of the browser. It is the most direct indicator of user perception experience and the most important core indicator recognized in the field of performance.

Iqiyi uses Google’s Firebase tool to get direct results, which are analyzed in real time through client-side delivery.
Back-end: response time (RT), throughput (TPS), concurrency, etc. Back-end system response time refers to the time the system takes to respond to requests (application latency). For user-oriented Web services, response time is a good measure of application performance and is influenced by database queries, RPC calls, network IO, logical computation complexity, JVM garbage collection and many other factors. For high-concurrency applications and systems, throughput is a very important metric, which is closely related to the consumption of CPU and memory resources, external interfaces called, IO and so on. This data can be retrieved from the monitoring system at the back of the company.

3. Business background

Before introducing the optimization process, it is necessary to briefly introduce the unique business characteristics of iQiyi International Station, as well as the difficulties and challenges brought by these business characteristics.

3-1 Pattern languages

Iqiyi international station has its own special business. Apart from Mainland China, there are more than 200 countries in the world. When operating, some countries will operate uniformly, such as Malaysia and Singapore. Some countries operate independently, such as Thailand. Iqiyi calls this business concept independent of the country a model (or site). In business operation, it will operate independently according to the copyright region of the program. This is not the same as in China, where everyone sees the same non-personalized recommendation content.

Another special feature is multi-language. Different countries have different languages and users’ languages are changeable. Iqiyi needs to maintain content data in dozens of languages.

In addition, in the international site, user attributes and mode are strongly bound. The user mode and language will be written in cookies and cannot be changed easily.

3-2 Server rendering

Since doing international station business, it is necessary to do Google SEO, search engine results are iQiyi a lot of traffic entry, and SEO is also a huge project, here is not much description, but this will bring requirements for iQiyi front-end technology selection, so the front-end page content is rendered by the server. The main advantages of server-side rendering (SSR) over traditional SPA (single-page Application) are:

Better SEO, thanks to search engine crawler crawler tools can view fully rendered pages directly.
Faster time-to-content, especially for slow network conditions or slow-running devices. You don’t have to wait for all the JavaScript to download and execute before the server-rendered markup is displayed, so your users will see the fully rendered page more quickly. This generally results in a better user experience, and for applications where “time-to-content is directly related to conversion rates,” server-side rendering (SSR) is critical.

4. Optimize steps

In general, other teams have been making technical improvements in CDN and server-side page rendering. The core work of the international backend team is on front-end cache optimization and back-end service optimization. It mainly includes the following contents:

Browser cache optimization
Compression optimization
Server cache optimization

4-1 Web caching service

A WEB page usually renders dozens of programs. If you request the back-end API every time, the response speed will be much slower, so you must add a cache. But caching has its pros and cons, and how to do it well is not an easy task.

Lu Xun once said that all technical talk divorced from business is playing hooligan. So in the combination of business caching, the road is long.

After the launch of the first WEB version of the international website, the brief architecture is as follows:

Iqiyi international station has the requirements of Google SEO, so the data related to the program will be rendered on the server. You can see that the client browser directly interacts with the front-end SSR server (there are CDN service providers in the middle), and the front-end rendering Node server will have a local cache for a short time.

Version online, the performance effect is not ideal. It was introduced in the business background that the program contents provided to users are in different sites (countries) and languages, which are stored in cookies and are not convenient for strong caching in CDN service. Therefore, an architecture optimization was made, and the optimization was as follows:

As you can see, a layer of web caching service has been added, which is a back-end Java service that is responsible for fine-grained caching of pages rendered by front-end Node, using redis centralized caching. After the launch, the cache hit ratio was greatly improved.

4-2 AB

After the backend web cache goes live, you want to continue to optimize the service. However, back-end optimization is carried out step by step, how to check the accurate optimization effect the fastest? There are generally two kinds of latitude: horizontal and vertical. Vertical, which means verifying results over time, can be used with the Google Cloud Platform for application developers (especially for full-stack development) to provide application backend services. With the help of Firebase, application developers can quickly build the application background, focus on the development of the client, and there is a real-time observable database, there is time latitude web performance data, according to the optimization operation online point, you can see the time latitude performance changes. However, as mentioned above, there are too many factors affecting webpage performance, CDN and the front-end team are also doing optimization, and the time latitude cannot accurately see the optimization results.

That is to use horizontal comparisons. How do you do that?

The answer is still Firebase. If you add item B to Firebase, the web caching service will update the optimized traffic to item B. In this way, you can compare the performance of item A and item B directly and accurately. The details are as follows:

PlanB is a grayscale optimization scheme. There are many ways to judge scheme B, but you need to ensure that the same scheme will be matched when the user accesses it twice, so as not to hit the cache. Iqiyi International Station currently adopts gray scale according to IP to ensure that the user’s gray scale scheme remains unchanged if the IP is not changed and the gray scale strategy is not adjusted. Section 2 talks about the role of B in the cache key, which is PlanB here.

The detailed comparison method and process are shown in the following figure. All subsequent optimization strategies are judged to be effective through this process:

The browser requests the back-end service, and the server obtains the back-end IP address
According to the gray scale ratio configured in the configuration center, the current request is plan A or Plan B
If it is gray scale scheme B, the optimization logic is followed
And SSR will return a different Firebase configuration depending on the grayscale scheme
Firebase does separate data delivery, and the console gets two performance comparisons
After analyzing the data and comparing, the optimization results are obtained

It can be seen that, after such a process, the horizontal comparison is realized, and the performance comparison results can be obtained more accurately, which is convenient for continuous optimization.

4-3 Browser cache optimization

After the web page caching service is added, the front-end rendering page will be cached for 5 minutes, and the cache will automatically expire after 5 minutes. This triggers a request to the SSR service, which is returned and written to the cache.

In the vast majority of cases, the page is not being updated and the user may be refreshing the page. This data does not change and is suitable for browser-negotiated caching:

Negotiated cache: Cache policy negotiated cache under browser-server cooperation depends on the communication between the server and the browser. In the negotiation cache mechanism, the browser needs to ask the server for information about the cache to determine whether to resend the request, download the complete response, or obtain cached resources from the local server.

If the server indicates that the cache resource is Not Modified, the resource is redirected to the browser cache. In this case, the status code for the network request is 304. The specific process is as follows:

Using Etag to achieve browser negotiation cache, online, 304 requests accounted for 4%, Firebase gray scheme B performance improved **5%**, webpage performance improved.

4-4 compression optimization

In September 2015, Google introduced its lossless compression algorithm Brotli, recognizing that Internet users’ time is valuable and should not be consumed by lengthy web page loads. Brotli uses LZ77 algorithm, Huffman coding and second-order text modeling to compress data. Compared with other compression algorithms, Brotli has higher compression efficiency. Enable Brotli compression algorithm to reduce CDN traffic by another 20% compared with Gzip compression.

According to the research report released by Google, Brotli compression algorithm has many characteristics, the most typical are the following three:

For common Web resource content, Brotli provides a 17-25% performance improvement over Gzip;
When Brotli compression level is 1, the compression rate is higher than when Gzip compression level is 9 (the highest).
Brotli still provides very high compression rates when working with different HTML documents.

And it can be seen from the log that most user browsers of IQiyi support BR compression. Previously, background services supported Gzip compression, as follows:

As you can see, it is the Nginx service that supports gzip compression.

In addition, the redis of the back-end web service stores the compressed content and uses a custom serializer, that is, reading and writing are not processed to reduce CPU consumption. The value of Redis is the compressed byte array.

Nginx support brotli

The original nginx does not support Brotli compression directly and requires reinstallation and compilation:

The web caching project supports BR compression

In HTTP, whether and what kind of compression the client supports is determined by the accept-encoding header. Generally, the accept-encoding content that supports BR is “gzip,br”.

After the Nginx service supports BR compression, the web caching service needs to cache both kinds of compressed content. The logic is as follows:

As you can see from the figure above, when a server needs to support Br compression and GZIP compression, and needs to support grayscale schemes, its business complexity increases exponentially.

The services in the figure above all exist in the “(back-end) Web caching service” in the figure above. This service will also be optimized in the future.

After A week of graying this feature, A firebase comparison of plan B and Plan A showed that BR compression reduced page size by 30% and increased FCP performance by about 6%.

4-5 Server cache optimization

After browser cache optimization and content compression optimization, the overall web performance has been greatly improved. The focus of this post is to focus on the server caching module.

Local cache + Redis level 2 cache

For the cache module, the local cache was first added. Local cache uses more advanced local cache framework caffeine, which uses w-Tinylfu algorithm and is a higher performance, high hit ratio local cache framework. This resulted in the following architecture:

As you can see, this is a very common level 2 cache. The local and Redis caches both expire in 5 minutes. The space and number of keys in the local cache are limited. If the cache key matches the filtering policy, redis is required to obtain data.

After the local cache is added, the network IO requesting redis is reduced and the back-end performance is optimized

Local cache + Redis Secondary active refresh cache

After running the above scheme for a period of time, the data found that the 5min local cache and redis hit ratio were not high, and the results were as follows:

There seems to be a lot of room for cache hit ratio optimization. The cache fails because the cache time is too short. Can you extend the cache expiration time? There are two options:

Increase the cache expiration time
Add background active refresh to actively prolong cache expiration time

Option 1 is not desirable because 5 minutes of failure is the maximum in business. So what about plan 2? Start by trying to create a deferred task for all caches and actively flush the cache. After going online, I found that the downstream pressure was very large and the CPU was almost full.

After analysis, it is found that because there are too many keys, the same page may be separated by dozens of keys, and the active refresh QPS is many times more than the request itself. This kind of caching that affects the performance of the background itself is definitely not desirable, but how do you improve the cache hit ratio without affecting the downstream?

Then the statistics of the requests show that most of the requests are focused on channel pages and popular dramas, and the statistical results are roughly as follows:

The blue and green areas in the figure above are home page visits and popular visits. As you can see, these two types of requests account for more than 50% of the traffic, which can be called hot requests.

Then, based on the data results, the following architecture optimization was made:

As you can see, the refresh- Task module has been added. It proactively refreshes hot service content and strictly monitors and controls QPS. Ensure that the page cache is valid for a long time. The detailed process is as follows:

The cache service receives the page request and retrieves the cache
If there is no hit, the data is obtained from SSR
Determine if it is a hot page
If it is a hot page, send a delayed message to RockMQ
The job service obtains the request header and request body based on the key and refreshes the cache content

After going online, I saw that the cache hit rate of hot pages basically reached 100%. Performance data FCP on Firebase also improved by 20%.

Local cache (update) + Redis level 2 real-time update cache

As we all know, iQiyi is a video content website. Only by maintaining the latest quality content can there be more users, and the technical team is to do a good job in technical support to ensure better user experience.

From the above cache policy, there is still a major problem, that is, the program update will have a maximum time difference of 5 minutes. Sure enough, I received a lot of feedback from the operation of the front desk, and the update delay of the WEB program was serious. Imagine for a moment that the content team worked hard to prepare subtitles and other data and rushed to launch an episode at 21:00 on time. As a result, it took 5 minutes for the WEB terminal to update the episode after the background went online, which was definitely unacceptable.

Therefore, from the business analysis, although it is a pure display service, that is, CRUD only has R (Read), and there are not many write operations like the transaction system, about 5% of the contents displayed on IQiyi are strongly updated, that is, they need to be updated in a timely manner, which requires real-time update.

But if you just listen for messages and update the cache, when there are multiple instances, only one instance will be selected at a time to update the local cache, and the local cache of the other instances will still not be updated, which requires broadcasting. Generally, message queues are used to realize it, such as activeMq, but other third-party intermediaries will be introduced, which will bring complexity to business and burden to operation and maintenance.

After investigation, it is found that Redis realizes subscription and publication mode through PUBLISH, SUBSCRIBE and other commands. This function provides two information mechanisms, namely subscription/PUBLISH to channel and subscription/PUBLISH to mode. The SUBSCRIBE command allows clients to SUBSCRIBE to any number of channels, and each time a new message is sent to the subscribed channel, the message is sent to all clients that SUBSCRIBE to the specified channel. As you can see, with redis’s publish/subscribe feature, you can achieve local cache update synchronization.

The cache architecture was changed as follows:

As can be seen, compared with the previous increase in the local cache synchronous update function logic, specific implementation is to use Redis pub/sub. The process is as follows

The service received an update message
Update the Redis cache
Sending a PUB message
Each local instance subscribes and receives a message from Redis to update or clear the local cache

As you can see, this scheme can ensure that in distributed multi-instance scenarios, the local cache of each instance can be updated to ensure that the latest data is available on the end.

After the launch, the program update can be guaranteed within the acceptable time range, avoiding the previous 5 minutes delay caused by the introduction of cache.

Tips: Since Redis 5.0, the Stream data structure has been introduced to make publish/subscribe data persistent, which can be replaced by the new feature for interested readers.

Local cache (update) + Redis level 2 real-time update cache + cache warm-up

As we all know, publishing startup of back-end services is routine, and local caching disappears with service shutdown. For a period of time after startup, there will be a window that the local cache does not have. And in this time, is often the cache breakdown between the disaster area. Iqiyi International station is similar to a start-up project, with a lot of iteration needs and frequent release. There will be slow requests at the launch of the release. Is there room for optimization?

Can we avoid this cache gap by synchronizing the local caches of other instances to this instance after the service is started and before the health check is complete? With this in mind, the following updates have been made to the caching feature.

The specific process is as follows:

An initialization message is published when a new instance is started
After receiving the subscription message, other instances obtain the locally configurable quantity, obtain the cache keys through Caffeine’s hot key algorithm, and send the update message
When the new instance receives the subscription message, it adds a local cache from Redis or from the remote service.
This will make the new client “warm”.

This warm-up operation ensures that the service is warmed up before the traffic comes in, prior to the health check.

After the preheating function is added, the local cache hit ratio within 1 minute after service startup is greatly improved, and the slow requests caused by cold startup almost disappear.

Local cache (update) + Redis level 2 real-time update cache + cache preheating + bottom cache

In the process of iteration, it will be found that during the business growth period, there are a lot of requirements for iteration at the front and back ends, and the operation side has been operating the background. Occasionally, a WEB page becomes unavailable, and there is no reliable way to degrade it.

After the evaluation and recheck of the existing scheme, it is found that redis cache data has a longer invalid time, which is used as backup data. When SSR is unavailable or an error occurs, data cannot be obtained after cache breakdown, and redis’ bottom of the loop data can be used to return. Although the time line of the bottom of the loop data is not strong, it can render the page without the worst rendering failure. After design, the structure is adjusted as follows:

It can be seen that there is no change to the secondary cache scheme of the main body, but the aging time of redis data is longer. When reading the cache normally, it still takes 5min of fresh data. When SSR service is degraded, it will return the 24-hour time-effective bottom-pocket data, which only increases the storage space of REDis, but greatly improves the service availability.

4-6 Level 2 cache tools

As you can see from the above, there is a lot of work done on the server level 2 cache, and those with business experience will find that these functions can be reused in many businesses, such as level 2 cache, cache synchronization, cache warming, cache active refresh, and so on.

Therefore, based on the open source framework for secondary development, combined with caffeine and Redis’s own API, developed a level 2 cache tool.

More features are in continuous development.

If a business needs these features in level 2 caching, it can support the various caching requirements of the business with only a small amount of configuration, without a lot of additional development and the introduction of toolkits.

5. Optimize results

Through unremitting efforts, the performance of the WEB side of our international website has been greatly improved. You can take a look at the data:

This is just one of the FCP numbers, as well as the cache hit ratio and service metrics of back-end services, that have changed significantly. A study conducted by Amazon a decade ago showed that a 100 millisecond reduction in web page load time resulted in a 1% increase in revenue. I’m afraid this is more demanding now, so the results of optimization is still very significant.

However, we did not stop, and we are still trying to GC optimization and service responsive transformation of back-end services, which is another major topic of performance optimization, and we are looking forward to the subsequent optimization results.

The author:

Peter Lee Back-end development of IQiyi Overseas Business Division

Isaac Gao Back-end Development Manager, IQiyi Overseas Business Division