This article has participated in the call for good writing activities, click to view: back end, big front end double track submission, 20,000 yuan prize pool waiting for you to challenge!

In recent years, large e-commerce websites such as Taobao, Tmall, JINGdong and Pinduoduo have become increasingly fierce. How dare something go wrong in the middle of a raging battle!Copy the code

This not only brings economic losses, may fall out of the first echelon of e-commerce. So what did these e-commerce platforms do before and during the promotion, including after the promotion? To ensure the smooth progress of the major promotion

Next, let’s share with you that I have worked in * East platform for three years, experienced three times on 618 and two times on 11.11, and participated in 618 War preparation command center supervision (true to my business).

Before the big push: Before the big push there will be a kick-off meeting, need to take an oath. Feel the sense of ceremony or relatively strong, with the oath we will feel heavy responsibility.

What each department or business line needs to do next is to make various technical changes and sort out plans for the promotion. What is more impressive is that the core system needs to cross-review plans for the promotion, and the business line itself will be equipped with architects. Fortunately, we are equipped with senior architects from Amazon.

Here are some of the key points I’ve learned from my experiences preparing for and fighting the war.

Simply say “three pre, three limit, three drop”.

Three preliminary

1. The warning

The early warning must be accurate and the monitoring panel must have the whole process monitoring. Although the system is monitored by the monitoring panel on 7*24 duty during the promotion period, it is inevitable that there will be some non-obvious jitter, such as timeout, which cannot be reflected in the occasional timeout monitoring. We generally need to configure TP99, TP999 timeout warning and availability warning. Problems need to be quickly located through early warning. During the rush period, querying logs to determine problems is slow and logs are very rare.

At this time, we need to be familiar with the system and very familiar with the emergency plan. If the alert is in the case of the emergency plan, there will be an operation for the case. If the alert is not in the case of the emergency plan, you need to determine whether it has an impact by judging the business of the system. (Automatic downgrade scheme in most cases)

Experience: Warning of dead cell phone, constant calls and texts. That is because the early warning setting is unreasonable, decisive adjustment early warning. Important early warning adjustments during the promotion period need to be reported.

2. The expansion

After half a year, most of the business may be superimposed on the system, and the performance of the system is difficult to maintain the performance index before the last big promotion. At this time, we have to carry out pressure testing. The first thing we need to pay attention to is that the pressure testing should not affect the production environment. Secondly, the pressure testing should not be distorted.

Experience: We did this in order to save costs and found that the distortion was very serious, leading to the failure of TP99 and TP999 indexes we provided to the upstream. The result is that the upstream system constantly has timeouts.

3. The preheating

The current promotion has already had the preheating link, the preheating link can make our system to cache data better preheating. However, our system may be warming up to more than just a few SKUs or a few categories. At the same time, we also need to preheat the full amount of data in the system, otherwise it will slow down our system or even break down.

Experience: There is a relatively unpopular category of quantity is not to the time did not consider the cache preheating, at that time greatly promote the current business side to temporarily adjust the price resulting in the flow skyrocketing fortunately is the query, at that time the peak of thousands of QPS because we from the library more stable through. (Most databases are dual rooms and each room needs to be highly available, i.e., at least 4 slave libraries)

How about I write it here and see the feedback? Everyone feel good to help a thumbs-up!