Author | White MANna

In the era of e-commerce, traffic has become the core competitiveness of enterprises, and activities such as seckilling and buying have become necessary marketing means. Since taobao launched the “Double 11” campaign, various e-commerce platforms and brand owners’ promotional activities have mushroomed. When it is necessary to provide services to a large group, availability becomes the key to e-commerce operation & website operation and maintenance. Faced with the surge of traffic brought by e-commerce promotion, how to deal with the massive users scattered in different regions and countries around the world and the surge of traffic, while ensuring the stable operation of business has become a problem that enterprises must solve. Take an e-commerce company with 10 million registered users as an example. During the promotion campaign, the enterprise will face the influx of nearly 10 million users from different regions at the same time. System availability will affect the success of the promotion.

For e-commerce websites, the slow loading or unavailability of the website often represents that the early marketing momentum has been burned, which not only loses tens of millions of yuan of orders, but also affects the brand reputation. Under the scenario of e-commerce promotion such as Double 11, due to the increase of traffic, once the website has usability problems, the social impact will be doubled and magnified. Therefore, both e-commerce platforms and self-built sites will conduct pressure test in the early stage to find the performance bottleneck of the system and make corresponding capacity planning in the scenario of large-scale promotion like Double 11. But is it enough just to do the manometry and dilatation? Far from enough, the pressure test scenario is more to evaluate the performance and capacity of the website from the perspective of merchants or platforms, and lacks the performance evaluation means and methods from the perspective of users.

Such website optimization is not only a simple expansion of IaaS resources, but also needs to optimize and adjust all links on the browsing path of the entire website. Simulating user usage in different parts of the world would be an impossible task to predict the performance, bottleneck or failure point of this complex shopping site system without a test tool that simulates large numbers of users and simulates the behavior of real users.

Taking the product pre-sale activity of a famous large e-commerce website as an example, we hope to test the performance of the website system before the product reservation and purchase activity, find the system bottleneck, and then help the system optimization to ensure the smooth progress of the reservation/purchase activity.

This test is a global test involving the store page, product details page and order page of the website system. The performance of each module and the entire system should be tested. It is necessary to simulate a large number of real users in different parts of the world to operate at the same time and check the page response time to ensure that the system responds timely when users in different parts of the world browse without causing unknown errors or delays affecting the user experience of the website.

After collecting and integrating relevant performance and experience indicators with the help of tools, we will start relevant analysis. We take the performance and experience data of real users as the core, so the analysis process should be roughly the same as the real user access process: terminal – network – application – system. In the process of analysis, we need to ensure that we have sufficient sample size and our own weight assessment of the impact of different indicators on user experience. Among them, we focus on terminal, network part.

(1) The availability of the whole region

Before the big promoting activities, we will for your geared to the needs of the market, choose the important cities of different provinces real user monitoring stations of different operators, and even overseas cities monitoring, landing page of the web site address launched several rounds of network measurement, from the time delay, packet loss rate, usability evaluation indicators such as dimension domain name, IP, API performance, form overall availability report, For the poor availability of regions or operators will focus on governance.

(2) Core path page user experience evaluation

User experience determines the effect of promotional activities, especially the approximate speed of the page, but also directly determines the user to stay. Research shows that most visitors will leave a web page if the opening speed is 6-8 seconds, and 99% of users will leave a web page if the opening speed is 12 seconds. The evaluation of the user experience before the big push is also something we need to focus on.

In view of user experience, we will sort out the core browsing path of users in the early stage, and the pages on the core browsing path will be optimized and governed. Through the browsing task of cloud dial-up, we can obtain the core experience indicators such as the first screen time and 100K time for users of different regions and operators to access the page. Especially for the overall first screen time, the first screen time of the core browsing path must meet corresponding requirements.

(3) DNS resolution effect evaluation

DNS resolution is one of the most overlooked areas, so we will focus on DNS governance because we still remember the lessons of Facebook’s front-end time. Through 1000+ monitoring points around the world, including real user monitoring, round-the-clock network requests to target domain names, to help users to monitor DNS service availability and resolution performance, while DNS dial-up support to specify recursion, iteration of different query methods and resolution server, Use flexible dialing parameters to simulate the access of real users as much as possible.

After scheduled dial-up tasks, Aliyun dial-up can generate reports of DNS resolution times in different regions, and clearly list details of DNS requests for each dial-up, including A address, DNS time, and DNS resolution process, which can help users quickly analyze and locate DNS resolution problems. In addition, DNS alarms can be configured to solve DNS availability and resolution performance problems before users detect and ask for solutions, improving user satisfaction and reducing economic losses.

(4) CDN quality monitoring

With the increasingly rich content of pictures and videos on websites, in order to solve the problem of slow access speed of different regions and different operators, many e-commerce websites are using CDN services to improve the loading speed of websites, reduce bandwidth costs, and increase content availability and redundancy. The LastMile (real Internet users) monitoring sites in major countries in North America, Europe, South America, Southeast Asia and other target user groups were selected, and the browser dial test task was configured to dial test daicun website.

Through the analysis of dial test logs, we can know in real time the display performance of CDN after deployment, whether the performance of host nodes is improved, and whether the availability is stable. Whether the target customer hits the corresponding host node correctly or whether the matching degree is reasonable, whether the CDN node synchronizes with the source station, whether the element release is provided in place and valid for a long time. The CDN setting strategy is adjusted and optimized based on the above evaluation criteria.

Double 11 every year on the eve of the pressure measurement become necessary options, all link continuously through the pressure test found that the problem of optimization and comprehensive verify the stability of the business, and the emergence of cloud dial test, is a perfect complement to link all pressure test, from the perspective of the user in parsing large cu scenarios of user experience, enables users to have better buying experience. And with the development of business continues to evolve, continue to play an irreplaceable role.

About cloud dial measurement

As a business-oriented non-intrusive cloud native monitoring product, cloud dial-up has become the best choice. Through aliyun’s worldwide service network, it simulates real user behavior and continuously monitors the availability and performance of websites and their networks, services and API ports around the clock. Achieve page element level, network request level, network link level fine granularity problem location. Rich monitoring related items and analysis models help enterprises timely find and locate performance bottlenecks and dark points in experience, reduce operating risks, and improve service experience and efficiency.

(I) Global monitoring node coverage

More than 200,000 LM worldwide, more than 500 IDC terminal monitoring nodes, 400+ operators at home and abroad and hundreds of thousands of registered members, to ensure that the monitoring scale meets the increasingly large business scale.

(two) no embedded code, out of the box

Zero intrusive monitoring, just enter the URL and perform simple configuration, no r&d support required. Complete site performance data analysis reports are available in minutes. Resource pack & pay-per-volume multiple purchase modes to meet the requirements of operation and maintenance testing.

(3) Business-oriented, preset a variety of analysis models

The monitoring period is refined to the minute level, and more than 20 monitoring associated parameter Settings of 7 categories are supported. It supports multiple mainstream protocols, and provides 7×24 hours fine-particle fault real-time monitoring, alarm and performance analysis services for sites and service ports. From the perspective of the end customer, through multi-dimensional combination analysis such as region and operator, drilldown analysis of single sample details, the use of rich index system and chart types, intuitive positioning of problems, affected range and root cause, pressure drop analysis time, improve operation and maintenance efficiency. Truly achieve fine monitoring.

(4) Intelligent alarm and accurate positioning

Real-time alarms are realized for the time, overall performance and availability of the first screen, rich alarm policy Settings, and deep integration with ali Cloud Alarm center to effectively shorten THE MTTR. Page element level errors can be found, and fault attribution can be accurately located to a single network request process, improving problem location efficiency.

Click the link in the description to learn more!…