Author: Di Mo Review & proofreading: Feng Yun Editor & Typesetting: Wen Yan

The introduction

PTS (Performance Testing Service) is a SaaS Performance Testing tool of Alibaba Cloud. It has been 10 years since its birth to accurately simulate the double 11 traffic peak. It supports tens of thousands of pressure testing tasks across the group every year, including singles’ Day, and is the “validator in advance” of alibaba’s internal singles’ Day technical architecture.

As a SaaS performance compression tool, PTS supports on-demand compression tasks, providing millions of concurrent and tens of millions of TPS traffic initiation capabilities. It is also 100% compatible with JMeter. Provide scene layout, API debugging, custom flow, good flow record, and other functions, can quickly create pressure test script, by hundreds of cities in China covers the operator nodes, it can accurately simulate different level users access the business system, help the business quickly improve system performance and stability, has been widely used in the retail, finance, online education, and other fields.

PTS capability was upgraded again today. The upgrade of the pressure testing protocol further expands the range and application scenarios supported by the pressure testing protocol, so that you no longer need to worry about the pressure testing of different technical architectures. The self-pressure capability of mass flow with low threshold enables the pressure testing tool team to avoid the trouble of development, operation and maintenance. Click to start the pressure testing, and easily have the self-pressure testing capability of millions of concurrent. The production environment write pressure test can be productized in a safe and non-invasive way. With simple access to the probe, the production environment write pressure test can be achieved, so that every business scenario is not “left behind” in the production environment pressure test, and the system performance and capacity can be evaluated comprehensively and accurately.

New release/upgrade features are as follows:

  1. Supports HTTP 2.
  2. Supports streaming media RTMP/HLS.
  3. Supports the Websocket protocol.
  4. Supports MQTT protocol.
  5. Support for SpringCloud/Dubbo microservice protocols.
  6. A maximum of 100W concurrent self-service pressure measurement capability.
  7. Write pressure test in safe, non-invasive production environment.

Pressure test protocol upgrade

Protocol as the “language” of application system communication, today in the face of diverse scenarios, different types of system protocol is quietly changing. As the most widely used transmission protocol, HTTP mainly transmits text content and carries the mainstream traffic of the Internet in the past. As we face diverse rich text content today, HTTP is clearly no longer the only option for our technology services. Streaming media related protocols assume the role of porter when you watch video content. When you watch video, the Websocket protocol is followed by the server. The smart watch you wear on your hand and the smart electricity in your home may keep data synchronization with the cloud service through MQTT protocol, even if you are still browsing text content. The service protocol for porters’ communication is also changing from HTTP 1.1 to HTTP 2 and HTTP 3.

As a technical, developer, and tester, understanding every interaction protocol can be a headache in the face of rapid business iterations. The same is true for the manslaughter scenario, where we clearly cannot afford to have a custom manslaughter tool for each system. PTS, as a pressure tool, brings new updates to the pressure support protocol as follows:

  • Supports HTTP 2.
  • Supports streaming media RTMP/HLS.
  • Supports the Websocket protocol.
  • Supports MQTT protocol.
  • Support for SpringCloud/Dubbo microservice protocols.

Support HTTP 2 pressure test

Since the release of the HTTP 1.x protocol version in 1997, our systems have been using HTTP 1.x to provide content services for quite some time. In the past ten years, Internet content and Internet users have exploded. HTTP 1.x has been unable to meet the needs of modern networks. More and more companies have begun to upgrade HTTP 1.x to HTTP 2 in exchange for better web page loading performance and security. You can get a feel for the performance improvements of the HTTP 2 protocol in the following image.

The major improvements in HTTP 2 over HTTP/1.1 include the following:

  1. Use binary transport.
  2. The Header compression.
  3. Multiplexing.
  4. Server Push.
  5. Improve security.

As you can see from the previous renderings, HTTP 2 performance is significantly better than HTTP 1.x. The key features to improve performance are binary transfer, Header compression, and multiplexing. Here’s how these three features work.

Using binary transport

Binary protocol is more efficient in parsing than plain text. In HTTP 2.0, the original transmission content is broken up into Frame mode, and all communication under the domain name is completed on a single connection. The original packet structure is broken up, and each packet consists of one or more frames, which can be sent out of order. It can be reassembled according to the stream identifier at the beginning of the frame, as shown below:

The Header compression

In HTTP 1.X, due to the stateless nature of HTTP 1. Header content is too large, which increases the transmission cost to some extent. If thousands of request response packets under a domain name have many field values that are identical, it is a waste of resources.

Therefore, on the basis of binary transmission, HTTP 2 protocol increases the compression ability of Header. Through HPACK compression algorithm, the dictionary is established at both ends of the client and server, and the index number represents the repeated string, and the compression efficiency can reach 50%~90%. The first request sends all the headers, and the second request sends only the difference data to improve efficiency:

multiplexing

HTTP 2 supports multiplexing. Multiplexing solves the problem of limiting the number of requests under the same domain name in the browser and reduces the cost of each new TCP request. In HTTP 2, complete request and response multiplexing is achieved through the binary Frame transport mentioned earlier, and by allowing clients and servers to break HTTP messages into individual frames and then reassemble them at the other end, as shown in the picture below. The client is sending a Stream 5 data Frame to the server. While the server is transmitting an interlaced sequence of frames for streams 1 and 3 to the client, three parallel streams are transmitting data.

Through HTTP 2 binary transmission, and multiplexing technology, you can see the previous browser for the same domain name, TCP persistent connection number limit must share TCP control, resulting in a pipeline at the same time can only process a Blocking Of the Head Of Line, It doesn’t exist anymore, which is why HTTP 2 is so efficient.

Theoretically, HTTP 2 is compatible with HTTP 1.x. If the client does not support HTTP 2, the server automatically uses HTTP 1.x for communication. In our performance test scenario, HTTP 1. X performance is inconsistent with HTTP 2. If the test engine does not support HTTP 2, it will degrade to HTTP 1. In today’s mainstream stray support HTTP 2 protocol background, the actual results of pressure measurement can be biased.

Therefore, we introduced PTS HTTP 2 support. After creating a scenario on the PTS console, users can decide to use HTTP 1.x or HTTP 2 protocol through negotiation with the server during the compression test, without any operation, to ensure the authenticity of the compression test scenario.

Supports streaming media protocol pressure test

With the rise of Internet broadcast business in recent years, Internet content is quietly undergoing earth-shaking changes. From the original e-commerce live broadcast, game live broadcast, to this year’s epidemic online education live broadcast, based on streaming media content, more and more forms of live broadcast are also presented to the public. From the technical point of view, do not agree with the HTTP protocol based back-end services, live broadcast system is a new system architecture. How to simulate the scene of a user watching a video as well as the user behavior based on HTTP request has become a new technical problem.

First of all, let’s take a look at a complete model diagram of live broadcast architecture. We can clearly see the macro model diagram of live broadcast architecture:

From the figure, we can clearly see the three main modules of the live broadcast system:

  1. Push the flow.
  2. Streaming media server.
  3. Play end.

The main function of push-stream terminal is to collect the audio and video data of anchors and push it to the streaming media server. The main function of the streaming media server is to convert the data transferred from the streaming end to the specified format, and push it to the player end so that different players can watch it. Of course, at present, cloud manufacturers also provide a complete set of solutions for the streaming media server. In short, the player is to play the audio and video and present the corresponding content to the user.

As you can see, the protocol that connects these three key modules is actually the Streaming media transfer protocol. Generally speaking, a streaming media server architecture does not need to emphasize protocol consistency. The mainstream streaming media protocols are as follows:

At present, PTS already supports RTMP/HLS protocol. How can the following figure, combined with the PTS process choreography capability, really simulate the scene of users watching different videos? Combined with the regional customization feature of PTS pressure engine, it can easily simulate user behavior of large-scale live broadcast to ensure the stability of live broadcast business.

Support Websocket protocol pressure test

Through the previous analysis of HTTP protocols, we can see that HTTP protocol is a stateless, connectionless, one-way application layer protocol, it adopts the request/response model. Before HTTP 2, a communication request can only be initiated by a client and the server responds to the request. Before the HTTP 2 protocol was widely rolled out, one drawback of this communication model was that the server could not actively send messages to the client.

However, in some real-time scenarios, this disadvantage cannot meet user needs. Before Websocket, in order to ensure the real-time information, the following two methods are usually used:

  • Ajax polling.
  • Long pull.

Ajax polling is very simple. The browser sends a request every few seconds to ask the server if there is new information. The principle of Long Poll is similar to that of Ajax polling, which adopts polling mode but adopts blocking model. After the client initiates a connection, if there is no message, it does not return a Response to the client until there is a message. After the return, the client establishes a connection again and starts again.

As you can see from the above, both of these approaches are essentially making HTTP connections and waiting for the server to process them without changing the request/response model per se. The emergence of Websocket is to solve the above problems. Through the Websocket protocol, when the server/client establishes a connection, the server can actively push information to the client, so as to ensure the real-time performance of the message and reduce the performance overhead.

In essence, Websocket is a full-duplex communication protocol based on TCP, which is completely different from HTTP, but the handshake process depends on HTTP. Careful students can easily find the following packet contents through packet capture and analysis:

GET/chat HTTP / 1.1 Host: server.pts.console.aliyun.com Upgrade: websocket Connection: Upgrade the Sec - websocket - Key: xxxxxxxxxxxxxxxxxxxx Sec-WebSocket-Protocol: chat, superchat Sec-WebSocket-Version: 13 Origin: https://pts.console.aliyun.comCopy the code

You can see that each time a WebSocket connection is established, an HTTP request is made during the handshake phase. The version number, protocol version number, original address, and host address supported by WebSocket are agreed to the server through HTTP. The Upgrade header tells the server to Upgrade the current HTTP request to the WebSocket protocol. If the server supports the WebSocket protocol, the status code returned must be 101:

HTTP / 1.1 101 Switching separate Protocols Upgrade: websocket Connection: Upgrade the Sec - websocket - Accept: XXXXXXXXXXXXXXXXXXXXCopy the code

With the above return, the Websocket connection has been established, and the data is transferred in full accordance with the Websocket protocol.

As mentioned earlier, Websocket is a new protocol derived to address the real behavior of the request/response model. In the practical application process, we find that Websocket is widely used in online games, stock funds, sports updates, chat rooms, bullet screen, online education and other scenes with high real-time requirements.

PTS supports the Websocket protocol, which enables these scenarios to quickly verify system performance and capacity through performance pressure testing, just like HTTP request-based testing scenarios.

Support MQTT pressure measurement

MQTT is an instant messaging protocol developed by IBM that is currently an important part of the Internet of Things. The protocol supports all platforms and can be used as a communication protocol for sensors and actuators by connecting almost any internet-connected object to the outside world.

The MQTT protocol itself does not distinguish between client (terminal) and server (cloud). According to the MQTT model, all client communications are forwarded by an MQTT broker role through pub/sub. The actual IoT scenario architecture diagram is as follows:

Compared to the previously mentioned HTTP protocol, MQTT has the following features:

  • Low protocol overhead. Binary – based transport protocol, the protocol header can be as short as 2 bytes.
  • Push mode is supported.
  • Unstable networks are highly tolerant. MQTT protocol natively supports the session mechanism, which can be automatically restored after the link is broken and ensures the quality of messages.

Combined with these features, MQTT fits well into the growing IoT space. Combined with recent data, the proportion of MQTT protocol in the IoT field is gradually increasing, and has even surpassed the traditional HTTP protocol.

Therefore, in order to solve the pressure test requirements of IoT scenarios, PTS specially launched MQTT pressure test scenarios, which support the pressure test of self-built MQTT service and MQTT version of Ali Cloud micro-message queue, as shown in the following figure, the pressure test scenarios can be quickly created on the console:

Support microservices related protocol (SpringCloud/Dubbo) pressure test

For a single application architecture, the deployment, operation and maintenance of applications will become slower and more complex with the expansion of services, and the agile mode cannot be applied with the increase of personnel in the application development process. Microservices architecture is designed to solve these problems.

Micro service architecture from structural point of view, is actually the function provided by an application service split into multiple services, loose coupling between these services through some sort of agreement (RPC/HTTP, etc.) to call one another, complete monomer architecture to distributed architecture, to provide a more flexible way of development, deployment, decrease the complexity of the development and operations.

In the following figure, a service is shown as an example. After a user’s request is entered into a Store-Web application through HTTP, it will be called to back-end services such as store-Cart and store-Product through RPC.

Consider the next scenario. In the microservice architecture, if we do not initiate traffic from store-Web and want to pressure the back-end services such as store-Cart and store-Product separately, we cannot pressure this scenario independently if the pressure tools do not support microservice-related protocols. Even if the tool supports some microservice protocols, you need to deploy the tool in the VPC where the microservice resides. This process takes time and effort.

To address these issues, PTS has introduced a new micro-service compression capability that supports mainstream micro-service protocols such as SpringCloud and Dubbo, and automatically enables user VPCS to perform performance compression on micro-services. The diagram below:

Escalation of pressure capability

PTS’s predecessor was Alibaba’s full link pressure test. The original intention of full-link pressure test is to simulate the real scene where users rush to Tmall to buy goods at zero on Double 11. Until 13 years ago, manslaughter was basically simulated in an offline environment. The advantages of offline simulated pressure measurement are relatively simple to implement, low risk, and certain performance problems can be found. However, the disadvantage is that the calling scenario is completely different from the real online calling scenario, and the authenticity of data and environment cannot be guaranteed, so the system performance cannot be accurately evaluated. Offline pressure measurement is usually used to test whether a single system has performance bottlenecks, which is of little reference value for capacity calculation. If the system is to be able to withstand the zero peak of Double 11, we need a more accurate pressure measurement mode to evaluate online capacity.

The concept of online pressure measurement was put forward in Ali as early as 2010. Through the way of single machine drainage, we have the ability to perform single machine pressure measurement online for the first time and accurately obtain the performance limit of single machine. The drainage manometry is single-machine based, and the corresponding capacity planning is also evaluated for a single application. In large distributed architectures, the single-application capacity-based approach ignores the impact of overall call relationships and upstream and downstream dependencies, and we cannot assess the actual carrying capacity of core pages and transaction payments throughout the chain from user login to completion of purchase. In addition, in the computer room, network, middleware, storage and a series of links are also full of uncertainty. However, the appearance of full-link pressure test changes this situation. Through the transformation of the application system, the online environment can handle normal traffic and test traffic at the same time, so as to support the cluster read and write pressure test without affecting the access of normal users, and obtain the most real online actual carrying capacity data.

Today, we are looking back at the special time point of Singles’ Day. Every year, at zero o ‘clock of Singles’ Day, all the users rush to Tongmao to buy goods. From the technical dimension, tens of millions of HTTP requests instantly arrive in the system. The reason why Ali’s system can withstand such a large volume of flood peak is inseparable from the full-link pressure test rehearsal before singles’ Day.

PTS stands on the shoulder of full-link pressure measurement, and has productized the full-link pressure measurement mass flow pressure capability and production environment write pressure measurement capability. PTS can initiate nationwide user access traffic at a low cost, and at the same time cover all online pressure test scenarios including write requests, which can simulate the most realistic scenario similar to the Double Eleven activity.

Pressure ability of mass flow

In the face of growing business scale, I believe that many users of self-built pressure measurement platform have a worry, that is how to launch the flow of super large activities. Open source self-built, high cost of environmental maintenance; Self-developed engine pressure machine problems leading to pressure.

As shown in the figure above, PTS on-demand traffic initiation capability supports up to 100W level concurrent self-pressure testing. Whether you’re running a small concurrent pressure test in a daily test scenario, or you need to simulate a very large event, click to initiate traffic and you don’t have to worry about the above problems.

Safe, non – invasive production environment write pressure measurement capability productized

As mentioned above, Ali’s full-link pressure test enables the online environment to handle normal traffic and test traffic at the same time through the transformation of the application system, so as to support the online cluster read and write pressure test without affecting the access of normal users and obtain the most real online actual bearing capacity data.

The challenges of writing pressure measurement in production environment are mainly in two aspects. One aspect is to ensure the safety of writing pressure measurement to avoid contamination of online data. Another aspect is to try to avoid intruding too much into the business code.

Based on years of practical experience of Ali full-link pressure test, we summarize the prerequisites for ensuring the safety of write pressure test in production environment:

  1. Ensure that pressure gauge marks are not lost.

Pressure flow can be correctly identified at any point. At the traffic entry layer, the middleware recognizes and passes the pressure labels down to ensure that the pressure labels are not lost on the entire link. In this way, the downstream applications and storage can also receive the pressure labels.

  1. Ensure the pressure measurement process is not interrupted.

The pressure measurement flow can be invoked normally, the whole process is not blocked, and the expected service results are returned. The application layer of the service also needs to be modified to support the full link. When the application layer identifies the pressure probe, it needs to bypass verification logic such as parameter verification and security verification, such as mobile phone number format verification, user status verification, and other special service verification logic.

  1. Ensure the pressure data is not contaminated.

Improper pressure measurement data causes data pollution to normal online services. In a full-link scenario, multiple read/write scenarios are involved. To isolate the pressure data, the storage middleware identifies the pressure data and writes the data into the shadow database table to distinguish it from the real data. In order to simulate real scenes, more realistic shadow library basic data in the table (such as buyers, sellers and goods, shop, etc.) is composed of real data, combined with the fixed offset structure, the migration process will be carried out in desensitization operations such as sampling, filtering, to ensure the safety of data, usually on the data level consistent with real data.

PTS has released write pressure probes for production environments that already have all three capabilities. You only need to deploy probes, support mainstream common middleware, and configure corresponding rules without changing any service code. Combined with the PTS pressure capability as shown below, the pressure test can be initiated in the production environment when needed again.

The last

The above capabilities are new to PTS by the time of the Cloud Conference. Those who are interested in PTS are welcome to scan codes for group communication. During the Double 11 carnival, we not only launched the exclusive resource pack of JMeter, but also the whole line of products with a discount of 88 yuan and a minimum of 0.99 yuan. Welcome to purchase!

A link to the

Ali cloud PTSPts.console.aliyun.com/#/overviewp…

PTS Resource pack purchaseCommon-buy.aliyun.com/?commodityC…

Pin pin scan code, join PTS user communication group

For more information, please search wechat (AlibabaCloud888) to add cloud native assistant! For more information!