This article mainly introduces the HttpDns service Cache tuning related problems and solutions.
Review of previous article:
Evolution design ideas of Xiaomi automation operation and maintenance platform
This is a recent business optimization, in the form of a small solution to share the optimization process.
Business profile
The company’s internal name is The Resolver service, which is essentially an HTTPDNS system that provides domain name resolution in the form of HTTP. When connecting to services, users first obtain an IP address list through the Resolver service, and then use the IP address list to connect to the corresponding server, which solves the problems of domain name hijacking and connection degradation.
The Resolver service adopts the system structure of nginx+ back end. The back end is written by the developer in c++, and the front and back ends communicate with each other through fastcgi protocol. The QPS of a single server is usually about 7k, and the peak reaches more than 10,000.
Have a problem
To analyze problems
Nginx itself is a non-blocking model, the 10000 level of QPS on the Nginx itself is very small pressure, analysis found that the cause of the request_time is large upstream_response_time, that is, the backend c++ is slow, so it is possible that the backend reached the bottleneck. The conclusion was confirmed after analyzing the log with the developer, who put forward the requirement of adding the machine at the first time.
As operations, is the need to continue to analysis whether can do some optimization through operational means, this service is the essence of the user to launch a HTTP request, then service return a list of IP address and the list will change according to the different url arguments, but the same parameter basic without the possibility of change within 1 minute, then confirm with development business logic, There is no dependence on ua, Reffer, cookie and other additional parameters in business processing. The students who developed this solution cache for 1 minute is absolutely no problem.
To solve the problem
Is it possible to use nginx cache? This is a very familiar area, combined with the use of memory, according to the previous business experience, according to the hit ratio can play a very good optimization effect, performance may fly, even if the hit ratio is small, also earn one, so immediately take action to test.
Proxy_cache is not used to communicate with upsteam. This means that the most commonly used proxy_cache cannot be used directly. Then we want to introduce proxy_pass by adding a multi-port server, which is also a common solution before. The disadvantage of doing this is to increase the complexity of Nginx, so we have to do this as a base solution. Fastcgi-cache fastCgi-cache FastCgi-cache FastCgi-cache fastCGI-cache fastCGI-cache
Research the memory usage of the machine, take out 5 gb of memory for caching is absolutely no problem, and the content of the 1 minute may also bubble less than 5 g, at least resources are enough, then Google and baidu, find the parameters configuration of meanings, configure, repeated testing and form a configuration, available will cache the data on the/dev/SHM, Then gray, the effect is very obvious, the capacity of the basic single machine can be increased by 5 times according to the back-end calculation, let’s take a look at a few pictures:
Corresponding to server memory consumption, also confirmed small ideas, as follows:
The optimization effect
Through the server resource consumption of less than 200M memory, the hit ratio of 75% to 80% is achieved, and the performance of the machine can be improved to more than 5 times. This optimization mainly achieves the following effects
1. Saved server resources, reduced backend penetration to 1/5, increased capacity by 5 times, and saved a large number of servers;
2. Reduced the pressure of back-end c++, and reduced the request for each backend server to 1/5 of the original;
3, play a peak elimination effect, peak demand at the back end of the basic no jitter, pressure reduction;
4. For error 5xx, if there is a backend error, nginx will return the last cached result to the user, the user can still get the parse list, screenshot below.
The article was first published on the public account “Mi Operation and Maintenance”. Click to view the original article.