What data is recorded in the request log
time_local
: Request timeremote_addr
: IP address of the clientrequest_method
: Request methodrequest_schema
: Request protocol, common HTTP and HTTPSrequest_host
: Requested domain namerequest_path
: Requested path Pathrequest_query
: Query parameter of the requestrequest_size
: Request sizereferer
: Request the source address. Suppose you posted the b.com link under the a.com website, then when users click to visit b.com from the a.com, the referer records the a.com, which is the behavior of the browseruser_agent
: Provides information about the client browserstatus
: Indicates the response status of the requestrequest_time
: Request timebytes_sent
: Response size
Many times we will use the load gateway to proxy forward requests to the actual back-end service. In this case, the request log will also contain the following data:
upstream_host
: host for proxy forwardingupstream_addr
: Indicates the IP address of proxy forwardingupstream_url
: THE URL that the proxy forwards to the serviceupstream_status
: Status returned by the upstream serviceproxy_time
: Indicates the time spent during proxy forwarding
The data derived
The client IP address can be derived from the following data:
- Asn Information:
asn_asn
IP addresses are managed by autonomous systems. For example, China Unicom Shanghai network manages all IP addresses of Shanghai Unicomas_org
: Autonomous system organizations, such as China Mobile and China Unicom
- Geo address location information:
geo_location
: latitude and longitudegeo_country
: nationalgeo_country_code
: Country codegeo_region
: Region (province)geo_city
City:
User_agent can parse the following information:
ua_device
: Use equipmentua_os
: Operating systemua_name
Browser:
The data analysis
PV
/QPS
: Page views/requests per secondUV
: Indicates the number of users who access the website. Many users can access the website if they log in unordered. In this case, you can determine the users based on the uniqueness of IP + user_AgentIP
Number: The number of IP addresses of the access source
- Network traffic: according to
request_size
The size of the request counts incoming network traffic,bytes_sent
Response size Calculates the outgoing network traffic
referer
Source analysis
- Geo-location analysis of customer requests: derived from IP addresses
geo
data
- Customer equipment analysis: according to
user_agent
Extract the data
- Request time statistics: According to
request_time
data- P99, P95, p90 Delay (what percentage of the request time is spent in the first 99 percent, for example, p99 is spent in the first 99 percent)
- Monitor long-time exceptions
- Response status monitoring: Based on
status
data- Response ratio of each status code
- 5XX Number of server exceptions
- Combined with business analysis: requested
request_path
The address andrequest_query
The parameters must be specific to the business, for example- Request an album address is /album/:id, then log
request_path
This corresponds to a visit to the album - The address for the site search is
/search? Q =< key words >
, then statisticsrequest_path
是/search
The number of log entries can know how many searches were conducted, statisticsrequest_query
中q
You can find out the search terms
- Request an album address is /album/:id, then log
General architecture
ELK + Kafka is the mainstream solution for log system construction in the industry. Beats and LogStash are used to collect and transport logs, Kafka stores logs waiting for consumption, elasticSearch is used for data aggregation analysis. Grafana and Kibana were graphically presented.