What data is recorded in the request log

  • time_local: Request time
  • remote_addr: IP address of the client
  • request_method: Request method
  • request_schema: Request protocol, common HTTP and HTTPS
  • request_host: Requested domain name
  • request_path: Requested path Path
  • request_query: Query parameter of the request
  • request_size: Request size
  • referer: Request the source address. Suppose you posted the b.com link under the a.com website, then when users click to visit b.com from the a.com, the referer records the a.com, which is the behavior of the browser
  • user_agent: Provides information about the client browser
  • status: Indicates the response status of the request
  • request_time: Request time
  • bytes_sent: Response size

Many times we will use the load gateway to proxy forward requests to the actual back-end service. In this case, the request log will also contain the following data:

  • upstream_host: host for proxy forwarding
  • upstream_addr: Indicates the IP address of proxy forwarding
  • upstream_url: THE URL that the proxy forwards to the service
  • upstream_status: Status returned by the upstream service
  • proxy_time: Indicates the time spent during proxy forwarding

The data derived

The client IP address can be derived from the following data:

  • Asn Information:
    • asn_asnIP addresses are managed by autonomous systems. For example, China Unicom Shanghai network manages all IP addresses of Shanghai Unicom
    • as_org: Autonomous system organizations, such as China Mobile and China Unicom
  • Geo address location information:
    • geo_location: latitude and longitude
    • geo_country: national
    • geo_country_code: Country code
    • geo_region: Region (province)
    • geo_cityCity:

User_agent can parse the following information:

  • ua_device: Use equipment
  • ua_os: Operating system
  • ua_nameBrowser:

The data analysis

  • PV / QPS: Page views/requests per second
  • UV: Indicates the number of users who access the website. Many users can access the website if they log in unordered. In this case, you can determine the users based on the uniqueness of IP + user_Agent
  • IPNumber: The number of IP addresses of the access source



  • Network traffic: according torequest_sizeThe size of the request counts incoming network traffic,bytes_sentResponse size Calculates the outgoing network traffic



  • refererSource analysis



  • Geo-location analysis of customer requests: derived from IP addressesgeodata



  • Customer equipment analysis: according touser_agentExtract the data



  • Request time statistics: According torequest_timedata
    • P99, P95, p90 Delay (what percentage of the request time is spent in the first 99 percent, for example, p99 is spent in the first 99 percent)
    • Monitor long-time exceptions


  • Response status monitoring: Based onstatusdata
    • Response ratio of each status code
    • 5XX Number of server exceptions


  • Combined with business analysis: requestedrequest_pathThe address andrequest_queryThe parameters must be specific to the business, for example
    • Request an album address is /album/:id, then logrequest_pathThis corresponds to a visit to the album
    • The address for the site search is/search? Q =< key words >, then statisticsrequest_path/searchThe number of log entries can know how many searches were conducted, statisticsrequest_queryqYou can find out the search terms

General architecture

ELK + Kafka is the mainstream solution for log system construction in the industry. Beats and LogStash are used to collect and transport logs, Kafka stores logs waiting for consumption, elasticSearch is used for data aggregation analysis. Grafana and Kibana were graphically presented.