The agent layer platform addresses the high availability of MySQL in multi-tenant mode. Because there are many businesses on the agent layer, once something goes wrong, it will affect a large number of businesses, so more energy is invested in quality, and its operation and monitoring has been the focus of consideration. The traceable ability of service problems is also critical. This chapter mainly introduces the slow log platform of the agent layer. Both the business side and the DBA can quickly query the slow log of the system, which helps to locate service problems and system faults.
Generating slow Logs
The proxy layer platform does not rely on slow logs of MySQL, but generates slow logs itself, mainly from the following aspects: After all, the proxy layer is an added layer between services and MySQL, which will have certain performance loss; The agent layer solves the cross-room problem. Write operations can only go to the master library. Therefore, in the case of multi-room deployment, the agent layer will access MySQL from the cross-room. You can decide the format of the slow log and customize the elements you are interested in.
Slow log format:
Time | agent layer IP IP | | the client account | | business Name a room | | target room whether autocommit | | operation mode DB Name IP | | MySQL MySQL Port time | | SQL
The following is an example of log content: 1492013991141 | 10.20.30.40 | test_business | 3.4.5.6 | user | 0 | | | | guangzhou shenzhen false db_test | 6.7.8.9 | 3306 | 22539 | delete from ABC where TTT = ‘ ‘
The logs are very detailed, covering almost every possible element. The author uses plain text format to store full logs for the following purposes:
- You don’t need extra tools to view it at any time;
- Various open source log collection tools support it; To store in plain text, SQL statements must be formatted:
- Remove invisible characters;
- Replace carriage return newline characters with Spaces;
- Truncation of long SQL statements;
Log Platform Architecture
The log platform architecture is as follows:
Graph LR Proxy layer host1-LogStash-->ES Cluster agent layer host2-LogStash-->ES cluster agent layer hosts... LogStash -->ES Cluster agent layer host N-LogStash-->ES cluster --> Kibana ES cluster --> DBMSCopy the code
Log collection
Deploy LogStash on each agent layer host for log collection, according to the specified slow log format, in order to implement
input {
file {
type => “slowlog”
path => [“/data/proxy-layer/slowlog/*”]
}
}
filter {
if [type] == “slowlog” {
ruby {
init => “@kname = [‘time’, ‘host’, ‘business’, ‘clientip’, ‘user’, ‘fromidc’, ‘toidc’,’autcommit’, ‘mode’, ‘db’, ‘mysqlip’, ‘mysqlport’, ‘used’, ‘sql’ ]”
code => “event.remove(‘host’) ; event.append(Hash[@kname.zip(event[‘message’].split(‘|’, @kname.size()))]) “
}
}
date {
match => [“time”, “UNIX_MS”]
target => “@timestamp”
remove_field => [“message”, “path”]
}
}
output {
elasticsearch { hosts => [“es.abc.com:9200”] }
stdout { codec =>rubydebug }
}
Here are a few things to explain the logstash configuration file above:
- If the log type is set to slowlog, indexes starting with slowlog are automatically generated in ES.
- Delete the host field automatically generated by logstash. Because we have too many hosts, the hostname of each host is basically the same, so the host field is basically invalid.
- The IP address of the proxy layer host in the slow log is used as the host field.
- The timestamp field is not generated by Logstash, but by the timestamp in the slow log.
- Delete path field, basically useless;
Slow Log Display
The default setting of the proxy layer is slow log if the value exceeds 1 second. Each service can set its own timeout time in milliseconds. Now SELECT any test service and run SELECT sleep(3).
After the SQL statement completes, we can view the slow log in the DBMS:
I only intercept the SQL used for testing, only for reference.
As for the Kibana display, I won’t go into details here, but there are more detailed instructions online.