First, find the problem
- 1. Recently, I found that there is an API service developed by Golang, which will be blocked every few days after running. Neither log is printed nor API can be accessed.
- 2. The service contains API services and several scheduled tasks, which are also unexecutable.
- 3, the service has access to the Route view, but can access.
- 4. Check the CPU, memory, and number of file handles. (The hTOP tool is recommended. CentOS can pass
yum install htop
Installation)
Second, solve the problem
1) repetition BUG
- First of all, how to reproduce the BUG, only let the service run for a few days, the BUG can be checked.
- Sure enough, the BUG resurfaced today.
(2) to locate the BUG
- 1, can access the view Route, access Route, generated HTTP Log can also print.
- 2. Crontab Jobs cannot be executed, and logs cannot be printed.
- 3. A scheduled task to print the third-party API status did not hang
- All access to the database is blocked. You should start with the database operation.
(3) to solve
Since we are using Golang’s GORM, I need to know the state of the database. Then I need to add a scheduled task to print the database status. The code is as follows:
func main (){ err := c.AddFunc("@every 5s", func() { by, _ := json.Marshal(db.DB.DB().Stats()) log.Println(string(by)) }) if err ! = nil { log.Fatalln(err.Error()) return } }Copy the code
We can see that the DBStats structure looks like this, with the following code:
// DBStats contains database statistics. type DBStats struct { MaxOpenConnections int // Maximum number of open connections to the database. // Pool Status OpenConnections int // The number of established connections both in use and idle. InUse int // The number of connections currently in use. Idle int // The number of idle connections. // Counters WaitCount int64 // The total number of connections waited for. WaitDuration time.Duration // The total time blocked waiting for a new connection. MaxIdleClosed int64 // The total number of connections closed due to SetMaxIdleConns. MaxLifetimeClosed int64 // The total number of connections closed due to SetConnMaxLifetime. }Copy the code
- When I added the scheduled task to print the database state, I found that the InUse filled up and the WaitCount increased. Look at the following JSON data:
{"MaxOpenConnections":30,"OpenConnections":30,"InUse":30,"Idle":0,"WaitCount":18032,"WaitDuration":66343149623,"MaxIdleC losed":0,"MaxLifetimeClosed":4203} {"MaxOpenConnections":30,"OpenConnections":30,"InUse":30,"Idle":0,"WaitCount":179968,"WaitDuration":5069116627810,"MaxId leClosed":0,"MaxLifetimeClosed":5147}Copy the code
-
The problem is that the connection is taken from the pool and not put back into the pool, or the connection is still occupied.
-
Database operations are involved in both API and JOBS. What would cause the connection to remain occupied? The first thing I would think is that the Transaction was not committed or rolled back.
-
Sure enough, found the following code, the moment really want to kill myself ~ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️
tx := o.Begin() transactionIsOk, err := chain.EthChainInstance.GetTransactionReceipt(action.Hash) if err ! = nil { continue }Copy the code
- The changed code
tx := o.Begin() transactionIsOk, err := chain.EthChainInstance.GetTransactionReceipt(action.Hash) if err ! = nil { tx.Rollback() continue }Copy the code
-
It is more recommended to defer tx.Com MIT () or defer tx.rollback ().
-
After repair, re-build deployment, observation of about ten hours, has been stable operation.
2019/05/08 00:18:11 {"MaxOpenConnections":300,"OpenConnections":13,"InUse":0,"Idle":13,"WaitCount":0,"WaitDuration":0,"MaxIdleClosed":0,"Max LifetimeClosed":2804}Copy the code
- OpenConnections are clogged with InUse because the Transaction is not completed, and other database operations cannot execute the corresponding SQL while making connections in the pool.
- The end ~