Upgrade process and symptoms
Dubbo version was upgraded from 2.7.4 to 2.7.8. 2. It took more than 3 weeks for the development, test and pre-release environment to run.
The 2020-09-24 15:03:40. 664, dar: N/A, INFO, org. Apache. Zookeeper. ClientCnxn, ClientCnxn. Java, 1299, onConnected Session establishment complete on server xx/x:2181, sessionid = 0xe72dc060e3f1d06, Negotiated timeout = 40000 2020-09-24 15:03:40.666, [TID:N/A], INFO, 参 见, 参 见 org.apache.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient, Log4jLogger.java, 66, info [DUBBO] Curator zookeeper client instance initiated successfully, session id is e72dc060e3f1d06, dubbo version: 2.7.8, current host: 10.3.107.40 15:03:53 2020-09-24. 734, dar: N/A, WARN, org. Apache. The zookeeper. ClientCnxn, ClientCnxn. Java, 1164, run Session 0xe72dc060e3f1d06 for server xx/xx:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len18594568 is out of range! at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:113) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) at Org. Apache. Zookeeper. ClientCnxn $SendThread. Run (ClientCnxn. Java: 1141) the 2020-09-24 15:03:55. 540, dar: N/A, WARN, org.apache.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient, Log4jLogger.java, 81, warn [DUBBO] Curator zookeeper connection of session e72dc060e3f1d06 timed out. connection timeout value is 3000, Session expire timeout value is 60000, Dubbo version: 2.7.8, current host: 10.3.107.40 15:03:56 2020-09-24. 165, dar: N/A, INFO, org. Apache. The zookeeper. ClientCnxn, ClientCnxn. Java, 879, Initiating session 2020-09-24 15:03:56.346, [TID:N/A], INFO, org.apache.zookeeper.ClientCnxn, ClientCnxn.java, 1299, onConnected Session establishment complete on server xx/xx:2181, sessionid = 0xe72dc060e3f1d06, Negotiated A timeout = 40000 2020-09-24 15:04:08. 135, dar: N/A, WARN, org.apache.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient, Log4jLogger.java, 81, warn [DUBBO] Curator zookeeper connection recovered from connection lose, reuse the old session e72dc060e3f1d06, Dubbo version: 2.7.8, current host: 10.3.107.40 15:04:53 2020-09-24. 046, dar: N/A, WARN, org. Apache. The zookeeper. ClientCnxn, ClientCnxn. Java, 1164, run Session 0xe72dc060e3f1d06 for server xx/xx:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len18594616 is out of range! at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:113) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) at Org. Apache. Zookeeper. ClientCnxn $SendThread. Run (ClientCnxn. Java: 1141) the 2020-09-24 15:04:54. 626, dar: N/A, WARN, org.apache.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient, Log4jLogger.java, 81, warn [DUBBO] Curator zookeeper connection of session e72dc060e3f1d06 timed out. connection timeout value is 3000, Session expire timeout value is 60000, Dubbo version: 2.7.8, current host: 10.3.107.40Copy the code
4. And then there are millions of logs like this
The 2020-09-24 15:37:50. 309, dar: N/A, the ERROR, org. Apache. Curator. Framework. Imps. CuratorFrameworkImpl, CuratorFrameworkImpl.java, 694, logError Background operation retry gave up org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862) at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:647) at org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152) at org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:187) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:604) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)Copy the code
5. O&m receives an alarm, indicating that ZK is down
6. Roll back the application
Question why
There are mainly the following doubts:
- Why is there a problem in the online environment while there is no problem in the other environment? Is there a problem with ZK online? It doesn’t exist
org.apache.dubbo.metadata.report.MetadataReport
What is it for? Why 2.7.4 has no problem and 2.7.8 has a problem?
Investigation process:
- Packet lenxx is out of range!
The 2020-09-24 15:03:53. 734, dar: N/A, WARN, org. Apache. Zookeeper. ClientCnxn, ClientCnxn. Java, 1164, run Session 0xe72dc060e3f1d06 for server xx/xx:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len18594568 is out of range! at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:113) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)Copy the code
I encountered this problem when operating ZK. At that time, there was a node with a long string of chaotic numbers. When obtaining this node, I reported such an error. I did some research on the Internet,
public static final int packetLen = Integer.getInteger("jute.maxbuffer",4096 * 1024); protected void readLength() throws IOException { int len = incomingBuffer.getInt(); if (len < 0 || len >= ClientCnxn.packetLen) { throw new IOException("Packet len" + len + " is out of range!" ); } incomingBuffer = ByteBuffer.allocate(len); }Copy the code
The jute. Maxbuffer size can be used to fix this problem. But why did you report this error? Because some node is too big? However, there is no such error in version 2.7.4. Could it be caused by something listening on ZK in the new version?
2. Find a critical log in the log center and find that MetadataReport is involved in it
Injection of @DubboReference dependencies is failed; nested exception is java.lang.IllegalStateException: Failed to receive INITIALIZED event from zookeeper, pls. check if url zookeeper://xx:2181/org.apache.dubbo.metadata.report.MetadataReport? client=&dubbo.config-center.root-path=/ is correctCopy the code
In the current version of Dubbo, the metadata center is just a place to store metadata, and there is no complicated logic involved. And we have enabled the metadata center for all applications through the configuration center, and changed the metadata center to Redis. Why is this thing generated here?
3, inMetadataReportInstance
Init method, which involves fetchingMetadataReport
Will eventually lead toZookeeperDynamicConfiguration
Initialize the
4. Why does it not appear in other environments, but only in everyday life? The metadata center is redis configured for all clients in other environments through dubbo Global Configuration center, but not online. This results in the use of the default metadata center, and ZK. Dubbo uses the registry as the metadata center by default when the client is not configured with a metadata center