The author | autumn
Arthas official community is holding an essay call for prizesClick on the submit】
In the RockerMQ Console, messages are slowly queried, taking up to 10 seconds to 5 to 6 seconds to 14+ seconds.
The diagram below:
Why is that? Why is it so time-consuming to query messages?
Current development environment: Windows10, JDK8, RocketMQ is 4.5.2.
There is no problem on other machines, RocketMQ and Console are deployed on Linux installed on VMware, the virtual machine on this machine, and verified to be fine.
So what happened to my machine??
Since it is currently the time spent on the interface, we do not know where the time is spent, so Arthas is used to track the time spent on the down chain.
Using the trace command:
The trace command method calls the path internally and prints the time spent on each node on the method path. The trace command can proactively search the method invocation path corresponding to class-pattern/method-pattern, render and count all performance costs along the entire invocation link, and trace the invocation link.
trace org.apache.rocketmq.console.service.impl.MessageServiceImpl queryMessageByTopic
Copy the code
The main time taken from the current invocation path is: DefaultMQPullConsumer constructor initialization + DefaultMQPullConsumer startup time. So let’s move on to the inside.
Now let’s focus on initialization of the DefaultMQPullConsumer constructor:
trace org.apache.rocketmq.client.consumer.DefaultMQPullConsumer <init>
Copy the code
From the constructor initialization entry, it doesn’t take much time.
Now let’s look at the DefaultMQPullConsumer launch method:
[E] Enable regular expression matching. The default value is wildcard matching
trace -E org.apache.rocketmq.client.consumer.DefaultMQPullConsumer start
Copy the code
trace -E org.apache.rocketmq.client.consumer.DefaultMQPullConsumer <init>|start
Copy the code
It then turns out that the time is mostly spent getting the MQClientInstance instance.
trace org.apache.rocketmq.client.impl.MQClientManager getAndCreateMQClientInstance
Copy the code
trace org.apache.rocketmq.client.ClientConfig cloneClientConfig
Copy the code
Now look at the ClientConfig#cloneClientConfig method:
public ClientConfig cloneClientConfig() {
ClientConfig cc = new ClientConfig();
cc.namesrvAddr = namesrvAddr;
cc.clientIP = clientIP;
cc.instanceName = instanceName;
cc.clientCallbackExecutorThreads = clientCallbackExecutorThreads;
cc.pollNameServerInterval = pollNameServerInterval;
cc.heartbeatBrokerInterval = heartbeatBrokerInterval;
cc.persistConsumerOffsetInterval = persistConsumerOffsetInterval;
cc.unitMode = unitMode;
cc.unitName = unitName;
cc.vipChannelEnabled = vipChannelEnabled;
cc.useTLS = useTLS;
cc.namespace = namespace;
cc.language = language;
return cc;
}
Copy the code
You can see a lot of assignment operations, which you can ignore, just look at new ClientConfig() :
trace org.apache.rocketmq.client.ClientConfig <init>
Copy the code
You can see that the main time is 3-4 seconds, and the main time is this tool class method:
`RemotingUtil#getLocalAddress“`
trace org.apache.rocketmq.remoting.common.RemotingUtil getLocalAddress
Copy the code
So far, the JDK method call has been traced: NetworkInterface#getNetworkInterfaces.
Next to view the JDK function call:
trace --skipJDKMethod false java.net.NetworkInterface getNetworkInterfaces
Copy the code
skipJDKMethod skip jdk method trace, default value true. By default, trace does not contain JDK function calls. If you want to trace JDK functions, you need to explicitly set skipJDKMethod false.
At this time, tracking is not possible, so check and issue according to four tips:
Github.com/alibaba/art…
Github.com/alibaba/art…
Finally, you need to turn on unsafe.
options unsafe true
Copy the code
Open complete.
Once again, you can see the JDK call chain.
At this point, the culprit of the RocketMQ Console’s slow queries has been identified: it takes a long time to retrieve the local interface card. At first I wondered if persistent storage was slow to load (ruled out).
So why does it take so long to call the network interface of the current operating system?
Attention to the java.net.NetworkInterface#getNetworkInterfaces at this time
public static Enumeration<NetworkInterface> getNetworkInterfaces() throws SocketException { final NetworkInterface[] netifs = getAll(); // specified to return null if no network interfaces if (netifs == null) return null; return new Enumeration<NetworkInterface>() { private int i = 0; public NetworkInterface nextElement() { if (netifs ! = null && i < netifs.length) { NetworkInterface netif = netifs[i++]; return netif; } else { throw new NoSuchElementException(); } } public boolean hasMoreElements() { return (netifs ! = null && i < netifs.length); }}; } private native static NetworkInterface[] getAll() throws SocketException;Copy the code
As you can see, JDK functions already call native methods, which are the underlying implementation of the JDK (C/C ++), very closely related to the operating system.
Debug the getNetworkInterfaces method to check which network interfaces exist:
A check found 81! Then look at the local network adapter:
The local Windows server has Wlan, VPN, and VMware network adapters.
In the end, it turned out to be relevant to them. After I disabled the corresponding adapter, I re-called NetworkInterface#getNetworkInterfaces, which took 3+ seconds to a few hundred milliseconds.
In the end, it is a pity that it is still not able to analyze why Windows calls down the native method of nic interface appears so time-consuming. And it must have something to do with my machine, because there is no problem with other machines.
If you want to analyze the reason, you need to have more basic knowledge of C/C ++.
Conclusion:
- Windows may be prone to some abnormal problems, unexpectedly can also give me this ^@^. Fortunately, Windows is generally used sparingly, and Linux(Unix) is fairly stable in deploying middleware such as RocketMQ, except on a large number of development machines.
- Arthas Trace is used to track the path of method calls and the time spent on each step, making it easy to troubleshoot bottlenecks.
- The -e parameter supports regular expression matching. –skipJDKMethod false Supports function calls containing the JDK; Tracing JDK functions, for example. If you cannot find a corresponding class or method, you may need to turn on Unsafe.
Author | fall, focus on Java back-end, distributed, service, system architecture, etc. Individual public number “porter to structure”.