scenario
Project online an interface, the first gray scale of a machine to observe the call; Interface continuous call, over a period of time, found that the machine on the interface call began to report OOM abnormal! That day is the deadline, exciting.
Found the problem
Step one, usejps
Command to obtain the process ID of the faulty JVM process
Jps-l -m = jps-l -m = jps-l -m = jPs-l -m = jPs-l -m = jPs-l -m = jPs-l -m = jPs-l -m = jPs-l -m = jPs-l -m = jPs-l -m
The JPS command
JPS -l -m: parameter -l lists all JVM processes on the machine, and -m displays the parameters passed to main() when the JVM is started
Step two, usejstat
Observe the status of the JVM and find problems
Since it is OOM exception, we first restart the machine to observe the running status of JVM.
We used the jstat -gc pid time command to observe the GC, and found that after YGC, the GC lost not much memory. After each YGC, a part of memory was not reclaimed. As a result, the memory that could not be reclaimed after multiple YGC was moved to the old part of the heap, and FGC found that the old part could not be reclaimed after multiple YGC. Here we can basically determine the problem of memory leakage, we have a simple look at the machine’s CPU, memory, disk status
Jstat command:
Jstat (JVM Statistics Monitoring) is a command used to monitor the status of a virtual machine while it is running. It shows the running data of a virtual machine, such as class loading, memory, garbage collection, JIT compilation, and so on.
Jstat -gc pid time: – GC Monitors THE GC information of the JVM, the ID of the JVM process monitored by the PID, and time how many milliseconds to refresh
Jstat -gccause PID time: -gccause monitors GC information and displays the cause of the last GC, the ID of the JVM process monitored by the PID, and the number of milliseconds to refresh
Jstat -class pid time: -class Monitors JVM class loading information, THE ID of the JVM process monitored by the PID, and the number of milliseconds to refresh
A brief introduction to heap GC:
At the beginning of GC, objects will only exist in the Eden zone and in the Survivor zone named “From”, where the Survivor zone “To” is empty. Following the GC, all surviving objects in Eden are copied To “To”, and surviving objects in “From” are moved based on their age.
Objects whose age reaches a certain value (age threshold, which can be set by -xx :MaxTenuringThreshold) are moved To the aged generation, and objects that do not reach the threshold are copied To the “To” area. After this GC, the Eden and From areas have been emptied. At this point, “From” and “To” switch roles, so that the new “To” is the “From” before the last GC, and the new “From” is the “To” before the last GC. In any case, the Survivor region named To is guaranteed To be empty, and the Minor GC keeps repeating the process.
The third step is to observe the state of the machine and confirm the problem
Run the top-p PID command to obtain the CPU and memory usage of the process. Check RES and %CPU %MEM:
Here’s a quick summary of what the top command shows:
VIRT: Virtual memory usage 1. The amount of virtual memory a process “needs”, including libraries, code, data, etc. 2. If a process requests 100 MB of memory, but only uses 10 MB, it will increase by 100 MB, not the actual usage
RES: Resident memory usage 1, the current memory usage of the process, excluding swap out 2, the shared memory of other processes 3, if you apply for 100m memory, but use 10m, it will increase by 10m, as opposed to VIRT It only counts the memory size of loaded library files
SHR: shared memory is used by a process in a shared library, but it contains the size of the entire shared library. RES — SHR 4, after swap out, it will drop down
DATA 1. Memory occupied by DATA. If top is not displayed, press f to display it. 2, the real data space required by the program is really used in the operation.
Ps: If the program occupies more real memory, it indicates that the program has requested more memory, and the actual space used is also more. If the application occupies a lot of virtual memory, the application has requested a lot of space, but is not using it.
Found that the machine’s own state does not exist, so needless to say, found the problem, typical memory leak.
Step 4: Get the JVM process dump file using jmap
We use jmap-dump :format=b,file=dump_file_name pid command to dump the status of the current JVM or a missing dump file for the following analysis
Jmap command:
The Jmap (JVM Memory Map) command is used to generate heap dump files and query finalize execution queues, Java heaps, and persistent generation details, such as the current usage, which collector is currently in use, etc. Jmap -dump:format=b,file=dump_file_name PID: file= Specifies the file name of the output data. Pid SPECIFIES the PROCESS ID of the JVM
Next, roll back the grayscale machine and begin to solve the problem =.=
To solve the problem
The first step is dump file analysis
Here, we analyze dump files using Jprofiler, which looks like this:
Dump: Select the Current Object Set in the Heap Walker, which shows the amount of resources currently occupied by the class in order from the largest to the smallest.
Biggest Objects
org.janusgraph.graphdb.database.StandardJanusGraph
724M
openTransactions
ConcurrentHashMap
The second step, source code search location code
Go to the project and open idea- Open the project – double-click shift – open global class Lookup – enter StandardJanusGraph.
janusgraph
private Set<StandardJanusGraphTx> openTransactions;
Copy the code
Initialize to a ConcurrentHashMap:
openTransactions = Collections.newSetFromMap(new
ConcurrentHashMap<StandardJanusGraphTx, Boolean>(100.0.75 f.1));
Copy the code
Looking at the code above, we can see that the StandardJanusGraphTx stored inside is literally a transaction object in the JanusGraph framework. Let’s go back up the code and see when the Map is assigned:
// Find a way to execute opentransactions.add ()
public StandardJanusGraphTx newTransaction(final TransactionConfiguration configuration) {
if(! isOpen) ExceptionFactory.graphShutdown();try {
StandardJanusGraphTx tx = new StandardJanusGraphTx(this, configuration);
tx.setBackendTransaction(openBackendTransaction(tx));
openTransactions.add(tx); / / attention! Here we add the above map object
return tx;
} catch (BackendException e) {
throw new JanusGraphException("Could not start new transaction", e); }}// The above discovery is a newTransaction, a method that creates a transaction. To be sure, follow up with the class that called the above method:
public JanusGraphTransaction start(a) {
TransactionConfiguration immutable = new ImmutableTxCfg(isReadOnly, hasEnabledBatchLoading,
assignIDsImmediately, preloadedData, forceIndexUsage, verifyExternalVertexExistence,
verifyInternalVertexExistence, acquireLocks, verifyUniqueness,
propertyPrefetching, singleThreaded, threadBound, getTimestampProvider(), userCommitTime,
indexCacheWeight, getVertexCacheSize(), getDirtyVertexSize(),
logIdentifier, restrictedPartitions, groupName,
defaultSchemaMaker, customOptions);
return graph.newTransaction(immutable); / / attention! This is where the newTransaction method described above is called
}
// Find the uppermost method
public JanusGraphTransaction newTransaction(a) {
return buildTransaction().start(); // The above start method is called here
}
Copy the code
When we operate on the graph data in the graph database, we manually create transactions. Before each query, we call code like datadao.begin (). Public JanusGraphTransaction newTransaction() is called;
Finally, a quick look at the source code shows that the logic used to remove data from the map is the commit transaction interface. The call chain is as follows:
public void closeTransaction(StandardJanusGraphTx tx) {
openTransactions.remove(tx); // Remove the StandardJanusGraphTx object from map
}
private void releaseTransaction(a) {
isOpen = false;
graph.closeTransaction(this); // Call the closeTransaction method above
vertexCache.close();
}
public synchronized void commit(a) {
Preconditions.checkArgument(isOpen(), "The transaction has already been closed");
boolean success = false;
if (null! = config.getGroupName()) { MetricManager.INSTANCE.getCounter(config.getGroupName(),"tx"."commit").inc();
}
try {
if (hasModifications()) {
graph.commit(addedRelations.getAll(), deletedRelations.values(), this);
} else {
txHandle.commit(); // The commit method also calls releaseTransaction to release the transaction
}
success = true;
} catch (Exception e) {
try {
txHandle.rollback();
} catch (BackendException e1) {
throw new JanusGraphException("Could not rollback after a failed commit", e);
}
throw new JanusGraphException("Could not commit transaction due to exception during persistence", e);
} finally {
releaseTransaction(); // // call releaseTransaction
if (null! = config.getGroupName() && ! success) { MetricManager.INSTANCE.getCounter(config.getGroupName(),"tx"."commit.exceptions").inc(); }}}Copy the code
Finally, we found the root cause of the memory leak: a transaction was invoked in the project codebegin
But there is nocommit
The code!
Step three, fix problem validation
Fix the problem: find the code for the memory leak interface, and find the location where there is no commit(), add commit() code to try-catch-finally;
Commit – deploy – publish – Gray scale after a machine to observe the phenomenon of memory leakage disappeared, GC recovery normal;
Memory leak problem resolved, project online as scheduled ~
The last
If you’ve ever encountered a memory leak, feel free to tell your story in the comments section
It took me longer to write this article than I expected. I expected to finish it in 2 hours, but it took me a whole afternoon…
Original is not easy, if you gain, I hope you can like comment support ~
And I welcome you to follow mineThe Denver nuggets
And wechat search public account [Originality of Java
Support the author, who regularly shares what he sees in his work