background

A personally maintained back-end service uses the Orika framework to handle mapping between bean objects and simply encapsulates the Orika framework to reduce redundant code. Here is a brief description of the Orika framework, which dynamically creates bytecode at run time and generates a minimal overhead mapper to achieve mapping between Bean objects. Other frameworks for Bean mapping include Mapstruct, BeanCopier, and so on.

The problem

After the background service runs in the internal test environment for a period of time, some interfaces respond slowly, and the interfaces take more than ten seconds on average. As a result, the front-end interface response times out and data cannot be flushed.

Troubleshoot problems

The preliminary screening

After a preliminary investigation of the related interface business logic, it was suspected that the bean mapping process of Orika was time-consuming. Is the Orika framework buggy? Or is there a hidden problem with the packaged Orika utility class?

Further screening

1) The processing time of the encapsulated Orika utility class for Bean mapping suddenly becomes longer and may have thread blocking.

Start the service locally, use JMeter to test and verify it locally, 100 threads per second to request concurrency, loop 5 times. With the arthas tool, use the thread-b command to see if the current service has blocked threads.

After test, there is a blocked thread by Orika DefaultMapperFactory. RegisterClassMap method caused the thread block.

2) analyze DefaultMapperFactory registerClassMap source code

This code is the main logic for Orika to generate the mapper, with the block of method code adding synchronized heavyweight locks.

public synchronized <A, B> void registerClassMap(ClassMap<A, B> classMap) {
        this.classMapRegistry.put(new MapperKey(classMap.getAType(), classMap.getBType()), classMap);
        // Determine whether the mapper is under construction or has already been built, and rebuild the registry if so
        if (this.isBuilding || this.isBuilt) {
            MappingContext context = this.contextFactory.getContext();
            try {
                if (classMap.getUsedMappers().isEmpty()) {
                    classMap = classMap.copyWithUsedMappers(this.discoverUsedMappers(classMap));
                }
                this.buildMapper(classMap, this.isBuilding, context);
                this.buildObjectFactories(classMap, context);
                this.initializeUsedMappers(classMap);
                this.mapperFacade.factoryModified(this);
            } finally {
                // Release the mapped bean's type, class attributes, and other information from the cache
                this.contextFactory.release(context); }}}Copy the code

Synchronized locks are implemented based on Java object headers and Monitor.

When multiple threads request a resource critical region at the same time, the object monitor sets several states to distinguish between the requesting threads. The thread state flow can be easily understood in the following figure:

It can be seen from thread state flow that synchronized lock is used. In the case of concurrency, there will be lock waiting, and the efficiency of this lock is relatively low.

Thus can analyze, indeed because DefaultMapperFactory registerClassMap method obstruction caused by time-consuming.

But why keep coming to this approach of dynamically generating mappers? It is possible that the encapsulated Orika utility class is causing the problem.

3) Analysis of encapsulated MapperUtil tool classes

The encapsulated utility class encapsulates the underlying operations of Orika and caches Orika’s mapperFacade.

Tool class complete code address: gitee.com/LuXianSheng…

After analysis, it is the code in the local cache through special case, continuously perform classMapBuilder. ByDefault (). The register () to register the mapping. The core code is as follows:

/** * get custom mapping ** @param toClass mapping class * @Param dataClass data mapping class * @param targetPropertyMap handle field name inconsistency (key original object property value) Private fun <E, T> getMapperFacade(toClass: Class<E>, dataClass: Class<T>, targetPropertyMap: Map<String, String> = emptyMap(), excludes: List<String> = emptyList(), converterMap: Map<String, String> = emptyMap() ): MapperFacade { val mapKey = dataClass.canonicalName + "_" + toClass.canonicalName var mapperFacade = CACHE_MAPPER_FACADE_MAP[mapKey] // Cache penetrates if (mapperFacade! = null && excludes. IsEmpty () && Convertermap.isempty ()) {return mapperFacade} var classMapBuilder = MAPPER_FACTORY.classMap(dataClass, ToClass) excludes. ForEach {classMapBuilder = classMapBuilder. Exclude (it)} / / attribute mapping targetPropertyMap. ForEach {val converterId = converterMap[it.key] if (converterId ! = null) { classMapBuilder.fieldMap(it.key, it.value).converter(converterId).add() } else { classMapBuilder.field(it.key, it.value) } } classMapBuilder.byDefault().register() mapperFacade = MAPPER_FACTORY.mapperFacade CACHE_MAPPER_FACADE_MAP[mapKey] = mapperFacade return mapperFacade }Copy the code

The main cause of the problem is found, but there is still a question, why the interface will brush slower? If the average time taken to continuously register a mapper is similar, there should not be many thread processes queued. Note that Orika is constantly dynamically generating classes to load into the JVM. Is that caused by the JVM?

4) JVM check

Obtain the GC logs of service operation and analyze the GC logs by GCeasy online analysis tool. The visualization results are as follows:

The figure above mainly shows the MEMORY distribution of the JVM, showing the young generation, the old generation, the meta-space, and the size allocated by the JVM and the peak size used during program execution.

The figure above shows some of the key metrics for the JVM, as follows

  • Throughput is Throughput
  • Latency indicates the response time
  • Avg Pause GC Time Average GC Time
  • Max Pause GC TIme Indicates the maximum GC TIme

The figure above Interactive Graphs shows some indicator data for running the interaction as follows:

  • Heap after GC: Heap usage after GC
  • Heap before GC: Heap usage before GC
  • GC Duration: GC Duration
  • Reclaimed Bytes: The memory size of the Reclaimed garbage object
  • Young Gen: Young generation heap usage
  • Old Gen: The usage of the Old generation heap
  • Meta Space: Usage of the Meta Space
  • A & P: Heap allocation and promotion at each GC. The red line shows how much memory (objects) in the young generation are promoted to the old generation each time GC is performed.

The figure above shows some GC statistics. How much memory was collected in total for each GC, how long it took, the average time, and individual statistics for each GC.

The figure above shows some of the cause information for GC.

According to the above data analysis, it can be seen that the young generation has insufficient memory space and fails to apply for capacity expansion. The young generation has frequent GC, but the memory reclamation effect is not obvious after GC, and the metadata data size is abnormal.

5) Local test validation

The JDK installed locally is 1.8, and the JVM configuration follows the default configuration. Local emulation request 100qps, jstat -gcutill command to query JVM status


As you can see in the figure above, it takes only one second to trigger a GC, and as THE QPS increases and GC becomes more frequent, the application should not actually have a short, frequent GC.

Metadata has a sudden peak change at the beginning, and then grows slowly, mainly caused by lock waiting.

6) analysis

As the number of requests increases, the metadata exceeds the threshold. The younger generation keeps creating class objects of the mapper. The more requests are made, the more frequent GC is performed.

The results of

In summary, it is mainly the local cache penetration of the encapsulated MapperUtil tool, the continuous execution of the classMap maper methods, the dynamic generation of many classes, synchronized lock contention and thread blocking queuing, the frequent GC of the young generation of JVM, and the increase of metadata. Resulting in slower and slower responses.

instructions

Here we have submitted a test code of Orika in Gitee, which can reproduce the problems mentioned in the article. If you are interested, you can run the analysis and validation locally.

Warehouse address: gitee.com/LuXianSheng…

Jemter test scripts: Put in the Resource directory of the repository project

Note: The interface may be called quickly in the beginning, not so quickly to reproduce the problem mentioned in this article, because the warehouse project is only a test code, you can adjust the QPS loop test, you can find that the interface response will be slower and slower.

Author: slagsea