• Quick to use
  • Making address [https://github.com/liuhuagui/gridexcel]

Apache POI

In business development, we often encounter Excel import and export, and Apache POI is a common API used by Java developers. 【 poi.apache.org/components/…

GridExcel

Universal solution for reading and writing simply Excel based on functional programming and POI EventModel

GridExcel is a universal solution for simple Excel reading and writing based on Java8 functional programming and POI EventModel implementation.

  • Based on POI EventModel, it can reduce memory usage to avoid OOM and frequent FullGC when reading and writing Excel with very large data volume
  • Based on function programming, support associated objects and other complex situations, low learning cost
  • Support for streaming apis, making code writing and understanding easier and more intuitive

EventModel

What is EventModel? In the POI FAQ (FAQ) 】 【 poi.apache.org/help/faq.ht… The official explanation:

The SS eventmodel package is an API for reading Excel files without loading the whole spreadsheet into memory. It does require more knowledge on the part of the user, but reduces memory consumption by more than tenfold. It is based on the AWT event model in combination with SAX. If you need read-only access, this is the best way to do it.

The SS EventModel package is an API for reading Excel files without loading the entire spreadsheet into memory. It does require more knowledge, but reduces memory consumption by more than ten times. It is based on the combination of AWT (Abstract Window Toolkit) Event Model and SAX. If you need read-only access, this is the best way.

The functional programming

Speaking of functional programming, you have to mention Lambda expressions, if the Lambda in Java8 don’t know or understand not deep, we can see the oracle website of this article, “www.oracle.com/technetwork…” Personally, I think this is one of the best Java8 Lambda articles from starter to advanced.

The purpose of functional programming is to achieve code block passing, that is, to pass methods as parameters between methods. To this end, as the Java language evolves, several solutions continue to emerge:

  1. Java 1.0, implemented using the Abstract Window Toolkit (AWT) EventModel, was unwieldy and not feasible
  2. Java 1.1, a Series of Listeners
  3. Then use theThe inner classandAnonymous inner classBut for the most part, they are just used for event handling.
  4. Then it was found that more places would be replacedCode block as object(Actually data) is useful and necessary, but functional programming in Java is still clunky, and it needs to grow.
  5. Until Java 1.7, Java introduced the Java.lang. Invoke package, which provided a new mechanism for dynamically determining target methods (instead of relying solely on bytecode invocation instructions that were hardcoded in the virtual machine) called MethodHandle, which mimics bytecode method pointer calls, Similar to C/C++ Function Pointer and introduces the bytecode instruction invokedynamic for the fifth method call.
  6. It wasn’t until Java 1.8, based on Java 1.7’s invokedynamic bytecode instruction, that Lamda technology was implemented, passing functions as arguments between methods, that Java began to better support functional programming.
  7. Isn’t this already possible with reflection? Reflection API is heavyweight and has low performance.

Note: 5, 6, 7 refer to Understanding the Java Virtual Machine in Depth, edition 2, 8.3.3 Dynamically typed language support.


There are two problems that most API users face in POI usage that GridExcel addresses.

Problem 1. Only simple import and export functions are used, but the data object structure of each service is different, so the processing method needs to be rewritten, which is very troublesome.

The solution

Extract Excel processing logic and encapsulate it into utility classes.

Encapsulation condition

Like most Java apis, POI focuses more on advanced functions such as Formula, Conditional Formatting, Zoom, and so on. These advanced features are rarely used by API users who only do data import and export functions, allowing API users to simply encapsulate the use of POI.

Encapsulation way

Both read and write, we need to solve the mapping between Excel Columns and Java data objects Fields, and pass this mapping to the utility class as a parameter (Map object HashMap or LinkedHashMap).

Columns are easy to understand. They can be ordered numbers or letters, or other strings used as the first row to indicate the meaning of the data in the column.

For Fields, its processing needs to be complex, as follows:

  • An exception occurred while querying the field
  • The value of field or cell is NULL
  • The value of this column may correspond to a field value in the associated object or even in the associated collection
  • The value of a field or cell requires special treatment, for examplevalue == true? 1. To fail;
reflection

The first thing that comes to mind, and that most wrappers use, is the Reflection API. As we learned from the function programming section above, Reflection weight can degrade code performance and not support complex situations well enough.

Reflection + annotation

This approach can better support complex cases, but reflection still degrades performance, annotations create code intrusion into data objects, and there is no doubt a learning cost for other users of the tool class wrapper.

Anonymous inner class

This approach also works well for complex situations, but the syntax for using anonymous inner classes suffers from a “vertical problem” (meaning that the code needs too many lines to express basic concepts) and is too cumbersome. As for performance, it should not be as fast as the direct transfer function.

Function interface (Lambda)

This method is realized based on the bytecode instruction invokeDynamic called by the fifth method. It directly transfers function code blocks, which supports complex situations well, has higher performance, simpler code writing and simpler structure, and has zero invasion to data object code.

Fault 2. A large amount of data is imported or exported by ExcelOut of memoryorFrequent Full GCAnd how to solve it?

The solution

  • Read the Excel – eventmodel
  • Write Excel – streaming. SXSSFWorkbook

The principle of

The use of POI is common to us, and the following two concepts should be familiar:

  • HSSFWorkbook (processing 97(-2007).xls)
  • XSSFWorkbook (processing 2007 OOXML (.xlsx))

There is less exposure to EventModel and Streaming.sxSSFWorkbook, which are POI’s low level apis designed to address memory footprint issues. They can be used to read and write Excel with very large amounts of data. You can also avoid memory overruns or frequent Full GC. 【 poi.apache.org/components/…

  • Eventmodel, which is used to read Excel, does not load the entire Excel into memory. Instead, it allows the user to hand over every bit of information it reads from InputStream to a callback or listener, leaving it up to the user to discard, store, or dispose of it.
  • streaming.SXSSFWorkbook, used to write Excel (is a package of XSSFWorkbook, only support. XLSX), passedThe sliding windowTo implement, only keep the number of Rows allowed to exist in memory by the sliding window, and the excess Rows are written into a temporary file when calledwrite(OutputStream stream)Method writes out content directly from temporary memory to the targetOutputStream.SXSSFWorkbookThere are some limitations to the use of.
    • Only a limited number of rows are accessible at a point in time.
    • Sheet.clone() is not supported.
    • Formula evaluation is not supported

solution

  • Github.com/liuhuagui/g… Programming based on Java functions (Lambda), supporting streaming apis, using environment Java1.8 or higher, cost of learning: Lambda

In fact, the POI official website has given users the use of the sample, and the above tools just do their own encapsulation implementation, so that the use of more convenient.


Quick to use

<dependency>
    <groupId>com.github.liuhuagui</groupId>
    <artifactId>gridexcel</artifactId>
    <version>2.2</version>
</dependency>
Copy the code

GridExcel.java

Gridexcel.java provides a variety of static methods that can be used directly, and examples can be found in the test code (test data and test files are provided) :

  • Github.com/liuhuagui/g…
  • Github.com/liuhuagui/g…

Streaming API

* 1. Enable windowListener and place the service logic in the function. * 2. Do not enable windowListener, use get() method to fetch all data set, do subsequent processing. * 3. ReadFunction (final or effective final) : readFunction (final or effective final) * Note: Input to the readFunction function with EventModel is a collection of cell values for each row List<String>. *@throws Exception
  */
 @Test
 public void readXlsxByEventModel(a) throws Exception {
     InputStream resourceAsStream = Thread.currentThread().getContextClassLoader().getResourceAsStream("2007.xlsx");
     GridExcel.readByEventModel(resourceAsStream,TradeOrder.class,ExcelType.XLSX)
             .window(2,ts -> System.out.println(JSON.toJSONString(ts)))// It is recommended to execute your own business logic here
             .process(cs ->{
                 TradeOrder tradeOrder = new TradeOrder();
                 tradeOrder.setTradeOrderId(Long.valueOf(cs.get(0)));
                 Consultant consultant = new  Consultant();
                 consultant.setConsultantName(cs.get(3));
                 tradeOrder.setConsultant(consultant);
                 tradeOrder.setPaymentRatio(cs.get(16));
                 return tradeOrder;
             },1);
 }
 /** * Use Streaming UserModel to write data to Excel *@throws Exception
  */
 @Test
 public void writeExcelByStreaming(a) throws Exception {
     GridExcel.writeByStreaming(TradeOrder.class)
             .head(writeFunctionMap())// Mapping object fields to Excel columns
             .createSheet()
             .process(MockData.data())// Simulate data. Set up the business data set here.
             .write(FileUtils.openOutputStream(new File("/excel/test.xlsx")));
 }
Copy the code

ReadExcel

ReadExcelByUserModel

Use user model to read excel file. userModel —

  • Disadvantages: Large memory consumption, all the Excel information will be loaded into the memory for processing.
  • Advantages: Ready-made apis, easier to use and understand.
  • Usage scenario: Can handle small amount of data Excel.
ReadExcelByEventModel

Use eventModel to read excel file. eventModel —

  • Cons: No existing API, complex to use and understand, suitable for mid-to-advanced programmers (one of GridExcel’s goals is to make EventModel easy to use)
  • Advantages: Very small memory footprint. Rather than loading everything into memory at the outset, the processing of the body content (storage, use, discard) is left to the user, who can customize listeners to handle the content.
  • Usage scenario: Excel with large data volume can be processed, avoiding OOM and frequent FullGC

WriteExcel

WriteExcelByUserModel

Use user model to write excel file. userModel —

  • Disadvantages: The resulting Spreadsheets object is stored entirely in memory, so the size of Write Excel is limited by the Heap space size.
  • Advantages: Easier to use and understand.
  • Usage scenario: You can write Excel with small amount of data.
WriteExcelByStreaming

Use apI-compatible Streaming extension of XSSF to write very large excel files. Streaming userModel —

  • disadvantages:
    • Only XSSF is supported.
    • Sheet. Clone () is not supported;
    • Formula evaluation is not supported;
    • Only a limited number of rows are accessible at a point in time.
  • Advantages: Implemented by sliding Windows, only the specified size of rows is kept in memory, the excess is written out to temporary files, and the size of Write Excel is no longer limited by the Heap space.
  • Usage scenario: You can write very large Excel.

Issues

If you have a problem using the tool and need to add or change the function, you can send an Issue to the author: github.com/liuhuagui/g…

  • For example, you want to add style extensions to rows and columns beyond the first line

If you have any questions, please contact the author at [email protected]