“This is the sixth day of my participation in the First Challenge 2022. For details: First Challenge 2022”

EasyExcel profile

A requirement of the project is to upload Excel, read data, and perform a series of operations. I have seen EasyExcel for a long time, but I haven’t used it yet. I will try to use it today.

Java domain parsing, operating Excel is more famous framework Apache POI, JXL, etc. But they all have one serious problem: they consume a lot of memory. This may be fine if your system has low concurrency, but if it does, it will be OOM or JVM full gc frequently.

EasyExcel is an open source Excel processing framework of Alibaba, known for its simplicity and memory saving.

64 MB of memory to read 75 MB of Excel in 1 minute (46W rows and 25 columns) (and of course rush mode is faster, but the memory footprint is a little over 100 MB)

The official explanation: Alibaba, Easy Excel – simple, save in the province of Java parse Excel tools | read Excel

The concept of class

The AnalysisContext core interface is also the context for the entire operation.

ExcelWriter writes to Excel classes

ExcelReader reads Excel classes

A Cell corresponds to a row of data sheet and a single table

Easy to read

Because my project involves reading Excel, I will briefly talk about reading here, which is also from the official example.

Data structure corresponding to each row:

@Data
public class DemoData {
    private String string;
    private Date date;
    private Double doubleData;
}
Copy the code

Listener, each row read is entered into the Invoke callback

// There is an important point that DemoDataListener cannot be managed by Spring. To read Excel, you need to use new method, and then use Spring constructor to pass it in

public class DemoDataListener extends AnalysisEventListener<DemoData> {
    private static final Logger LOGGER = LoggerFactory.getLogger(DemoDataListener.class);
    /** * Store the database every 5 entries, in actual use can be 3000 entries, then clear the list, convenient memory collection */
    private static final int BATCH_COUNT = 5;
    List<DemoData> list = new ArrayList<DemoData>();
    /** * If this is a DAO, of course there is business logic that this can also be a service. Of course this object doesn't work if you don't have to store it. * /
    private DemoDAO demoDAO;
​
    public DemoDataListener(a) {
        // This is demo, so just create a new one. If you are in Spring, use the parameterized constructor below
        demoDAO = new DemoDAO();
    }
​
    /** * If spring is used, use this constructor. Each time you create a Listener, you need to pass in spring-managed classes * *@param demoDAO
     */
    public DemoDataListener(DemoDAO demoDAO) {
        this.demoDAO = demoDAO;
    }
​
    / * * * the each data parsing would call for * *@param data
     *            one row value. Is is same as {@link AnalysisContext#readRowHolder()}
     * @param context
     */
    @Override
    public void invoke(DemoData data, AnalysisContext context) {
        LOGGER.info("Parse to a single data :{}", JSON.toJSONString(data));
        list.add(data);
        // Reach BATCH_COUNT, need to go to store a database, prevent data tens of thousands of data in memory, easy OOM
        if (list.size() >= BATCH_COUNT) {
            saveData();
            // Store the cleanup listlist.clear(); }}/** * when all data is parsed, call **@param context
     */
    @Override
    public void doAfterAllAnalysed(AnalysisContext context) {
        // Save the data here, too, to ensure that the last remaining data is also stored in the database
        saveData();
        LOGGER.info("All data parsed!");
    }
​
    /** * + store database */
    private void saveData(a) {
        LOGGER.info("{} data, start storing database!", list.size());
        demoDAO.save(list);
        LOGGER.info("Database saved successfully!"); }}Copy the code

Start point of reading data:

/** ** <p>1. Create excel corresponding entity object reference {@linkDemoData} * <p>2. Create excel row - by - line callback listener, see {@linkDemoDataListener} * <p>3
@Test
public void simpleRead(a) {
    // There is an important point that DemoDataListener cannot be managed by Spring. To read Excel, you need to use new method, and then use Spring constructor to pass it in
    // write method 1:
    String fileName = TestFileUtil.getPath() + "demo" + File.separator + "demo.xlsx";
    // We need to specify which class is used to read the sheet, and then the first read stream is automatically closed
    EasyExcel.read(fileName, DemoData.class, new DemoDataListener()).sheet().doRead();
​
    // write method 2:
    fileName = TestFileUtil.getPath() + "demo" + File.separator + "demo.xlsx";
    ExcelReader excelReader = EasyExcel.read(fileName, DemoData.class, new DemoDataListener()).build();
    ReadSheet readSheet = EasyExcel.readSheet(0).build();
    excelReader.read(readSheet);
    // Do not forget to close this file, temporary files will be created while reading, and the disk will crash
    excelReader.finish();
}
Copy the code

The principle of analytic

Simple understanding: In the process of reading, read a line of data each time, invoke function in the Listener, and send the data to the caller for parsing and processing.

Source tracking:

Using the demo above,

The first step is to make a breakpoint where you read,

The second step is to make a break point at invoke

Look at the order in which the stack is called, and then look at the order in which the code is executed from the bottom of each stack to get a sense of the flow of execution.

Here’s some core code

com.alibaba.excel.analysis.ExcelAnalyserImpl#analysis

This is where you start reading Excel

I don’t pay attention to the process of scanning Excel in the middle, which is basically reading data, which is quite tedious.

com.alibaba.excel.read.processor.DefaultAnalysisEventProcessor#endRow

Here’s what happens after reading a line

The com for dealing with data. Alibaba. Excel. Read. Processor. DefaultAnalysisEventProcessor# dealData

I’m curious about the isData setting, how many rows are in the header and how it’s handled. I didn’t do that in my code.

By default, the first line is the header

Conclusion:

Easyexcel is really simple