How can POI efficiently export millions of Excel data?

In a system with statistical function, exporting Excel function is almost certain. How to export Excel? How much data is exported? How to export efficiently?

Excel introduction Without introducing what Excel is, here is mainly to explain the row limits under each sheet in different versions.

From the above we know that Excel 2003 and below are not able to achieve a single sheet of millions of data.

Apache POI

  • Introduction to the

Apache POI is a free, open source, cross-platform Java API written in Java. Apache POI provides API for Java programs to read and write files in Microsoft Office (Excel, WORD, PowerPoint, Visio, etc.) format. POI is an acronym for Poor Obfuscation Implementation.

  • Commonly used class

HSSF – Provides the function of reading and writing Microsoft Excel XLS files. XSSF – Provides the function of reading and writing Microsoft Excel OOXML XLSX files. SXSSF – a low memory footprint API based on XSSF (version 3.8 starts to appear). HWPF – Reads and writes Microsoft Word DOC97 files. XWPF – Provides the ability to read and write Microsoft Word DOC2003 files. HSLF – Allows you to read and write Microsoft PowerPoint files. HDGF – Provides the ability to read Microsoft Visio files. HPBF – Provides the ability to read files in Microsoft Publisher format. HSMF – Provides the function of reading Microsoft Outlook files.

We’re exporting Excel here, so we’re using the first three.

Export strategy

  • plan

Use XSSF and SXSSF to import 1W, 10W, and 100W data respectively. Use SXSSF to import 10W data in 10W pages, and USE SXSSF to import 100W data in 10W pages

  • The performance comparison
Time does not include network time

  • conclusion

Solution 1: There is little difference between THE XSSF and SXSSF when the number of data is 10,000. The performance of SXSSF starts to improve when the number of data reaches 100,000. When the number of data reaches millions, the XSSF is no longer suitable for use

Scheme 2: SXSSF can store up to 1048576 rows of million-level data without table partitioning. When table partitioning is stored, exporting with multithreading is almost half of the time that exporting without multithreading

Finally, I came up with the most efficient way to export millions of data: multi-threaded table export

In actual combat

  • The controller layer:


@RestController 
@RequestMapping("export")
public class ReportController {

public static final String[] TITLE = new String[]{"Column 1"."Column 2"."Column 3"."Column 4"."Column 5"};
public static final String SHEET_NAME = "page1";
@RequestMapping(value = "/sxssf/page/thread")
@ResponseBody
public void exportSXSSFWorkbookByPageThread(HttpServletResponse response, Integer num) throws Exception {
    / / excel file name
    String fileName = System.currentTimeMillis() + ".xlsx";
    / / sheet
    if (Objects.isNull(num)) {
        num = 65536;
    }
    String[][] content = buildContent(num);
    long start = System.currentTimeMillis();
    SXSSFWorkbook wb = ExcelUtil.getSXSSFWorkbookByPageThread(TITLE, content, null);
    long millis = System.currentTimeMillis() - start;
    long second = millis / 1000;
    System.out.println("SXSSF Page Thread export" + num + "Piece of data, cost:" + second + "s/ " + millis + "ms");
    writeAndClose(response, fileName, wb);
    wb.dispose();
}
/** * Build content *@param num
 * @return* /
private String[][] buildContent(Integer num) {
    String[][] content = new String[num][4];
    for (int i = 0; i < content.length; i++) {
        content[i][0] = "1";
        content[i][5] = "2";
        content[i][6] = "3";
        content[i][7] = "4";
        content[i][8] = "5";
    }
    return content;
}
private void writeAndClose(HttpServletResponse response, String fileName, Workbook wb) {
    try {
        this.setResponseHeader(response, fileName);
        OutputStream os = response.getOutputStream();
        wb.write(os);
        os.flush();
        os.close();
    } catch(Exception e) { e.printStackTrace(); }}public void setResponseHeader(HttpServletResponse response, String fileName) {
    try {
        try {
            fileName = new String(fileName.getBytes(), "UTF-8");
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
        response.setContentType("application/octet-stream; charset=ISO8859-1");
        response.setHeader("Content-Disposition"."attachment; filename=" + fileName);
        response.addHeader("Pargam"."no-cache");
        response.addHeader("Cache-Control"."no-cache");
    } catch(Exception ex) { ex.printStackTrace(); }}}Copy the code


  • Tools:


public class ExcelUtil {
public static final int PER_SHEET_LIMIT = 500000;
public static SXSSFWorkbook getSXSSFWorkbookByPageThread(String[] title, String[][] values) {
 
 SXSSFWorkbook wb = new SXSSFWorkbook();
  int pageNum = values.length / PER_SHEET_LIMIT;
  int lastCount = values.length % PER_SHEET_LIMIT;
  if (values.length > PER_SHEET_LIMIT) {
      CellStyle style = wb.createCellStyle();
      int sheet = lastCount == 0 ? pageNum : pageNum + 1;
      CountDownLatch downLatch = new CountDownLatch(sheet);
      Executor executor = Executors.newFixedThreadPool(sheet);
      for (int c = 0; c <= pageNum; c++) {
          int rowNum = PER_SHEET_LIMIT;
          if (c == pageNum) {
              if (lastCount == 0) {
                  continue;
              }
              rowNum = lastCount;
          }
          Sheet sheet = wb.createSheet("page" + c);
          executor.execute(new PageTask(downLatch, sheet, title, style, rowNum, values));
      }
      try {
          downLatch.await();
      } catch(InterruptedException e) { e.printStackTrace(); }}return wb;
}}
Copy the code


  • Table tasks:


public class PageTask implements Runnable {
private CountDownLatch countDownLatch;
private Sheet sheet;
private String[] title;
private CellStyle style;
private int b;
private String[][] values;
public PageTask(CountDownLatch countDownLatch, Sheet sheet, String[] title, CellStyle style, int b, String[][] values) {
    this.countDownLatch = countDownLatch;
    this.sheet = sheet;
    this.title = title;
    this.style = style;
    this.b = b;
    this.values = values;
}
@Override
public void run(a) {
    try {
        Row row = sheet.createRow(0);
        Cell cell = null;
        for (int i = 0; i < title.length; i++) {
            cell = row.createCell(i);
            cell.setCellValue(title[i]);
            cell.setCellStyle(style);
        }
        for (int i = 0; i < b; i++) {
            row = sheet.createRow(i + 1);
            for (int j = 0; j < values[i].length; j++) { row.createCell(j).setCellValue(values[i][j]); }}}catch (Exception e) {
        e.printStackTrace();
    } finally {
        if(countDownLatch ! =null) { countDownLatch.countDown(); }}}}Copy the code