Read millions of data efficiently

Following up on the introduction to efficient file writing, I recently took some time to study reading Excel files. In summary, there are two ways poI can read Excel: user mode and event mode.

However, many business scenarios to read Excel still use user mode, but this mode needs to create a large number of objects, the support for large files is very unfriendly, very easy to OOM. However, for the event mode, listeners need to be implemented by themselves and different events need to be resolved according to their own needs, so it is complicated to use.

Based on this, EasyExcel encapsulates the common Excel format document event parsing, and provides an interface for developers to expand the customization, so that you can parse Excel no longer bother the purpose.

Talk is cheap, show me the code.

Use the pose

pom

< the groupId > com. Making. Dorae132 < / groupId > < artifactId > easyutil. Easyexcel < / artifactId > < version > 1.1.0 < / version >Copy the code

Ordinary posture

Looking at the posture below, do you think you only need to care about business logic?

ExcelUtils.excelRead(ExcelProperties.produceReadProperties("C:\\Users\\Dorae\\Desktop\\ttt\\",
			"append_0745704108fa42ffb656aef983229955.xlsx"), new IRowConsumer<String>() {
				@Override
				public void consume(List<String> row) {
					System.out.println(row);
					count.incrementAndGet();
					try {
						TimeUnit.MICROSECONDS.sleep(100);
					} catch (InterruptedException e) {
						// TODO Auto-generated catch block
						e.printStackTrace();
					}
				}
			}, new IReadDoneCallBack<Void>() {
				@Override
				public Void call() {
					System.out.println(
							"end, count: " + count.get() + "\ntime: " + (System.currentTimeMillis() - start));
					return null;
				}
			}, 3, true);
Copy the code

Custom posture

What? You want to customize the context, add handler? Look below! All you need to do is implement an Abstract03RecordHandler and regist to the context (look at the Factory in ExcelVersionEnums).

public static void excelRead(IHandlerContext context, IRowConsumer rowConsumer, IReadDoneCallBack callBack,
		int threadCount, boolean syncCurrentThread) throws Exception {
	// synchronized main thread
	CyclicBarrier cyclicBarrier = null;
	threadCount = syncCurrentThread ? ++threadCount : threadCount;
	if (callBack != null) {
		cyclicBarrier = new CyclicBarrier(threadCount, () -> {
			callBack.call();
		});
	} else {
		cyclicBarrier = new CyclicBarrier(threadCount);
	}
	for (int i = 0; i < threadCount; i++) {
		THREADPOOL.execute(new ConsumeRowThread(context, rowConsumer, cyclicBarrier));
	}
	context.process();
	if (syncCurrentThread) {
		cyclicBarrier.await();
	}
}
Copy the code

Frame structure

Figure, is the structure of the whole EasyExcel, in which (if you know the design pattern, or read the relevant source code, should be easy to understand) :

  1. Green is extensible interface,
  2. The upper part is the file writing part, and the lower part is the file reading part.

conclusion

So far, the basic function of EasyExcel is at night, welcome all the gods to Issue over. 🍗