This is the 30th day of my participation in the August Text Challenge.More challenges in August
preface
There are a lot of common ZIP package tool classes, because the scenarios in the code are not very efficient, the other tools are not discussed here, because the project used Zip4J before, the zip decompression Chinese character problem discussed here is a solution to Zip4J.
Maven repository zip4j
<dependency>
<groupId>net.lingala.zip4j</groupId>
<artifactId>zip4j</artifactId>
<version>2.6.4</version>
</dependency>
Copy the code
ZIP decompression status
The ZIP format is compressed using different software on different platforms, and the general results are two types:
- Windows uses WinRAR and compression tools to compress files. The file name is GBK
- Features of the ZIP file compressed using Linux or MacOS: The file name is UTF-8
The differences between GBK and UTF-8 refer to the article
Character encoding thing: quick understanding of ASCII, Unicode, GBK, and UTF-8
Problem description
The code was compressed in the Linux service before, and then uploaded to the miniO server. Other services were downloaded to the local decompression through minio, and there was no problem with the file package in Chinese or English. The newly added scene can upload some compressed packages, resulting in the decompression after the display of garbled, abnormal code, local write test class simulation results are also garbled.
The problem code
The test code
The results of
The cause of this character garble problem is simply the character set problem. There are many online solutions, such as setting the character set to GBK, using the Ant-jar package to decompress and so on. If there is no uniform standard or solution, then we can start with API, see if there is any reserved character set related Settings in the source code, find the root cause of the problem, and solve the problem naturally.
Zip4J source analysis
The default is utF-8
Complete tool class
import cn.hutool.core.util.CharsetUtil;
import net.lingala.zip4j.ZipFile;
import net.lingala.zip4j.exception.ZipException;
import net.lingala.zip4j.model.FileHeader;
import org.apache.commons.collections.CollectionUtils;
import java.io.File;
import java.nio.charset.Charset;
import java.util.*;
public class MyZipUtils {
/** * Unzip the zip package to return all files **@paramFilePath path *@paramUnZipList decompressed list *@return List<File>
* @throws ZipException
*/
public static List<File> scanAndUnzipFile(String filePath, Set<String> unZipList) throws ZipException {
File file = new File(filePath);
List<File> fileList = Arrays.asList(file.listFiles());
List<File> fileListNew = new ArrayList<>();
if (CollectionUtils.isNotEmpty(fileList)) {
for (File file1 : fileList) {
if (file1.getName().indexOf(".zip") != -1) {
if (unZipList.contains(file1.getName())) {
continue;
}
unZipList.add(file1.getName());
ZipFile zipFile1 = new ZipFile(file1);
extractAll(filePath, zipFile1);
return scanAndUnzipFile(filePath, unZipList);
}
fileListNew.add(file1);
}
return fileListNew;
}
return null;
}
public static String extractAll(String filePath, ZipFile zip) {
zip.setCharset(Charset.forName("utf-8"));
System.out.println("begin unpack zip file....");
try {
zip.getFileHeaders().forEach(v->{
String extractedFile = getFileName(v);
try {
zip.extractFile(v, filePath ,extractedFile);
} catch (ZipException e) {
System.out.println("Decompression failed:"+ extractedFile);
e.printStackTrace();
return;
}
System.out.println("Decompression successful:"+extractedFile);
});
} catch (ZipException e) {
e.printStackTrace();
}
System.out.println("unpack zip file success");
return "success";
}
public static String getFileName(FileHeader fileHeader) {
try {
// There are two main sources of compressed packages: WINdows and Linux
if (fileHeader.isFileNameUTF8Encoded()) {
return new String(fileHeader.getFileName().getBytes("Cp437"), CharsetUtil.CHARSET_UTF_8.name());
} else {
return new String(fileHeader.getFileName().getBytes("Cp437"), CharsetUtil.CHARSET_GBK.name()); }}catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return fileHeader.getFileName();
}
Copy the code
Reference documentation
unzip not correct with cjk filename. #45
Garbled chinese character #73