This is the fifth in a series of articles. For the first few, check out the link
Good News for programmers – An introduction to Apache Commons
Good news for Programmers – Apache Commons Lang
Programmer’s Gospel – Apache Commons IO
Good news for programmers – Apache Commons Codec
Apache Commons Compress provides a number of codec – related utility classes. Compress the latest version is 1.21, and Java8 or higher is the minimum requirement.
Maven coordinates are as follows:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.21</version>
</dependency>
Copy the code
The following is the overall structure:
org.apache.commons.compress
org.apache.commons.compress.archivers
org.apache.commons.compress.changes
org.apache.commons.compress.compressors
org.apache.commons.compress.parallel
org.apache.commons.compress.utils
org.apache.commons.compress.harmony
Here are only some of them that are commonly used to illustrate, the rest of the interested can browse the source code research.
01. Compression
Compression: an algorithm to reduce the size of the space occupied by a file
Decompress: restore the file according to the corresponding reverse algorithm
Compress comes with many compression-related classes. The main ones are as follows
Compressed GzipCompressorOutputStream: * * * * “*. Gz” file
GzipCompressorInputStream: unzip “*. Gz” file
BZip2CompressorOutputStream: compression “*..bz2 files.
BZip2CompressorInputStream: decompression “. *.bz2 files”
XZCompressorOutputStream: compresses the “*. Xz “file
XZCompressorInputStream: decompresses the “*. Xz “file
FramedLZ4CompressorOutputStream: compression “*. Lz4” file
FramedLZ4CompressorInputStream: unzip “*. Lz4” file
BlockLZ4CompressorOutputStream: compression “*. Block_lz4” file
BlockLZ4CompressorInputStream: unzip “*. Block_lz4” file
Pack200CompressorOutputStream: compression *. “pack” file
Pack200CompressorInputStream: unzip *. “pack” file
DeflateCompressorOutputStream: compression *. “deflate” file
DeflateCompressorInputStream: unzip *. “deflate” file
LZMACompressorOutputStream: compression “*. Lzma” file
LZMACompressorInputStream: unzip “*. Lzma” file
Compression FramedSnappyCompressorOutputStream: “*. Sz” files
FramedSnappyCompressorInputStream: unzip *. “sz” file
ZCompressorInputStream: decompresses the “*.Z” file
Here’s a quick example
1. gzip
Gzip is a common compression tool on Unix and Linux, and is a very popular compression technology on WEB sites today. There are concepts such as compression levels, which can be set using GzipParameters. JDK8 also comes with the GZIPInputStream class, which is used similarly.
/ / gzip compression
String file = "/test.js";
GzipParameters parameters = new GzipParameters();
parameters.setCompressionLevel(Deflater.BEST_COMPRESSION);
parameters.setOperatingSystem(3);
parameters.setFilename(FilenameUtils.getName(file));
parameters.setComment("Test file");
parameters.setModificationTime(System.currentTimeMillis());
FileOutputStream fos = new FileOutputStream(file + ".gz");
try (GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(fos, parameters);
InputStream is = new FileInputStream(file)) {
IOUtils.copy(is, gzos);
}
Copy the code
/ / gzip decompression
String gzFile = "/test.js.gz";
FileInputStream is = new FileInputStream(gzFile);
try (GzipCompressorInputStream gis = new GzipCompressorInputStream(is)) {
GzipParameters p = gis.getMetaData();
File targetFile = new File("/test.js");
FileUtils.copyToFile(gis, targetFile);
targetFile.setLastModified(p.getModificationTime());
}
Copy the code
2. bz2
Bz2 is a common compressed file format in Linux. It is a compressed file whose suffix ends in. Bz2 and is generated by bzip2, a compression tool with high compression rate.
/ / compression.bz2
String srcFile = "/test.tar";
String targetFile = "/test.tar.bz2";
FileOutputStream os = new FileOutputStream(targetFile);
try (BZip2CompressorOutputStream bzos = new BZip2CompressorOutputStream(os);
InputStream is = new FileInputStream(srcFile)) {
IOUtils.copy(is, bzos);
}
Copy the code
/ / decompression.bz2
String bzFile = "/test.tar.bz2";
FileInputStream is = new FileInputStream(bzFile);
try (BZip2CompressorInputStream bzis = new BZip2CompressorInputStream(is)) {
File targetFile = new File("test.tar");
FileUtils.copyToFile(bzis, targetFile);
}
Copy the code
The other compression algorithms are used in much the same way as BZ2, so I won’t do any code examples here.
02. Archive
Archiving: Organizing many scattered files into one file, the total size of the file is basically the same
Unpack: Release files from the archive
Compress comes with many archiving related classes
TarArchiveOutputStream: archives the “*.tar” file
TarArchiveInputStream: unpackages the “*.tar” file
ZipArchiveOutputStream: Archive compressed “*.zip” files
ZipArchiveInputStream: decompress the “*.zip” file
JarArchiveOutputStream: archive and compress “*.jar” files
JarArchiveInputStream: Unpack and decompress the “*.jar” file
DumpArchiveOutputStream: Archive the “*.dump” file
DumpArchiveInputStream: unpackages the “*.dump” file
CpioArchiveOutputStream: archive compressed “*. Cpio “files
CpioArchiveInputStream: unpack and decompress the “*.cpio” file
ArArchiveOutputStream: archive compressed “*. Ar “files
ArArchiveInputStream: Decompress the “*.ar” file
ArjArchiveInputStream: Unpack the “*.arj” file
SevenZOutputFile: archives compressed “*.7z” files
SevenZFile: decompress the *.7z file
Among them, ZIP, JAR, CPIO, AR and 7Z support both archiving and compression, and can do compression processing during the archiving process.
Since they deal with piecemeal files, there is the concept of ArchiveEntry, where an ArchiveEntry represents a directory or file within an archive package. Let’s take a quick look at an example
1. tar
Tar is a commonly used compression and archival tool on Unix and Linux systems. It can combine multiple files into a single file, and the file suffix is “tar”.
/ / the tar compression
public void tar(a) throws IOException {
File srcDir = new File("/test");
String targetFile = "/test.tar";
try (TarArchiveOutputStream tos = new TarArchiveOutputStream(
new FileOutputStream(targetFile))) {
tarRecursive(tos, srcDir, ""); }}// Recursively compress files and directories under the directory
private void tarRecursive(TarArchiveOutputStream tos, File srcFile, String basePath) throws IOException {
if (srcFile.isDirectory()) {
File[] files = srcFile.listFiles();
String nextBasePath = basePath + srcFile.getName() + "/";
if (ArrayUtils.isEmpty(files)) {
/ / empty directory
TarArchiveEntry entry = new TarArchiveEntry(srcFile, nextBasePath);
tos.putArchiveEntry(entry);
tos.closeArchiveEntry();
} else {
for(File file : files) { tarRecursive(tos, file, nextBasePath); }}}else {
TarArchiveEntry entry = newTarArchiveEntry(srcFile, basePath + srcFile.getName()); tos.putArchiveEntry(entry); FileUtils.copyFile(srcFile, tos); tos.closeArchiveEntry(); }}Copy the code
/ / the tar decompressed
public void untar(a) throws IOException {
InputStream is = new FileInputStream("/test.tar");
String outPath = "/test";
try (TarArchiveInputStream tis = new TarArchiveInputStream(is)) {
TarArchiveEntry nextEntry;
while((nextEntry = tis.getNextTarEntry()) ! =null) {
String name = nextEntry.getName();
File file = new File(outPath, name);
// If it is a directory, create the directory
if (nextEntry.isDirectory()) {
file.mkdir();
} else {
// The file is written to the specific pathFileUtils.copyToFile(tis, file); file.setLastModified(nextEntry.getLastModifiedDate().getTime()); }}}}Copy the code
2. 7z
7Z is a new compression format with extremely high compression ratio.
Main features of the 7Z format:
-
Open structure
-
High compression ratio
-
Powerful AES-256 encryption
-
Compatible with any compression, conversion, encryption algorithm
-
Supports file compression of up to 16000000000 GB
-
A file name that is standard in Unicode
-
Supports solid compression
-
Support file header compression
/ / 7 z compression
public void _7z(a) throws IOException {
try (SevenZOutputFile outputFile = new SevenZOutputFile(new File("/test.7z"))) {
File srcFile = new File("/test");
_7zRecursive(outputFile, srcFile, ""); }}// Recursively compress files and directories under the directory
private void _7zRecursive(SevenZOutputFile _7zFile, File srcFile, String basePath) throws IOException {
if (srcFile.isDirectory()) {
File[] files = srcFile.listFiles();
String nextBasePath = basePath + srcFile.getName() + "/";
/ / empty directory
if (ArrayUtils.isEmpty(files)) {
SevenZArchiveEntry entry = _7zFile.createArchiveEntry(srcFile, nextBasePath);
_7zFile.putArchiveEntry(entry);
_7zFile.closeArchiveEntry();
} else {
for(File file : files) { _7zRecursive(_7zFile, file, nextBasePath); }}}else {
SevenZArchiveEntry entry = _7zFile.createArchiveEntry(srcFile, basePath + srcFile.getName());
_7zFile.putArchiveEntry(entry);
byte[] bs = FileUtils.readFileToByteArray(srcFile); _7zFile.write(bs); _7zFile.closeArchiveEntry(); }}Copy the code
/ / 7 z decompression
public void un7z(a) throws IOException {
String outPath = "/test";
try (SevenZFile archive = new SevenZFile(new File("test.7z"))) {
SevenZArchiveEntry entry;
while((entry = archive.getNextEntry()) ! =null) {
File file = new File(outPath, entry.getName());
if (entry.isDirectory()) {
file.mkdirs();
}
if (entry.hasStream()) {
final byte [] buf = new byte [1024];
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (int len = 0; (len = archive.read(buf)) > 0;) {
baos.write(buf, 0, len); } FileUtils.writeByteArrayToFile(file, baos.toByteArray()); }}}}Copy the code
3. Ar, ARj, CPIO, dump, zip, jar
These compression utility classes are used in a similar way to tar, so I won’t do an example
03. Modify the archive file
Sometimes we have the need to modify the files in the archive, such as adding or deleting a file, modifying the contents of the file, etc., of course, we can also extract all the changes in the compressed back. In addition to more code, large archive files also lead to longer operation time. Is there a way to dynamically modify the contents of the archive using code?
Org.apache.commons.com press. The changes under the package right provides some classes are used to dynamically modify the contents in the archive. Let’s look at a simple example
String tarFile = "/test.tar";
InputStream is = new FileInputStream(tarFile);
// The original test.tar will be overwritten after the replacement, and Windows may overwrite the error because the file was accessed
OutputStream os = new FileOutputStream(tarFile);
try (TarArchiveInputStream tais = new TarArchiveInputStream(is);
TarArchiveOutputStream taos = new TarArchiveOutputStream(os)) {
ChangeSet changes = new ChangeSet();
// Delete "dir/1.txt" from "test.tar"
changes.delete("dir/1.txt");
// Delete the "t" directory in "test.tar"
changes.delete("t");
// Add the file or replace it if it already exists
File addFile = new File("/a.txt");
ArchiveEntry addEntry = taos.createArchiveEntry(addFile, addFile.getName());
// Add can pass a third argument: true: replace if it already exists (default), false: do not replace
changes.add(addEntry, new FileInputStream(addFile));
// Perform the modification
ChangeSetPerformer performer = new ChangeSetPerformer(changes);
ChangeSetResults result = performer.perform(tais, taos);
}
Copy the code
4. Other
1. Simple factory
Commons-compress also provides some simple factory-class users to retrieve compression and archive streams dynamically.
// Use factory to dynamically retrieve the archive stream
ArchiveStreamFactory factory = new ArchiveStreamFactory();
String archiveName = ArchiveStreamFactory.TAR;
InputStream is = new FileInputStream("/in.tar");
OutputStream os = new FileOutputStream("/out.tar");
// Get the implementation class dynamically, where ais is actually TarArchiveOutPutStream
ArchiveInputStream ais = factory.createArchiveInputStream(archiveName, is);
ArchiveOutputStream aos = factory.createArchiveOutputStream(archiveName, os);
// Other business operations
// ------------------------
// Use factory to dynamically obtain the compressed stream
CompressorStreamFactory factory = new CompressorStreamFactory();
String compressName = CompressorStreamFactory.GZIP;
InputStream is = new FileInputStream("/in.gz");
OutputStream os = new FileOutputStream("/out.gz");
// Get the implementation class dynamically, where ais is actually TarArchiveOutPutStream
CompressorInputStream cis = factory.createCompressorInputStream(compressName, is);
CompressorOutputStream cos = factory.createCompressorOutputStream(compressName, os);
// Other business operations
Copy the code
2. Decompress the package
A lot of this is a single operation, but what about decompressing “test.tar.gz” as an archive and compression?
In fact, it’s very simple, we don’t need to decompress and unpack first, we can do it at the same time, we just need to wrap the corresponding stream (I have to sigh Java IO decorator pattern design is really clever). Let’s look at a code example
// Decompress the test.tar.gz file
String outPath = "/test";
InputStream is = new FileInputStream("/test.tar.gz");
// Decompress first, so you need to wrap the file stream with gzip stream first
CompressorInputStream gis = new GzipCompressorInputStream(is);
// In unpacking, wrap the gzip stream with a tar stream
try (ArchiveInputStream tgis = new TarArchiveInputStream(gis)) {
ArchiveEntry nextEntry;
while((nextEntry = tgis.getNextEntry()) ! =null) {
String name = nextEntry.getName();
File file = new File(outPath, name);
// If it is a directory, create the directory
if (nextEntry.isDirectory()) {
file.mkdir();
} else {
// The file is written to the specific pathFileUtils.copyToFile(tgis, file); file.setLastModified(nextEntry.getLastModifiedDate().getTime()); }}}Copy the code
05. Conclusion
In addition to the tool classes described above, there are other less commonly used and I will not do much to introduce. Interested can browse the source code research.
I look forward to your attention as I continue to introduce you to other useful utility libraries in The Commons section.