I IO Streams and Systems

IO technology is an extremely complex module in the JDK, and one of the key reasons for its complexity is the correlation between IO operation and the system kernel. In addition, network programming and file management all rely on IO technology, which are difficult to program. If you want to understand the IO flow as a whole, you should start from the Linux operating system.

Linux Space Isolation

Linux is used to distinguish between users, this is basic common sense, its underlying also distinguish between the user and the kernel two modules:

  • The User space
  • -Sheldon: The Kernel space

The user space permissions are much weaker than the kernel space permissions, which involves the interaction between the user and the kernel modules. In this case, the application deployed on the service needs to request system resources, and the interaction is more complicated:

The user space itself cannot issue scheduling instructions directly to the system, but must pass through the kernel. For the operation of data in the kernel, it is also necessary to copy to the user space first. This isolation mechanism can effectively protect the security and stability of the system.

Parameter to see

Top command can be used to dynamically view the data analysis, the status of the process occupied resources:

  • us: Percentage of CPU occupied by user space;
  • sy: Percentage of kernel space consumed by CPU;
  • id: Percentage of CPU used by idle processes;
  • wa: Percentage of CPU occupied by IO waiting;

The WA indicator is one of the core items of monitoring in the large-scale file task flow.

IO collaboration process

Now let’s look at the process in Figure [1] above. When the application end initiates a request for IO operation, the request flows along each node on the link. There are two core concepts:

  • Node interaction mode: synchronous and asynchronous;
  • IO data manipulation: blocking and non-blocking;

This is what file streams are all about: Synchronous/Asynchronous IO, Blocking/Non-Blocking IO.

II. IO model analysis

1. Synchronization Blocking

The way the user thread interacts with the kernel, in which the application request is processed by one thread, and the accept and read methods block the entire process until the action is completed:

In the normal CS architecture pattern, this is the basic process of an IO operation, which can have serious performance problems and consume too many resources if the client’s request response is in high concurrency scenarios.

2. Synchronous non-blocking

Optimized on the basis of synchronous blocking IO, the current thread does not wait for data to be ready until it completes replication:

The drawback of this pattern is that the thread returns immediately after the request and keeps polling until the data is available. If the data is ready, the thread is told to complete the subsequent action, thus eliminating a lot of intermediate interaction.

3. Asynchronous notification mode

In the asynchronous mode, the blocking mechanism is completely abandoned and the process interacts in segments, which is very similar to the conventional third-party docking mode. When the local service requests the third-party service, if the request process is time-consuming, it will be executed asynchronously. The third party calls back for the first time to confirm that the request can be executed. The second callback is to push processing results. This idea can greatly improve performance and save resources when dealing with complex problems:

Asynchronous mode is a huge performance improvement, of course, its corresponding processing mechanism is more complex, the program iteration and optimization is endless, in NIO mode again to IO stream mode optimization.

3. File File class

1. Basic description

The File class serves as an abstract representation of File and directory pathnames, which is used to obtain metadata information about disk files, such as File name, size, modification time, permission determination, and so on.

Note: File does not operate on the data content of the File. The content of the File is called data, and the information of the File itself is called metadata.

Public class File01 {public static void main(String[] args) throws Exception File(IoParam.BASE_PATH+"fileio-03.text") ; if (! speFile.exists()){ boolean creFlag = speFile.createNewFile() ; System.out.println(" create: "+ spefile.getName ()+"; Result: "+ creFlag); } File dirFile = new File(ioparam.base_path); Boolean dirFlag = dirfile.isDirectory (); if (dirFlag){ File[] dirFiles = dirFile.listFiles() ; printFileArr(dirFiles); } // 3, Delete file if (spefile.exists ()){Boolean delFlag = spefile.delete (); System.out.println(" delete: "+ spefile.getName ()+"; Result: "+ delFlag); } } private static void printFileArr (File[] fileArr){ if (fileArr ! = null && fileArr.length>0){ for (File file : fileArr) { printFileInfo(file) ; }} private static void printFileInfo (File File) {System.out.println() +file.getName()); Println (" length: "+file.length()); System.out.println(" length: "+file.length()); System.out.println(" Path: "+file.getPath()); System.out.println(" file: "+ file.isfile ()); System.out.println(" directory: "+file.isDirectory()); System.out.println(" Last modified: "+new Date(file.lastModified()))); System.out.println(); }}

The above cases use the basic construction and common methods (read, judge, create, delete) in the File class. The JDK source code is constantly updated and iterated. It is a necessary ability for a developer to judge the basic functions of a class through its constructor, methods, annotations, etc.

There are two key information descriptions missing in the File class: type and encoding. If you often develop the requirements of File modules, you will know that these are two extremely complex points, and it is easy to have problems. Let’s look at how to deal with them from the perspective of actual development.

2. Document business scenario

As shown in the figure, in a normal file stream task, three basic forms of conversion [file, stream and data] are involved:

Basic process description:

  • Source file generation, push file center;
  • Notify the business to use the node to obtain the file;
  • Business nodes for logical processing;

One obvious problem is that no node can fit all the file processing policies, such as type and encoding. In complex scenarios, rule constraints are a common solution strategy, that is, things within the convention rules are handled.

The description of the data body when the source file node notifies the business node in the above process:

Public class BizFile {/** ** ** ** ** ** ** ** ** ** ** ** ** * Private Boolean ZipFlag; /** */ private Boolean ZipFlag; /** * private String fileUrl; /** * File type */ private String FileType; /** * private String FileCode; /** * Private String BizDatabase */ Private String BizDatabase; /** * private String BizTableName; /** * private String BizTableName; }

The whole process is encapsulated as a task, namely: task batch, file information, routing of business database table, etc. Of course, these information can also be directly marked on the policy of file naming. The processing means is similar to:

Public class File02 {public static void main(String[] args) {bizFile bizFile = new BizFile("IN001",Boolean.FALSE, IoParam.BASE_PATH, "csv","utf8","model","score"); bizFileInfo(bizFile) ; /* * File = new File(bizFile.getFileUrl()); if (! File.getName ().endsWith(bizFile.getFileType())){System.out.println(file.getName()+" : Error..." ); }} private static void BizFileInfo (BizFile BizFile){logInfo(" Task ID", BizFile.gettAsKid ())); LogInfo (" unzip or not ", bizfile.getZipFlag ()); LogInfo (" file address ", bizFile.getFileURL ()); LogInfo (" FileType ",bizFile.getFileType()); LogInfo (" File Encoding ",bizFile.getFileCode()); LogInfo (" Business Library ", bizFile.getBizDatabase ()); LogInfo (" business table ",bizFile.getBizTableName()); }}

Based on the information described by the main body, it can also be transformed into the naming rules: naming strategy: number-compression _excel_encoding _ library _ table, so that in the business processing, the files that do not conform to the convention can be directly excluded, and the data problems caused by file anomalies can be reduced.

IV. Basic flow mode

1. Overall overview

IO flow

Basic coding logic: source file -> input stream -> logic processing -> output stream -> object file;

Streams can be divided into many modes based on different perspectives:

  • Flow direction: input stream, output stream;
  • Streaming data type: byte stream, character stream;

There are many modes of IO flow, and the corresponding API design is very complex. Usually, the complex API needs to grasp the core interface and common implementation classes and principles.

Based on the API

  • Byte stream: InputStream input, OutputStream output; The basic unit of data transmission is the byte;

    • Read () : The next byte of data read in the input stream;
    • Read (byte b[]) : read data cached to byte array;
    • Write (int b) : Specifies bytes written to the output stream;
    • Write (byte b[]) : The array bytes are written to the output stream;
  • Character stream: read by Reader, write by Writer; The basic unit of data transmission is the character;

    • Read () : Read a single character;
    • Read (char cbuf[]) : read(char cbuf[]);
    • Write (int c) : Write a specified character;
    • Write (char cbuf[]) : Write an array of characters;

Buffer mode

The IO stream uses a normal read/write mode, in which data is read and then written, and a buffered mode, in which data is loaded into the buffered array and the buffered array is read to determine if the buffer should be filled again:

The advantages of the buffering mode are very obvious. It ensures the high efficiency of the read and write process and is executed in isolation from the data filling process. In the BufferedInputStream and BufferedReader classes, it is the concrete implementation of the buffering logic.

2. Byte streams

API diagram:

The byte stream base API:

Public static void main(String[] args) throws Exception {// File source = new File(IoParam.BASE_PATH+"fileio-01.png") ; File target = new File(IoParam.BASE_PATH+"copy-"+source.getName()) ; InputStream inStream = new FileInputStream(source); OutputStream outStream = new FileOutputStream(target) ; // Byte [] ByteArr = new Byte [1024]; int readSign ; while ((readSign=inStream.read(byteArr)) ! = -1){ outStream.write(byteArr); Close ();} close outStream.close(); inStream.close(); }}

The byte stream caching API:

Public static void main(String[] args) throws Exception {// File source = new File(IoParam.BASE_PATH+"fileio-02.png") ; File target = new File(IoParam.BASE_PATH+"backup-"+source.getName()) ; // Buffer: InputStream bufinStream = new BufferedInputStream(new FileInputStream(source)); OutputStream bufOutStream = new BufferedOutputStream(new FileOutputStream(target)); // Read in and write int readSign; while ((readSign=bufInStream.read()) ! = -1){ bufOutStream.write(readSign); } // close the input and output streams bufoutStream.close (); bufInStream.close(); }}

Byte Streaming Application Scenes: Data is the file itself, such as pictures, videos, audio, etc.

3. Character stream

API diagram:

Character stream base API:

Public class ioChar01 {public static void main(String[] args) throws Exception {File ReaderFile = new File(IoParam.BASE_PATH+"io-text.txt") ; File writerFile = new File(IoParam.BASE_PATH+"copy-"+readerFile.getName()) ; // Reader = new FileReader(ReaderFile); Writer writer = new FileWriter(writerFile) ; // Read and write int readSign; while ((readSign = reader.read()) ! = -1){ writer.write(readSign); } writer.flush(); // close the stream writer.close(); reader.close(); }}

Character stream caching API:

Public static void main(String[] args) throws Exception {// File ReaderFile = new File(IoParam.BASE_PATH+"io-text.txt") ; File writerFile = new File(IoParam.BASE_PATH+"line-"+readerFile.getName()) ; BufferedReader bufReader = new BufferedReader(new FileReader(ReaderFile)); BufferedWriter bufWriter = new BufferedWriter(new FileWriter(writerFile)) ; // Read and write the String line; while ((line = bufReader.readLine()) ! = null){ bufWriter.write(line); bufWriter.newLine(); } bufWriter.flush(); // close the stream bufwriter.close (); bufReader.close(); }}

Character stream application scenario: file as data carrier, such as Excel, CSV, TXT, etc.

4, encoding decoding

  • Encoding: character conversion to bytes;
  • Decode: To convert bytes to characters;
Public class EndeCode {public static void main(String[] args) throws Exception {String var = "EndeCode"; // Encode byte[] enVar = var.getBytes(standardCharsets.utf_8); for (byte encode:enVar){ System.out.println(encode); } / / decoding String deVar = new String (enVar, StandardCharsets. UTF_8); System.out.println(deVar); / / the code String messyVar = new String (enVar, StandardCharsets. ISO_8859_1); System.out.println(messyVar); }}

The root cause of garbled code is the different encoding types used in the two stages of encoding and decoding.

5. Serialization

  • Serialization: The process of converting objects into streams;
  • Deserialization: The process of converting a stream to an object;
public class SerEntity implements Serializable { private Integer id ; private String name ; } public class Seriali01 {public static void main(String[] args) throws Exception {// OutputStream outputStream = new FileOutputStream("SerEntity.txt") ; ObjectOutputStream objOutStream = new ObjectOutputStream(outStream); objOutStream.writeObject(new SerEntity(1,"Cicada")); objOutStream.close(); InputStream inStream = new FileInputStream(" serentity.txt "); ObjectInputStream objInStream = new ObjectInputStream(inStream) ; SerEntity serEntity = (SerEntity) objInStream.readObject(); System.out.println(serEntity); inStream.close(); }}

Note: Member objects of reference type must also be serializable, otherwise a NotSerializableException will be thrown.

V. NIO mode

1. Basic concepts

NIO (NonblockingIO), data block-oriented processing mechanism, synchronous non-blocking model, a single thread on the server can handle multiple client requests, the IO stream processing speed has a high increase, the three core components:

  • Buffer: The underlying maintenance array stores data;
  • Channel: Support read-write bidirectional operation;
  • Selector: Provides Channel multiple registration and polling capabilities;

API use cases

Public class Ionew01 {public static void main(String[] args) throws Exception {// File source = new File(IoParam.BASE_PATH+"fileio-02.png") ; File target = new File(IoParam.BASE_PATH+"channel-"+source.getName()) ; FileInputStream inStream = new FileInputStream(source); FileChannel inChannel = inStream.getChannel(); FileOutputStream = new FileOutputStream(target); FileChannel outChannel = outStream.getChannel(); // OutChannel.transferFrom (inChannel, 0, inChannel.size()); / / buffer read/write mechanism ByteBuffer buffer = ByteBuffer. AllocateDirect (1024); While (true) {int in = inChannel.read(buffer); if (in == -1) { break; } // Read/write toggle Buffer.flip (); OutChannel.write (buffer); // Clear the buffer Buffer.clear (); } outChannel.close(); inChannel.close(); }}

The above case is only the most basic NIO file copy ability, in the network communication, NIO mode play space is very broad.

2. Network communication

A single thread on the server side can handle multiple client requests and poll the multiplexer to see if there are IO requests. In this way, the concurrent capacity of the server side is greatly improved and the resource consumption is significantly reduced.

API example: server-side emulation

Public class secServer {public static void main(String[] args) {try {public static void main(String[] args) {try {ServerSocketChannel SocketChannel = ServerSocketChannel.open(); SocketChannel. Socket (). The bind (new InetSocketAddress (8089) "127.0.0.1,"); / / set non-blocking, accept the client socketChannel. ConfigureBlocking (false); // Open the multiplexer Selector Selector = Selector.open(); Register (selector, selectionKey. OP_ACCEPT); register(selector, selectionKey. OP_ACCEPT); register(selector, selectionKey. / / multiplexer polling ByteBuffer buffer = ByteBuffer. AllocateDirect (1024); while (selector.select() > 0){ Set<SelectionKey> selectionKeys = selector.selectedKeys(); Iterator<SelectionKey> selectionKeyIter = selectionKeys.iterator(); while (selectionKeyIter.hasNext()){ SelectionKey selectionKey = selectionKeyIter.next() ; selectionKeyIter.remove(); If (selectionKey isAcceptable ()) {/ / to accept a new connection SocketChannel client = SocketChannel. The accept (); // Set read non-blocking client.configureblocking (false); Register (selector, selectionKey. OP_READ); register(selector, selectionKey. OP_READ); } else if (selectionKey.isReadable()) {// Channel readable SocketChannel client = (SocketChannel) selectionKey.channel(); int len = client.read(buffer); if (len > 0){ buffer.flip(); byte[] readArr = new byte[buffer.limit()]; buffer.get(readArr); System.out.println(Client.Socket ().getPort() + "port data :" + new String(readArr)); buffer.clear(); } } } } } catch (Exception e) { e.printStackTrace(); }}}

API example: Client emulation

Public class secClient {public static void main(String[] args) {try {SocketChannel SocketChannel = SocketChannel.open(); SocketChannel. Connect (new InetSocketAddress (8089) "127.0.0.1,"); ByteBuffer writeBuffer = ByteBuffer.allocate(1024); String conVar = "[hello-8089]"; writeBuffer.put(conVar.getBytes()); writeBuffer.flip(); While (true) {Thread.sleep(5000); writeBuffer.rewind(); socketChannel.write(writeBuffer); writeBuffer.clear(); } } catch (Exception e) { e.printStackTrace(); }}}

The selectionKey binds the association between a Selector and Chanel and gets a collection of Channels in the ready state.

IO Streaming with Series of Articles:

Overview | | IO flow MinIO middleware | FastDFS middleware | Xml and CSV file | | | file upload logic Excel and PDF file

Six, the source code address

Making address GitEE, https://github.com/cicadasmile/java-base-parent, https://gitee.com/cicadasmile/java-base-parent

Read labels

【 JAVA Foundation 】【 Design Patterns 】【 Structure and Algorithms 】【Linux System 】【 Database 】

[Distributed Architecture] [Micro Services] [Big Data Components] [SpringBoot Advanced] [Spring&Boot Foundation]

【 Data Analysis 】【 Technical Map 】【 Workplace 】