In the last article, we looked at Java’s file stream framework, but this article will focus on file character streams.
The first thing to be clear about is that byte streams process files on a byte basis, whereas character streams process files on a character-by-character basis.
But in fact, the essence of the characters of the flow operation is “byte stream operation” + “encoding” two encapsulation of the process, if you want, no matter you are to write a character to a file, you need a character encoding into binary, then write to the file in bytes as the basic unit, or do you read a character into memory, you need to read in bytes as the basic unit, and then transcoding into character.
It’s important to understand this, and it will determine how you understand character streams in general, so let’s take a look at the API design.
The base class Reader/Writer
Before we can formally study the character stream base class, we need to know how a character is represented in Java.
First, the default character encoding in Java is utF-8, and we know that UTF-8 characters are stored in 1 to 4 bytes, with the more common characters using fewer bytes.
The char type is defined as two bytes, meaning that for normal characters, a char can store one character, but for some supplementary character sets, two chars are used to represent one character.
Reader is the base class for reading character streams. It provides the most basic character reading operations, as we’ll see.
First look at its constructor:
protected Object lock;
protected Reader() {
this.lock = this;
}
protected Reader(Object lock) {
if (lock == null) {
throw new NullPointerException();
}
this.lock = lock;
}
Copy the code
Reader is an abstract class, so it goes without saying that these constructors are called by subclasses to initialize lock lock objects, as we’ll explain in more detail later.
public int read() throws IOException {
char cb[] = new char[1];
if (read(cb, 0, 1) == -1)
return- 1;else
return cb[0];
}
public int read(char cbuf[]) throws IOException {
return read(cbuf, 0, cbuf.length);
}
abstract public int read(char cbuf[], int off, int len)
Copy the code
The first method reads a character and returns -1 if it has reached the end of the file. Likewise, int is received as the return value. Why not char? And the reason is the same, because of the uncertainty in the interpretation of negative 1.
The second method, similar to the third method, reads a specified length of characters from the file and places them in the target array. The third method is abstract and needs to be implemented by subclasses, while the second method is based on it.
There are also some similar methods:
- Public long skip(long n) : n characters are skipped
- Public Boolean ready() : Indicates whether the next character is readable
- Public Boolean markSupported() : See reset method
- Public void mark(int readAheadLimit) : see reset
- Public void reset() : used to repeat read operations
- Abstract public void close() : closes the stream
All of these methods are pretty much the same as our InputStream. There is no core implementation, so you can get a sense of what’s going on inside.
Writer is a stream of characters used to write one or more characters to a file. The specific write method is still abstract and needs to be implemented by subclasses.
The adapter InpustStramReader/OutputStreamWriter
Adapter character streams inherit from the base Reader or Writer classes, which are very important members of the character stream architecture. The main function is to convert a byte stream into a character stream. Let’s start with a read adapter.
First, its core members:
private final StreamDecoder sd;
Copy the code
StreamDecoder is a decoder that converts byte operations into character operations. We will refer to StreamDecoder continuously in the future, but there is no uniform explanation here.
Then there is the constructor:
public InputStreamReader(InputStream in) {
super(in);
try {
sd = StreamDecoder.forInputStreamReader(in, this, (String)null);
} catch (UnsupportedEncodingException e) {
throw new Error(e);
}
}
public InputStreamReader(InputStream in, String charsetName)
throws UnsupportedEncodingException
{
super(in);
if (charsetName == null)
throw new NullPointerException("charsetName");
sd = StreamDecoder.forInputStreamReader(in, this, charsetName);
}
Copy the code
The purpose of both constructors is to initialize the decoder, and both call the method forInputStreamReader with different arguments. Let’s look at the implementation of this method:
This is a typical static factory pattern, with three parameters, var0 and var1, which are nothing to say, representing byte stream instances and adapter instances, respectively.
The var2 argument actually represents the name of a character encoding. If it is null, the system default character encoding is UTF-8.
Finally, we can get an instance of the decoder.
Almost all of the methods that follow depend on this decoder.
public String getEncoding() {
return sd.getEncoding();
}
public int read() throws IOException {
return sd.read();
}
public int read(char cbuf[], int offset, int length){
return sd.read(cbuf, offset, length);
}
public void close() throws IOException {
sd.close();
}
Copy the code
Decoder method in the implementation of the code is relatively complex, here we do not do in-depth research, but in general, the implementation of the idea is: “byte stream read + decode” process.
Of course, there must also be an opposite instance of StreamEncoder in OutputStreamWriter for encoding characters.
Other than that, the rest of the operations are no different, either writing to a file through an array of characters, a string, or the lower 16 bits of an int.
File character stream FileReader/Writer
The file character stream is so simple that there is no other method except the constructor, relying solely on the file byte stream.
Let’s take FileReader for example,
FileReader inherits from InputStreamReader and has only the following three constructors:
public FileReader(String fileName) throws FileNotFoundException {
super(new FileInputStream(fileName));
}
public FileReader(File file) throws FileNotFoundException {
super(new FileInputStream(file));
}
public FileReader(FileDescriptor fd) {
super(new FileInputStream(fd));
}
Copy the code
Theoretically, all character streams should use our adapter as the base class, because it is the only one that provides character-to-byte conversions that you can write or read without.
FileReader does not extend any of its own methods. The preimplemented character manipulation methods in the parent InputStreamReader class are sufficient for FileReader. It just needs to pass in a corresponding byte stream instance.
The same is true for FileWriter, which I won’t repeat here.
Character array stream CharArrayReader/Writer
Character arrays and byte array streams are similar in that they are used for situations where the file size is uncertain and a large amount of content needs to be read.
Because they provide a dynamic expansion mechanism internally, they can either fully accommodate the target file, or they can control the array size so that they do not waste a lot of memory by allocating too much memory.
Take CharArrayReader as an example
protected char buf[];
public CharArrayReader(char buf[]) {
this.buf = buf;
this.pos = 0;
this.count = buf.length;
}
public CharArrayReader(char buf[], int offset, int length){
//....
}
Copy the code
The core task of the constructor is to initialize an array of characters into the internal BUF property, on which all subsequent reads of the stream instances are based.
Other methods of CharArrayReader and CharArrayWriter are not described here, and are similar to byte array streams in the previous article.
In addition, there’s a StringReader and a StringWriter involved here, which is essentially the same thing as a character array stream, because strings are essentially char arrays.
Buffer array stream BufferedReader/Writer
Similarly, BufferedReader/Writer is a buffered-flow, decorator flow, used to provide buffering. Similar in principle to our byte buffer stream, we’ll look at it briefly.
private Reader in;
private char cb[];
private static int defaultCharBufferSize = 8192;
public BufferedReader(Reader in, int sz){.. } public BufferedReader(Readerin) {
this(in, defaultCharBufferSize);
}
Copy the code
Cb is a character array used to cache partial characters read from the file stream. You can initialize the length of this array in the constructor, otherwise the default value 8192 will be used.
public int read() throws IOException {.. } public intread(char cbuf[], int off, int len){... }Copy the code
For read, it depends on the read method of the member attribute in, which, as a Reader type, internally depends on the read method of an InputStream instance.
So almost all character streams are dependent on some byte stream instance.
The BufferedWriter, again, is pretty much the same thing, except one reads and one writes, all around an internal character array.
Standard print output stream
There are two main types of PrintStream, PrintStream, which is a byte stream, and PrintWriter, which is a character stream.
These two streams are an integration of streams in their respective categories. There are rich methods to encapsulate them, but the implementation is a bit more complicated. Let’s start with PrintStream:
There are several main constructors:
- public PrintStream(OutputStream out)
- public PrintStream(OutputStream out, boolean autoFlush)
- public PrintStream(OutputStream out, boolean autoFlush, String encoding)
- public PrintStream(String fileName)
Obviously, simple constructors rely on complex constructors, which is an old JDK design cliche. One thing that distinguishes PrintStream from other byte streams is that it provides a flag, autoFlush, that specifies whether to flush the cache automatically.
Here’s how to write PrintStream:
- public void write(int b)
- public void write(byte buf[], int off, int len)
In addition, PrintStream encapsulates a number of print methods to write different types of content to a file, such as:
- public void print(boolean b)
- public void print(char c)
- public void print(int i)
- public void print(long l)
- public void print(float f)
- , etc.
Of course, these methods don’t actually write binary values to the file, but just their corresponding strings, for example:
print(123);
Copy the code
Instead of writing the binary representation of 123 to the file, you end up writing just the string 123, which is the print stream.
PrintStream uses a buffered character stream for all printing operations, and if auto refresh is specified, the buffer is automatically flushed when the newline symbol “\n” is encountered.
So, PrintStream integrates all the output methods in the byte stream and character stream, with the write method for byte stream operations and the print method for character stream operations, which needs to be clarified.
So PrintWriter, it’s a full character stream, and it operates entirely on characters, whether it’s a write method or a print method, it’s a character stream.
To sum up, we’ve spent three articles on byte streams and character streams in Java. Byte streams perform byte based data transfers between disk and memory. The most typical is file character streams, which are implemented as native methods. With basic byte transfer capabilities, we can also improve efficiency through buffering.
The most basic implementations of character streams are InputStreamReader and OutputStreamWriter, both of which can theoretically perform basic character stream operations, but only the most basic operations. All that is required to construct an instance of them is “a byte stream instance” + “an encoding format”.
So, the relationship between character stream and byte stream is the same as the equation above. All you need to do to write a character to a disk file is to encode the character in the specified encoding format, and then use the byte stream to write the encoded character binary to the file. The read operation is reversed.
All the code, images and files in this article are stored in the cloud on my GitHub:
(https://github.com/SingleYam/overview_java)
Welcome to wechat public number: jump on the code of Gorky, all articles will be synchronized in the public number.