“This is the second day of my participation in the First Challenge 2022.
You have to work really hard to look effortless!
Wechat search public number [long Coding road], together From Zero To Hero!
preface
The bytes.Buffer is a byte Buffer to which data can be written before it is processed. However, bytes.buffer does not provide an interface for manipulating underlying files (ReadFrom writes the entire file to the buffer, not large files), so if you want to manipulate a file, you need to manually read the file and write it to the buffer.
As we all know, IO operations on files are time-consuming. If every data operation requires a file to be read, IO operations are very numerous. So how to improve efficiency? You can consider preloading. When reading data, part of the data is loaded into the buffer in advance. If the buffer length is larger than the data length to be operated on each time, the I/O times are reduced. Similarly, for writing to a file, we can buffer the data we write and then write the data to the file all at once.
Bufio is a buffer-based package that provides a convenient way to do file I/O operations and reduces the number of I/O operations using buffers.
Structure of the overview
Bufio. Reader uses a buffer to bridge the gap between the underlying file Reader and the read operation method. The underlying file Reader is the IO.Reader passed in to initialize the Reader. The advantage of having such a buffer is that every time we want to read the contents of a file, we will read from the buffer first, which increases the reading speed and avoids frequent file IO, and we will use the underlying file reader to pre-load some data into the buffer if necessary.
The advantage of having such a buffer is that it reduces the execution time of the read method most of the time. Although the reading method is sometimes responsible for filling the buffer, the average execution time of the reading method is generally significantly reduced as a result.
The structure of bufio.reader is as follows:
IO.Reader rd is used to write data to buF. Therefore, when some bytes are written, w increases the number of bytes written to buF. When the data is read from the BUF, r increases and the read data is useless. Always w>=r. When w==r, it indicates that all data written has been read and no data can be read.
- Buf: A slice of bytes used as a buffer. Although of the slice type, the length does not change once initialization is complete
- Rd: IO.Reader passed in at initialization to read the underlying file data and write it to the buffer BUF
- R: the starting position of the next read buffer buF, i.e., all data before r is read, and the next read starts from r, which is called the read count
- W: The starting position of the next write to buffer buf. That is, the data before w is written before the next write starts at position W. This is called the written count
- Err: Records the error generated when rd reads data. Err is set to nil after it is read or ignored
- LastByte: holds the position of the lastByte read last time, used to roll back a byte; -1 indicates an invalid value and cannot be rolled back
- LastRuneSize: Holds the location of the last read rune, used to roll back a rune; -1 indicates an invalid value and cannot be rolled back
type Reader struct {
buf []byte
rd io.Reader // reader provided by the client
r, w int // buf read and write positions
err error
lastByte int // last byte read for UnreadByte; -1 means invalid
lastRuneSize int // size of last rune read for UnreadRune; -1 means invalid
}
Copy the code
NewReaderSize
The NewReaderSize method is used for initialization operations that specify the IO.Reader and buffer size for the underlying data to read. The default buffer minimum is minReadBufferSize, and if size < minReadBufferSize is passed, size is set to minReadBufferSize.
// The minimum buffer value
const minReadBufferSize = 16
func NewReaderSize(rd io.Reader, size int) *Reader {
// If the rd passed in is already bufio.reader and the buffer size is larger than the size passed in, then rd meets the requirements and returns RD directly
b, ok := rd.(*Reader)
if ok && len(b.buf) >= size {
return b
}
// If size is smaller than the default minimum buffer size, size is set to minReadBufferSize
if size < minReadBufferSize {
size = minReadBufferSize
}
// Initialize, then call the reset method to assign the value
r := new(Reader)
r.reset(make([]byte, size), rd)
return r
}
// reset resets all fields of bufio.Reader based on the values passed in.R and w are set to 0
func (b *Reader) reset(buf []byte, r io.Reader) {
*b = Reader{
buf: buf,
rd: r,
lastByte: - 1,
lastRuneSize: - 1,}}Copy the code
NewReader
The NewReader method is initialized with the default buffer size, which is 4K.
const (
defaultBufSize = 4096
)
// NewReader returns a new Reader whose buffer has the default size.
func NewReader(rd io.Reader) *Reader {
return NewReaderSize(rd, defaultBufSize)
}
Copy the code
Size
The Size method returns the length of the buffered slice
// Size returns the size of the underlying buffer in bytes.
func (b *Reader) Size(a) int { return len(b.buf) }
Copy the code
Buffered
The Buffered method returns the number of bytes currently Buffered
func (b *Reader) Buffered(a) int { return b.w - b.r }
Copy the code
Reset
Reset resets the state of all fields and uses the passed IO.Reader r as the new underlying data Reader. By resetting all states, r and w are also reset to 0, which is equivalent to discarding all previously cached data.
// Reset discards any buffered data, resets all state, and switches
// the buffered reader to read from r.
func (b *Reader) Reset(r io.Reader) {
// Call the private method reset
b.reset(b.buf, r)
}
Copy the code
reset
Reset, the private method resets all its fields based on the value passed in. R and w are set to 0. Because r and W are reset, all cached data is discarded.
func (b *Reader) reset(buf []byte, r io.Reader) {
*b = Reader{
buf: buf,
rd: r,
lastByte: - 1,
lastRuneSize: - 1,}}Copy the code
fill
The fill private method uses the IO.Reader RD to read the underlying data into the buffer BUF.
- The buF method first compresses the cache array. If the read count r>0, it indicates that r has been read before, and the invalid data can be discarded. However, the data between B.r and B.W has not been read, which is meaningful. Therefore, the data of B.BOuf [B.R: B.W] is moved to the top of the buffer by means of data translation, which is equivalent to moving the whole data forward b.R positions, and then updating the values of R and W.
There are two situations in the translation process: the length of valid data is greater than or equal to invalid data, or the valid data is smaller than invalid data. In the first case, the translation overwrites invalid data. In the second case, the valid data cannot completely cover the current invalid data, but since we range the valid data based on r and W values, i.e., B.BUuf [B.R: B.W], we do not care about the unoverwritten invalid data, which will be overwritten in our subsequent writing process.
- Try to fill the buffer BUF by reading data from the underlying data reader RD. If it reads data or generates an error, it returns directly; However, if the underlying data is not ready, no data is read, and no error is generated, the read will be retried for up to 100 times.
const maxConsecutiveEmptyReads = 100
// fill reads a new chunk into the buffer.
func (b *Reader) fill(a) {
// Invalid data was read, the data was shifted, and the values of r and w were updated
if b.r > 0 {
copy(b.buf, b.buf[b.r:b.w])
b.w -= b.r
b.r = 0
}
if b.w >= len(b.buf) {
panic("bufio: tried to fill full buffer")}// If the underlying data is not ready, retry maxConsecutiveEmptyReads again
for i := maxConsecutiveEmptyReads; i > 0; i-- {
// RD reads the data and writes it to the buffer starting at position w
n, err := b.rd.Read(b.buf[b.w:])
if n < 0 {
panic(errNegativeRead)
}
// Update the written count
b.w += n
// If error is generated and the value is set to b.err, return
iferr ! =nil {
b.err = err
return
}
// No error is generated and data is read
if n > 0 {
return
}
Err =nil,n=0,n=0,n=0,n=0,n=0
}
// Retry maxConsecutiveEmptyReads neither time causes data to be read, sets ErrNoProgress, and then returns
b.err = io.ErrNoProgress
}
Copy the code
readErr
ReadErr, a private method, returns the value of b.err and sets b.err to nil.
func (b *Reader) readErr(a) error {
err := b.err
b.err = nil
return err
}
Copy the code
conclusion
In this article, we introduce the basic structure and operation principle of Bufio. Reader, and introduce two important methods:
- Reset: Resets the entire structure, discarding all the data in the buffer and making the new file Reader as IO.Reader rd
- Fill: First compress the invalid data in the buffer and then try to fill the buffer
More and more
Personal blog: lifelmy.github. IO /
Wechat official account: Long Coding road