ioutil.ReadAll
ReadAll stands out both because we often need to read data out of an IO.Reader object and because it is often teased for its performance issues.
Let’s take a look at the usage scenarios. For example, we use http.Client to send a GET request:
func main(a) {
res, err := http.Get("http://www.google.com/robots.txt")
iferr ! =nil {
log.Fatal(err)
}
robots, err := io.ReadAll(res.Body)
res.Body.Close()
iferr ! =nil {
log.Fatal(err)
}
fmt.Printf("%s", robots)
}
Copy the code
The data returned by http.get () is stored in res.body, which we read out via ioutil.readall.
ReadAll ioutil.ReadAll
func ReadAll(r io.Reader) ([]byte, error) {
return io.ReadAll(r)
}
Copy the code
Since version 1.16, ioutil.readall () calls io.readall () directly. We followed up:
// ReadAll reads from r until an error or EOF and returns the data it read.
// A successful call returns err == nil, not err == EOF. Because ReadAll is
// defined to read from src until EOF, it does not treat an EOF from Read
// as an error to be reported.
func ReadAll(r Reader) ([]byte, error) {
b := make([]byte.0.512)
for {
if len(b) == cap(b) {
// Add more capacity (let append pick how much).
b = append(b, 0)[:len(b)]
}
n, err := r.Read(b[len(b):cap(b)])
b = b[:len(b)+n]
iferr ! =nil {
if err == EOF {
err = nil
}
return b, err
}
}
}
Copy the code
Functionally, ReadAll keeps reading data from R until it returns an EOF or fails; However, EOF is not treated as error when returned to the upper layer.
The implementation is then analyzed.
- Line 6, create a 512-byte buffer;
- Lines 7 to 13, repeatedly read data into buffer, if the buffer is full, call
append()
Appends 1 byte, forcing it to reallocate memory - Lines 14 to 18, if called
r.Read()
Error, terminates the loop, and before returning willEOF
To filter out
The key here is the unit of copied data: 512Bytes. If the amount of data to be copied is less than 512 BYTES, it does not matter. If more than 512 BYTES of data are to be copied, frequent realLOC and data copies occur. The larger the data volume is, the more serious the realLOC and data copies occur.
Another point covered here is slice’s expansion strategy.
- If the existing capacity is less than 1024, the new slice capacity will be doubled to prevent frequent expansion.
- If the existing capacity exceeds 1024, the new slice capacity is 1.25 times the existing capacity, preventing space waste.
Are there alternatives?
io.Copy
Without further ado, go directly to the code:
func Copy(dst Writer, src Reader) (written int64, err error) {
return copyBuffer(dst, src, nil)}Copy the code
Function: Read data from SRC to DST. Returns the number of bytes successfully copied.
ReadAll reads the data from the buffer. Copy implements the entire process of data processing: reading data, then writing (using) it.
Because of the semantic limitations, ReadAll is used to process data. The data must be read completely before it can be used. On the other hand, Copy can combine both to read and write at the same time, which is suitable for large data processing scenarios.
Next, look at the implementation of copyBuffer().
func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) {
// If the reader has a WriteTo method, use it to do the copy.
// Avoids an allocation and a copy.
if wt, ok := src.(WriterTo); ok {
return wt.WriteTo(dst)
}
// Similarly, if the writer has a ReadFrom method, use it to do the copy.
if rt, ok := dst.(ReaderFrom); ok {
return rt.ReadFrom(src)
}
if buf == nil {
size := 32 * 1024
if l, ok := src.(*LimitedReader); ok && int64(size) > l.N {
if l.N < 1 {
size = 1
} else {
size = int(l.N)
}
}
buf = make([]byte, size)
}
for {
nr, er := src.Read(buf)
if nr > 0 {
nw, ew := dst.Write(buf[0:nr])
if nw < 0 || nr < nw {
nw = 0
if ew == nil {
ew = errInvalidWrite
}
}
written += int64(nw)
ifew ! =nil {
err = ew
break
}
ifnr ! = nw { err = ErrShortWritebreak}}ifer ! =nil {
ifer ! = EOF { err = er }break}}return written, err
}
Copy the code
- Lines 2 through 6, if
src
The underlying objects are also implementedWriterTo
Interface, then execute directlysrc.WriteTo(dst)
; - Lines 7 to 10, if
dst
The underlying objects are also implementedReaderFrom
Interface, then execute directlydst.ReadFrom(src)
; - Lines 11 to 21, if passed in
buf
If null, create a new 32KB buffer. ifsrc
The bottom layer is also one*LimitedReader
Object (meaning there is a limit to how much data can be read from it), and if the amount of data remaining readable is less than 32KB, limit the buffer size to the same size; - Lines 22 to 41 repeatedly change the data from
src
readbuf
, and write the data todst
; - Lines 42 to 46 terminate the loop if there is an error, and before returning
EOF
Filtered out.
IO.Copy has the following advantages over IO.ReadAll:
- if
src
anddst
, respectively,WriterTo
orReaderFrom
, so the intermediate LINK of BUF cache is omitted, and the data is directly fromsrc
todst
; - Using fixed-length buffers as temporary buffers does not result in frequent expansion of slice.
conclusion
To sum up, ioutil.ReadAll is fine for small data copies; For large volumes of data, ReadAll is a performance bomb, preferably using io.copy.
In addition, Copy provides more complete semantics, so for scenarios where ReadAll() is used, it is recommended to consider the data processing flow as well, abstracting it as a Writer object, and then using Copy to complete the data reading and processing flow.
In particular, if the read data is to be decoded using JSON, you can not even use IO.Copy:
type Result struct {
Msg string `json:"msg"`
Rescode string `json:"rescode"`
}
func parseBody(body io.Reader) {
var v Result
err := json.NewDecoder(body).Decode(&v)
iferr ! =nil {
return nil, fmt.Errorf("DecodeJsonFailed:%s", err.Error())
}
}
Copy the code