Ioutil. ReadAll vs. IO.Copy in Golang

ioutil.ReadAll

ReadAll stands out both because we often need to read data out of an IO.Reader object and because it is often teased for its performance issues.

Let’s take a look at the usage scenarios. For example, we use http.Client to send a GET request:

func main(a) {
	res, err := http.Get("http://www.google.com/robots.txt")
	iferr ! =nil {
		log.Fatal(err)
	}
	robots, err := io.ReadAll(res.Body)
	res.Body.Close()
	iferr ! =nil {
		log.Fatal(err)
	}
	fmt.Printf("%s", robots)
}
Copy the code

The data returned by http.get () is stored in res.body, which we read out via ioutil.readall.

ReadAll ioutil.ReadAll

func ReadAll(r io.Reader) ([]byte, error) {
	return io.ReadAll(r)
}
Copy the code

Since version 1.16, ioutil.readall () calls io.readall () directly. We followed up:

// ReadAll reads from r until an error or EOF and returns the data it read.
// A successful call returns err == nil, not err == EOF. Because ReadAll is
// defined to read from src until EOF, it does not treat an EOF from Read
// as an error to be reported.
func ReadAll(r Reader) ([]byte, error) {
	b := make([]byte.0.512)
	for {
		if len(b) == cap(b) {
			// Add more capacity (let append pick how much).
			b = append(b, 0)[:len(b)]
		}
		n, err := r.Read(b[len(b):cap(b)])
		b = b[:len(b)+n]
		iferr ! =nil {
			if err == EOF {
				err = nil
			}
			return b, err
		}
	}
}
Copy the code

Functionally, ReadAll keeps reading data from R until it returns an EOF or fails; However, EOF is not treated as error when returned to the upper layer.

The implementation is then analyzed.

Line 6, create a 512-byte buffer;
Lines 7 to 13, repeatedly read data into buffer, if the buffer is full, callappend()Appends 1 byte, forcing it to reallocate memory
Lines 14 to 18, if calledr.Read()Error, terminates the loop, and before returning willEOFTo filter out

The key here is the unit of copied data: 512Bytes. If the amount of data to be copied is less than 512 BYTES, it does not matter. If more than 512 BYTES of data are to be copied, frequent realLOC and data copies occur. The larger the data volume is, the more serious the realLOC and data copies occur.

Another point covered here is slice’s expansion strategy.

If the existing capacity is less than 1024, the new slice capacity will be doubled to prevent frequent expansion.
If the existing capacity exceeds 1024, the new slice capacity is 1.25 times the existing capacity, preventing space waste.

Are there alternatives?

io.Copy

Without further ado, go directly to the code:

func Copy(dst Writer, src Reader) (written int64, err error) {
	return copyBuffer(dst, src, nil)}Copy the code

Function: Read data from SRC to DST. Returns the number of bytes successfully copied.

ReadAll reads the data from the buffer. Copy implements the entire process of data processing: reading data, then writing (using) it.

Because of the semantic limitations, ReadAll is used to process data. The data must be read completely before it can be used. On the other hand, Copy can combine both to read and write at the same time, which is suitable for large data processing scenarios.

Next, look at the implementation of copyBuffer().

func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) {
	// If the reader has a WriteTo method, use it to do the copy.
	// Avoids an allocation and a copy.
	if wt, ok := src.(WriterTo); ok {
		return wt.WriteTo(dst)
	}
	// Similarly, if the writer has a ReadFrom method, use it to do the copy.
	if rt, ok := dst.(ReaderFrom); ok {
		return rt.ReadFrom(src)
	}
	if buf == nil {
		size := 32 * 1024
		if l, ok := src.(*LimitedReader); ok && int64(size) > l.N {
			if l.N < 1 {
				size = 1
			} else {
				size = int(l.N)
			}
		}
		buf = make([]byte, size)
	}
	for {
		nr, er := src.Read(buf)
		if nr > 0 {
			nw, ew := dst.Write(buf[0:nr])
			if nw < 0 || nr < nw {
				nw = 0
				if ew == nil {
					ew = errInvalidWrite
				}
			}
			written += int64(nw)
			ifew ! =nil {
				err = ew
				break
			}
			ifnr ! = nw { err = ErrShortWritebreak}}ifer ! =nil {
			ifer ! = EOF { err = er }break}}return written, err
}
Copy the code

Lines 2 through 6, ifsrcThe underlying objects are also implementedWriterToInterface, then execute directlysrc.WriteTo(dst);
Lines 7 to 10, ifdstThe underlying objects are also implementedReaderFromInterface, then execute directlydst.ReadFrom(src);
Lines 11 to 21, if passed inbufIf null, create a new 32KB buffer. ifsrcThe bottom layer is also one*LimitedReaderObject (meaning there is a limit to how much data can be read from it), and if the amount of data remaining readable is less than 32KB, limit the buffer size to the same size;
Lines 22 to 41 repeatedly change the data fromsrcreadbuf, and write the data todst;
Lines 42 to 46 terminate the loop if there is an error, and before returningEOFFiltered out.

IO.Copy has the following advantages over IO.ReadAll:

ifsrcanddst, respectively,WriterToorReaderFrom, so the intermediate LINK of BUF cache is omitted, and the data is directly fromsrctodst;
Using fixed-length buffers as temporary buffers does not result in frequent expansion of slice.

conclusion

To sum up, ioutil.ReadAll is fine for small data copies; For large volumes of data, ReadAll is a performance bomb, preferably using io.copy.

In addition, Copy provides more complete semantics, so for scenarios where ReadAll() is used, it is recommended to consider the data processing flow as well, abstracting it as a Writer object, and then using Copy to complete the data reading and processing flow.

In particular, if the read data is to be decoded using JSON, you can not even use IO.Copy:

type Result struct {
    Msg string `json:"msg"`
    Rescode string `json:"rescode"`
}
func parseBody(body io.Reader) {
	var v Result
	err := json.NewDecoder(body).Decode(&v)
	iferr ! =nil {
		return nil, fmt.Errorf("DecodeJsonFailed:%s", err.Error())
	}
}
Copy the code

Ioutil. ReadAll vs. IO.Copy in Golang

ioutil.ReadAll

io.Copy

conclusion

Related Posts

Front, middle, back, and order traversal of binary trees

Linux kernel source code analysis process outline and scheduling timing

The Rust Authority guide to speed Reading in 5 Minutes (Part 3)