Recently, I encountered a need to write an interface to dynamically package zip files.

Step one: Google

When you encounter this requirement, the first step is Google, and then you see an answer on StackOverflow as follows

package main

import (
    "archive/zip"
    "bytes"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
)

func zipHandler(w http.ResponseWriter, r *http.Request) {
    filename := "randomfile.jpg"
    buf := new(bytes.Buffer)
    writer := zip.NewWriter(buf)
    data, err := ioutil.ReadFile(filename)
    iferr ! =nil {
        log.Fatal(err)
    }
    f, err := writer.Create(filename)
    iferr ! =nil {
        log.Fatal(err)
    }
    _, err = f.Write([]byte(data))
    iferr ! =nil {
        log.Fatal(err)
    }
    err = writer.Close()
    iferr ! =nil {
        log.Fatal(err)
    }
    w.Header().Set("Content-Type"."application/zip")
    w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=\"%s.zip\"", filename))
    //io.Copy(w, buf)
    w.Write(buf.Bytes())
}

func main(a) {
    http.HandleFunc("/zip", zipHandler)
    http.ListenAndServe(": 8080".nil)}Copy the code

This interface seems to be fine, because just packaging a JPG doesn’t consume much memory. However, there are two problems with large files:

  1. When a packaged file includes large files, wait until the files are read into memory one by one, then written to the ZIP, and finally written to http.responseWriter. At this point, the memory requirements are relatively large. If the zip file downloaded by the user is 2 gigabytes, the service team needs at least 2 gigabytes of memory. I think a lot of services don’t have that luxury. If this interface is used by multiple users at the same time, the service needs more memory to support it.

  2. If the package file is large, the packaging process is long, and the user experience is poor. After clicking download, the user sees no response from the browser and may keep clicking. In this case, the Repeated packaging of the Web service doubles the memory footprint, which may cause the service to hang up altogether.

Step two, continue Google

I thought there would be a way to write while transferring, so I added stream to the search keyword and found a Ruby implementation. I don’t know Ruby, but I can understand HTTP. The key point is that the content-Length in the header is as follows

  1. The content-Length field in the header indicates the Length of the response. The browser uses this field to determine whether the response is complete. If the content-Length is greater than the actual Length of the file, the browser considers the file failed to download; If content-Length is less than the actual Length of the file, the browser will stop accepting the data prematurely.
  2. The content-Length of the header can be removed, and the browser will accept the request until the service is finished transmitting the data.

Ok, so when the server processes the request, it needs to remove the content-Length from the Header.

Create a zip.Writer file based on the buF, write the file content, and transfer the buF data to http.ResponseWriter. Now the download can’t wait, because it needs to be transferred as it is written. The perfect answer, it seems, is to create a Pipe with pr, pw := io.pipe (), one for input and one for output. The following code

func zipHandlerUsingPipe(w http.ResponseWriter, r *http.Request) {
	pr, pw := io.Pipe()
	writer := zip.NewWriter(pw)
	w.Header().Set("Content-Type"."application/zip")
	w.Header().Set("Content-Disposition"."attachment; filename=\"test.zip\"")
	w.Header().Del("Content-Length")
	var wg sync.WaitGroup
	wg.Add(2)
	go func(a) {
		defer wg.Done()
		defer pw.Close()
		defer writer.Close()
		for time := 0; time < times; time++ {
			filename := fmt.Sprintf("test/%d.txt", time)
			log.Println("start sending file", time)
			f, err := writer.Create(filename)
			iferr ! =nil {
				log.Fatal(err)
			}
			readFile, err := os.Open(sendFilePath(time))
			iferr ! =nil {
				log.Fatal(err)
			}
			buf := make([]byte, bufferLength)
			for {
				n, err := readFile.Read(buf)
				f.Write(buf[:n])
				iferr ! =nil {
					break}}}} ()go func(a) {
		defer wg.Done()
		for {
			dataRead := make([]byte, bufferLength)
			n, err := pr.Read(dataRead)
			w.Write(dataRead[:n])
			iferr ! =nil {
				return
			}
		}
	}()
	wg.Wait()
}
Copy the code

The third step is optimization

Is pipe really necessary? Why not write to http.ResponseWriter? The source code for the function zip.NewWriter is as follows

// NewWriter returns a new Writer writing a zip file to w.
func NewWriter(w io.Writer) *Writer {
	return &Writer{cw: &countWriter{w: bufio.NewWriter(w)}}
}
Copy the code

The parameter NewWriter accepts is an interface IO.Writer. The functions to be implemented are as follows

type Writer interface {
	Write(p []byte) (n int, err error)
}
Copy the code

As you can see from the HTTP.ResponseWriter definition, it implements Write(p []byte) (int, errro)

type ResponseWriter interface {
	Header() Header
	Write([]byte) (int, error)
	WriteHeader(statusCode int)}Copy the code

In this case, we can create a writer directly with zip.newWriter (w). This avoids writing to http.ResponseWriter continuously, because when writing to zip.Writer continuously, it writes to http.ResponseWriter continuously.

At this point the code looks like this

func zipHandlerUsingResp(w http.ResponseWriter, r *http.Request) {
	writer := zip.NewWriter(w)
	w.Header().Set("Content-Type"."application/zip")
	w.Header().Set("Content-Disposition"."attachment; filename=\"test.zip\"")
	w.Header().Del("Content-Length")
	defer writer.Close()
	for time := 0; time < times; time++ {
		filename := fmt.Sprintf("test/%d.txt", time)
		log.Println("start sending file", time)
		f, err := writer.Create(filename)
		iferr ! =nil {
			log.Fatal(err)
		}
		fmt.Println("send file path is ", sendFilePath(time))
		readFile, err := os.Open(sendFilePath(time))
		iferr ! =nil {
			log.Fatal(err)
		}
		buf := make([]byte, bufferLength)
		for {
			n, err := readFile.Read(buf)
			f.Write(buf[:n])
			iferr ! =nil {
				break
			}
		}
		readFile.Close()
	}
}
Copy the code

conclusion

By deepening my understanding of HTTP requests and Writer, I gradually removed unnecessary processing logic, and finally realized a perfect solution. It’s a fun learning process. In this paper, all the code in https://github.com/dahaihu/zip_server, feel useful classmate, to praise the article point!!!