Large file upload: second upload, resumable upload, fragment upload

Source: Assigned (author – Xiaodu ye)

preface

File upload is a cliche topic, in the case of a file is relatively small, can put the file into a byte stream directly uploaded to the server, but in the case of a file is bigger, use ordinary manner, this isn’t a good idea, after all, few people can stand, when a file upload half after interruption, continue to upload to upload only to start over, It’s an unpleasant experience. Is there a better upload experience, the answer is some, is the following to introduce several upload methods

Detailed tutorial

A pass

1. What is second pass

Popular said, you have to upload to upload, the server will do first MD5 check, if there is something on the server, it will directly give you a new address, in fact you download is the same file on the server, want to pass from either, in fact as long as let the MD5 change, is to modify a file itself do the no name (change), such as a text file, If you add a few more words, MD5 will change and will not be transmitted in seconds.

2. The core logic of second transmission implemented in this paper

A. Use the set method of Redis to store the file uploading status, where key is md5 of the file uploading and value is the flag bit of whether the file uploading is completed.

B. When the flag bit true indicates that the upload has been completed, if the same file has been uploaded, the second transmission logic will be entered. If the flag bit is false, it indicates that the file has not been uploaded. In this case, you need to call the set method to save the path of the file with the block number. Key is md5 of the uploaded file with a fixed prefix, and value is the path of the file with the block number

Shard to upload

1. What is Sharding upload

Fragment upload means that the file to be uploaded is divided into multiple data blocks (called parts) according to a certain size for uploading. After uploading, the server summarizes all uploaded files into original files.

2. Fragment upload scenario

1. Upload large files

2. The network environment is poor, and retransmission risks exist

Breakpoint continuingly

1. What is breakpoint continuation

Breakpoint continuingly is in the download or upload, download or upload task (a file or a tarball) artificially divided into several parts, each part using a thread to upload or download, if encounter network failure, can start from have already upload or download some continue to upload or download unfinished part, and there is no need to upload or download from the very beginning. This article focuses on the breakpoint upload scenario.

2. Application scenarios

Breakpoint continuation can be regarded as a derivative of shard upload. Therefore, breakpoint continuation can be used in any scenario where shard upload can be used.

3. Realize the core logic of breakpoint continuation

If the fragment upload is interrupted due to abnormal factors such as system crash or network interruption, the client needs to record the upload progress. When re-uploading is supported later, you can continue uploading from where the last upload was interrupted.

In order to avoid the client after upload the progress of the data to be deleted and the problem that the lead to start again from the beginning to upload, the server can also provide the corresponding interface to facilitate the client already uploaded in the fragmented data query, so that the client know fragmented data has been uploaded, thus starting a shard data continue to upload.

4. Implement the process steps

A. Plan 1: Routine steps

Divide the files to be uploaded into data blocks of the same size according to certain segmentation rules.
Initialize a fragment upload task and return the unique identifier of the fragment upload.
Send each fragmented data block according to a certain strategy (serial or parallel);
After the data is uploaded, the server determines whether the data is uploaded completely. If yes, the server synthesizes data blocks to obtain the original file.

B. Plan 2. Steps of this paper

The front end (client) needs to fragment the file according to the fixed size, and the back end (server) requests the fragment number and size
The server creates a conf file to record the location of the partition. The length of the conf file is a total number of partitions. If a partition is uploaded, 127 is written into the conf file.
The server calculates the starting position according to the fragment number and the size of each fragment given in the request data (the fragment size is fixed and the same), and writes the fragment data to the file.

5, fragment upload/breakpoint upload code implementation

A. Front-end use webuploader plug-in provided by Baidu for sharding. Because this article mainly introduces the server code implementation, how webuploader sharding, the specific implementation can be viewed as follows:

Fex.baidu.com/webuploader…

B, the backend uses two ways to write files, one is to use RandomAccessFile, if you are not familiar with RandomAccessFile, you can check the following link:

Blog.csdn.net/dimudan2015…

Another way is to use MappedByteBuffer. If you are not familiar with MappedByteBuffer, you can check out the following link:

www.jianshu.com/p/f90866dcb…

The core code for back-end write operations

A, RandomAccessFile implementation

@UploadMode(mode = UploadModeEnum.RANDOM_ACCESS) @Slf4j public class RandomAccessUploadStrategy extends SliceUploadTemplate { @Autowired private FilePathUtil filePathUtil; @Value("${upload.chunkSize}") private long defaultChunkSize; @Override public boolean upload(FileUploadRequestDTO param) { RandomAccessFile accessTmpFile = null; try { String uploadDirPath = filePathUtil.getPath(param); File tmpFile = super.createTmpFile(param); accessTmpFile = new RandomAccessFile(tmpFile, "rw"); Long chunkSize = objects.isnull (param.getChunkSize())? defaultChunkSize * 1024 * 1024 : param.getChunkSize(); long offset = chunkSize * param.getChunk(); // Locate the fragment's offset accesstmpfile.seek (offset); Accesstmpfile.write (param.getFile().getBytes())); boolean isOk = super.checkAndSetUploadProgress(param, uploadDirPath); return isOk; } catch (IOException e) { log.error(e.getMessage(), e); } finally { FileUtil.close(accessTmpFile); } return false; }}Copy the code

B. Implementation of MappedByteBuffer

@UploadMode(mode = UploadModeEnum.MAPPED_BYTEBUFFER) @Slf4j public class MappedByteBufferUploadStrategy extends SliceUploadTemplate { @Autowired private FilePathUtil filePathUtil; @Value("${upload.chunkSize}") private long defaultChunkSize; @Override public boolean upload(FileUploadRequestDTO param) { RandomAccessFile tempRaf = null; FileChannel fileChannel = null; MappedByteBuffer mappedByteBuffer = null; try { String uploadDirPath = filePathUtil.getPath(param); File tmpFile = super.createTmpFile(param); tempRaf = new RandomAccessFile(tmpFile, "rw"); fileChannel = tempRaf.getChannel(); long chunkSize = Objects.isNull(param.getChunkSize()) ? defaultChunkSize * 1024 * 1024 : param.getChunkSize(); // Write the fragment data long offset = chunkSize * param.getchunk (); byte[] fileData = param.getFile().getBytes(); mappedByteBuffer = fileChannel .map(FileChannel.MapMode.READ_WRITE, offset, fileData.length); mappedByteBuffer.put(fileData); boolean isOk = super.checkAndSetUploadProgress(param, uploadDirPath); return isOk; } catch (IOException e) { log.error(e.getMessage(), e); } finally { FileUtil.freedMappedByteBuffer(mappedByteBuffer); FileUtil.close(fileChannel); FileUtil.close(tempRaf); } return false; }}Copy the code

C, file operation core template class code

@Slf4j public abstract class SliceUploadTemplate implements SliceUploadStrategy { public abstract boolean upload(FileUploadRequestDTO param); protected File createTmpFile(FileUploadRequestDTO param) { FilePathUtil filePathUtil = SpringContextHolder.getBean(FilePathUtil.class); param.setPath(FileUtil.withoutHeadAndTailDiagonal(param.getPath())); String fileName = param.getFile().getOriginalFilename(); String uploadDirPath = filePathUtil.getPath(param); String tempFileName = fileName + "_tmp"; File tmpDir = new File(uploadDirPath); File tmpFile = new File(uploadDirPath, tempFileName); if (! tmpDir.exists()) { tmpDir.mkdirs(); } return tmpFile; } @Override public FileUploadDTO sliceUpload(FileUploadRequestDTO param) { boolean isOk = this.upload(param); if (isOk) { File tmpFile = this.createTmpFile(param); FileUploadDTO fileUploadDTO = this.saveAndFileUploadDTO(param.getFile().getOriginalFilename(), tmpFile); return fileUploadDTO; } String md5 = FileMD5Util.getFileMD5(param.getFile()); Map<Integer, String> map = new HashMap<>(); map.put(param.getChunk(), md5); return FileUploadDTO.builder().chunkMd5Info(map).build(); } / check and modify the file upload progress * * * * / public Boolean checkAndSetUploadProgress (FileUploadRequestDTO param. String uploadDirPath) { String fileName = param.getFile().getOriginalFilename(); File confFile = new File(uploadDirPath, fileName + ".conf"); byte isComplete = 0; RandomAccessFile accessConfFile = null; try { accessConfFile = new RandomAccessFile(confFile, "rw"); Println ("set part "+ param.getchunk () +" complete"); The length of the file is the total number of slices. For each block uploaded, 127 is written to the conf file. The unuploaded location is 0 by default, and the uploaded location is byt.max_value 127 AccessConffile.setLength (param.getchunks ()); accessConfFile.seek(param.getChunk()); accessConfFile.write(Byte.MAX_VALUE); / / completeList check whether complete, if in the array are all 127 (all shard successfully uploaded) byte [] completeList = FileUtils. ReadFileToByteArray (confFile); isComplete = Byte.MAX_VALUE; for (int i = 0; i < completeList.length && isComplete == Byte.MAX_VALUE; MAX_VALUE isComplete = (Byte) (isComplete & completeList[I]); // isComplete = (Byte) (isComplete & completeList[I]); System.out.println("check part " + i + " complete? :" + completeList[i]); } } catch (IOException e) { log.error(e.getMessage(), e); } finally { FileUtil.close(accessConfFile); } boolean isOk = setUploadProgress2Redis(param, uploadDirPath, fileName, confFile, isComplete); return isOk; } /** * private Boolean setUploadProgress2Redis(FileUploadRequestDTO param, String uploadDirPath, String fileName, File confFile, byte isComplete) { RedisUtil redisUtil = SpringContextHolder.getBean(RedisUtil.class); if (isComplete == Byte.MAX_VALUE) { redisUtil.hset(FileConstant.FILE_UPLOAD_STATUS, param.getMd5(), "true"); redisUtil.del(FileConstant.FILE_MD5_KEY + param.getMd5()); confFile.delete(); return true; } else { if (! redisUtil.hHasKey(FileConstant.FILE_UPLOAD_STATUS, param.getMd5())) { redisUtil.hset(FileConstant.FILE_UPLOAD_STATUS, param.getMd5(), "false"); redisUtil.set(FileConstant.FILE_MD5_KEY + param.getMd5(), uploadDirPath + FileConstant.FILE_SEPARATORCHAR + fileName + ".conf"); } return false; } /** * Public FileUploadDTO saveAndFileUploadDTO(String fileName, File tmpFile) { FileUploadDTO fileUploadDTO = null; try { fileUploadDTO = renameFile(tmpFile, fileName); if (fileUploadDTO.isUploadComplete()) { System.out .println("upload complete !!" + fileUploadDTO.isUploadComplete() + " name=" + fileName); }} Catch (Exception e) {log.error(LLDB etMessage(), e); } finally { } return fileUploadDTO; } /** * File rename ** @param toBeRenamed the File toBeRenamed * @param toFileNewName new name */ private FileUploadDTO renameFile(File) ToBeRenamed, String toFileNewName) {// check if the file toBeRenamed exists, FileUploadDTO FileUploadDTO = new FileUploadDTO(); if (! toBeRenamed.exists() || toBeRenamed.isDirectory()) { log.info("File does not exist: {}", toBeRenamed.getName()); fileUploadDTO.setUploadComplete(false); return fileUploadDTO; } String ext = FileUtil.getExtension(toFileNewName); String p = toBeRenamed.getParent(); String filePath = p + FileConstant.FILE_SEPARATORCHAR + toFileNewName; File newFile = new File(filePath); Boolean uploadFlag = toBeRenamed. RenameTo (newFile); fileUploadDTO.setMtime(DateUtil.getCurrentTimeStamp()); fileUploadDTO.setUploadComplete(uploadFlag); fileUploadDTO.setPath(filePath); fileUploadDTO.setSize(newFile.length()); fileUploadDTO.setFileExt(ext); fileUploadDTO.setFileId(toFileNewName); return fileUploadDTO; }}Copy the code

conclusion

In the process of fragment uploading, front-end and back-end need to cooperate. For example, the upload block number of the front and back ends must be the same as the file size of the front and back ends; otherwise, there will be problems in uploading. Secondly, file related operations are normal to build a file server, such as using FASTdFS, HDFS, etc.

In this example code, when the computer is configured with 4-core memory of 8G, it takes more than 30 minutes to upload a file of 24G size. The main time is spent in the front-end CALCULATION of MD5 value, while the back-end writing speed is relatively fast. If the project team thinks that it takes too much time to build a file server, and the project needs only upload and download, it is recommended to use ali’s OSS server. You can check the official website for its introduction:

Help.aliyun.com/product/318…

Ali’s OSS is essentially an object storage server, not a file server, so OSS may not be a good choice if there are requirements that involve massive deletion or modification of files.

You can upload files directly from the front end to the OSS server, putting the upload pressure on the OSS server:

www.cnblogs.com/ossteam/p/4…

Recommend 3 original Springboot +Vue projects, with complete video explanation and documentation and source code:

Build a complete project from Springboot+ ElasticSearch + Canal

Video tutorial: www.bilibili.com/video/BV1Jq…
A complete development documents: www.zhuawaba.com/post/124
Online demos: www.zhuawaba.com/dailyhub

【VueAdmin】 hand to hand teach you to develop SpringBoot+Jwt+Vue back-end separation management system

Full 800 – minute video tutorial: www.bilibili.com/video/BV1af…
Complete development document front end: www.zhuawaba.com/post/18
Full development documentation backend: www.zhuawaba.com/post/19

【VueBlog】 Based on SpringBoot+Vue development of the front and back end separation blog project complete teaching

Full 200 – minute video tutorial: www.bilibili.com/video/BV1af…
Full development documentation: www.zhuawaba.com/post/17

If you have any questions, please come to my official account [Java Q&A Society] and ask me