Original: Taste of Little Sister (wechat official ID: XjjDog), welcome to share, please reserve the source.
For some reporting backends, some provide data export capabilities. If there are too many dimensions to query and all of them are time-consuming operations, it is like opening Pandora’s box, which can have bad consequences.
Data export, download, and product positioning is closely related. Many of these products are very hardcore and don’t provide you with very common export functions, but you still have to use these systems in a bad way. Because he’s awesome.
However, goose a lot of products, on the comparison of soft. What the customer and boss need, provide what, make a product to do a project completely. It’s pathetic. It’s hateful.
This is fundamental to the existence of demand.
The target
Download tasks usually take up a lot of resources, resulting in high system load and even memory overflow. If not properly controlled, it will often result in service timeouts or outages, which cannot be tolerated.
Our goal is to make the resource usage of download service reach a balanced state, and intercept some repeated download requirements, especially some large data download requirements.
The following content is more ideological. Why is it thought property? Because it does not implement methods, it serves only as a guideline for architectural meaning.
We will optimize from the following aspects.
A, asynchronous
Once a download request is received, it should be returned immediately and placed in a processing queue. After the processing is complete, users are reminded by the notification function. This usually means a change in behaviour and the introduction of notifications such as in-site letters.
For time-consuming download requests, asynchrony is also an optimization of the product experience. Instead of sitting in front of the browser, waiting for the download, the user simply initiates a request.
Second, the file
Data exports and downloads are typically combined with multi-page requests, but this is not the case for a normal presentation. ** File generation process, do not put in memory. ** Especially for concurrency of some size, or the result set is large.
Files are not loaded into memory, but are appended directly to the file. After the file is generated, the system sends the file to a storage engine (such as CDN) for storage and returns the uploaded storage address.
There are a few things to do here.
1. Can file merge be performed for requests with very large time span? You download them separately, break them up, and then you merge them. Because many downloads require some data to be reloaded, files can be shared to avoid this calculation.
2. You can set a domain name for the files uploaded to the storage engine based on their categories to decouple the files. This is mainly for isolation, depending on the situation.
3. Provide a page with a list of downloads, including the maximum time to store. When users need these data, they can directly enter the download list to obtain them.
Three, queuing
** Queuing is mainly a resource limitation. There can be global queueing and standalone queueing only. The simplest solution is to queue in a single machine, and load balancing is carried out by the external NGINx.
Once received, the request is placed in the buffered queue. This cache queue, it could be a thread queue, but it’s easy to lose; It can also be a distributed queue, such as Redis or MQ. The processing process obtains certain tasks and executes them based on the system load. With this queue, we can do a lot of things.
1. Resource utilization can be controlled so as not to process multiple large requests in parallel
2. Anti-reentrant. The same parameters and range will not be processed.
3. Carefully monitor the download tasks, duration and errors of the system.
4, centralized operation, unified way.
4. Pre-calculation
Many downloads are predictable, which means they can be calculated in advance. For example, if the data is downloaded by the day, the files can be generated periodically at night. End of day, end of month, end of year and other data can be carried out in this way.
But consider resource footprint. If your report data is not accessed frequently, then this part of the file generation is not worth the gain.
** This usually causes a lot of computation. ** Therefore, it is worth considering carefully what modules apply to this strategy.
Five, trigger type
This way is more clever, investment is also huge. The idea is to share where the data is generated in the system, via messages or open apis.
The required merchant, with an account number, password token, and so on, can receive this metadata in a continuous stream.
I don’t care what you do with it or how you play it.
Sixth, product optimization
The design of the product directly determines the complexity and stability of the implementation, and agreement must be reached on query conditions. You’ll find that even very common systems are functionally limited when it comes to exporting data.
For example, the printing of social security system, some functions, need to be booked in advance, because the request, may consume a lot of resources. This is to affect the design of products from the limitations of technology.
In specific product design, we should also consider:
1, latitude query does not need to be exhaustive, if the conditions of the download have father and son relationship, occupy the same resources, then only provide the parent class download. After the customer downloads, excel filtering.
2, time latitude to be fixed, across the moon, arbitrary fill this, is absolutely prohibited. This also affects the implementation of many of our programs.
3, should not provide download, to strictly observe the red line, such as users can be refined through simple Excel formula, do not provide chicken rib function. Customers aren’t as stupid as you think they are, and you treat them like they are.
End
This idea is entirely designed for large systems, so don’t try to stick to your system. Two or three clients, thousands of records, trying to play with that, that’s overdesign.
Xjjdog is a public account that doesn’t allow programmers to get sidetracked. Focus on infrastructure and Linux. Ten years architecture, ten billion daily flow, and you discuss the world of high concurrency, give you a different taste. My personal wechat xjjdog0, welcome to add friends, further communication.