On May 16, 2018, the collector (warehouse version) with a number of black technology was officially released.
In the application validation before release, the warehouse version has already won praise from users with its strength. When evaluating the warehouse version, users of Bank of Beijing said: In the data analysis practice, the problem of long system response time caused by high concurrent access and large amount of data calculation has not been well solved. Set calculator (warehouse version), completely solve this problem! It can be said that it is the best solution to build the middle layer of data calculation by preloading the hot data of high frequency with the aggregator, which is better than the database products worth millions in many scenarios.
The actual tests proved that the aggregator (warehouse version) did indeed perform well! Take performance indicators as an example. The test target is high-frequency hotspot data, 30 million lines per day. The aggregator and GreenPlum perform the same conditional query, with aggregator completing it in 2 seconds and GreenPlum performing it in 5 seconds. GP test environment is 5 node cluster, each node is 2*6 core CPU, 96 GB memory physical machine; The server where the aggregator resides is only a VIRTUAL machine with 1 x 2 core cpus and 16 GB memory. Can surpass GP substantially, show the power of black technology!
What kind of black technology can win the user’s praise with such excellent performance? Here, let’s go one by one:
One of the hacks: group tables
Group table, also known as composite table, is the basic way of data storage in aggregator (warehouse version). The group table supports partial data modification, update and recovery, and can safely and conveniently synchronize hotspot data from the full data source. Group tables support indexes, and the data itself is ordered storage, common conditional filtering calculation does not rely on indexes to ensure high performance. Indexes are automatically updated after data modification.
In the tests mentioned above, column hard disk storage was used. Group tables can also use row memory and full memory to store data and support in-memory database operation.
Black technology of two: can be stored together
The column storage mechanism used by group tables is different from regular column storage. Conventional column storage (such as the Parquet format), which can only be partitioned and stored within a block, is limited when doing parallel computing. Group table can be parallel compression column storage mechanism, using the multiplication segmentation technology, allowing parallel computing of any segment, can use the multi-CPU core computing capacity of the hard disk IO to the extreme.
Black technology three: cluster group table
In addition to the above features, the group table also supports the distribution of data across multiple machines to form a cluster group table. Cluster group tables can use multi-machine parallelism to horizontally expand computing power and storage capacity. In usage, the cluster group table is basically the same as the normal group table. In other words, the cluster group table is a transparent cluster, and users do not need to care about the details of the multi-node group table, and can be used as a group table.
Black technology four: the unification of the master schedule
Some tables are same-dimensional (one-to-one) relationships or master relationships. For example: customer table, VIP customer table; User basic information, family information, education experience, work experience; Order, order details.
The unification of the master schedule means that the same dimension table or the master child table is put into a group table, and only one primary key is stored. In addition, the JOIN calculation of these tables can be eliminated, reducing storage space and effectively improving performance.
Black technology five: serial number key technology
The JOIN calculation of foreign key relationships is also common. For example, a sales record table is associated with an item list by an item number. The serial number key technology is to change the serial number of the goods in the sales record to an integer, and this number is the serial number of the corresponding goods in the list of goods.
The ordinal key technology enables foreign key JOIN calculation to locate directly using Ordinal Numbers, eliminating the need to calculate and compare HASH values, reducing calculation time and improving performance. At the same time, it is easy to execute multiple joins in parallel by using serial key technology, such as foreign key technology.
Black technology six: JDBC intelligent gateway
The collector provides a JDBC driver and simple SQL interface externally, and has a programmable gateway mechanism. By writing the SPL code of the new generation programming language built into the set processor, the calculation rules of high frequency hotspot data can be realized freely.
Calculation rules can be considered including: analyze the date parameters in the SQL filtering conditions passed in by the front end. If the date that has been cached by the set operator (warehouse version) is hit, it is considered as hot data and accessed directly. If no match is found, the SQL is forwarded to the background traditional database for execution. The collector can also record the access conditions and analyze the temporal and spatial distribution of hotspot data.
With a lot of black technology set calculator (warehouse version) has been able to compete with traditional database, memory database and other expensive products, but its price is very realistic and favorable. You can expect this product to bring new value and opportunities to your software projects. Now, rungan official website offers a fully functional trial version immediatelyDownload experience!!!!