“In recent years, the emerging Internet services, and telecommunications, finance, and traffic and so on various traditional industry appeared the explosive growth of data assets, the type of the data assets is given priority to with unstructured and semi-structured, how to store and process of low cost and high efficiency even the PB EB magnitude data became a great challenge.” — Excerpt from Hbase Authoritative Guide with big data

Increasing amount of data and enterprise scale expands unceasingly, the business system is more diverse, we face the same, the analysis of the different source data processing ability in stand the test of the giant, and a distributed system requirements for data consistency is extremely high, at the same time a large number of business data and trend of the data on the cloud several proves several positions of convenience, But in the actual work of data warehouse, only 20% of the time is spent on data mining, and the rest of the time is mostly spent on data synchronization, task scheduling, data cleaning and so on to construct qualified data. This article introduces the overall new feature points and future iteration planning of CloudQuery 1.4 from the background of this era.

Introduction of “DTS” concept, data tool integration

As mentioned above, a lot of time is spent on data processing in warehouse operations. CloudQuery is intended to solve problems encountered by users in data manipulation. Before version 1.4, we focused on optimizing editor experience and overcoming difficulties. When we gradually improved the data operation workbench, we found that data operation by SQL (handwritten SQL or visual data query and editing, terminal operation) was only a part of the work of database related personnel. More often, we need to interact with data in batches, such as batch data import, data migration, etc. At this time, SQL or terminal form is far from our expectations for this function.

Therefore CloudQuery introduced the concept of “DTS” in version 1.4, which stands for “database toolbox services”. As a sub-module of DTS, import/Export facilitates data transmission for roles such as DBA and O&M.

In version 1.4.0, we first introduced the dump format export function for MySQL and Oracle databases. Later, we will support specific data tools for each data source, such as PostgreSQL dump import and export function. Meanwhile, database tools are also included in the permission control. Users must be authorized by the administrator to use database tools.

In addition, “Data migration” will be added as a tool to the “CloudQuery DTS” family during the 1.4 iteration. When we design the data migration module given the current distributed business systems usually do not use a single data source for data storage, puts forward the homologous/heterogeneous migration, at the same time, according to the similarities and differences of the database structure is “homogeneous/heterogeneous migration”, so provide migration is also more diversified, specific as follows:

  • Migrating data volume
    • Full amount
    • The incremental
  • The migration time
    • real-time
    • timing
  • The migration end
    • homologous
    • The heterogeneous
      • Relational to Relational
      • Relational to non-relational
      • Relational/non-relational to Big data/data warehouse
    • homogeneous
    • heterogeneous
  • Selective migration
    • Horizontal partitioning
    • Vertical segmentation

Since the design of “data migration” is relatively complex and includes a wide range, we will continue to improve the function of “data migration” in version 1.4. We also hope that you can put forward your own suggestions in using it. We will adjust the function after careful analysis.

“Visualization” module, data operation without human

While “DTS” is closer to the underlying data in the database, “Visualization” is closer to the business scenario. The second major feature introduced in CloudQuery 1.4 is “visualization”, consisting of two modules: “Visualization assisted Query” and “ER Modeling”.

Visual aid

Not every user in the query enterprise can skillfully use SQL statements. For users who have no SQL foundation but need to conduct data query, they need “visualization” to assist.

CloudQuery 1.4.0 added the “Visualization Aided Query” function, so that users can operate data in a graphical way, adding query, filtering, sorting and other conditions easier to understand, even if they do not understand SQL or database.

At the same time, we will support users to query the canvas to save or generate SQL statement to save, convenient in the future use of direct access to the results.

ER modeling

“ER modeling” is aimed at relatively advanced users, rendering the table relations under the database in the form of ER graph, so that the primary and foreign keys and constraint relations are more intuitive.

At the same time, the ER diagram rendering canvas supports changes to table structures, such as adding tables, designing tables, deleting tables, adding constraints, etc. ER graph canvas also supports export in image format, which is convenient for DBA to sort out the relationship between database elements and circulate them in business.

Added data source support to cover all types of databases

CloudQuery is a unified entry point to the database, data source support is the most basic function, and as a community-focused product, the needs of users are critical. During the 1.3 iteration, we continued to collect suggestions from community users. After evaluation, we will add the following data sources in the 1.4 iteration:

  • Hive
  • Es
  • DB2
  • PolarDB
  • OceanBase

CloudQuery continues to expand the variety of data sources without ignoring the characteristics of each data source, and will optimize the automatic prompting of the data manipulation area and the presentation of result sets in a future release.

OpenAPI, leveraging the power of the community

CloudQuery is a product that continues to grow in the community, but there is a limit to what we can do on our own. So we will continue to open up our API as we move through our functional iterations, making it easier for developers to access third-party applications, organizational structures and other resources within the enterprise.

Next, we will give priority to opening part of the interface of the “user” module, but before calling the interface, the system administrator needs to activate the developer identity of the specified user in the internal developer center of the platform. After identity activation, appID and corresponding secret will be obtained, which will be used as the key for authentication and invocation of API interface.

conclusion

That is the overall functionality and iteration plan for CloudQuery in the upcoming 1.4 release. We will publish a series of articles on the new features for version 1.4, detailing each new feature point and the technology behind it to give you a better understanding of CloudQuery’s architecture. At the same time, we will continue to improve our basic capabilities to bring more convenient and fast data operation and interactive experience to community users.

Official website address:cloudquery.club/