DolphinDB provides two ways to import MySQL data: the ODBC plug-in and the MySQL plug-in. We recommend using the MySQL plug-in to import MySQL data because it is faster than ODBC import, importing 6.5 GIGABytes of data is 4 times faster than ODBC, and using the MySQL plug-in requires no configuration, whereas ODBC requires configuration of the data source.

Before using the MySQL plug-in, refer to the DolphinDB installation guide to install DolphinDB.

1. Download the plug-in

DolphinDB installation directory /server/plugins/mysql already contains mysql plug-ins that can be used directly. If you need to compile your own, you can refer to github.com/dolphindb/D… .

2. Load the plug-in

In the GUI, use the loadPlugin function to load the MySQL plugin:

loadPlugin(server_dir+"/plugins/mysql/PluginMySQL.txt")
Copy the code

Interface functions

The DolphinDB MySQL plugin provides the following interface functions:

  • connect
  • showTables
  • extractSchema
  • load
  • loadEx

We can call the plug-in’s interface functions in two ways:

(1) moduleName::apiFunction. For example, call the connect method of the MySQL plug-in.

mysql::connect(host, port, user, password, db)
Copy the code

(2) Use moduleName and call the interface function directly. After the use statement is executed once, the use function does not need to be executed again in subsequent calls to the interface function. Therefore, we generally recommend this call method.

use mysql
connect(host, port, user, password, db)
Copy the code

3.1 the connect

grammar

connect(host, port, user, password, db)

parameter

Host is the host name of the MySQL server.

Port is the port number of the MySQL server. The default value is 3306.

User is the user name in the MySQL server.

Password is the password corresponding to user.

Db is the name of the database in MySQL.

details

Create MySQL connection, return MySQL connection handle. We recommend that the Authentication Type for MySQL users be mysql_native_password.

example

Connect to the Employees database on the local MySQL server.

Conn = connect (" 127.0.0.1 ", 3306, the "root", "123456", "employees")Copy the code

3.2 showTables

grammar

showTables(connection)

parameter

Connection is the connection handle returned by the connect function.

details

Returns a DolphinDB table containing the names of all the tables in the MySQL database.

example

View the tables in the Employees database.

showTables(conn);

Tables_in_employees
current_dept_emp
departments
dept_emp
dept_emp_latest_date
dept_manager
employees
salaries
test_datatypes
titles
Copy the code

3.2 extractSchema

grammar

extractSchema(connection, tableName)

parameter

Connection is the connection handle returned by the connect function.

TableName is the name of the table in the MySQL database.

details

The result is a DolphinDB table. The first column is the names of fields in the MySQL table, the second column is the data types imported to DolphinDB, and the third column is the data types in MySQL.

example

View the data types for each column in the Employees table.

extractSchema(conn,`employees);

name	        type	  MySQL describe type	
emp_no	        LONG	  int(11)	                
birth_date	DATE	  date	                
first_name	STRING	  varchar(14)	        
last_name	STRING	  varchar(16)	        
gender	        SYMBOL	  enum('M','F')	        
hire_date	DATE	  date	               
Copy the code

3.3 the load

grammar

load(connection, table|query, [schema], [startRow], [rowNum])

parameter

Connection is the connection handle returned by the connect function.

Table is the name of the table in the MySQL server.

Query is a query statement in MySQL.

Schema is a DolphinDB type table that contains two columns, the first for field names and the second for data types. It is optional. You can specify this parameter to change the data type of DolphinDB when it is loaded.

StartRow is a positive integer that indicates the starting number of rows to read data. It is optional and defaults to 0, indicating that data is read from the first record.

RowNum is a positive integer that represents the number of rows read. It is optional and, if not specified, reads all data. If the second argument is query, the startRow and rowNum arguments are invalid.

details

Load MySQL data into DolphinDB memory tables.

example

  1. Load all the data in the employees table into the DolphinDB memory table.

    t=load(conn,”employees”);

    Emp_no birth_date first_name last_name Gender hire_date 10,001 1953.09.02 Georgi Facello M 1986.06.26 10,002 1964.06.02 Bezalel Simmel F 1985.11.21 10 003 1959.12.03 Parto Bamford M 1986.08.28 10 004 1954.05.01 Chirstian Koblick M 1986.12.01 10 005 1955.01.21 Kyoichi Maliniak M 1989.09.12 10 006 1953.04.20 Anneke Preusig F 1989.06.02 10 007 1957.05.23 Tzvetan Zielinski F 1989.02.10 10 008 1958.02.19 Saniya Kalloufi M 1994.09.15 10 009 1952.04.19 Sumant Peac F 1985.02.18 10 010 1963.06.01 Duangkaew Piveteau F 1989.08.24…

2. Load the first 10 rows from the employees table into the DolphinDB memory table.

t=load(conn,"select * from employees limit 10");

emp_no	birth_date	first_name	last_name	gender	hire_date
10,001	1953.09.02	Georgi	        Facello	        M	1986.06.26
10,002	1964.06.02	Bezalel	        Simmel	        F	1985.11.21
10,003	1959.12.03	Parto	        Bamford	        M	1986.08.28
10,004	1954.05.01	Chirstian	Koblick	        M	1986.12.01
10,005	1955.01.21	Kyoichi	        Maliniak	M	1989.09.12
10,006	1953.04.20	Anneke	        Preusig	        F	1989.06.02
10,007	1957.05.23	Tzvetan	        Zielinski	F	1989.02.10
10,008	1958.02.19	Saniya	        Kalloufi	M	1994.09.15
10,009	1952.04.19	Sumant	        Peac	        F	1985.02.18
10,010	1963.06.01	Duangkaew	Piveteau	F	1989.08.24
Copy the code

Select last_name from last_name and change it to SYMBOL.

schema=select name,type from extractSchema(conn,`employees) update schema set type="SYMBOL" where name="last_name" T =load(conn,"employees",schema) chunkPath-> partitionColumnIndex->-1 colDefs-> name typeString typeInt ---------- ---------- ------- emp_no LONG 5 birth_date DATE 6 first_name STRING 18 last_name SYMBOL 18 gender SYMBOL 17 hire_date DATE 6Copy the code

3.4 loadEx

grammar

loadEx(connection, dbHandle, tableName, partitionColumns, table|query, [schema], [startRow], [rowNum])

parameter

Connection is the connection handle returned by the connect function.

A dbHandle is a DolphinDB database handle, usually an object returned by the Database function.

TableName is the name of a table in the DolphinDB database.

PartitionColumns are string scalars or vectors that represent partitioned columns.

Table is a string that represents the name of a table in the MySQL server.

Query is a query statement in MySQL.

Schema is a DolphinDB type table that contains two columns, the first for field names and the second for data types. It is optional. You can specify this parameter to change the data type of DolphinDB when it is loaded.

StartRow is a positive integer that indicates the starting number of rows to read data. It is optional and defaults to 0, indicating that data is read from the first record.

RowNum is a positive integer that represents the number of rows read. It is optional and, if not specified, reads all data. If the second argument is query, the startRow and rowNum arguments are invalid.

details

Load data from MySQL into DolphinDB’s partition table. LoadEx does not support loading data into DolphinDB’s sequential partition table.

example

Load the employees table into the DolphinDB disk VALUE partition table.

db=database("H:/DolphinDB/Data/mysql",VALUE,`F`M) pt=loadEx(conn,db,"pt","gender","employees") select count(*) from loadTable(db,"pt"); The count of 300024Copy the code

If you need to load data into a memory partition table, simply change the database path to an empty string. If you need to load data into distributed tables, change the database path to a path starting with “DFS ://”, for example, “DFS ://mysql”. Distributed tables need to be in a cluster to be used. For details about cluster deployment, see Single-Server Cluster Deployment and Multi-Server Cluster Deployment.

4. Data type conversion

When you import DolphinDB data using the MySQL plugin, type conversions are performed. The specific conversion rules are as follows:

Description:

(1) DolphinDB integers (SHORT, INT, LONG) are all signed. To prevent overflow, unsigned types in MySQL are converted to higher-order signed types. For example, unsigned tinyint is converted to short, unsigned Smallint is converted to short, and so on. Currently, the MySQL plug-in does not support 64-bit unsigned type conversions.

(2) in DolphinDB, integer, the minimum value is NULL, the CHAR type – 128, SHORT – 32768 type, type INT – 2147483648, LONG – 9223372036854775808 are NULL.

(3) Bigint unsigned types in MySQL are converted to DolphinDB LONG types by default. If an overflow occurs, the user is required to use the schema parameter, specifying the type as DOUBLE or FLOAT.

The char and varchar types in MySQL are converted to DolphinDB symbols if the length is less than or equal to 10, and to DolphinDB strings if the length is greater than 10. The SYMBOL type is stored as an integer inside DolphinDB, so sorting and comparing data are more efficient and storage space is saved. But mapping strings to integers takes time, and the mapping table takes up memory. The user can decide whether to use SYMBOL type for a column based on the following rule: Use SYMBOL type if the value of the field is heavily duplicated. For example, stock symbols, exchange and contract codes in financial data and device numbers in Internet of Things data are typical scenarios using SYMBOL type.

5. Performance test

We tested performance on a normal PC (16GB of RAM, 4 cores, 8 threads, using SSD). The data set used is the daily quotation data of the U.S. stock market from 1990 to 2016, with a data volume of 6.5g, including 22 fields, 50,591,907 lines of records, and a disk occupancy of 7.2G in the MySQL database. The loadEx function was used to import data from MySQL to the DolphinDB partition database in 160.5 seconds, with a reading speed of 41.4M/s. The disk usage in the DolphinDB database was 1.3 GB. On the same PC, it takes 660 seconds to import 1 million pieces of data each time because the MySQL memory is insufficient when ODBC is used to import data at one time. The same data was imported into Clickhouse in 171.9 seconds with a reading speed of 37.8M/s. DolphinDB is more convenient than ClickHouse for processing time series data and managing partitions.

DolphinDB is a good choice for DolphinDB if you want to ensure performance while friendly support for various processing of sequential data and distributed databases.