DolphinDB provides two ways to import MySQL data: the ODBC plug-in and the MySQL plug-in. We recommend using the MySQL plug-in to import MySQL data because it is faster than ODBC import, importing 6.5 GIGABytes of data is 4 times faster than ODBC, and using the MySQL plug-in requires no configuration, whereas ODBC requires configuration of the data source.
Before using the MySQL plug-in, refer to the DolphinDB installation guide to install DolphinDB.
1. Download the plug-in
DolphinDB installation directory /server/plugins/mysql already contains mysql plug-ins that can be used directly. If you need to compile your own, you can refer to github.com/dolphindb/D… .
2. Load the plug-in
In the GUI, use the loadPlugin function to load the MySQL plugin:
loadPlugin(server_dir+"/plugins/mysql/PluginMySQL.txt")
Copy the code
Interface functions
The DolphinDB MySQL plugin provides the following interface functions:
- connect
- showTables
- extractSchema
- load
- loadEx
We can call the plug-in’s interface functions in two ways:
(1) moduleName::apiFunction. For example, call the connect method of the MySQL plug-in.
mysql::connect(host, port, user, password, db)
Copy the code
(2) Use moduleName and call the interface function directly. After the use statement is executed once, the use function does not need to be executed again in subsequent calls to the interface function. Therefore, we generally recommend this call method.
use mysql
connect(host, port, user, password, db)
Copy the code
3.1 the connect
grammar
connect(host, port, user, password, db)
parameter
Host is the host name of the MySQL server.
Port is the port number of the MySQL server. The default value is 3306.
User is the user name in the MySQL server.
Password is the password corresponding to user.
Db is the name of the database in MySQL.
details
Create MySQL connection, return MySQL connection handle. We recommend that the Authentication Type for MySQL users be mysql_native_password.
example
Connect to the Employees database on the local MySQL server.
Conn = connect (" 127.0.0.1 ", 3306, the "root", "123456", "employees")Copy the code
3.2 showTables
grammar
showTables(connection)
parameter
Connection is the connection handle returned by the connect function.
details
Returns a DolphinDB table containing the names of all the tables in the MySQL database.
example
View the tables in the Employees database.
showTables(conn);
Tables_in_employees
current_dept_emp
departments
dept_emp
dept_emp_latest_date
dept_manager
employees
salaries
test_datatypes
titles
Copy the code
3.2 extractSchema
grammar
extractSchema(connection, tableName)
parameter
Connection is the connection handle returned by the connect function.
TableName is the name of the table in the MySQL database.
details
The result is a DolphinDB table. The first column is the names of fields in the MySQL table, the second column is the data types imported to DolphinDB, and the third column is the data types in MySQL.
example
View the data types for each column in the Employees table.
extractSchema(conn,`employees);
name type MySQL describe type
emp_no LONG int(11)
birth_date DATE date
first_name STRING varchar(14)
last_name STRING varchar(16)
gender SYMBOL enum('M','F')
hire_date DATE date
Copy the code
3.3 the load
grammar
load(connection, table|query, [schema], [startRow], [rowNum])
parameter
Connection is the connection handle returned by the connect function.
Table is the name of the table in the MySQL server.
Query is a query statement in MySQL.
Schema is a DolphinDB type table that contains two columns, the first for field names and the second for data types. It is optional. You can specify this parameter to change the data type of DolphinDB when it is loaded.
StartRow is a positive integer that indicates the starting number of rows to read data. It is optional and defaults to 0, indicating that data is read from the first record.
RowNum is a positive integer that represents the number of rows read. It is optional and, if not specified, reads all data. If the second argument is query, the startRow and rowNum arguments are invalid.
details
Load MySQL data into DolphinDB memory tables.
example
-
Load all the data in the employees table into the DolphinDB memory table.
t=load(conn,”employees”);
Emp_no birth_date first_name last_name Gender hire_date 10,001 1953.09.02 Georgi Facello M 1986.06.26 10,002 1964.06.02 Bezalel Simmel F 1985.11.21 10 003 1959.12.03 Parto Bamford M 1986.08.28 10 004 1954.05.01 Chirstian Koblick M 1986.12.01 10 005 1955.01.21 Kyoichi Maliniak M 1989.09.12 10 006 1953.04.20 Anneke Preusig F 1989.06.02 10 007 1957.05.23 Tzvetan Zielinski F 1989.02.10 10 008 1958.02.19 Saniya Kalloufi M 1994.09.15 10 009 1952.04.19 Sumant Peac F 1985.02.18 10 010 1963.06.01 Duangkaew Piveteau F 1989.08.24…
2. Load the first 10 rows from the employees table into the DolphinDB memory table.
t=load(conn,"select * from employees limit 10");
emp_no birth_date first_name last_name gender hire_date
10,001 1953.09.02 Georgi Facello M 1986.06.26
10,002 1964.06.02 Bezalel Simmel F 1985.11.21
10,003 1959.12.03 Parto Bamford M 1986.08.28
10,004 1954.05.01 Chirstian Koblick M 1986.12.01
10,005 1955.01.21 Kyoichi Maliniak M 1989.09.12
10,006 1953.04.20 Anneke Preusig F 1989.06.02
10,007 1957.05.23 Tzvetan Zielinski F 1989.02.10
10,008 1958.02.19 Saniya Kalloufi M 1994.09.15
10,009 1952.04.19 Sumant Peac F 1985.02.18
10,010 1963.06.01 Duangkaew Piveteau F 1989.08.24
Copy the code
Select last_name from last_name and change it to SYMBOL.
schema=select name,type from extractSchema(conn,`employees) update schema set type="SYMBOL" where name="last_name" T =load(conn,"employees",schema) chunkPath-> partitionColumnIndex->-1 colDefs-> name typeString typeInt ---------- ---------- ------- emp_no LONG 5 birth_date DATE 6 first_name STRING 18 last_name SYMBOL 18 gender SYMBOL 17 hire_date DATE 6Copy the code
3.4 loadEx
grammar
loadEx(connection, dbHandle, tableName, partitionColumns, table|query, [schema], [startRow], [rowNum])
parameter
Connection is the connection handle returned by the connect function.
A dbHandle is a DolphinDB database handle, usually an object returned by the Database function.
TableName is the name of a table in the DolphinDB database.
PartitionColumns are string scalars or vectors that represent partitioned columns.
Table is a string that represents the name of a table in the MySQL server.
Query is a query statement in MySQL.
Schema is a DolphinDB type table that contains two columns, the first for field names and the second for data types. It is optional. You can specify this parameter to change the data type of DolphinDB when it is loaded.
StartRow is a positive integer that indicates the starting number of rows to read data. It is optional and defaults to 0, indicating that data is read from the first record.
RowNum is a positive integer that represents the number of rows read. It is optional and, if not specified, reads all data. If the second argument is query, the startRow and rowNum arguments are invalid.
details
Load data from MySQL into DolphinDB’s partition table. LoadEx does not support loading data into DolphinDB’s sequential partition table.
example
Load the employees table into the DolphinDB disk VALUE partition table.
db=database("H:/DolphinDB/Data/mysql",VALUE,`F`M) pt=loadEx(conn,db,"pt","gender","employees") select count(*) from loadTable(db,"pt"); The count of 300024Copy the code
If you need to load data into a memory partition table, simply change the database path to an empty string. If you need to load data into distributed tables, change the database path to a path starting with “DFS ://”, for example, “DFS ://mysql”. Distributed tables need to be in a cluster to be used. For details about cluster deployment, see Single-Server Cluster Deployment and Multi-Server Cluster Deployment.
4. Data type conversion
When you import DolphinDB data using the MySQL plugin, type conversions are performed. The specific conversion rules are as follows:
Description:
(1) DolphinDB integers (SHORT, INT, LONG) are all signed. To prevent overflow, unsigned types in MySQL are converted to higher-order signed types. For example, unsigned tinyint is converted to short, unsigned Smallint is converted to short, and so on. Currently, the MySQL plug-in does not support 64-bit unsigned type conversions.
(2) in DolphinDB, integer, the minimum value is NULL, the CHAR type – 128, SHORT – 32768 type, type INT – 2147483648, LONG – 9223372036854775808 are NULL.
(3) Bigint unsigned types in MySQL are converted to DolphinDB LONG types by default. If an overflow occurs, the user is required to use the schema parameter, specifying the type as DOUBLE or FLOAT.
The char and varchar types in MySQL are converted to DolphinDB symbols if the length is less than or equal to 10, and to DolphinDB strings if the length is greater than 10. The SYMBOL type is stored as an integer inside DolphinDB, so sorting and comparing data are more efficient and storage space is saved. But mapping strings to integers takes time, and the mapping table takes up memory. The user can decide whether to use SYMBOL type for a column based on the following rule: Use SYMBOL type if the value of the field is heavily duplicated. For example, stock symbols, exchange and contract codes in financial data and device numbers in Internet of Things data are typical scenarios using SYMBOL type.
5. Performance test
We tested performance on a normal PC (16GB of RAM, 4 cores, 8 threads, using SSD). The data set used is the daily quotation data of the U.S. stock market from 1990 to 2016, with a data volume of 6.5g, including 22 fields, 50,591,907 lines of records, and a disk occupancy of 7.2G in the MySQL database. The loadEx function was used to import data from MySQL to the DolphinDB partition database in 160.5 seconds, with a reading speed of 41.4M/s. The disk usage in the DolphinDB database was 1.3 GB. On the same PC, it takes 660 seconds to import 1 million pieces of data each time because the MySQL memory is insufficient when ODBC is used to import data at one time. The same data was imported into Clickhouse in 171.9 seconds with a reading speed of 37.8M/s. DolphinDB is more convenient than ClickHouse for processing time series data and managing partitions.
DolphinDB is a good choice for DolphinDB if you want to ensure performance while friendly support for various processing of sequential data and distributed databases.