PySpark example of connecting to MySQL

This article is posted in Nuggets by WX shin-Devops

The configuration process

The installationpyspark
configurationmysql-connector.jar
Create a connection
Read the data

Install PySpark

Create a new project locally and execute PIP Install PySpark ==3.0 to install PySpark.

MySQL – the Connector configuration

download

Go to https://dev.mysql.com/downloads/connector/j/ to download the corresponding version of the Platform Independent package:

Connector/J version	JDBC version	MySQL Server version	JRE Required	JDK Required for Compilation	Status
5.1	3.0, 4.0, 4.1, 4.2	5.61, 5.71, 8.01	JRE 5 or higher1	JDK 5.0 AND JDK 8.0 or higher2, 3	General availability
8.0	4.2	5.6, 5.7, 8.0	JRE 8 or higher	JDK 8.0 or higher2	General availability. recommended

Click to see the full version of the association

For example, mysql-connector-java-8.0.20.tar.gz is decompressed to obtain mysql-connector-java-8.0.19.jar

Move to the SPARK_HOME path

If you use other installation methods, run the echo $SPARK_HOME command on the local PC to view the Spark installation path.

Install PySpark directly by PIP install PySpark ==3.0, $SPARK_HOME is empty, The “copy mysql-connector.jar into the $SPARK_HOME/jars” folder mentioned in other configuration documents on the web cannot be executed.

java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver
Copy the code

This article is posted in Nuggets by WX shin-Devops

The solution is to find $SPARK_HOME using the _find_spark_HOME method in the PySpark code:

>>> from pyspark import find_spark_home
>>> print(find_spark_home._find_spark_home())

/home/ityoung/test- spark/venv/lib/python3.6 / site - packages/pysparkCopy the code

Then set $SPARK_HOME to that path and copy mysql-connector.jar into $SPARK_HOME/jars:

export SPARK_HOME=/home/ityoung/test- spark/venv/lib/python3.6 / site - packages/pyspark mv mysql connector - Java - 8.0.19. Jar$SPARK_HOME/jars
Copy the code

Spark code Example

Reference: zhuanlan.zhihu.com/p/136777424 reproduced indicate the source

main.py

# this article published on [Denver] (https://juejin.cn/user/3579665589502909), the author: strict north (wx: shin - the conversation), theft is prohibited
from pyspark import SparkContext
from pyspark.sql import SQLContext, Row

if __name__ == '__main__':
    # Spark initialization
    sc = SparkContext(master='local', appName='sql')
    spark = SQLContext(sc)
    Mysql > modify mysql
    prop = {'user': 'xxx'.'password': 'xxx'.'driver': 'com.mysql.cj.jdbc.Driver'}
    # database address (need to change)
    url = 'jdbc:mysql://host:port/database'
    
    # read table
    data = spark.read.jdbc(url=url, table='tb_test', properties=prop)
    Print the data type
    print(type(data))
    # Display data
    data.show()
    # Close the Spark session
    spark.stop()
Copy the code

Modify the configuration in the code and run to see the data output:

python main.py
Copy the code

This article is posted in Nuggets by WX shin-Devops

PySpark example of connecting to MySQL

The configuration process

Install PySpark

MySQL – the Connector configuration

download

Move to the SPARK_HOME path

Spark code Example

Related Posts

The best Python virtual environment to use, no one! ! !

Java optimistic locking and pessimistic locking and CAS implementation and JVM memory allocation

Foundation pit! The new Mac Big Sur blew up my Nacos