This article is posted in Nuggets by WX shin-Devops

The configuration process

  1. The installationpyspark
  2. configurationmysql-connector.jar
  3. Create a connection
  4. Read the data

Install PySpark

Create a new project locally and execute PIP Install PySpark ==3.0 to install PySpark.

MySQL – the Connector configuration

download

Go to https://dev.mysql.com/downloads/connector/j/ to download the corresponding version of the Platform Independent package:

Connector/J version JDBC version MySQL Server version JRE Required JDK Required for Compilation Status
5.1 3.0, 4.0, 4.1, 4.2 5.61, 5.71, 8.01 JRE 5 or higher1 JDK 5.0 AND JDK 8.0 or higher2, 3 General availability
8.0 4.2 5.6, 5.7, 8.0 JRE 8 or higher JDK 8.0 or higher2 General availability. recommended

Click to see the full version of the association

For example, mysql-connector-java-8.0.20.tar.gz is decompressed to obtain mysql-connector-java-8.0.19.jar

Move to the SPARK_HOME path

If you use other installation methods, run the echo $SPARK_HOME command on the local PC to view the Spark installation path.

Install PySpark directly by PIP install PySpark ==3.0, $SPARK_HOME is empty, The “copy mysql-connector.jar into the $SPARK_HOME/jars” folder mentioned in other configuration documents on the web cannot be executed.

java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver
Copy the code

This article is posted in Nuggets by WX shin-Devops

The solution is to find $SPARK_HOME using the _find_spark_HOME method in the PySpark code:

>>> from pyspark import find_spark_home
>>> print(find_spark_home._find_spark_home())

/home/ityoung/test- spark/venv/lib/python3.6 / site - packages/pysparkCopy the code

Then set $SPARK_HOME to that path and copy mysql-connector.jar into $SPARK_HOME/jars:

export SPARK_HOME=/home/ityoung/test- spark/venv/lib/python3.6 / site - packages/pyspark mv mysql connector - Java - 8.0.19. Jar$SPARK_HOME/jars
Copy the code

Spark code Example

Reference: zhuanlan.zhihu.com/p/136777424 reproduced indicate the source

main.py

# this article published on [Denver] (https://juejin.cn/user/3579665589502909), the author: strict north (wx: shin - the conversation), theft is prohibited
from pyspark import SparkContext
from pyspark.sql import SQLContext, Row
​
if __name__ == '__main__':
    # Spark initialization
    sc = SparkContext(master='local', appName='sql')
    spark = SQLContext(sc)
    Mysql > modify mysql
    prop = {'user': 'xxx'.'password': 'xxx'.'driver': 'com.mysql.cj.jdbc.Driver'}
    # database address (need to change)
    url = 'jdbc:mysql://host:port/database'
    
    # read table
    data = spark.read.jdbc(url=url, table='tb_test', properties=prop)
    Print the data type
    print(type(data))
    # Display data
    data.show()
    # Close the Spark session
    spark.stop()
Copy the code

Modify the configuration in the code and run to see the data output:

python main.py
Copy the code

This article is posted in Nuggets by WX shin-Devops