0 Related to the source code
1 Spark environment installation
◆ Spark is written in Scala and provides multiple language interfaces, requiring a JVM
◆ The official version of Spark is provided for us, so manual compilation is unnecessary
◆ Spark is easy to install and configure, and the Hadoop environment is not required
-
download
-
Unpack the
The tar ZXVF spark - against 2.4.1 - bin - hadoop2.7. TGZCopy the code
2 the Spark configuration
Before the configuration, try to read the official documents, avoid directly looking for configuration tutorials on the Internet
◆ To set the use of memory for the node, otherwise it may lead to low node utilization;
◆ Set the IP address and port number of Spark to avoid UnknownHostException
Website configuration
-
Apply the Default configuration
-
The configuration file
-
Copy two templates and enable self-configuration
Single-machine Environment Configuration
- The local IP
Shell authentication
bin/spark-shell
Copy the code
3 Spark shell
◆ Spark shell is a bash script in the./bin directory
◆ Spark Shell configures the context and session for us.
-
The context instance
-
The session instance
-
UI
4 practical Wordcount
4.1 introduction of Wordcount
◆ Wordcount word frequency statistics, is the most basic task in big data analysis.
First, extract all the words in the file, and then merge the same words
- Implementation diagram
Project structures,
-
Add the Spark JAR package
-
Select jar package, first left-click select the first, then shift, then left-click the last to select all.
-
The new class
-
The test file
`pwd`/`ls |grep L`
Copy the code
-
Write a function
-
The successful running
-
packaging
-
Remove these extra JAR packages
-
build
-
Put the JAR package into the spark/bin directory and run it using spark-submit
Spark Machine learning Practice series
- Spark based Machine learning Practices (PART 1) – Introduction to machine learning
- Spark based Machine learning practices (II) – Introduction to MLlib
- Spark based machine learning practice (III) – Actual environment construction