precondition

  • Spark is installed.
  • Ancona is installed

In order to write Scala and Spark on the Jupyter Notebook, we need to install two Jupyter cores (kernel) : Jupyter-Spark and Jupyter-Scala. Then Jupyter-Scala, documentation order for my installation order.

start

Use Apache Toree to install the Scala kernel for the notebook

Step one, install Toree

Download toree,

pip install toree
Copy the code

Step 2: Install Jupyter-Scala and start Spark

Jupyter toree install --spark_opts='--master= Spark ://localhost:7077' --user --kernel_name=Spark2.3.2 - spark_home = / home/fonttian/spark - - bin - hadoop2.7 2.3.2Copy the code

– Master Spark address – spark_HOME Spark download directory – kernel_name Can be viewed using spark-shell

The third step, detection of jupyter core, detection of new projects

Step 4: Create the Scala project and run it



In Jupyter, you can run scala statements directly as scripts

You can also define an object and run it using the main function.

Areas of attention

If You use Jupyter-Spark to start Scala, Even if spark is not used, Jupyter will start Spark by default. If you just want to practice Scala, you are advised to use jupyter-Scala to create a new project. The following is how to install Jupyter-Scala

Installing the Scala Core

If you are not familiar with Scala, you may also need to install the Scala core on Jupyter (usually using IDEA).

Download jupyter scala – cli

Please go to https://oss.sonatype.org/content/repositories/snapshots/com/github/alexarchambault/jupyter/ to download jupyter scala – cli files

Here the blogger is using the latest version, 2.11.6

Add the core

First decompress the file, and then install the file according to the drawing

Detection of core

jupyter kernelspec list
Copy the code

Detect the newly added core

Creating a Scala project



If you want to create an object and run it, run objectName.main(Array()) as described above