Java

Install Java8, set JAVA_HOME, and add %JAVA_HOME%\bin to the environment variable PATH

E:\java -version
java version"1.8.0 comes with_60"
Java(TM) SE Runtime Environment (build1.8.0 comes with_60-b27)
Java HotSpot(TM) 64 -Bit Server VM (build25.60 -b23.mixed mode)
Copy the code

Scala

Download and unzip Scala 2.11, set SCALA_HOME, and add %SCALA_HOME%\bin to PATH

E:\ scala -verion
Scala code runner version2.11.7 -Copyright2002-2013.LAMP/EPFL
Copy the code

Spark

Download and decompress Spark 2.1, set SPARK_HOME, and add %SPARK_HOME%\bin to PATH. When you try to run spark-shell on the console, the following error message is displayed indicating that winutils.exe cannot be located.

E:\>spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR.use setLogLevel(newLevel).
17/06/05 21:34:43 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java: 379).at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java: 394).at org.apache.hadoop.util.Shell. <clinit> (Shell.java: 387).at org.apache.hadoop.hive.conf.HiveConf$ConfVars.findHadoopBinary(HiveConf.java: 2327).at org.apache.hadoop.hive.conf.HiveConf$ConfVars. <clinit> (HiveConf.java: 365).at org.apache.hadoop.hive.conf.HiveConf. <clinit> (HiveConf.java: 105).at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java: 348).at org.apache.spark.util.Utils$.classForName(Utils.scala: 229).at org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala: 991).at org.apache.spark.repl.Main$.createSparkSession(Main.scala: 92).at $line3. $read?iw?iw. <init> (<console> : 15)at $line3. $read?iw. <init> (<console> : 42)at $line3. $read. <init> (<console> : 44)at $line3. $read$."init> (<console> : 48)at $line3. $read$."clinit> (<console>)
        at $line3. $eval$$print$lzycompute(<console> : 7)at $line3. $eval$$print(<console> : 6)at $line3. $eval. $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 62).at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java43) :at java.lang.reflect.Method.invoke(Method.java: 497).at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala: 786).at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala: 1047).at scala.tools.nsc.interpreter.IMain$WrappedRequest?anonfun$loadAndRunReq$1.apply(IMain.scala: 638).at scala.tools.nsc.interpreter.IMain$WrappedRequest?anonfun$loadAndRunReq$1.apply(IMain.scala: 637).at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala31) :at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala19) :at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala: 637).at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala: 569).at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala: 565).at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala: 807).at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala: 681).at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala: 395).at org.apache.spark.repl.SparkILoop?anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala: 38)at org.apache.spark.repl.SparkILoop?anonfun$initializeSpark$1.apply(SparkILoop.scala37) :at org.apache.spark.repl.SparkILoop?anonfun$initializeSpark$1.apply(SparkILoop.scala37) :at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala: 214).at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala37) :at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala: 105).at scala.tools.nsc.interpreter.ILoop?anonfun$process$1.apply$mcZ$sp(ILoop.scala: 920).at scala.tools.nsc.interpreter.ILoop?anonfun$process$1.apply(ILoop.scala: 909).at scala.tools.nsc.interpreter.ILoop?anonfun$process$1.apply(ILoop.scala: 909).at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala: 97).at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala: 909).at org.apache.spark.repl.Main$.doMain(Main.scala: 69).at org.apache.spark.repl.Main$.main(Main.scala52) :at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 62).at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java43) :at java.lang.reflect.Method.invoke(Method.java: 497).at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit?runMain(SparkSubmit.scala: 743).at org.apache.spark.deploy.SparkSubmit$.doRunMain$1 (SparkSubmit.scala: 187).at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala: 212).at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala: 126).at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Copy the code

As you can see from the error message, Spark needs to use some of the Hadoop libraries (through the HADOOP_HOME environment variable, null\bin\winutils.exe is present because we haven’t set it before). This does not mean that we have to install Hadoop. We can download winutils. Exe directly to any location on disk, such as C:\winutils\bin\winutils.

Now we run spark-shell again and there is a new error:

java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession?reflect(SparkSession.scala: 981).at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala: 110).at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala: 109).at org.apache.spark.sql.SparkSession$Builder?anonfun$getOrCreate$5.apply(SparkSession.scala: 878).at org.apache.spark.sql.SparkSession$Builder?anonfun$getOrCreate$5.apply(SparkSession.scala: 878).at scala.collection.mutable.HashMap?anonfun$foreach$1.apply(HashMap.scala: 99).at scala.collection.mutable.HashMap?anonfun$foreach$1.apply(HashMap.scala: 99).at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala: 230).at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala40) :at scala.collection.mutable.HashMap.foreach(HashMap.scala: 99).at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala: 878).at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
  ... 47 elided
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java: 62).at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java45) :at java.lang.reflect.Constructor.newInstance(Constructor.java: 422).at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession?reflect(SparkSession.scala:978)
  ... 58 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
  at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState?reflect(SharedState.scala: 169).at org.apache.spark.sql.internal.SharedState. <init> (SharedState.scala: 86).at org.apache.spark.sql.SparkSession?anonfun$sharedState$1.apply(SparkSession.scala: 101).at org.apache.spark.sql.SparkSession?anonfun$sharedState$1.apply(SparkSession.scala: 101).at scala.Option.getOrElse(Option.scala: 121).at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala: 101).at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala: 100).at org.apache.spark.sql.internal.SessionState. <init> (SessionState.scala: 157).at org.apache.spark.sql.hive.HiveSessionState. <init> (HiveSessionState.scala:32)
  ... 63 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: -- -- -- -- -- -- -- -- --at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java: 62).at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java45) :at java.lang.reflect.Constructor.newInstance(Constructor.java: 422).at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState?reflect(SharedState.scala:166)
  ... 71 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: -- -- -- -- -- -- -- -- --at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java: 62).at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java45) :at java.lang.reflect.Constructor.newInstance(Constructor.java: 422).at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala: 264).at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala: 358).at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala: 262).at org.apache.spark.sql.hive.HiveExternalCatalog. <init> (HiveExternalCatalog.scala:66)
  ... 76 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: -- -- -- -- -- -- -- -- --at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java: 522).at org.apache.spark.sql.hive.client.HiveClientImpl. <init> (HiveClientImpl.scala:188)
  ... 84 more
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: -- -- -- -- -- -- -- -- --at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java: 612).at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java: 554).at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
  ... 85 more
<console> : 14:error: not found: value spark
       import spark.implicits._
              ^
<console> : 14:error: not found: value spark
       import spark.sql
              ^
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \ /_/ /_` /__/  '_/
   /___/.__/ \_._/_/ /_/ \_\   version 2.1.1
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64 -Bit Server VM.Java1.8.0 comes with_60)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Copy the code

The directory/TMP /hive does not have write permission:

The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
Copy the code

Therefore, we need to update the permission of E:/ TMP /hive (I ran the spark-shell command on drive E, if running on another drive, change it to the corresponding drive letter +/ TMP /hive). Run the following command:

E:\>C: \winutils\bin\winutils.exe chmod 777 E: \tmp\hive
Copy the code

Run spark-shell again. Spark starts successfully. You can access the Spark UI at http://localhost:4040