Hero image for Install Apache Spark on WSL
アパッチー・スパーク

Install Apache Spark on WSL

Table of Contents

Requirements

  • 7zip or any other similar software;
  • Apache Spark 3.0.1 (or latest version);
  • Windows Subsystem Linux [WSL] (I strongly recommend using version 1, because WSL2 is incompatible with Spark);
  • Latest Java Development Kit (JDK) version;
  • Maven.

Installation

JDK

Install JDK following my other guide’s section under Linux, here.

Maven

Install maven with your OS’ package manger, I have Ubuntu so I use sudo apt install maven.

WSL

Install WSL1 following these steps.

Apache Spark

Download Apache-Spark here and choose your desired version.

image

image

Extract the folder whenever you want, I suggest placing into a WSL folder or a Windows folder ** CONTAINING NO SPACES INTO THE PATH** (see the image below).

image

image

Setup Apache Spark

Setup environment

Open WSL either by Start>wsl.exe or using your desired terminal.

Type:

export SPARK_LOCAL_IP=127.0.0.1

export SPARK_MASTER_HOST=127.0.0.1

These commands are valid only for the current session, meaning as soon as you close the terminal they will be discarded; in order to save these commands for every WSL session you need to append them to ~/.bashrc.

image

Setup config Apache Spark

Create a tmp log folder for Spark with mkdir -p /tmp/spark-events.

Navigate to the Apache Spark’s folder with cd /path/to/spark/folder.

image

Edit conf/spark-defaults.conf file with the following configuration:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
 spark.master                     spark://127.0.0.1:7077
 spark.eventLog.enabled           true
 spark.eventLog.dir               /tmp/spark-events
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

Run Apache Spark

Start Master

Start the master from your Spark’s folder with ./sbin/start-master.sh.

image

Check if there were no errors by opening your browser and going to https://localhost:8080, you should see this:

image

Start Slave

Start the slave from your Spark’s folder with ./sbin/start-slave.sh.

image

Check if there were no errors by opening your browser and going to https://localhost:8080, you should see this:

image

Submit

Build artifact for an Apache Spark prject using mvn package.

Submit an executor using ./bin/spark-submit --class [classhpath] [artifact-path] spark://127.0.0.1:7077, replace classpath and artifact

image

If everything is set correctly you will see under Running Applications your class running:

image