Click here to Skip to main content
15,867,453 members
Articles / Spark

Spark on Windows

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
4 Feb 2017CPOL2 min read 15.1K   3  
This article helps to setup Apache Spark on Windows in easy steps.

Introduction

Apache Spark is designed to run on Linux production environments. However to learn Spark programming we can use Windows machine. In this article I'll explain how we can setup Spark using simple steps and also will run our Hello World Spark program.

Background

Apache Spark is fast and general purpose cluster computing platform.  Spark extends the popular MapReduce model to efficiently support more types of computations, including interactive queries and stream processing. You can find more info from - http://spark.apache.org/  and https://en.wikipedia.org/wiki/Apache_Spark

Softwares required

Apache Spark is built using Scala and runs on JVM. The latest Spark release which is 2.0.2 runs on Java 1.7

Step-1

So first we need to setup Java 1.7 if it's not already. You can download it from http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html#jre-7u76-oth-JPR

Either you can use Installer or binaries. Once Java setup is over then open you command prompt and check the Java version using the command "java -version". It'll display as below

Image 1

 

Step-2

Spark depends on winutils.exe which is usually installed along with Hadoop. As we are not going to deploy Hadoop, we need to download this program and set it up and envirnment variable.

Download winutils.exe from http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe

Create a folder called hadoop/bin wherever you want. I chose c:\backup\hadoop\bin

Create a environment variable called HADOOP_HOME with path c:\backup\hadoop

Image 2

Step-3

Now download Apache spark from  http://spark.apache.org/downloads.html

Unzip it to your preferred location and it looks like this

Image 3

Update the Path" environment variable with spark bin location - in my case it's C:\backup\spark-2.0.2-bin-hadoop2.7\bin

Image 4

Test Spark

Spark comes with interactive shell to execute spark APIs. The available shells are

Spark-Shell --> Works with Scala APIs

PySpark --> Works with Python APIs

Open your command prompt and type spark-shell and press enter. You should see Spark shell if all the configurations are set correctly.

 

Image 5

Congrates! You have successfully setup Spark on Windows. Now let's try Hadoop hellow world program which is simple word count program :). If you know how to write it using Java MapReduce or Hive SQL or Pig script then you'll really appreciate Spark where we can achieve same using few simple APIs.

A. Make sure that you have sample text file from where you want to count words. Assume it's in c:\temp\test.txt

B. Let's write spark program for hello world

C++
scala> val file = sc.textFile("c:\\temp\\test.txt")  --> Press Enter
C++
scala> val words = file.flatMap(line=>line.split(" ").map(word=>(word,1)).reduceByKey(_+_) -> Press Enter
C++
scala> words.collect -> Press Enter

 

Image 6

 

History

 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect CA Technologies
India India
I'm an enthusiastic software developer with experience in Open Source and MS Technologies. I'm passionate about BigData technologies particularly HDFS, YARN, SPark and Kafka.I love learning new things and sharing knowledge. I like to travel, listen music, watch thriller movies, play chess etc.

Comments and Discussions

 
-- There are no messages in this forum --