Click here to Skip to main content
15,885,985 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi,
I am trying to run few spark commands using SparkR (from local R-GUI). For setting up the spark cluster on EC2 I used most of the commands from ( https://edgarsdatalab.com/2016/08/25/setup-a-spark-2-0-cluster-r-on-aws/) with little modification to install the latest versions. All I was trying to do is to interact with remote spark(on EC2-Ubuntu) from my local R-GUI using SparkR package.

**Here is my setup (step by step):**

1. I have Windows 8.1 on my PC with R3.3.3 and SparkR package.
2. I created an AWS-EC2 instance (free tier account) and used existing Ubuntu image from Amazon.
3. Installed PuTTy on my local PC. Used PuTTy terminal to connect to Ubuntu-16 (on EC2) and used it for steps 4 to 10 below.
4. Installed Java and then spark-2.1.1-bin-hadoop2.7 on EC2
5. Added following to .bashrc (/home/ubuntu)

export SPARK_HOME=~/server/spark-2.1.1-bin-hadoop2.7

PATH=$PATH:$SPARK_HOME/bin

export PATH


6. Load modified file.

. .bashrc

7. Installed R on EC2-Ubuntu
8. I created another instance on EC2 (with Ubuntu) and followed steps 4 to 6(above) to set up spark worker node.
9. On the first EC2 instance (call it Master instance), I started spark master using start-master.sh. Got master's URL from web-ui for spark.
10. On the second EC2 instance (call it Slave instance), I started spark slave using start-slave.sh and passing spark master's URL.
11. Then launched R (GUI) on my local PC.
12. Ran following from R to connect and to execute commands in spark. (in the following xx.yy.zz.aa is the spark master's public ip address).

library(SparkR)

sparkR.session(master = "spark://xx.yy.zz.aa:7077", sparkHome = "/home/ubuntu/server/spark-2.1.1-bin-hadoop2.7", enableHiveSupport=FALSE)

ds <- createDataFrame(mtcars) ## R becomes unresponsive

13. when I killed the process from Spark web UI after waiting for long enough. I get following error (see screenshot):
[Screenshot]

Please help. what am I doing wrong? How can I fix this? All I want to do is to use remote spark from local PC using R interface(local PC).

Thanks,
SG

What I have tried:

- in sparkR.Session(), I tried passing public and private address of EC2 first instance(master).
- I also tried installing R on both EC2 instances. Even uninstalling R from both didn't work.
- Also, tried launching spark master and slave on same EC2-Ubuntu (the first EC2).
- Ran R inside EC2-Ubuntu instance that had both master and slave running on same EC2. Nothing worked
Posted
Updated 9-Jul-17 17:45pm
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900