Spark cluster

How to setup Spark single-node Cluster for Linux (Ubuntu 14.04 LTS)? After 2-node Hadoop Cluster with pc and virtualbox, I felt that reading/writing from intermediate files was something that I might avoid with Spark. In order to setup, I followed this:

  1. Download latest version of Spark from here. I selected 1.6.0 release, Pre-built for Hadoop 2.6 and later, and did a direct download.
  2. Navigate to the downloaded folder and do: tar -xvf spark-1.6.0.tgz
  3. Navigate to the extracted folder and do: ./bin/spark-shell

I selected writing in Scala. But you can do so in Python and Java. I had a version issue with sbt. I also had a problem connecting to the master (I used a cluster with a master and one worker), where I solved like:

cd spark-1.6.0-bin-hadoop2.6/conf
pico spark-env.sh.template
// and I appended
SPARK_MASTER_IP=<your host IP> // mine is 192.168.1.2
source spark-env.sh.template

(found my IP like this)
Then, you will be ready to run your application (oh wait there big guy, read the quick-start guide first), like:

./sbin/start-master.sh
// now open a browser and go to http://localhost:8080/
./sbin/start-slave.sh spark://gsamaras:7077
sbt package
bin/spark-submit --class "KMeans" --master spark://gsamaras:7077 target/scala-2.10/kmeans-project_2.10-1.0.jar

Have questions? Comments? Did you find a bug? Let me know!😀
Page created by G. (George) Samaras (DIT)