Getting Started with Spark on OS X

Installation and first steps

September 12, 2015

Apache Spark Photo by Luis Llerena

With impressive performance results and intuitive support for streaming data, Apache Spark is one of the hottest discussion topics across the big data community and start-up lofts around the globe. Whether Spark will end up replacing Hadoop or whether the two will continue to coexist is up for debate. But this much is for certain: it is definitely worth having a good look at. Especially from a developer’s point of view, Spark is quite a tease as it comes with an invaluable practical feature: interactive Python and Scala shells!

While Spark obviously thrives in large-scale cluster deployments, possibly on top of Apache Mesos, a local installation is a cheap and easy way to explore all key features of the Spark framework. Here’s how to get started with Spark on OS X - it just takes a few minutes.

Installation

The best way to install the latest version of Apache Spark on OS X and to keep it up to date is via Homebrew.

brew install apache-spark

The above command installs the latest version of Apache Spark on your Mac. By the time I wrote this post, this was version 1.5. If you don’t have Java installed on your system, the installation will abort and print instructions how to install the latest Oracle JDK.

After the installation has completed, you’ll find your Spark installation in /usr/local/Cellar/apache-spark/1.5.0. All relevant paths were added automatically to your environment by Homebrew.

Next, you can change Spark’s log-level to something a little less verbose. First, copy the log4j template file;

cd /usr/local/Cellar/apache-spark/1.5.0/libexec/conf
cp log4j.properties.template log4j.properties

Open the copied log4j.properties file, find the line starting with log4j.rootCategory and change the log-level from INFO to ERROR:

log4j.rootCategory=ERROR, console

That’s pretty much it. Now you can fire up Spark’s interactive Scala or Python shells.

Scala Shell

spark-shell
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.

scala>

Python Shell

pyspark
Python 2.7.10 (default, Jul 14 2015, 19:46:27)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.5.0
      /_/

Using Python version 2.7.10 (default, Jul 14 2015 19:46:27)
SparkContext available as sc, HiveContext available as sqlContext.

Next Steps

If you are new to Spark, I’d suggest that you work your way through the Spark Quick Start Guide.

Comments

comments powered by Disqus