Photo by Luis Llerena
With impressive performance results and intuitive support for streaming data, Apache Spark is one of the hottest discussion topics across the big data community and start-up lofts around the globe. Whether Spark will end up replacing Hadoop or whether the two will continue to coexist is up for debate. But this much is for certain: it is definitely worth having a good look at. Especially from a developer’s point of view, Spark is quite a tease as it comes with an invaluable practical feature: interactive Python and Scala shells!
While Spark obviously thrives in large-scale cluster deployments, possibly on top of Apache Mesos, a local installation is a cheap and easy way to explore all key features of the Spark framework. Here’s how to get started with Spark on OS X - it just takes a few minutes.
The best way to install the latest version of Apache Spark on OS X and to keep it up to date is via Homebrew.
brew install apache-spark
The above command installs the latest version of Apache Spark on your Mac. By the time I wrote this post, this was version 1.5. If you don’t have Java installed on your system, the installation will abort and print instructions how to install the latest Oracle JDK.
After the installation has completed, you’ll find your Spark
/usr/local/Cellar/apache-spark/1.5.0. All relevant
paths were added automatically to your environment by Homebrew.
Next, you can change Spark’s log-level to something a little less verbose. First, copy the log4j template file;
cd /usr/local/Cellar/apache-spark/1.5.0/libexec/conf cp log4j.properties.template log4j.properties
Open the copied
log4j.properties file, find the line starting with
log4j.rootCategory and change the log-level from
That’s pretty much it. Now you can fire up Spark’s interactive Scala or Python shells.
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc. SQL context available as sqlContext. scala>
Python 2.7.10 (default, Jul 14 2015, 19:46:27) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin Type "help", "copyright", "credits" or "license" for more information. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 1.5.0 /_/ Using Python version 2.7.10 (default, Jul 14 2015 19:46:27) SparkContext available as sc, HiveContext available as sqlContext.
If you are new to Spark, I’d suggest that you work your way through the Spark Quick Start Guide.