Getting Started with Apache Kafka on OS X

Installation and a Simple Producer-Comsumer Example with Python

October 10, 2015

Cassandra “Das Schloss”, photo by Blair Connolly

Apache Kafka is a highly-scalable publish-subscribe messaging system that can serve as the data backbone in distributed applications. I use Kafka in my research platform to collect process runtime data of large MPI applications in realtime. With Kafka’s Producer-Consumer model it becomes easy to implement multiple data consumers that do live, in-flight application monitoring as well persistent data storage for later analysis. In this post I describe how to set up a single Kafka server on OS X and show a simple producer-consumer example with Python.

Installation

The best way to install the latest version of the Kafka server on OS X and to keep it up to date is via Homebrew.

$> brew install kafka

This installs a few other dependencies, including Zookeper which is required to run the server. Once everything has installed, you need to start Zookeeper before you can start Kafka.

$> zkServer start

JMX enabled by default
Using config: /usr/local/etc/zookeeper/zoo.cfg
Starting zookeeper ... STARTED

Once Zookeeper is running you can start the Kafka server itself. For simplicity, we’ll run the server in the foreground:

$> kafka-server-start.sh /usr/local/etc/kafka/server.properties

[...]
[2015-10-09 20:48:22,485] INFO [Kafka Server 0], started (kafka.server.KafkaServer)

A Simple Producer Consumer Example

Now let’s write a simple Python producer that periodically writes a string and a timestamp to a topic. Topics in Kafka are simply message feed categories. Consumers only receive the messages for the topics they have subscribed to.

First you need to install the Kafka Python client:

$> pip install kafka-python

The following code is the producer implementation:

from kafka.client import KafkaClient
from kafka.producer import SimpleProducer
from time import sleep
from datetime import datetime

kafka = KafkaClient("localhost:9092")

producer = SimpleProducer(kafka)

while 1:
  # "kafkaesque" is the name of our topic
  producer.send_messages("kafkaesque", "Metamorphosis! " + str(datetime.now().time()) )
  sleep(1)

Before your run the above script, start a Kafka Console Consumer that listens to the “kafkatest” topic in a separate shell:

$> kafka-console-consumer.sh --zookeeper localhost --topic kafkaesque

Now you can run the Python script. In the Console Consumer window you should start to see the messages of the kafkaesque topic:

Metamorphosis! 21:03:21.991262
Metamorphosis! 21:03:22.993003
Metamorphosis! 21:03:23.999115
[...]

Further Reading

Comments

comments powered by Disqus