“Das Schloss”, photo by Blair Connolly
Apache Kafka is a highly-scalable publish-subscribe messaging system that can serve as the data backbone in distributed applications. I use Kafka in my research platform to collect process runtime data of large MPI applications in realtime. With Kafka’s Producer-Consumer model it becomes easy to implement multiple data consumers that do live, in-flight application monitoring as well persistent data storage for later analysis. In this post I describe how to set up a single Kafka server on OS X and show a simple producer-consumer example with Python.
The best way to install the latest version of the Kafka server on OS X and to keep it up to date is via Homebrew.
$> brew install kafka
This installs a few other dependencies, including Zookeper which is required to run the server. Once everything has installed, you need to start Zookeeper before you can start Kafka.
$> zkServer start JMX enabled by default Using config: /usr/local/etc/zookeeper/zoo.cfg Starting zookeeper ... STARTED
Once Zookeeper is running you can start the Kafka server itself. For simplicity, we’ll run the server in the foreground:
$> kafka-server-start.sh /usr/local/etc/kafka/server.properties [...] [2015-10-09 20:48:22,485] INFO [Kafka Server 0], started (kafka.server.KafkaServer)
A Simple Producer Consumer Example
Now let’s write a simple Python producer that periodically writes a string and a timestamp to a topic. Topics in Kafka are simply message feed categories. Consumers only receive the messages for the topics they have subscribed to.
First you need to install the Kafka Python client:
$> pip install kafka-python
The following code is the producer implementation:
from kafka.client import KafkaClient from kafka.producer import SimpleProducer from time import sleep from datetime import datetime kafka = KafkaClient("localhost:9092") producer = SimpleProducer(kafka) while 1: # "kafkaesque" is the name of our topic producer.send_messages("kafkaesque", "Metamorphosis! " + str(datetime.now().time()) ) sleep(1)
Before your run the above script, start a Kafka Console Consumer that listens to the “kafkatest” topic in a separate shell:
$> kafka-console-consumer.sh --zookeeper localhost --topic kafkaesque
Now you can run the Python script. In the Console Consumer window you should start to see the messages of the kafkaesque topic:
Metamorphosis! 21:03:21.991262 Metamorphosis! 21:03:22.993003 Metamorphosis! 21:03:23.999115 [...]
- Kafka Python API - http://kafka-python.readthedocs.org/en/latest/index.html