Replicate data from a Kafka cluster to Event Hubs using Apache Kafka Mirror Maker 2
This tutorial shows how to replicate data from an existing Kafka cluster to Azure Event Hubs using Mirror Maker 2.
Note
This sample is available on GitHub
In this tutorial, you learn how to:
- Create an Event Hubs namespace
- Set up or use an existing Kafka cluster
- Configure Kafka Mirror Maker 2
- Run Kafka Mirror Maker 2
Introduction
Apache Kafka MirrorMaker 2.0 (MM2) is designed to make it easier to mirror or replicate topics from one Kafka cluster to another. Mirror Maker uses the Kafka Connect framework to simplify configuration and scaling. For more detailed information on Kafka MirrorMaker, see the Kafka Mirroring/MirrorMaker guide.
As Azure Event Hubs is compatible with Apache Kafka protocol, you can use Mirror Maker 2 to replicate data between an existing Kafka cluster and an Event Hubs namespace.
Mirror Maker 2 dynamically detects changes to topics and ensures source and target topic properties are synchronized, including offsets and partitions. It can be used to replicated data bi-directionally between Kafka cluster and Event Hubs namespace.
Prerequisites
To complete this tutorial, make sure you have:
- Read through the Event Hubs for Apache Kafka article.
- An Azure subscription. If you don't have one, create a trial subscription before you begin.
- Java Development Kit (JDK) 1.7+
- On Ubuntu, run
apt-get install default-jdk
to install the JDK. - Be sure to set the JAVA_HOME environment variable to point to the folder where the JDK is installed.
- On Ubuntu, run
- Download and install a Maven binary archive
- On Ubuntu, you can run
apt-get install maven
to install Maven.
- On Ubuntu, you can run
- Git
- On Ubuntu, you can run
sudo apt-get install git
to install Git.
- On Ubuntu, you can run
- Apache Kafka distribution
- Download the preferred Apache Kafka distribution (which should contain the Mirror Maker 2 distribution.)
Create an Event Hubs namespace
An Event Hubs namespace is required to send and receive from any Event Hubs service. See Creating an event hub for instructions to create a namespace and an event hub. Make sure to copy the Event Hubs connection string for later use.
Clone the example project
Now that you have an Event Hubs connection string, clone the Azure Event Hubs for Kafka repository and navigate to the mirror-maker-2
subfolder:
git clone https://github.com/Azure/azure-event-hubs-for-kafka.git
cd azure-event-hubs-for-kafka/tutorials/mirror-maker-2
Set up or use an existing Kafka cluster
If you don't have an existing Kafka cluster, use the Kafka quickstart guide to set up a Kafka cluster with the desired settings (or use an existing Kafka cluster). For testing purposes, you can also create a couple of topics in the newly created Kafka cluster and publish data to them.
If you already have an existing Kafka cluster on-premises or in a managed Kafka cloud service, then you can use it to replicate existing data to Event Hubs.
Configure Kafka Mirror Maker 2
Apache Kafka distribution comes with connect-mirror-maker.sh
script that is bundled with the Kafka library that implements a distributed Mirror Maker 2 cluster. It manages the Connect workers internally based on a configuration file. Internally MirrorMaker driver creates and handles pairs of each connector - MirrorSource Connector, MirrorSink Connector, MirrorCheckpoint Connector and MirrorHeartbeat Connector.
To configure Mirror Maker 2 to replicate data, you need to update Mirror Maker 2 configuration file
kafka-to-eh-connect-mirror-maker.properties
to define the replication topology.In the
kafka-to-eh-connect-mirror-maker.properties
config file, define cluster aliases that you plan to use for your Kafka cluster(source) and Event Hubs (destination).# cluster aliases clusters = source, destination
Then specify the connection information for your source, which is your Kafka cluster.
source.bootstrap.servers = your-kafka-cluster-hostname:9092 #source.security.protocol=SASL_SSL #source.sasl.mechanism=PLAIN #source.sasl.jaas.config=<replace sasl jaas config of your Kafka cluster>;
Specify connection information for destination, which is the Event Hubs namespace that you created.
destination.bootstrap.servers = <your-enventhubs-namespace>.servicebus.chinacloudapi.cn:9093 destination.security.protocol=SASL_SSL destination.sasl.mechanism=PLAIN destination.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='$ConnectionString' password='<Your Event Hubs namespace connection string.>';
Enable replication flow from source Kafka cluster to destination Event Hubs namespace.
source->destination.enabled = true source->destination.topics = .*
Update the replication factor of the remote topics and internal topics that Mirror Maker creates at the destination.
replication.factor=3 checkpoints.topic.replication.factor=3 heartbeats.topic.replication.factor=3 offset-syncs.topic.replication.factor=3 offset.storage.replication.factor=3 status.storage.replication.factor=3 config.storage.replication.factor=3
Then you copy
kafka-to-eh-connect-mirror-maker.properties
configuration file to the Kafka distribution's config directory and can run the Mirror Maker 2 script using the following command../bin/connect-mirror-maker.sh ./config/kafka-to-eh-connect-mirror-maker.properties
Upon the successful execution of the script, you should see the Kafka topics and events getting replicated to your Event Hubs namespace.
To verify that events are making it to the Kafka-enabled Event Hubs, check out the ingress statistics in the Azure portal, or run a consumer against the Event Hubs.
Samples
See the following samples on GitHub:
If you are hosting Apache Kafka on Kubernetes using the CNCF Strimzi operator, you can use Strimzi Mirror Maker 2 sample for Event Hubs.
Next steps
To learn more about Event Hubs for Kafka, see the following articles: