Storm Kafka – Support of Wildcard Topics.

Storm is great for consuming “streams” of data.  It offers good support for consuming topic streams from Kafka.  However, as of writing this article, Storm Kafka module does not support wildcard topics.  For example, lets say we want to consume all topics matching a particular pattern. e.g:


To consume this topic you will have to create multiple spouts.  And in case new topics get added, there is no way for you to consume those dynamically.  This was a big drawback of Storm Kafka module for me.  One option for me was to create my own Kafka Spout implementation.  But that would mean, i would miss all the existing features in Storm Kafka Spout.

To overcome this, i decided to augment the Storm Kafka code with this wildcard feature.  I have opened a merge request for this change   Till this get merged, you can directly consume the change from my repo @  Here are the steps:

  1. Add following repository.  Jitpack allows you to consume this github repo as a maven repository.

2.    Add following dependency to your maven project. Here version tag is the github commit number.


3.    For consuming the code, add following config to enable wildcard topic support


After this if you pass a wildcard topic to Kafka Spout e.g. “clickstream.*.log”, then it will match all topics matching this pattern.  It will discover for new topics dynamically, and you would need to create a single Kafka Spout.  You can use a parallelism hint so that all the partitions of all the matching topics get distributed equally.


As of Nov 2nd, 2015  this change has been absorbed in main branch of apache\storm code.



3 thoughts on “Storm Kafka – Support of Wildcard Topics.

  1. Hi Sumit

    Does this create one executor per partition or try to map multiple partitions to single executor. Issue in my case is,I have multiple topics. I need to consume them all in a single topology. If there are 15 topics with 10 partitions each, then I will end up having parallelism of 150. I know, I don’t need that much. Having just 10 will solve my requirements. Will this PR help deal with issue mentioned.


    • It will evenly divide the partitions among tasks. So you in your case since you have parallelism of 10 – i.e means each task will get 15 * 10 ( no of partitions) / 10 ( no of tasks) = 15 partitions


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s