Storm Kafka – Support of Wildcard Topics.

Storm is great for consuming “streams” of data. It offers good support for consuming topic streams from Kafka. However, as of writing this article, Storm Kafka module does not support wildcard topics. For example, lets say we want to consume all topics matching a particular pattern. e.g:

clickstream.abc.log

clickstream.def.log

clickstream.xyz.log

To consume this topic you will have to create multiple spouts. And in case new topics get added, there is no way for you to consume those dynamically. This was a big drawback of Storm Kafka module for me. One option for me was to create my own Kafka Spout implementation. But that would mean, i would miss all the existing features in Storm Kafka Spout.

To overcome this, i decided to augment the Storm Kafka code with this wildcard feature. I have opened a merge request for this change https://github.com/apache/storm/pull/561. Till this get merged, you can directly consume the change from my repo @ https://github.com/sumitchawla/storm. Here are the steps:

Add following repository. Jitpack allows you to consume this github repo as a maven repository.

 <repository>
   <id>jitpack.io</id>
   <url>https://jitpack.io</url>
 </repository>

2. Add following dependency to your maven project. Here version tag is the github commit number.

 <dependency>
  <groupId>com.github.sumitchawla</groupId>
  <artifactId>storm</artifactId>
  <version>fe7de58f09</version>
  <exclusions>
    <exclusion>
       <groupId>com.github.sumitchawla.storm</groupId>
       <artifactId>storm-core</artifactId>
    </exclusion>
    <exclusion>
       <groupId>com.github.sumitchawla.storm</groupId>
       <artifactId>storm-hive</artifactId>
    </exclusion>
  </exclusions>
 </dependency>

3. For consuming the code, add following config to enable wildcard topic support

       config.put("kafka.topic.wildcard.match",true);

After this if you pass a wildcard topic to Kafka Spout e.g. “clickstream.*.log”, then it will match all topics matching this pattern. It will discover for new topics dynamically, and you would need to create a single Kafka Spout. You can use a parallelism hint so that all the partitions of all the matching topics get distributed equally.

Update:

As of Nov 2nd, 2015 this change has been absorbed in main branch of apache\storm code.

https://github.com/apache/storm/commit/60d9f81ba1b16f7711c487f831f202e49eda258c

References:

3 thoughts on “Storm Kafka – Support of Wildcard Topics.”

Exactly what I needed! Thank you!

LikeLike

Hi Sumit

Does this create one executor per partition or try to map multiple partitions to single executor. Issue in my case is,I have multiple topics. I need to consume them all in a single topology. If there are 15 topics with 10 partitions each, then I will end up having parallelism of 150. I know, I don’t need that much. Having just 10 will solve my requirements. Will this PR help deal with issue mentioned.

LikeLike

sumitchawla says:

December 4, 2015 at 5:31 pm

It will evenly divide the partitions among tasks. So you in your case since you have parallelism of 10 – i.e means each task will get 15 * 10 ( no of partitions) / 10 ( no of tasks) = 15 partitions

LikeLike

Reply

Paulo Costa says:

November 17, 2015 at 11:45 am

Exactly what I needed! Thank you!

LikeLike

Santosh Pingale says:

December 4, 2015 at 2:40 pm

Hi Sumit

Does this create one executor per partition or try to map multiple partitions to single executor. Issue in my case is,I have multiple topics. I need to consume them all in a single topology. If there are 15 topics with 10 partitions each, then I will end up having parallelism of 150. I know, I don’t need that much. Having just 10 will solve my requirements. Will this PR help deal with issue mentioned.

LikeLike

- sumitchawla says:
  
  December 4, 2015 at 5:31 pm
  
  It will evenly divide the partitions among tasks. So you in your case since you have parallelism of 10 – i.e means each task will get 15 * 10 ( no of partitions) / 10 ( no of tasks) = 15 partitions
  
  LikeLike

	DNS issues when usin… on Linux Mint OpenConnect VPN DNS…
	Braydon Douglas on How to do end-to-end testing o…
	Apache + Yarn + Spar… on Install a Multi Node Hadoop Cl…
	Film streaming on How to do end-to-end testing o…
	Film complet vf on How to do end-to-end testing o…

Sumit Chawla's Blog

Storm Kafka – Support of Wildcard Topics.

3 thoughts on “Storm Kafka – Support of Wildcard Topics.”

Leave a reply to sumitchawla Cancel reply

Rate this:

Share this:

Related

3 thoughts on “Storm Kafka – Support of Wildcard Topics.”

Leave a reply to sumitchawla Cancel reply