Storm is great for consuming “streams” of data. It offers good support for consuming topic streams from Kafka. However, as of writing this article, Storm Kafka module does not support wildcard topics. For example, lets say we want to consume all topics matching a particular pattern. e.g:
To consume this topic you will have to create multiple spouts. And in case new topics get added, there is no way for you to consume those dynamically. This was a big drawback of Storm Kafka module for me. One option for me was to create my own Kafka Spout implementation. But that would mean, i would miss all the existing features in Storm Kafka Spout.
To overcome this, i decided to augment the Storm Kafka code with this wildcard feature. I have opened a merge request for this change https://github.com/apache/storm/pull/561. Till this get merged, you can directly consume the change from my repo @ https://github.com/sumitchawla/storm. Here are the steps:
- Add following repository. Jitpack allows you to consume this github repo as a maven repository.
<repository> <id>jitpack.io</id> <url>https://jitpack.io</url> </repository>
2. Add following dependency to your maven project. Here version tag is the github commit number.
<dependency> <groupId>com.github.sumitchawla</groupId> <artifactId>storm</artifactId> <version>fe7de58f09</version> <exclusions> <exclusion> <groupId>com.github.sumitchawla.storm</groupId> <artifactId>storm-core</artifactId> </exclusion> <exclusion> <groupId>com.github.sumitchawla.storm</groupId> <artifactId>storm-hive</artifactId> </exclusion> </exclusions> </dependency>
3. For consuming the code, add following config to enable wildcard topic support
After this if you pass a wildcard topic to Kafka Spout e.g. “clickstream.*.log”, then it will match all topics matching this pattern. It will discover for new topics dynamically, and you would need to create a single Kafka Spout. You can use a parallelism hint so that all the partitions of all the matching topics get distributed equally.
As of Nov 2nd, 2015 this change has been absorbed in main branch of apache\storm code.