Wrong Kafka configuration and deployment using puppet

Hi,

I just want to share with you one case that we have last week and that involved some wrong kafka deplyment from puppet that actually filled the filesystems and got our colleagues that were using it for a transport layer on ELK.

Lets start with the begining, we have some puppet code to deploy and configure kafka machines. To keep it simple the broker config block from puppet looks like this:

$broker_config = {
    'broker.id'                     => '-1', # always set broker.id to -1.
    # broker specific config
    'zookeeper.connect'             => hiera('::kafka::zookeeper_connect', $zookeeper_connect),
    'inter.broker.protocol.version' => hiera('::kafka::inter_broker_protocol_version', $kafka_version),
    'log.dir'                       => hiera('::kafka::log_dir', '/srv/kafka-logs'),
    'log.dirs'                      => hiera('::kafka::log_dirs', '/srv/kafka-logs'),
    'log.retention.hours'           => hiera('::kafka::log_retention_hours', $days7),
    'log.retention.bytes'           => hiera('::kafka::log_retention_bytes', '-1'),
    # confiure availability
    'num.partitions'                => hiera('::kafka::num_partitions', 256),
    'default.replication.factor'    => hiera('::kafka::default_replication_factor', $default_replication_factor),
    # configurre administratability (this is a word now)
    'delete.topic.enable'           => hiera('::kafka::delete_topic_enable', 'true'),
  }

As you can see, there are two fields that need to be carefully configured, one is the log_retention_bytes and the other is the num_partitions. I am underlying this for a very simple reason, if we will take a look in the kafka server.properies file which the engine use to start the broker, we will see

log.retention.bytes=107374182400
num.partitions=256

Now, the first one is the default value (and it means the maximum size per partition), and if you go to an online converter you will see that it means 100GB. This shouldn’t be a problem if you override it topic based, and even more if you manually create a topic with one or two partitions and you have the required space. But our case was different, they were using for logstash which created a default topic configuration with 256 partitions and if the maximum size was 100GB then it though it had 25600GB which i am not mistaken should also translate in 25TB (yeah, that it’s usually more than enough for a ELK instance).

Since we were running only three nodes with 100GB filesystem each, you can imagine that it filled up and the servers crashed.

Now there are two ways to avoid this, depending the number of consumers in the consumer group, and also the retention period. If you have only one consumer and you want to have data for a larger period of time, you can leave it at 100GB (if you have that much storage available) and manually create a topic with 1 partition and a replication factor equal to the number of nodes that you want to use for high availability, or you can leave the partition number to 256, and you highly decrease the retention bytes value. This translates in an equal distribution of data on multiple consumers, but it comes with a lesser storage period of time (now it is also true that it depends on the amount and type of data that you are transferring).

The solution that we adopted is to leave the partition number unchanged and decrease the partition size to 200MB. Keep you informed how it works. 🙂

Cheers!