How to deploy Prometheus infrastructure for Kafka monitoring using puppet

Hi,

In the last couple of days i worked on deployment of Prometheus server and agent for Kafka monitoring. In that purpose i will share with you the main points that you need to do in order to achieve this.

First thing to do is to use the prometheus and grafana modules that you will find at the following links:

https://forge.puppet.com/puppet/prometheus
https://forge.puppet.com/puppet/grafana

After these are imported in puppet you need to create the following puppet files:

grafana.pp

class profiles::grafana {
    class { '::grafana':
      cfg => {
        app_mode => 'production',
        server   => {
          http_port     => 8080,
        },
        database => {
          type     => 'sqlite3',
          host     => '127.0.0.1:3306',
          name     => 'grafana',
          user     => 'root',
          password => 'grafana',
        },
        users    => {
          allow_sign_up => false,
        },
      },
    }
}

puppetserver.pp

class profiles::prometheusserver {
    $kafka_nodes=hiera(profiles::prometheusserver::nodes)
   
    if $kafka_nodes {
	class {'::prometheus':
	   global_config  => { 'scrape_interval'=> '15s', 'evaluation_interval'=> '15s', 'external_labels'=> { 'monitor'=>'master'}},
       rule_files     => [ "/etc/prometheus/alert.rules" ],
       scrape_configs => [ {'job_name'=>'prometheus','scrape_interval'=> '30s','scrape_timeout'=>'30s','static_configs'=> [{'targets'=>['localhost:9090'], 'labels'=> { 'alias'=>'Prometheus'}}]},{'job_name'=> kafka, 'scrape_interval'=> '10s', 'scrape_timeout'=> '10s', 'static_configs'=> [{'targets'=> $kafka_nodes }]}],
    }
   
    } else {
    class {'::prometheus':
	   global_config  => { 'scrape_interval'=> '15s', 'evaluation_interval'=> '15s', 'external_labels'=> { 'monitor'=>'master'}},
       rule_files     => [ "/etc/prometheus/alert.rules" ],
       scrape_configs => [ {'job_name'=>'prometheus','scrape_interval'=> '30s','scrape_timeout'=>'30s','static_configs'=> [{'targets'=>['localhost:9090'], 'labels'=> { 'alias'=>'Prometheus'}}]}],
    }
    }
}

prometheusnode.pp

class profiles_opqs::prometheusnode(
	$jmxexporter_dir = hiera('jmxexporter::dir','/opt/jmxexporter'),
	$jmxexporter_version = hiera('jmxexporter::version','0.9')
){
	include ::prometheus::node_exporter
	#validate_string($jmxexporter_dir)

	file {"${jmxexporter_dir}":
		ensure => 'directory',
	}
	file {"${jmxexporter_dir}/prometheus_config.yaml":
		source => 'puppet:///modules/profiles/prometheus_config',
	}
	wget::fetch {"https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/${jmxexporter_version}/jmx_prometheus_javaagent-${jmxexporter_version}.jar":
	destination => "${jmxexporter_dir}/",
	cache_dir => '/tmp/',
	timeout => 0,
	verbose => false,
	unless => "test -e ${jmxexporter_dir}/jmx_prometheus_javaagent-${jmxexporter_version}.jar",
	}	
}

It is true that i used the wget module to take the JMX exporter so i must give you that as well

 https://forge.puppet.com/leonardothibes/wget

As also required is the file to configure the jmx exporter configuration file in order to translate the JMX data provided by Kafka to fields to be imported in Prometheus
prometheus_config:

lowercaseOutputName: true
rules:
- pattern : kafka.cluster<type=(.+), name=(.+), topic=(.+), partition=(.+)><>Value
  name: kafka_cluster_$1_$2
  labels:
    topic: "$3"
    partition: "$4"
- pattern : kafka.log<type=Log, name=(.+), topic=(.+), partition=(.+)><>Value
  name: kafka_log_$1
  labels:
    topic: "$2"
    partition: "$3"
- pattern : kafka.controller<type=(.+), name=(.+)><>(Count|Value)
  name: kafka_controller_$1_$2
- pattern : kafka.network<type=(.+), name=(.+)><>Value
  name: kafka_network_$1_$2
- pattern : kafka.network<type=(.+), name=(.+)PerSec, request=(.+)><>Count
  name: kafka_network_$1_$2_total
  labels:
    request: "$3"
- pattern : kafka.network<type=(.+), name=(\w+), networkProcessor=(.+)><>Count
  name: kafka_network_$1_$2
  labels:
    request: "$3"
  type: COUNTER
- pattern : kafka.network<type=(.+), name=(\w+), request=(\w+)><>Count
  name: kafka_network_$1_$2
  labels:
    request: "$3"
- pattern : kafka.network<type=(.+), name=(\w+)><>Count
  name: kafka_network_$1_$2
- pattern : kafka.server<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count
  name: kafka_server_$1_$2_total
  labels:
    topic: "$3"
- pattern : kafka.server<type=(.+), name=(.+)PerSec\w*><>Count
  name: kafka_server_$1_$2_total
  type: COUNTER

- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    clientId: "$3"
    topic: "$4"
    partition: "$5"
- pattern : kafka.server<type=(.+), name=(.+), topic=(.+), partition=(.*)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    topic: "$3"
    partition: "$4"
- pattern : kafka.server<type=(.+), name=(.+), topic=(.+)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    topic: "$3"
  type: COUNTER

- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    clientId: "$3"
    broker: "$4:$5"
- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    clientId: "$3"
- pattern : kafka.server<type=(.+), name=(.+)><>(Count|Value)
  name: kafka_server_$1_$2

- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
  name: kafka_$1_$2_$3_total
- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count
  name: kafka_$1_$2_$3_total
  labels:
    topic: "$4"
  type: COUNTER
- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+), partition=(.+)><>Count
  name: kafka_$1_$2_$3_total
  labels:
    topic: "$4"
    partition: "$5"
  type: COUNTER
- pattern : kafka.(\w+)<type=(.+), name=(.+)><>(Count|Value)
  name: kafka_$1_$2_$3_$4
  type: COUNTER
- pattern : kafka.(\w+)<type=(.+), name=(.+), (\w+)=(.+)><>(Count|Value)
  name: kafka_$1_$2_$3_$6
  labels:
    "$4": "$5"

Ok, so in order to put this together, we will use plain old hiera :). For the server on which you want to configure prometheus server you will need to create a role or just put it in the fqdn.yaml that looks like this:

prometheus.yaml

---
classes:
  - 'profiles::prometheusserver'
  - 'profiles::grafana'

alertrules:
    -
        name: 'InstanceDown'
        condition:  'up == 0'
        timeduration: '5m'
        labels:
            -
                name: 'severity'
                content: 'critical'
        annotations:
            -
                name: 'summary'
                content: 'Instance {{ $labels.instance }} down'
            -
                name: 'description'
                content: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'

This is as default installation, because it’s a “role”, on each prometheus host i also created a specific fqdn.yaml file to specify in order to tell what nodes should be checked for exposed metrics. Here is an example:

---
profiles::prometheusserver::nodes:
    - 'kafka0:7071'
    - 'kafka1:7071'
    - 'kafka2:7071'

The three nodes are as an example, you can put all the nodes on which you include the prometheus node class.
Let me show you how this should also look:

---
classes:
 - 'profiles::prometheusnode'
 
profiles::kafka::jolokia: '-javaagent:/usr/share/java/jolokia-jvm-agent.jar -javaagent:/opt/jmxexporter/jmx_prometheus_javaagent-0.9.jar=7071:/opt/jmxexporter/prometheus_config.yaml

Now i need to explain that jolokia variable, right? Yeah, it’s pretty straight forward. The kafka installation was already wrote, and it included the jolokia agent and our broker definition block looks like this:


 class { '::kafka::broker':
    config    => $broker_config,
    opts      => hiera('profiles::kafka::jolokia', '-javaagent:/usr/share/java/jolokia-jvm-agent.jar'),
    heap_opts => "-Xmx${jvm_heap_size}M -Xms${jvm_heap_size}M",
  }
}

So i needed to puth the jmx exporter agent beside jolokia on kafka startup, and when this will be deployed you will see the jmxexporter started as agent. Anyhow, when all is deployed you will have a prometheus config that should look like:

---
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    monitor: master
rule_files:
- /etc/prometheus/alert.rules
scrape_configs:
- job_name: prometheus
  scrape_interval: 30s
  scrape_timeout: 30s
  static_configs:
  - targets:
    - localhost:9090
    labels:
      alias: Prometheus
- job_name: kafka
  scrape_interval: 10s
  scrape_timeout: 10s
  static_configs:
  - targets:
    - kafka0:7071
    - kafka1:7071
    - kafka2:7071

You can also see the nodes at Status -> Targets from the menu, and yeah, all the metrics are available by node at http://[kafka-node]:7071/metrics.

I think this should be it, i don’t know i covered everything and there are a lot of details related to our custom installation but at least i managed to provide so details related to it. The article that helped me very much to do is can be visited here

https://www.robustperception.io/monitoring-kafka-with-prometheus/

Cheers!