Categories
kafka newtools

How to deploy Prometheus infrastructure for Kafka monitoring using puppet

Hi,

In the last couple of days i worked on deployment of Prometheus server and agent for Kafka monitoring. In that purpose i will share with you the main points that you need to do in order to achieve this.

First thing to do is to use the prometheus and grafana modules that you will find at the following links:

https://forge.puppet.com/puppet/prometheus
https://forge.puppet.com/puppet/grafana

After these are imported in puppet you need to create the following puppet files:

grafana.pp

class profiles::grafana {
    class { '::grafana':
      cfg => {
        app_mode => 'production',
        server   => {
          http_port     => 8080,
        },
        database => {
          type     => 'sqlite3',
          host     => '127.0.0.1:3306',
          name     => 'grafana',
          user     => 'root',
          password => 'grafana',
        },
        users    => {
          allow_sign_up => false,
        },
      },
    }
}

puppetserver.pp

class profiles::prometheusserver {
    $kafka_nodes=hiera(profiles::prometheusserver::nodes)
   
    if $kafka_nodes {
	class {'::prometheus':
	   global_config  => { 'scrape_interval'=> '15s', 'evaluation_interval'=> '15s', 'external_labels'=> { 'monitor'=>'master'}},
       rule_files     => [ "/etc/prometheus/alert.rules" ],
       scrape_configs => [ {'job_name'=>'prometheus','scrape_interval'=> '30s','scrape_timeout'=>'30s','static_configs'=> [{'targets'=>['localhost:9090'], 'labels'=> { 'alias'=>'Prometheus'}}]},{'job_name'=> kafka, 'scrape_interval'=> '10s', 'scrape_timeout'=> '10s', 'static_configs'=> [{'targets'=> $kafka_nodes }]}],
    }
   
    } else {
    class {'::prometheus':
	   global_config  => { 'scrape_interval'=> '15s', 'evaluation_interval'=> '15s', 'external_labels'=> { 'monitor'=>'master'}},
       rule_files     => [ "/etc/prometheus/alert.rules" ],
       scrape_configs => [ {'job_name'=>'prometheus','scrape_interval'=> '30s','scrape_timeout'=>'30s','static_configs'=> [{'targets'=>['localhost:9090'], 'labels'=> { 'alias'=>'Prometheus'}}]}],
    }
    }
}

prometheusnode.pp

class profiles_opqs::prometheusnode(
	$jmxexporter_dir = hiera('jmxexporter::dir','/opt/jmxexporter'),
	$jmxexporter_version = hiera('jmxexporter::version','0.9')
){
	include ::prometheus::node_exporter
	#validate_string($jmxexporter_dir)

	file {"${jmxexporter_dir}":
		ensure => 'directory',
	}
	file {"${jmxexporter_dir}/prometheus_config.yaml":
		source => 'puppet:///modules/profiles/prometheus_config',
	}
	wget::fetch {"https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/${jmxexporter_version}/jmx_prometheus_javaagent-${jmxexporter_version}.jar":
	destination => "${jmxexporter_dir}/",
	cache_dir => '/tmp/',
	timeout => 0,
	verbose => false,
	unless => "test -e ${jmxexporter_dir}/jmx_prometheus_javaagent-${jmxexporter_version}.jar",
	}	
}

It is true that i used the wget module to take the JMX exporter so i must give you that as well

 https://forge.puppet.com/leonardothibes/wget

As also required is the file to configure the jmx exporter configuration file in order to translate the JMX data provided by Kafka to fields to be imported in Prometheus
prometheus_config:

lowercaseOutputName: true
rules:
- pattern : kafka.cluster<type=(.+), name=(.+), topic=(.+), partition=(.+)><>Value
  name: kafka_cluster_$1_$2
  labels:
    topic: "$3"
    partition: "$4"
- pattern : kafka.log<type=Log, name=(.+), topic=(.+), partition=(.+)><>Value
  name: kafka_log_$1
  labels:
    topic: "$2"
    partition: "$3"
- pattern : kafka.controller<type=(.+), name=(.+)><>(Count|Value)
  name: kafka_controller_$1_$2
- pattern : kafka.network<type=(.+), name=(.+)><>Value
  name: kafka_network_$1_$2
- pattern : kafka.network<type=(.+), name=(.+)PerSec, request=(.+)><>Count
  name: kafka_network_$1_$2_total
  labels:
    request: "$3"
- pattern : kafka.network<type=(.+), name=(\w+), networkProcessor=(.+)><>Count
  name: kafka_network_$1_$2
  labels:
    request: "$3"
  type: COUNTER
- pattern : kafka.network<type=(.+), name=(\w+), request=(\w+)><>Count
  name: kafka_network_$1_$2
  labels:
    request: "$3"
- pattern : kafka.network<type=(.+), name=(\w+)><>Count
  name: kafka_network_$1_$2
- pattern : kafka.server<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count
  name: kafka_server_$1_$2_total
  labels:
    topic: "$3"
- pattern : kafka.server<type=(.+), name=(.+)PerSec\w*><>Count
  name: kafka_server_$1_$2_total
  type: COUNTER

- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    clientId: "$3"
    topic: "$4"
    partition: "$5"
- pattern : kafka.server<type=(.+), name=(.+), topic=(.+), partition=(.*)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    topic: "$3"
    partition: "$4"
- pattern : kafka.server<type=(.+), name=(.+), topic=(.+)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    topic: "$3"
  type: COUNTER

- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    clientId: "$3"
    broker: "$4:$5"
- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    clientId: "$3"
- pattern : kafka.server<type=(.+), name=(.+)><>(Count|Value)
  name: kafka_server_$1_$2

- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
  name: kafka_$1_$2_$3_total
- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count
  name: kafka_$1_$2_$3_total
  labels:
    topic: "$4"
  type: COUNTER
- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+), partition=(.+)><>Count
  name: kafka_$1_$2_$3_total
  labels:
    topic: "$4"
    partition: "$5"
  type: COUNTER
- pattern : kafka.(\w+)<type=(.+), name=(.+)><>(Count|Value)
  name: kafka_$1_$2_$3_$4
  type: COUNTER
- pattern : kafka.(\w+)<type=(.+), name=(.+), (\w+)=(.+)><>(Count|Value)
  name: kafka_$1_$2_$3_$6
  labels:
    "$4": "$5"

Ok, so in order to put this together, we will use plain old hiera :). For the server on which you want to configure prometheus server you will need to create a role or just put it in the fqdn.yaml that looks like this:

prometheus.yaml

---
classes:
  - 'profiles::prometheusserver'
  - 'profiles::grafana'

alertrules:
    -
        name: 'InstanceDown'
        condition:  'up == 0'
        timeduration: '5m'
        labels:
            -
                name: 'severity'
                content: 'critical'
        annotations:
            -
                name: 'summary'
                content: 'Instance {{ $labels.instance }} down'
            -
                name: 'description'
                content: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'

This is as default installation, because it’s a “role”, on each prometheus host i also created a specific fqdn.yaml file to specify in order to tell what nodes should be checked for exposed metrics. Here is an example:

---
profiles::prometheusserver::nodes:
    - 'kafka0:7071'
    - 'kafka1:7071'
    - 'kafka2:7071'

The three nodes are as an example, you can put all the nodes on which you include the prometheus node class.
Let me show you how this should also look:

---
classes:
 - 'profiles::prometheusnode'
 
profiles::kafka::jolokia: '-javaagent:/usr/share/java/jolokia-jvm-agent.jar -javaagent:/opt/jmxexporter/jmx_prometheus_javaagent-0.9.jar=7071:/opt/jmxexporter/prometheus_config.yaml

Now i need to explain that jolokia variable, right? Yeah, it’s pretty straight forward. The kafka installation was already wrote, and it included the jolokia agent and our broker definition block looks like this:


 class { '::kafka::broker':
    config    => $broker_config,
    opts      => hiera('profiles::kafka::jolokia', '-javaagent:/usr/share/java/jolokia-jvm-agent.jar'),
    heap_opts => "-Xmx${jvm_heap_size}M -Xms${jvm_heap_size}M",
  }
}

So i needed to puth the jmx exporter agent beside jolokia on kafka startup, and when this will be deployed you will see the jmxexporter started as agent. Anyhow, when all is deployed you will have a prometheus config that should look like:

---
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    monitor: master
rule_files:
- /etc/prometheus/alert.rules
scrape_configs:
- job_name: prometheus
  scrape_interval: 30s
  scrape_timeout: 30s
  static_configs:
  - targets:
    - localhost:9090
    labels:
      alias: Prometheus
- job_name: kafka
  scrape_interval: 10s
  scrape_timeout: 10s
  static_configs:
  - targets:
    - kafka0:7071
    - kafka1:7071
    - kafka2:7071

You can also see the nodes at Status -> Targets from the menu, and yeah, all the metrics are available by node at http://[kafka-node]:7071/metrics.

I think this should be it, i don’t know i covered everything and there are a lot of details related to our custom installation but at least i managed to provide so details related to it. The article that helped me very much to do is can be visited here

https://www.robustperception.io/monitoring-kafka-with-prometheus/

Cheers!

Categories
cloud newtools

Small Vagrant config file for Rancher deploy

Hi,

Just wanted to post this also, if it’s not that nice the config using a jumpserver, surely we can convert that to code (Puppet/Ansible), you can also use Vagrant. The main issue that i faced when i tried to create my setup is that for a reason (not really sure why, Vagrant on Windows runs very slow). However, i chose to give you one piece of Vagrantfile for a minimal setup on which you can grab the Rancher server framework and also the client containers.

Here is it:

# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
config.vm.define "master" do |master|
master.vm.box = "centos/7"
master.vm.hostname = 'master'
master.vm.network "public_network", bridge: "enp0s25"
end
config.vm.define "slave" do |slave|
slave.vm.box = "centos/7"
slave.vm.hostname = 'slave'
slave.vm.network "public_network", bridge: "enp0s25"
end
config.vm.define "swarmmaster" do |swarmmaster|
swarmmaster.vm.box = "centos/7"
swarmmaster.vm.hostname = 'swarmmaster'
swarmmaster.vm.network "public_network", bridge: "enp0s25"
end
config.vm.define "swarmslave" do |swarmclient|
swarmclient.vm.box = "centos/7"
swarmclient.vm.hostname = 'swarmclient'
swarmclient.vm.network "public_network", bridge: "enp0s25"
end
end

 

Do not worry about the naming of the machines, you can change them to whatever you like, the main catch is to bridge the public network in all of them in order to be able to communicate with each other and also have access to the docker hub. Beside that everything else that i posted regarding the registry to the Rancher framework is still valid.

Thank you for your time,

Cheers!