Tag: promethheus

  • How to deploy Prometheus infrastructure for Kafka monitoring using puppet

    Hi,

    In the last couple of days i worked on deployment of Prometheus server and agent for Kafka monitoring. In that purpose i will share with you the main points that you need to do in order to achieve this.

    First thing to do is to use the prometheus and grafana modules that you will find at the following links:

    https://forge.puppet.com/puppet/prometheus
    https://forge.puppet.com/puppet/grafana

    After these are imported in puppet you need to create the following puppet files:

    grafana.pp

    class profiles::grafana {
        class { '::grafana':
          cfg => {
            app_mode => 'production',
            server   => {
              http_port     => 8080,
            },
            database => {
              type     => 'sqlite3',
              host     => '127.0.0.1:3306',
              name     => 'grafana',
              user     => 'root',
              password => 'grafana',
            },
            users    => {
              allow_sign_up => false,
            },
          },
        }
    }

    puppetserver.pp

    class profiles::prometheusserver {
        $kafka_nodes=hiera(profiles::prometheusserver::nodes)
       
        if $kafka_nodes {
    	class {'::prometheus':
    	   global_config  => { 'scrape_interval'=> '15s', 'evaluation_interval'=> '15s', 'external_labels'=> { 'monitor'=>'master'}},
           rule_files     => [ "/etc/prometheus/alert.rules" ],
           scrape_configs => [ {'job_name'=>'prometheus','scrape_interval'=> '30s','scrape_timeout'=>'30s','static_configs'=> [{'targets'=>['localhost:9090'], 'labels'=> { 'alias'=>'Prometheus'}}]},{'job_name'=> kafka, 'scrape_interval'=> '10s', 'scrape_timeout'=> '10s', 'static_configs'=> [{'targets'=> $kafka_nodes }]}],
        }
       
        } else {
        class {'::prometheus':
    	   global_config  => { 'scrape_interval'=> '15s', 'evaluation_interval'=> '15s', 'external_labels'=> { 'monitor'=>'master'}},
           rule_files     => [ "/etc/prometheus/alert.rules" ],
           scrape_configs => [ {'job_name'=>'prometheus','scrape_interval'=> '30s','scrape_timeout'=>'30s','static_configs'=> [{'targets'=>['localhost:9090'], 'labels'=> { 'alias'=>'Prometheus'}}]}],
        }
        }
    }

    prometheusnode.pp

    class profiles_opqs::prometheusnode(
    	$jmxexporter_dir = hiera('jmxexporter::dir','/opt/jmxexporter'),
    	$jmxexporter_version = hiera('jmxexporter::version','0.9')
    ){
    	include ::prometheus::node_exporter
    	#validate_string($jmxexporter_dir)
    
    	file {"${jmxexporter_dir}":
    		ensure => 'directory',
    	}
    	file {"${jmxexporter_dir}/prometheus_config.yaml":
    		source => 'puppet:///modules/profiles/prometheus_config',
    	}
    	wget::fetch {"https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/${jmxexporter_version}/jmx_prometheus_javaagent-${jmxexporter_version}.jar":
    	destination => "${jmxexporter_dir}/",
    	cache_dir => '/tmp/',
    	timeout => 0,
    	verbose => false,
    	unless => "test -e ${jmxexporter_dir}/jmx_prometheus_javaagent-${jmxexporter_version}.jar",
    	}	
    }

    It is true that i used the wget module to take the JMX exporter so i must give you that as well

     https://forge.puppet.com/leonardothibes/wget

    As also required is the file to configure the jmx exporter configuration file in order to translate the JMX data provided by Kafka to fields to be imported in Prometheus
    prometheus_config:

    lowercaseOutputName: true
    rules:
    - pattern : kafka.cluster<type=(.+), name=(.+), topic=(.+), partition=(.+)><>Value
      name: kafka_cluster_$1_$2
      labels:
        topic: "$3"
        partition: "$4"
    - pattern : kafka.log<type=Log, name=(.+), topic=(.+), partition=(.+)><>Value
      name: kafka_log_$1
      labels:
        topic: "$2"
        partition: "$3"
    - pattern : kafka.controller<type=(.+), name=(.+)><>(Count|Value)
      name: kafka_controller_$1_$2
    - pattern : kafka.network<type=(.+), name=(.+)><>Value
      name: kafka_network_$1_$2
    - pattern : kafka.network<type=(.+), name=(.+)PerSec, request=(.+)><>Count
      name: kafka_network_$1_$2_total
      labels:
        request: "$3"
    - pattern : kafka.network<type=(.+), name=(\w+), networkProcessor=(.+)><>Count
      name: kafka_network_$1_$2
      labels:
        request: "$3"
      type: COUNTER
    - pattern : kafka.network<type=(.+), name=(\w+), request=(\w+)><>Count
      name: kafka_network_$1_$2
      labels:
        request: "$3"
    - pattern : kafka.network<type=(.+), name=(\w+)><>Count
      name: kafka_network_$1_$2
    - pattern : kafka.server<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count
      name: kafka_server_$1_$2_total
      labels:
        topic: "$3"
    - pattern : kafka.server<type=(.+), name=(.+)PerSec\w*><>Count
      name: kafka_server_$1_$2_total
      type: COUNTER
    
    - pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>(Count|Value)
      name: kafka_server_$1_$2
      labels:
        clientId: "$3"
        topic: "$4"
        partition: "$5"
    - pattern : kafka.server<type=(.+), name=(.+), topic=(.+), partition=(.*)><>(Count|Value)
      name: kafka_server_$1_$2
      labels:
        topic: "$3"
        partition: "$4"
    - pattern : kafka.server<type=(.+), name=(.+), topic=(.+)><>(Count|Value)
      name: kafka_server_$1_$2
      labels:
        topic: "$3"
      type: COUNTER
    
    - pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>(Count|Value)
      name: kafka_server_$1_$2
      labels:
        clientId: "$3"
        broker: "$4:$5"
    - pattern : kafka.server<type=(.+), name=(.+), clientId=(.+)><>(Count|Value)
      name: kafka_server_$1_$2
      labels:
        clientId: "$3"
    - pattern : kafka.server<type=(.+), name=(.+)><>(Count|Value)
      name: kafka_server_$1_$2
    
    - pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
      name: kafka_$1_$2_$3_total
    - pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count
      name: kafka_$1_$2_$3_total
      labels:
        topic: "$4"
      type: COUNTER
    - pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+), partition=(.+)><>Count
      name: kafka_$1_$2_$3_total
      labels:
        topic: "$4"
        partition: "$5"
      type: COUNTER
    - pattern : kafka.(\w+)<type=(.+), name=(.+)><>(Count|Value)
      name: kafka_$1_$2_$3_$4
      type: COUNTER
    - pattern : kafka.(\w+)<type=(.+), name=(.+), (\w+)=(.+)><>(Count|Value)
      name: kafka_$1_$2_$3_$6
      labels:
        "$4": "$5"

    Ok, so in order to put this together, we will use plain old hiera :). For the server on which you want to configure prometheus server you will need to create a role or just put it in the fqdn.yaml that looks like this:

    prometheus.yaml

    ---
    classes:
      - 'profiles::prometheusserver'
      - 'profiles::grafana'
    
    alertrules:
        -
            name: 'InstanceDown'
            condition:  'up == 0'
            timeduration: '5m'
            labels:
                -
                    name: 'severity'
                    content: 'critical'
            annotations:
                -
                    name: 'summary'
                    content: 'Instance {{ $labels.instance }} down'
                -
                    name: 'description'
                    content: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'
    

    This is as default installation, because it’s a “role”, on each prometheus host i also created a specific fqdn.yaml file to specify in order to tell what nodes should be checked for exposed metrics. Here is an example:

    ---
    profiles::prometheusserver::nodes:
        - 'kafka0:7071'
        - 'kafka1:7071'
        - 'kafka2:7071'
    

    The three nodes are as an example, you can put all the nodes on which you include the prometheus node class.
    Let me show you how this should also look:

    ---
    classes:
     - 'profiles::prometheusnode'
     
    profiles::kafka::jolokia: '-javaagent:/usr/share/java/jolokia-jvm-agent.jar -javaagent:/opt/jmxexporter/jmx_prometheus_javaagent-0.9.jar=7071:/opt/jmxexporter/prometheus_config.yaml

    Now i need to explain that jolokia variable, right? Yeah, it’s pretty straight forward. The kafka installation was already wrote, and it included the jolokia agent and our broker definition block looks like this:

    
     class { '::kafka::broker':
        config    => $broker_config,
        opts      => hiera('profiles::kafka::jolokia', '-javaagent:/usr/share/java/jolokia-jvm-agent.jar'),
        heap_opts => "-Xmx${jvm_heap_size}M -Xms${jvm_heap_size}M",
      }
    }

    So i needed to puth the jmx exporter agent beside jolokia on kafka startup, and when this will be deployed you will see the jmxexporter started as agent. Anyhow, when all is deployed you will have a prometheus config that should look like:

    ---
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
      external_labels:
        monitor: master
    rule_files:
    - /etc/prometheus/alert.rules
    scrape_configs:
    - job_name: prometheus
      scrape_interval: 30s
      scrape_timeout: 30s
      static_configs:
      - targets:
        - localhost:9090
        labels:
          alias: Prometheus
    - job_name: kafka
      scrape_interval: 10s
      scrape_timeout: 10s
      static_configs:
      - targets:
        - kafka0:7071
        - kafka1:7071
        - kafka2:7071

    You can also see the nodes at Status -> Targets from the menu, and yeah, all the metrics are available by node at http://[kafka-node]:7071/metrics.

    I think this should be it, i don’t know i covered everything and there are a lot of details related to our custom installation but at least i managed to provide so details related to it. The article that helped me very much to do is can be visited here

    https://www.robustperception.io/monitoring-kafka-with-prometheus/

    Cheers!