Category: cloud

  • Install zookeeper using puppet without module

    Hi,

    In this post, I was given the task to provide a standalone zookeeper cluster with basic auth on the latest version.

    The reason that happened is that we are using a very old module on our Kafka clusters and a new requirement appeared to install the latest version of 3.5.5.

    The old module had only the possibility to install the package from apt repo, which was not an option since the last version available on Ubuntu Xenial is at least two years old.

    To complete this task, a different method was required. I would have to grab it with wget and add the rest of the files to make it functional.

    Let us start with the puppet manifest and from that, I will add the rest.

    class zookeeperstd {
      $version = hiera("zookeeperstd::version","3.5.5")
      $authenabled = hiera("zookeeperstd::authenabled",false)
      $server_jvm_flags = hiera('zookeeperstd::jvm_flags', undef)
        group { 'zookeeper':
            ensure => 'present',
        } 
        user {'zookeeper':
            ensure => 'present',
            home => '/var/lib/zookeeper',
            shell => '/bin/false',
            }
        wget::fetch { 'zookeeper':
            source      => "https://www-eu.apache.org/dist/zookeeper/stable/apache-zookeeper-${version}-bin.tar.gz",
            destination => "/opt/apache-zookeeper-${version}-bin.tar.gz",
            } ->
        archive { "/opt/apache-zookeeper-${version}-bin.tar.gz":
            creates      => "/opt/apache-zookeeper-${version}-bin",
            ensure        => present,
            extract       => true,
            extract_path  => '/opt',
            cleanup       => true,
        } ->
        file { "/opt/apache-zookeeper-${version}-bin":
            ensure    => directory,
            owner     => 'zookeeper',
            group      => 'zookeeper',
            require     => [ User['zookeeper'], Group['zookeeper'], ],
            recurse => true,
        } ->
        file { '/opt/zookeeper/':
            ensure    => link,
            target    => "/opt/apache-zookeeper-${version}-bin",
            owner     => 'zookeeper',
            group      => 'zookeeper',
            require     => [ User['zookeeper'], Group['zookeeper'], ],
        }
        file { '/var/lib/zookeeper':
            ensure    => directory,
            owner     => 'zookeeper',
            group      => 'zookeeper',
            require     => [ User['zookeeper'], Group['zookeeper'], ],
            recurse    => true,
        }
    # in order to know which servers are in the cluster a role fact needs to be defined on each machine
        $hostshash = query_nodes(" v1_role='zookeeperstd'").sort
        $hosts_hash = $hostshash.map |$value| { [$value, seeded_rand(254, $value)+1] }.hash
        $overide_hosts_hash = hiera_hash('profiles_opqs::kafka_hosts_hash', $hosts_hash)
        $overide_hosts = $overide_hosts_hash.keys.sort
        if $overide_hosts_hash.size() != $overide_hosts_hash.values.unique.size() {
            #notify {"Duplicate IDs detected! ${overide_hosts_hash}": }
            $overide_hosts_hash2 = $hosts.map |$index, $value| { [$value, $index+1] }.hash
      } else {
            $overide_hosts_hash2 = $overide_hosts_hash
        }
    	$hosts = $overide_hosts_hash2
    	$data_dir = "/var/lib/zookeeper"
    	$tick_time        = 2000
            $init_limit       = 10
            $sync_limit       = 5
    
    	$myid = $hosts[$::fqdn]
        file { '/var/lib/zookeeper/myid':
            content => "${myid}",
        }
    
    	file { '/opt/zookeeper/conf/zoo.cfg':
            content => template("${module_name}/zoo.cfg.erb"),
       }
       if $authenabled {
       
        $superpass        = hiera("zookeeperstd::super_pass", 'super-admin')
        $zoopass          = hiera("zookeeperstd::zookeeper_pass", 'zookeeper-admin')
        $clientpass        = hiera("zookeeperstd::client_pass", 'client-admin')
        
        file { '/opt/zookeeper/conf/zoo_jaas.config':
            content => template("${module_name}/zoo_jaas.config.erb"),
       }
       }
         file { '/opt/zookeeper/conf/java.env':
            content => template("${module_name}/java.zookeeper.env.erb"),
            mode => "0755",
        }
         file { '/opt/zookeeper/conf/log4j.properties':
            content => template("${module_name}/log4j.zookeeper.properties.erb"),
        }
       
        file {'/etc/systemd/system/zookeeper.service':
            source  => 'puppet:///modules/work/zookeeper.service',
            mode => "644",
            } ->
        service { 'zookeeper':
            ensure   => running,
            enable   => true,
            provider => systemd,
            }
    }
    

    As far as I managed to adapt some file from the existing module, here are the rest of the additional details.

    #zoo.cfg.erb
    # Note: This file is managed by Puppet.
    
    # http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html
    
    # specify all zookeeper servers
    # The fist port is used by followers to connect to the leader
    # The second one is used for leader election
    <%
    if @hosts
    # sort hosts by myid and output a server config
    # for each host and myid.  (sort_by returns an array of key,value tuples)
    @hosts.sort_by { |name, id| id }.each do |host_id|
    -%>
    server.<%= host_id[1] %>=<%= host_id[0] %>:2182:2183
    <% if @authenabled -%>
    authProvider.<%= host_id[1] %>=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
    <% end -%>
    <% end -%>
    <% end -%>
    
    # the port at which the clients will connect
    clientPort=2181
    
    # the directory where the snapshot is stored.
    dataDir=<%= @data_dir %>
    
    # Place the dataLogDir to a separate physical disc for better performance
    <%= @data_log_dir ? "dataLogDir=#{data_log_dir}" : '# dataLogDir=/disk2/zookeeper' %>
    
    
    # The number of milliseconds of each tick.
    tickTime=<%= @tick_time %>
    
    # The number of ticks that the initial
    # synchronization phase can take.
    initLimit=<%= @init_limit %>
    
    # The number of ticks that can pass between
    # sending a request and getting an acknowledgement
    syncLimit=<%= @sync_limit %>
    
    # To avoid seeks ZooKeeper allocates space in the transaction log file in
    # blocks of preAllocSize kilobytes. The default block size is 64M. One reason
    # for changing the size of the blocks is to reduce the block size if snapshots
    # are taken more often. (Also, see snapCount).
    #preAllocSize=65536
    
    # Clients can submit requests faster than ZooKeeper can process them,
    # especially if there are a lot of clients. To prevent ZooKeeper from running
    # out of memory due to queued requests, ZooKeeper will throttle clients so that
    # there is no more than globalOutstandingLimit outstanding requests in the
    # system. The default limit is 1,000.ZooKeeper logs transactions to a
    # transaction log. After snapCount transactions are written to a log file a
    # snapshot is started and a new transaction log file is started. The default
    # snapCount is 10,000.
    #snapCount=1000
    
    # If this option is defined, requests will be will logged to a trace file named
    # traceFile.year.month.day.
    #traceFile=
    
    # Leader accepts client connections. Default value is "yes". The leader machine
    # coordinates updates. For higher update throughput at thes slight expense of
    # read throughput the leader can be configured to not accept clients and focus
    # on coordination.
    #leaderServes=yes
    
    <% if @authenabled -%>
    
    requireClientAuthScheme=sasl
    quorum.auth.enableSasl=true
    quorum.auth.learnerRequireSasl=true
    quorum.auth.serverRequireSasl=true
    quorum.auth.learner.loginContext=QuorumLearner
    quorum.auth.server.loginContext=QuorumServer
    quorum.cnxn.threads.size=20
    
    <% end -%> 
    
    #zoo_jaas.config
    QuorumServer {
           org.apache.zookeeper.server.auth.DigestLoginModule required
           user_zookeeper="<%= @zoopass %>";
    };
     
    QuorumLearner {
           org.apache.zookeeper.server.auth.DigestLoginModule required
           username="zookeeper"
           password="<%= @zoopass %>";
    };
    
    Server {
           org.apache.zookeeper.server.auth.DigestLoginModule required
           user_super="<%= @superpass %>"
           user_client="<%= @clientpass %>";
    };
    
    #java.zookeeper.env.erb
    ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
    SERVER_JVMFLAGS="<%= @server_jvm_flags %>"
    
    #log4j.zookeeper.properties.erb
    # Note: This file is managed by Puppet.
    
    #
    # ZooKeeper Logging Configuration
    #
    
    # Format is "<default threshold> (, <appender>)+
    
    log4j.rootLogger=${zookeeper.root.logger}, ROLLINGFILE
    
    #
    # Log INFO level and above messages to the console
    #
    log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
    log4j.appender.CONSOLE.Threshold=INFO
    log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
    log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n
    
    #
    # Add ROLLINGFILE to rootLogger to get log file output
    #    Log INFO level and above messages to a log file
    log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender
    log4j.appender.ROLLINGFILE.Threshold=INFO
    log4j.appender.ROLLINGFILE.File=${zookeeper.log.dir}/zookeeper.log
    
    # Max log file size of 10MB
    log4j.appender.ROLLINGFILE.MaxFileSize=10MB
    # Keep only 10 files
    log4j.appender.ROLLINGFILE.MaxBackupIndex=10
    log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout
    log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n
    

    And the last but not the least.

    [Unit]
    Description=ZooKeeper Service
    Documentation=http://zookeeper.apache.org
    Requires=network.target
    After=network.target
    
    [Service]
    Type=forking
    User=zookeeper
    Group=zookeeper
    ExecStart=/opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/conf/zoo.cfg
    ExecStop=/opt/zookeeper/bin/zkServer.sh stop /opt/zookeeper/conf/zoo.cfg
    ExecReload=/opt/zookeeper/bin/zkServer.sh restart /opt/zookeeper/conf/zoo.cfg
    WorkingDirectory=/var/lib/zookeeper
    
    [Install]
    WantedBy=default.target
    

    Also, if you want to enable simple MD5 authentication, in hiera you will need to add the following two lines.

    zookeeperstd::authenabled: true
    zookeeperstd::jvm_flags: "-Djava.security.auth.login.config=/opt/zookeeper/conf/zoo_jaas.config"
    
    

    If there is a simpler approach, feel free to leave me a message on Linkedin or Twitter.

    Cheers

  • Configure kafka truststore and keystore using puppet

    Hi,

    Since the last article was about the template needed to generate the truststore and keystore, now it’s time to give you the rest of the fragments for the deployment with puppet.
    So, We will have the kafka_security gen class that was called on the last post and it should look like this in the final form:

    class profiles::kafka_security_gen {
        $pass = hiera('profiles::kafka_security_gen::password','password')
        $keystorepass = hiera('profiles::kafka_security_gen::keystorepass','password')
        
        $cluster_servers = query_nodes("role='kafka'")
        $servers = $cluster_servers.delete("${::fqdn}")
        
        file {'/home/kafka/security.sh':
            owner => kafka,
            group => kafka,
            ensure => file,
            mode => '0755',
            content => template("${module_name}/security.sh.erb"),
            replace => 'no'
    } ->
        exec {'generate_keystore':
        path   => '/usr/bin:/usr/sbin:/bin',
        command => '/home/kafka/security.sh',
        user => 'kafka',
        unless => 'test -f /home/kafka/kafka.server.keystore.jks'
        }
        $servers.each |String $server| {
            exec{"copy files to ${server}":
                cwd => '/home/kafka',
                path   => '/usr/bin:/usr/sbin:/bin',
                command => "scp /home/kafka/kafka* kafka@${server}:/home/kafka",
                user => 'kafka',
            }
            
         }
        ssh_keygen {'kafka':
        home => '/home/kafka/',
        type => rsa,
      }
      @@file {"/home/kafka/.ssh/authorized_keys":
        ensure => present,
        mode => '0600',
        owner => 'kafka',
        group => 'kafka',
        content => "${::sharedkey}",
        tag => "${::instance_tag}",
      }
      
    }

    The only thing i did not manage to fix is the copying of the stores, it will copy them each time, but i will fix that in the near future with some custom facts.
    Ok, now let’s see the integration part with kafka code. Unfortunately, i can not provide you all the code but as far as i can provide you is:

        
    $kafkadirs = ['/home/kafka/', '/home/kafka/.ssh/']
    file { $kafkadirs: 
        ensure => directory,
        mode => '0700',
        owner => 'kafka',
        group => 'kafka'
        }
        $security_enabled = hiera('profiles::kafka::security',false)
    if ($security_enabled == false) {
      $broker_config = {
        ....
      }
      } else {
        # if hostname starts with kafka0 them it's the CA auth (in our case kafka servers are on form kafka0....)
        if ($hostname == 'kafka0') {
        contain 'profiles::kafka_security_gen'
        Class['profiles::kafka_security_gen']
        }
        #this shares the public key on a couple of servers wich are isolated by a tag
        File <<| tag == "${::instance_tag}" |>>
        #path for keystore/truststore
        $keystore_location = '/home/kafka/kafka.server.keystore.jks'
        $truststore_location = '/home/kafka/kafka.server.truststore.jks'
        #broker config
        $broker_config = {
        ....
        'security.inter.broker.protocol' => 'SSL',
        'ssl.client.auth'               => 'none',
        'ssl.enabled.protocols'         => ['TLSv1.2', 'TLSv1.1', 'TLSv1'],
        'ssl.key.password'              => hiera('profiles::kafka_security_gen::password','password'),
        'ssl.keystore.location'         => $keystore_location,
        'ssl.keystore.password'         => hiera('profiles::kafka_security_gen::keystorepass','password'),
        'ssl.keystore.type'             => 'JKS',
        'ssl.protocol'                  => 'TLS',
        'ssl.truststore.location'       => $truststore_location,
        'ssl.truststore.password'       => hiera('profiles::kafka_security_gen::keystorepass','password'),
        'ssl.truststore.type'           => 'JKS',
        'listeners'                     => "PLAINTEXT://${::fqdn}:9092,SSL://${::fqdn}:9093", #both listeners are enabled
        'advertised.listeners'          => "PLAINTEXT://${::fqdn}:9092,SSL://${::fqdn}:9093",
        }
      }
    class { '::kafka::broker':
        config    => $broker_config,
        opts      => hiera('profiles::kafka::jolokia', '-javaagent:/usr/share/java/jolokia-jvm-agent.jar'),
        heap_opts => "-Xmx${jvm_heap_size}M -Xms${jvm_heap_size}M",
      }
    

    This can be easily activated by putting in the hierarchy the profiles::kafka::security option with true and it should do all the work. Once it’s done it can be tested using openssl s_client -debug -connect localhost:9093 -tls1

    The link to the generate article for the security script template is http://log-it.tech/2017/07/23/generate-kafka-keytrust-store-tls-activation/ and also a quite interesting guideline fron Confluence http://docs.confluent.io/current/kafka/ssl.html#configuring-kafka-brokers

    That’s all folks. 🙂

    Cheers

  • Integrate Kafka with Datadog monitoring using puppet

    Hi,

    Since i was in debt with an article on how to integate Kafka monitoring using Datadog, let me tell you a couple of things about this topic. First of all, we are taking the same config of Kafka with Jolokia that was describe in following article. From the install of the brokers on our infrastructure, JMX data is published on port 9990 (this will be needed in the datadog config).

    The files you need to create for this task are as follows:

    datadogagent.pp

    class profiles::datadogagent {
      $_check_api_key = hiera('datadog_agent::api_key')
    
      contain 'datadog_agent'
      contain 'profiles::datadogagent_config_kafka'
      contain 'datadog_agent::integrations::zk'
    
      Class['datadog_agent'] -> Class['profiles::datadog_agent_config_kafka']
      Class['datadog_agent'] -> Class['datadog_agent::integrations::zk']
    }

    datadogagent_config_kafka.pp

    class profiles::datadogagent_config_kafka (
    $servers = [{'host' => 'localhost', 'port' => '9990'}]
    ) inherits datadog_agent::params {
      include datadog_agent
    
      validate_array($servers)
    
      file { "${datadog_agent::params::conf_dir}/kafka.yaml":
        ensure  => file,
        owner   => $datadog_agent::params::dd_user,
        group   => $datadog_agent::params::dd_group,
        mode    => '0600',
        content => template("${module_name}/kafka.yaml.erb"),
        require => Package[$datadog_agent::params::package_name],
        notify  => Service[$datadog_agent::params::service_name],
      }
    }
    

    And since, there isn’t yet an integration by default for the kafka on the datadog module which you can find it here:

    https://github.com/DataDog/puppet-datadog-agent

    i created in the templates directory the following file:

    kafka.yaml.erb (as you can see from the header this is actually the template given by datadog for kafka integration with specific host and port)

    ##########
    # WARNING
    ##########
    # This sample works only for Kafka >= 0.8.2.
    # If you are running a version older than that, you can refer to agent 5.2.x released
    # sample files, https://raw.githubusercontent.com/DataDog/dd-agent/5.2.1/conf.d/kafka.yaml.example
    
    instances:
    <% @servers.each do |server| -%>
      - host: <%= server['host'] %>
        port: <%= server['port'] %> # This is the JMX port on which Kafka exposes its metrics (usually 9999)
        tags:
          kafka: broker
    
    init_config:
      is_jmx: true
    
      # Metrics collected by this check. You should not have to modify this.
      conf:
        # v0.8.2.x Producers
        - include:
            domain: 'kafka.producer'
            bean_regex: 'kafka\.producer:type=ProducerRequestMetrics,name=ProducerRequestRateAndTimeMs,clientId=.*'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.producer.request_rate
        - include:
            domain: 'kafka.producer'
            bean_regex: 'kafka\.producer:type=ProducerRequestMetrics,name=ProducerRequestRateAndTimeMs,clientId=.*'
            attribute:
              Mean:
                metric_type: gauge
                alias: kafka.producer.request_latency_avg
        - include:
            domain: 'kafka.producer'
            bean_regex: 'kafka\.producer:type=ProducerTopicMetrics,name=BytesPerSec,clientId=.*'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.producer.bytes_out
        - include:
            domain: 'kafka.producer'
            bean_regex: 'kafka\.producer:type=ProducerTopicMetrics,name=MessagesPerSec,clientId=.*'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.producer.message_rate
        # v0.8.2.x Consumers
        - include:
            domain: 'kafka.consumer'
            bean_regex: 'kafka\.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=.*'
            attribute:
              Value:
                metric_type: gauge
                alias: kafka.consumer.max_lag
        - include:
            domain: 'kafka.consumer'
            bean_regex: 'kafka\.consumer:type=ConsumerFetcherManager,name=MinFetchRate,clientId=.*'
            attribute:
              Value:
                metric_type: gauge
                alias: kafka.consumer.fetch_rate
        - include:
            domain: 'kafka.consumer'
            bean_regex: 'kafka\.consumer:type=ConsumerTopicMetrics,name=BytesPerSec,clientId=.*'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.consumer.bytes_in
        - include:
            domain: 'kafka.consumer'
            bean_regex: 'kafka\.consumer:type=ConsumerTopicMetrics,name=MessagesPerSec,clientId=.*'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.consumer.messages_in
    
        # Offsets committed to ZooKeeper
        - include:
            domain: 'kafka.consumer'
            bean_regex: 'kafka\.consumer:type=ZookeeperConsumerConnector,name=ZooKeeperCommitsPerSec,clientId=.*'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.consumer.zookeeper_commits
        # Offsets committed to Kafka
        - include:
            domain: 'kafka.consumer'
            bean_regex: 'kafka\.consumer:type=ZookeeperConsumerConnector,name=KafkaCommitsPerSec,clientId=.*'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.consumer.kafka_commits
        # v0.9.0.x Producers
        - include:
            domain: 'kafka.producer'
            bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
            attribute:
              response-rate:
                metric_type: gauge
                alias: kafka.producer.response_rate
        - include:
            domain: 'kafka.producer'
            bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
            attribute:
              request-rate:
                metric_type: gauge
                alias: kafka.producer.request_rate
        - include:
            domain: 'kafka.producer'
            bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
            attribute:
              request-latency-avg:
                metric_type: gauge
                alias: kafka.producer.request_latency_avg
        - include:
            domain: 'kafka.producer'
            bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
            attribute:
              outgoing-byte-rate:
                metric_type: gauge
                alias: kafka.producer.bytes_out
        - include:
            domain: 'kafka.producer'
            bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
            attribute:
              io-wait-time-ns-avg:
                metric_type: gauge
                alias: kafka.producer.io_wait
    
        # v0.9.0.x Consumers
        - include:
            domain: 'kafka.consumer'
            bean_regex: 'kafka\.consumer:type=consumer-fetch-manager-metrics,client-id=.*'
            attribute:
              bytes-consumed-rate:
                metric_type: gauge
                alias: kafka.consumer.bytes_in
        - include:
            domain: 'kafka.consumer'
            bean_regex: 'kafka\.consumer:type=consumer-fetch-manager-metrics,client-id=.*'
            attribute:
              records-consumed-rate:
                metric_type: gauge
                alias: kafka.consumer.messages_in
        #
        # Aggregate cluster stats
        #
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.net.bytes_out.rate
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.net.bytes_in.rate
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.messages_in.rate
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.net.bytes_rejected.rate
    
        #
        # Request timings
        #
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.request.fetch.failed.rate
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.request.produce.failed.rate
        - include:
            domain: 'kafka.network'
            bean: 'kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.request.produce.rate
        - include:
            domain: 'kafka.network'
            bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce'
            attribute:
              Mean:
                metric_type: gauge
                alias: kafka.request.produce.time.avg
              99thPercentile:
                metric_type: gauge
                alias: kafka.request.produce.time.99percentile
        - include:
            domain: 'kafka.network'
            bean: 'kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumer'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.request.fetch_consumer.rate
        - include:
            domain: 'kafka.network'
            bean: 'kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.request.fetch_follower.rate
        - include:
            domain: 'kafka.network'
            bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer'
            attribute:
              Mean:
                metric_type: gauge
                alias: kafka.request.fetch_consumer.time.avg
              99thPercentile:
                metric_type: gauge
                alias: kafka.request.fetch_consumer.time.99percentile
        - include:
            domain: 'kafka.network'
            bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower'
            attribute:
              Mean:
                metric_type: gauge
                alias: kafka.request.fetch_follower.time.avg
              99thPercentile:
                metric_type: gauge
                alias: kafka.request.fetch_follower.time.99percentile
        - include:
            domain: 'kafka.network'
            bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata'
            attribute:
              Mean:
                metric_type: gauge
                alias: kafka.request.update_metadata.time.avg
              99thPercentile:
                metric_type: gauge
                alias: kafka.request.update_metadata.time.99percentile
        - include:
            domain: 'kafka.network'
            bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Metadata'
            attribute:
              Mean:
                metric_type: gauge
                alias: kafka.request.metadata.time.avg
              99thPercentile:
                metric_type: gauge
                alias: kafka.request.metadata.time.99percentile
        - include:
            domain: 'kafka.network'
            bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Offsets'
            attribute:
              Mean:
                metric_type: gauge
                alias: kafka.request.offsets.time.avg
              99thPercentile:
                metric_type: gauge
                alias: kafka.request.offsets.time.99percentile
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.request.handler.avg.idle.pct.rate
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=ProducerRequestPurgatory,name=PurgatorySize'
            attribute:
              Value:
                metric_type: gauge
                alias: kafka.request.producer_request_purgatory.size
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=FetchRequestPurgatory,name=PurgatorySize'
            attribute:
              Value:
                metric_type: gauge
                alias: kafka.request.fetch_request_purgatory.size
    
        #
        # Replication stats
        #
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions'
            attribute:
              Value:
                metric_type: gauge
                alias: kafka.replication.under_replicated_partitions
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=ReplicaManager,name=IsrShrinksPerSec'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.replication.isr_shrinks.rate
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=ReplicaManager,name=IsrExpandsPerSec'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.replication.isr_expands.rate
        - include:
            domain: 'kafka.controller'
            bean: 'kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.replication.leader_elections.rate
        - include:
            domain: 'kafka.controller'
            bean: 'kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.replication.unclean_leader_elections.rate
        - include:
            domain: 'kafka.controller'
            bean: 'kafka.controller:type=KafkaController,name=OfflinePartitionsCount'
            attribute:
              Value:
                metric_type: gauge
                alias: kafka.replication.offline_partitions_count
        - include:
            domain: 'kafka.controller'
            bean: 'kafka.controller:type=KafkaController,name=ActiveControllerCount'
            attribute:
              Value:
                metric_type: gauge
                alias: kafka.replication.active_controller_count
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=ReplicaManager,name=PartitionCount'
            attribute:
              Value:
                metric_type: gauge
                alias: kafka.replication.partition_count
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=ReplicaManager,name=LeaderCount'
            attribute:
              Value:
                metric_type: gauge
                alias: kafka.replication.leader_count
        - include:
            domain: 'kafka.server'
            bean: 'kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica'
            attribute:
              Value:
                metric_type: gauge
                alias: kafka.replication.max_lag
    
        #
        # Log flush stats
        #
        - include:
            domain: 'kafka.log'
            bean: 'kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs'
            attribute:
              Count:
                metric_type: rate
                alias: kafka.log.flush_rate.rate
    
    <% end -%>

    To integrate all of this node, you need to add in your fqdn.yaml the class in the format:

    ---
    classes:
     - profiles::datadogagent
    
    datadog_agent::api_key: [your key]
    

    After this runs, datadog-agent is installed and you can check it by using ps -ef | grep datadog-agent and also if you like to take a look and you should do that, you will find that there are two new files added to /etc/dd-agent/conf.d called kafka.yaml and zk.yaml.

    You are done, please feel free to login to the datadog portal and check you host.

    Cheers

  • Sysdig container isolation case debugged on kubernetes

    Hi,

    I didn’t get to actual test anything related to this but i managed to find a very interesting article that might be lost if you are not a sysdig fan. You can find it at following link https://sysdig.com/blog/container-isolation-gone-wrong/

    To put into perspective, this tool is used for some very interesting debugging situation, i have played with it some a short period of time and i think i will put in on my list so that i can show you what it can do.

    Cheers

  • Small Vagrant config file for Rancher deploy

    Hi,

    Just wanted to post this also, if it’s not that nice the config using a jumpserver, surely we can convert that to code (Puppet/Ansible), you can also use Vagrant. The main issue that i faced when i tried to create my setup is that for a reason (not really sure why, Vagrant on Windows runs very slow). However, i chose to give you one piece of Vagrantfile for a minimal setup on which you can grab the Rancher server framework and also the client containers.

    Here is it:

    # -*- mode: ruby -*-
    # vi: set ft=ruby :
    Vagrant.configure("2") do |config|
    config.vm.define "master" do |master|
    master.vm.box = "centos/7"
    master.vm.hostname = 'master'
    master.vm.network "public_network", bridge: "enp0s25"
    end
    config.vm.define "slave" do |slave|
    slave.vm.box = "centos/7"
    slave.vm.hostname = 'slave'
    slave.vm.network "public_network", bridge: "enp0s25"
    end
    config.vm.define "swarmmaster" do |swarmmaster|
    swarmmaster.vm.box = "centos/7"
    swarmmaster.vm.hostname = 'swarmmaster'
    swarmmaster.vm.network "public_network", bridge: "enp0s25"
    end
    config.vm.define "swarmslave" do |swarmclient|
    swarmclient.vm.box = "centos/7"
    swarmclient.vm.hostname = 'swarmclient'
    swarmclient.vm.network "public_network", bridge: "enp0s25"
    end
    end
    

     

    Do not worry about the naming of the machines, you can change them to whatever you like, the main catch is to bridge the public network in all of them in order to be able to communicate with each other and also have access to the docker hub. Beside that everything else that i posted regarding the registry to the Rancher framework is still valid.

    Thank you for your time,

    Cheers!

  • Monitoring Kafka with DataDog

    Hi,

    A very interesting series of articles that should be checked regarding one option of Kafka monitoring with Datadog :

     https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/.

    I will have in the near future a task regarding this, will post the outcome when it’s done.

    P.S: It was done and you can find the implementation here :

    Integrate Kafka with Datadog monitoring using puppet

    As for the metrics point of view we will see if this is really an option, i have tried the same with Prometheus and Grafana and it seems to wok better. Keep you posted

    Cheers