Categories
cloud puppet

Install zookeeper using puppet without module

Hi,

In this post, I was given the task to provide a standalone zookeeper cluster with basic auth on the latest version.

The reason that happened is that we are using a very old module on our Kafka clusters and a new requirement appeared to install the latest version of 3.5.5.

The old module had only the possibility to install the package from apt repo, which was not an option since the last version available on Ubuntu Xenial is at least two years old.

To complete this task, a different method was required. I would have to grab it with wget and add the rest of the files to make it functional.

Let us start with the puppet manifest and from that, I will add the rest.

class zookeeperstd {
  $version = hiera("zookeeperstd::version","3.5.5")
  $authenabled = hiera("zookeeperstd::authenabled",false)
  $server_jvm_flags = hiera('zookeeperstd::jvm_flags', undef)
    group { 'zookeeper':
        ensure => 'present',
    } 
    user {'zookeeper':
        ensure => 'present',
        home => '/var/lib/zookeeper',
        shell => '/bin/false',
        }
    wget::fetch { 'zookeeper':
        source      => "https://www-eu.apache.org/dist/zookeeper/stable/apache-zookeeper-${version}-bin.tar.gz",
        destination => "/opt/apache-zookeeper-${version}-bin.tar.gz",
        } ->
    archive { "/opt/apache-zookeeper-${version}-bin.tar.gz":
        creates      => "/opt/apache-zookeeper-${version}-bin",
        ensure        => present,
        extract       => true,
        extract_path  => '/opt',
        cleanup       => true,
    } ->
    file { "/opt/apache-zookeeper-${version}-bin":
        ensure    => directory,
        owner     => 'zookeeper',
        group      => 'zookeeper',
        require     => [ User['zookeeper'], Group['zookeeper'], ],
        recurse => true,
    } ->
    file { '/opt/zookeeper/':
        ensure    => link,
        target    => "/opt/apache-zookeeper-${version}-bin",
        owner     => 'zookeeper',
        group      => 'zookeeper',
        require     => [ User['zookeeper'], Group['zookeeper'], ],
    }
    file { '/var/lib/zookeeper':
        ensure    => directory,
        owner     => 'zookeeper',
        group      => 'zookeeper',
        require     => [ User['zookeeper'], Group['zookeeper'], ],
        recurse    => true,
    }
# in order to know which servers are in the cluster a role fact needs to be defined on each machine
    $hostshash = query_nodes(" v1_role='zookeeperstd'").sort
    $hosts_hash = $hostshash.map |$value| { [$value, seeded_rand(254, $value)+1] }.hash
    $overide_hosts_hash = hiera_hash('profiles_opqs::kafka_hosts_hash', $hosts_hash)
    $overide_hosts = $overide_hosts_hash.keys.sort
    if $overide_hosts_hash.size() != $overide_hosts_hash.values.unique.size() {
        #notify {"Duplicate IDs detected! ${overide_hosts_hash}": }
        $overide_hosts_hash2 = $hosts.map |$index, $value| { [$value, $index+1] }.hash
  } else {
        $overide_hosts_hash2 = $overide_hosts_hash
    }
	$hosts = $overide_hosts_hash2
	$data_dir = "/var/lib/zookeeper"
	$tick_time        = 2000
        $init_limit       = 10
        $sync_limit       = 5

	$myid = $hosts[$::fqdn]
    file { '/var/lib/zookeeper/myid':
        content => "${myid}",
    }

	file { '/opt/zookeeper/conf/zoo.cfg':
        content => template("${module_name}/zoo.cfg.erb"),
   }
   if $authenabled {
   
    $superpass        = hiera("zookeeperstd::super_pass", 'super-admin')
    $zoopass          = hiera("zookeeperstd::zookeeper_pass", 'zookeeper-admin')
    $clientpass        = hiera("zookeeperstd::client_pass", 'client-admin')
    
    file { '/opt/zookeeper/conf/zoo_jaas.config':
        content => template("${module_name}/zoo_jaas.config.erb"),
   }
   }
     file { '/opt/zookeeper/conf/java.env':
        content => template("${module_name}/java.zookeeper.env.erb"),
        mode => "0755",
    }
     file { '/opt/zookeeper/conf/log4j.properties':
        content => template("${module_name}/log4j.zookeeper.properties.erb"),
    }
   
    file {'/etc/systemd/system/zookeeper.service':
        source  => 'puppet:///modules/work/zookeeper.service',
        mode => "644",
        } ->
    service { 'zookeeper':
        ensure   => running,
        enable   => true,
        provider => systemd,
        }
}

As far as I managed to adapt some file from the existing module, here are the rest of the additional details.

#zoo.cfg.erb
# Note: This file is managed by Puppet.

# http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html

# specify all zookeeper servers
# The fist port is used by followers to connect to the leader
# The second one is used for leader election
<%
if @hosts
# sort hosts by myid and output a server config
# for each host and myid.  (sort_by returns an array of key,value tuples)
@hosts.sort_by { |name, id| id }.each do |host_id|
-%>
server.<%= host_id[1] %>=<%= host_id[0] %>:2182:2183
<% if @authenabled -%>
authProvider.<%= host_id[1] %>=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
<% end -%>
<% end -%>
<% end -%>

# the port at which the clients will connect
clientPort=2181

# the directory where the snapshot is stored.
dataDir=<%= @data_dir %>

# Place the dataLogDir to a separate physical disc for better performance
<%= @data_log_dir ? "dataLogDir=#{data_log_dir}" : '# dataLogDir=/disk2/zookeeper' %>


# The number of milliseconds of each tick.
tickTime=<%= @tick_time %>

# The number of ticks that the initial
# synchronization phase can take.
initLimit=<%= @init_limit %>

# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=<%= @sync_limit %>

# To avoid seeks ZooKeeper allocates space in the transaction log file in
# blocks of preAllocSize kilobytes. The default block size is 64M. One reason
# for changing the size of the blocks is to reduce the block size if snapshots
# are taken more often. (Also, see snapCount).
#preAllocSize=65536

# Clients can submit requests faster than ZooKeeper can process them,
# especially if there are a lot of clients. To prevent ZooKeeper from running
# out of memory due to queued requests, ZooKeeper will throttle clients so that
# there is no more than globalOutstandingLimit outstanding requests in the
# system. The default limit is 1,000.ZooKeeper logs transactions to a
# transaction log. After snapCount transactions are written to a log file a
# snapshot is started and a new transaction log file is started. The default
# snapCount is 10,000.
#snapCount=1000

# If this option is defined, requests will be will logged to a trace file named
# traceFile.year.month.day.
#traceFile=

# Leader accepts client connections. Default value is "yes". The leader machine
# coordinates updates. For higher update throughput at thes slight expense of
# read throughput the leader can be configured to not accept clients and focus
# on coordination.
#leaderServes=yes

<% if @authenabled -%>

requireClientAuthScheme=sasl
quorum.auth.enableSasl=true
quorum.auth.learnerRequireSasl=true
quorum.auth.serverRequireSasl=true
quorum.auth.learner.loginContext=QuorumLearner
quorum.auth.server.loginContext=QuorumServer
quorum.cnxn.threads.size=20

<% end -%> 
#zoo_jaas.config
QuorumServer {
       org.apache.zookeeper.server.auth.DigestLoginModule required
       user_zookeeper="<%= @zoopass %>";
};
 
QuorumLearner {
       org.apache.zookeeper.server.auth.DigestLoginModule required
       username="zookeeper"
       password="<%= @zoopass %>";
};

Server {
       org.apache.zookeeper.server.auth.DigestLoginModule required
       user_super="<%= @superpass %>"
       user_client="<%= @clientpass %>";
};
#java.zookeeper.env.erb
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
SERVER_JVMFLAGS="<%= @server_jvm_flags %>"
#log4j.zookeeper.properties.erb
# Note: This file is managed by Puppet.

#
# ZooKeeper Logging Configuration
#

# Format is "<default threshold> (, <appender>)+

log4j.rootLogger=${zookeeper.root.logger}, ROLLINGFILE

#
# Log INFO level and above messages to the console
#
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Threshold=INFO
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n

#
# Add ROLLINGFILE to rootLogger to get log file output
#    Log INFO level and above messages to a log file
log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender
log4j.appender.ROLLINGFILE.Threshold=INFO
log4j.appender.ROLLINGFILE.File=${zookeeper.log.dir}/zookeeper.log

# Max log file size of 10MB
log4j.appender.ROLLINGFILE.MaxFileSize=10MB
# Keep only 10 files
log4j.appender.ROLLINGFILE.MaxBackupIndex=10
log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout
log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n

And the last but not the least.

[Unit]
Description=ZooKeeper Service
Documentation=http://zookeeper.apache.org
Requires=network.target
After=network.target

[Service]
Type=forking
User=zookeeper
Group=zookeeper
ExecStart=/opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/conf/zoo.cfg
ExecStop=/opt/zookeeper/bin/zkServer.sh stop /opt/zookeeper/conf/zoo.cfg
ExecReload=/opt/zookeeper/bin/zkServer.sh restart /opt/zookeeper/conf/zoo.cfg
WorkingDirectory=/var/lib/zookeeper

[Install]
WantedBy=default.target

Also, if you want to enable simple MD5 authentication, in hiera you will need to add the following two lines.

zookeeperstd::authenabled: true
zookeeperstd::jvm_flags: "-Djava.security.auth.login.config=/opt/zookeeper/conf/zoo_jaas.config"

If there is a simpler approach, feel free to leave me a message on Linkedin or Twitter.

Cheers

Categories
cloud kafka

Configure kafka truststore and keystore using puppet

Hi,

Since the last article was about the template needed to generate the truststore and keystore, now it’s time to give you the rest of the fragments for the deployment with puppet.
So, We will have the kafka_security gen class that was called on the last post and it should look like this in the final form:

class profiles::kafka_security_gen {
    $pass = hiera('profiles::kafka_security_gen::password','password')
    $keystorepass = hiera('profiles::kafka_security_gen::keystorepass','password')
    
    $cluster_servers = query_nodes("role='kafka'")
    $servers = $cluster_servers.delete("${::fqdn}")
    
    file {'/home/kafka/security.sh':
        owner => kafka,
        group => kafka,
        ensure => file,
        mode => '0755',
        content => template("${module_name}/security.sh.erb"),
        replace => 'no'
} ->
    exec {'generate_keystore':
    path   => '/usr/bin:/usr/sbin:/bin',
    command => '/home/kafka/security.sh',
    user => 'kafka',
    unless => 'test -f /home/kafka/kafka.server.keystore.jks'
    }
    $servers.each |String $server| {
        exec{"copy files to ${server}":
            cwd => '/home/kafka',
            path   => '/usr/bin:/usr/sbin:/bin',
            command => "scp /home/kafka/kafka* kafka@${server}:/home/kafka",
            user => 'kafka',
        }
        
     }
    ssh_keygen {'kafka':
    home => '/home/kafka/',
    type => rsa,
  }
  @@file {"/home/kafka/.ssh/authorized_keys":
    ensure => present,
    mode => '0600',
    owner => 'kafka',
    group => 'kafka',
    content => "${::sharedkey}",
    tag => "${::instance_tag}",
  }
  
}

The only thing i did not manage to fix is the copying of the stores, it will copy them each time, but i will fix that in the near future with some custom facts.
Ok, now let’s see the integration part with kafka code. Unfortunately, i can not provide you all the code but as far as i can provide you is:

    
$kafkadirs = ['/home/kafka/', '/home/kafka/.ssh/']
file { $kafkadirs: 
    ensure => directory,
    mode => '0700',
    owner => 'kafka',
    group => 'kafka'
    }
    $security_enabled = hiera('profiles::kafka::security',false)
if ($security_enabled == false) {
  $broker_config = {
    ....
  }
  } else {
    # if hostname starts with kafka0 them it's the CA auth (in our case kafka servers are on form kafka0....)
    if ($hostname == 'kafka0') {
    contain 'profiles::kafka_security_gen'
    Class['profiles::kafka_security_gen']
    }
    #this shares the public key on a couple of servers wich are isolated by a tag
    File <<| tag == "${::instance_tag}" |>>
    #path for keystore/truststore
    $keystore_location = '/home/kafka/kafka.server.keystore.jks'
    $truststore_location = '/home/kafka/kafka.server.truststore.jks'
    #broker config
    $broker_config = {
    ....
    'security.inter.broker.protocol' => 'SSL',
    'ssl.client.auth'               => 'none',
    'ssl.enabled.protocols'         => ['TLSv1.2', 'TLSv1.1', 'TLSv1'],
    'ssl.key.password'              => hiera('profiles::kafka_security_gen::password','password'),
    'ssl.keystore.location'         => $keystore_location,
    'ssl.keystore.password'         => hiera('profiles::kafka_security_gen::keystorepass','password'),
    'ssl.keystore.type'             => 'JKS',
    'ssl.protocol'                  => 'TLS',
    'ssl.truststore.location'       => $truststore_location,
    'ssl.truststore.password'       => hiera('profiles::kafka_security_gen::keystorepass','password'),
    'ssl.truststore.type'           => 'JKS',
    'listeners'                     => "PLAINTEXT://${::fqdn}:9092,SSL://${::fqdn}:9093", #both listeners are enabled
    'advertised.listeners'          => "PLAINTEXT://${::fqdn}:9092,SSL://${::fqdn}:9093",
    }
  }
class { '::kafka::broker':
    config    => $broker_config,
    opts      => hiera('profiles::kafka::jolokia', '-javaagent:/usr/share/java/jolokia-jvm-agent.jar'),
    heap_opts => "-Xmx${jvm_heap_size}M -Xms${jvm_heap_size}M",
  }

This can be easily activated by putting in the hierarchy the profiles::kafka::security option with true and it should do all the work. Once it’s done it can be tested using openssl s_client -debug -connect localhost:9093 -tls1

The link to the generate article for the security script template is http://log-it.tech/2017/07/23/generate-kafka-keytrust-store-tls-activation/ and also a quite interesting guideline fron Confluence http://docs.confluent.io/current/kafka/ssl.html#configuring-kafka-brokers

That’s all folks. šŸ™‚

Cheers

Categories
cloud kafka newtools

Integrate Kafka with Datadog monitoring using puppet

Hi,

Since i was in debt with an article on how to integate Kafka monitoring using Datadog, let me tell you a couple of things about this topic. First of all, we are taking the same config of Kafka with Jolokia that was describe in following article. From the install of the brokers on our infrastructure, JMX data is published on port 9990 (this will be needed in the datadog config).

The files you need to create for this task are as follows:

datadogagent.pp

class profiles::datadogagent {
  $_check_api_key = hiera('datadog_agent::api_key')

  contain 'datadog_agent'
  contain 'profiles::datadogagent_config_kafka'
  contain 'datadog_agent::integrations::zk'

  Class['datadog_agent'] -> Class['profiles::datadog_agent_config_kafka']
  Class['datadog_agent'] -> Class['datadog_agent::integrations::zk']
}

datadogagent_config_kafka.pp

class profiles::datadogagent_config_kafka (
$servers = [{'host' => 'localhost', 'port' => '9990'}]
) inherits datadog_agent::params {
  include datadog_agent

  validate_array($servers)

  file { "${datadog_agent::params::conf_dir}/kafka.yaml":
    ensure  => file,
    owner   => $datadog_agent::params::dd_user,
    group   => $datadog_agent::params::dd_group,
    mode    => '0600',
    content => template("${module_name}/kafka.yaml.erb"),
    require => Package[$datadog_agent::params::package_name],
    notify  => Service[$datadog_agent::params::service_name],
  }
}

And since, there isn’t yet an integration by default for the kafka on the datadog module which you can find it here:

https://github.com/DataDog/puppet-datadog-agent

i created in the templates directory the following file:

kafka.yaml.erb (as you can see from the header this is actually the template given by datadog for kafka integration with specific host and port)

##########
# WARNING
##########
# This sample works only for Kafka >= 0.8.2.
# If you are running a version older than that, you can refer to agent 5.2.x released
# sample files, https://raw.githubusercontent.com/DataDog/dd-agent/5.2.1/conf.d/kafka.yaml.example

instances:
<% @servers.each do |server| -%>
  - host: <%= server['host'] %>
    port: <%= server['port'] %> # This is the JMX port on which Kafka exposes its metrics (usually 9999)
    tags:
      kafka: broker

init_config:
  is_jmx: true

  # Metrics collected by this check. You should not have to modify this.
  conf:
    # v0.8.2.x Producers
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=ProducerRequestMetrics,name=ProducerRequestRateAndTimeMs,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.producer.request_rate
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=ProducerRequestMetrics,name=ProducerRequestRateAndTimeMs,clientId=.*'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.producer.request_latency_avg
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=ProducerTopicMetrics,name=BytesPerSec,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.producer.bytes_out
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=ProducerTopicMetrics,name=MessagesPerSec,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.producer.message_rate
    # v0.8.2.x Consumers
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=.*'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.consumer.max_lag
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=ConsumerFetcherManager,name=MinFetchRate,clientId=.*'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.consumer.fetch_rate
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=ConsumerTopicMetrics,name=BytesPerSec,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.consumer.bytes_in
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=ConsumerTopicMetrics,name=MessagesPerSec,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.consumer.messages_in

    # Offsets committed to ZooKeeper
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=ZookeeperConsumerConnector,name=ZooKeeperCommitsPerSec,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.consumer.zookeeper_commits
    # Offsets committed to Kafka
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=ZookeeperConsumerConnector,name=KafkaCommitsPerSec,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.consumer.kafka_commits
    # v0.9.0.x Producers
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
        attribute:
          response-rate:
            metric_type: gauge
            alias: kafka.producer.response_rate
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
        attribute:
          request-rate:
            metric_type: gauge
            alias: kafka.producer.request_rate
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
        attribute:
          request-latency-avg:
            metric_type: gauge
            alias: kafka.producer.request_latency_avg
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
        attribute:
          outgoing-byte-rate:
            metric_type: gauge
            alias: kafka.producer.bytes_out
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
        attribute:
          io-wait-time-ns-avg:
            metric_type: gauge
            alias: kafka.producer.io_wait

    # v0.9.0.x Consumers
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=consumer-fetch-manager-metrics,client-id=.*'
        attribute:
          bytes-consumed-rate:
            metric_type: gauge
            alias: kafka.consumer.bytes_in
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=consumer-fetch-manager-metrics,client-id=.*'
        attribute:
          records-consumed-rate:
            metric_type: gauge
            alias: kafka.consumer.messages_in
    #
    # Aggregate cluster stats
    #
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.net.bytes_out.rate
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.net.bytes_in.rate
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.messages_in.rate
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.net.bytes_rejected.rate

    #
    # Request timings
    #
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.fetch.failed.rate
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.produce.failed.rate
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.produce.rate
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.request.produce.time.avg
          99thPercentile:
            metric_type: gauge
            alias: kafka.request.produce.time.99percentile
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumer'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.fetch_consumer.rate
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.fetch_follower.rate
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.request.fetch_consumer.time.avg
          99thPercentile:
            metric_type: gauge
            alias: kafka.request.fetch_consumer.time.99percentile
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.request.fetch_follower.time.avg
          99thPercentile:
            metric_type: gauge
            alias: kafka.request.fetch_follower.time.99percentile
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.request.update_metadata.time.avg
          99thPercentile:
            metric_type: gauge
            alias: kafka.request.update_metadata.time.99percentile
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Metadata'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.request.metadata.time.avg
          99thPercentile:
            metric_type: gauge
            alias: kafka.request.metadata.time.99percentile
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Offsets'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.request.offsets.time.avg
          99thPercentile:
            metric_type: gauge
            alias: kafka.request.offsets.time.99percentile
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.handler.avg.idle.pct.rate
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ProducerRequestPurgatory,name=PurgatorySize'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.request.producer_request_purgatory.size
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=FetchRequestPurgatory,name=PurgatorySize'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.request.fetch_request_purgatory.size

    #
    # Replication stats
    #
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.replication.under_replicated_partitions
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ReplicaManager,name=IsrShrinksPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.replication.isr_shrinks.rate
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ReplicaManager,name=IsrExpandsPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.replication.isr_expands.rate
    - include:
        domain: 'kafka.controller'
        bean: 'kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.replication.leader_elections.rate
    - include:
        domain: 'kafka.controller'
        bean: 'kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.replication.unclean_leader_elections.rate
    - include:
        domain: 'kafka.controller'
        bean: 'kafka.controller:type=KafkaController,name=OfflinePartitionsCount'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.replication.offline_partitions_count
    - include:
        domain: 'kafka.controller'
        bean: 'kafka.controller:type=KafkaController,name=ActiveControllerCount'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.replication.active_controller_count
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ReplicaManager,name=PartitionCount'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.replication.partition_count
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ReplicaManager,name=LeaderCount'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.replication.leader_count
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.replication.max_lag

    #
    # Log flush stats
    #
    - include:
        domain: 'kafka.log'
        bean: 'kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.log.flush_rate.rate

<% end -%>

To integrate all of this node, you need to add in your fqdn.yaml the class in the format:

---
classes:
 - profiles::datadogagent

datadog_agent::api_key: [your key]

After this runs, datadog-agent is installed and you can check it by usingĀ ps -ef | grep datadog-agentĀ and also if you like to take a look and you should do that, you will find that there are two new files added toĀ /etc/dd-agent/conf.d called kafka.yaml and zk.yaml.

You are done, please feel free to login to the datadog portal and check you host.

Cheers

Categories
cloud docker linux

Sysdig container isolation case debugged on kubernetes

Hi,

I didn’t get to actual test anything related to this but i managed to find a very interesting article that might be lost if you are not a sysdig fan. You can find it at following linkĀ https://sysdig.com/blog/container-isolation-gone-wrong/

To put into perspective, this tool is used for some very interesting debugging situation, i have played with it some a short period of time and i think i will put in on my list so that i can show you what it can do.

Cheers

Categories
cloud newtools

Small Vagrant config file for Rancher deploy

Hi,

Just wanted to post this also, if it’s not that nice the config using a jumpserver, surely we can convert that to code (Puppet/Ansible), you can also use Vagrant. The main issue that i faced when i tried to create my setup is that for a reason (not really sure why, Vagrant on Windows runs very slow). However, i chose to give you one piece of Vagrantfile for a minimal setup on which you can grab the Rancher server framework and also the client containers.

Here is it:

# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
config.vm.define "master" do |master|
master.vm.box = "centos/7"
master.vm.hostname = 'master'
master.vm.network "public_network", bridge: "enp0s25"
end
config.vm.define "slave" do |slave|
slave.vm.box = "centos/7"
slave.vm.hostname = 'slave'
slave.vm.network "public_network", bridge: "enp0s25"
end
config.vm.define "swarmmaster" do |swarmmaster|
swarmmaster.vm.box = "centos/7"
swarmmaster.vm.hostname = 'swarmmaster'
swarmmaster.vm.network "public_network", bridge: "enp0s25"
end
config.vm.define "swarmslave" do |swarmclient|
swarmclient.vm.box = "centos/7"
swarmclient.vm.hostname = 'swarmclient'
swarmclient.vm.network "public_network", bridge: "enp0s25"
end
end

 

Do not worry about the naming of the machines, you can change them to whatever you like, the main catch is to bridge the public network in all of them in order to be able to communicate with each other and also have access to the docker hub. Beside that everything else that i posted regarding the registry to the Rancher framework is still valid.

Thank you for your time,

Cheers!

Categories
cloud kafka

Monitoring Kafka with DataDog

Hi,

A very interesting series of articles that should be checked regarding one option of Kafka monitoring with Datadog :

Ā https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/.

I will have in the near future a task regarding this, will post the outcome when it’s done.

P.S: It was done and you can find the implementation here :

Integrate Kafka with Datadog monitoring using puppet

As for the metrics point of view we will see if this is really an option, i have tried the same with Prometheus and Grafana and it seems to wok better. Keep you posted

Cheers