Categories
docker newtools

Command to start sysdig container – redundant but useful

Hi,

This is more like a easier way to find the command without searching the net:

docker run -it --rm --name=sysdig --privileged=true \
   --volume=/var/run/docker.sock:/host/var/run/docker.sock \
   --volume=/dev:/host/dev \
   --volume=/proc:/host/proc:ro \
   --volume=/boot:/host/boot:ro \
   --volume=/lib/modules:/host/lib/modules:ro \
   --volume=/usr:/host/usr:ro \
   sysdig/sysdig

The actual command on starting a sysdig container. I will get more in depth with some Kafka cluster aggregated info from this amazing tool and also what it takes to send it to an elastic cluster.
It will be challenging, but this is how it goes in IT in our days.

Cheers

Categories
kafka newtools

Consumer group coordinator in Kafka using some scala script

Morning,

Just a small post regarding returning a consumer group coordinator for a specific consumer group.
We had the issue that consumer groups are re-balancing and we didn’t knew if it’s related to application logic or the Consumer Group Coordinator was changing and the Kafka cluster was reassign a different one each time. So, a small piece of code was needed. I was using the libraries that are sent with Kafka 1.0.0 for this test so be aware of the classpath update if you want to modify this.

In order to do the test, i started a standalone Confluent Kafka image which normally listens on port 29092. For more details please consult their documentation here

I also created a test topic with one partition and same replication factor. Produced some messages in the topic and after that started a console consumer:

sorin@debian-test:~/kafka_2.11-1.0.0/bin$ ./kafka-console-consumer.sh --bootstrap-server localhost:29092 --topic test --from-beginning
test
message
test message

Once this is started you can also see it using consumer-groups command like this:

sorin@debian-test:~/kafka_2.11-1.0.0/bin$ ./kafka-consumer-groups.sh --bootstrap-server localhost:29092 --list
Note: This will not show information about old Zookeeper-based consumers.

console-consumer-49198
console-consumer-66063
console-consumer-77631

Now my console consumer is identified by console-consumer-77631 and in order to see the group coordinator you will have to run something like:

./getconsumercoordinator.sh localhost 29092 console-consumer-77631
warning: there were three deprecation warnings; re-run with -deprecation for details
one warning found
Creating connection to: localhost 29092 
log4j:WARN No appenders could be found for logger (kafka.network.BlockingChannel).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Channel connected
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
GroupCoordinatorResponse(Some(BrokerEndPoint(1001,localhost,29092)),NONE,0)

It’s clear that since we have only one broker, that is also the coordinator.
Regarding the details for the code i used this link and also, in order to search for all of the dependencies, since i don’t have a scala project, just a script the following command was of great use

for i in *.jar; do jar -tvf "$i" | grep -Hsi ClassName && echo "$i"; done

Here is also the code:

#!/bin/sh
exec scala -classpath "/home/sorin/kafka_2.11-1.0.0/libs/kafka-clients-1.0.0.jar:/home/sorin/kafka_2.11-1.0.0/libs/kafka_2.11-1.0.0.jar:/home/sorin/kafka_2.11-1.0.0/libs/slf4j-api-1.7.25.jar:/home/sorin/kafka_2.11-1.0.0/libs/jackson-core-2.9.1.jar:/home/sorin/kafka_2.11-1.0.0/libs/jackson-databind-2.9.1.jar:/home/sorin/kafka_2.11-1.0.0/libs/jackson-annotations-2.9.1.jar:/home/sorin/kafka_2.11-1.0.0/libs/log4j-1.2.17.jar" "$0" "$1" "$2" "$@"
!#

import com.fasterxml.jackson.core._
import com.fasterxml.jackson.databind.ObjectMapper
import kafka.network.BlockingChannel
import kafka.api.GroupCoordinatorRequest
import kafka.api.GroupCoordinatorResponse
import org.slf4j.LoggerFactory

val hostname = args(0)
val port = args(1).toInt
val group = args(2)
println("Creating connection to: " + hostname + " " + port + " ")

var channel = new BlockingChannel(hostname, port, 1048576, 1048576, readTimeoutMs = 50000)
channel.connect()
if (channel.isConnected) {
  println("Channel connected")
channel.send(GroupCoordinatorRequest(group))
val metadataResponse = GroupCoordinatorResponse.readFrom(channel.receive.payload())
println(metadataResponse) }
channel.disconnect()

Regarding the code, the first part is to run scala from shell script, you need to update the lasspath with all libraries and also specify how many parameters to be used. In our case this is three. Also, if you won’t add all of the jackson, log4j and slf4j dependencies, it won’t work.

P.S: It will work also by running exec scala -classpath "/home/sorin/kafka_2.11-1.0.0/libs/*

Cheers!

Categories
kafka newtools

Apache Kafka technical logs to ELK stack

Morning,

Just wanted to share with you the following article:

https://www.elastic.co/blog/monitoring-kafka-with-elastic-stack-1-filebeat

I will try to share my experience once i have all i need to start working on it. As far as i understood from managing Kafka for a while, monitoring is not enough, this is mandatory in order to manage you clusters in a clear and transparent mattter.

Cheers

Categories
docker linux newtools

List differences between two sftp hosts using golang

Hi,

Just as a intermediate post as i wanted to play a little bit with golang, let me show you what i managed to put together in some days. I created a virtual machine on which i installed docker and grabbed a sftp image. You can try first two from Docker Hub, it should work.
So i pulled this image and initiated two containers as shown below:

eaf3b93798b5        asavartzeth/sftp    "/entrypoint.sh /u..."   21 hours ago        Up About a minute         0.0.0.0:2225->22/tcp   server4
ec7d7e1d029f        asavartzeth/sftp    "/entrypoint.sh /u..."   21 hours ago        Up About a minute         0.0.0.0:2224->22/tcp   server3

The command to do this looks like:

docker run --name server3 -v /home/sorin/sftp1:/chroot/sorin:rw -e SFTP_USER=sorin -e SFTP_PASS=pass -p 2224:22 -d asavartzeth/sftp
docker run --name server4 -v /home/sorin/sftp2:/chroot/sorin:rw -e SFTP_USER=sorin -e SFTP_PASS=pass -p 2225:22 -d asavartzeth/sftp

Main info to know about these containers is that they should be accessible by user sorin and the path were the external directories are mapped is on /chroot/sorin.

You can manually test the connection by using a simple command like:

sftp -P 2224 sorin@localhost

If you are using the container ip address i observed that you will use the default 22 port to connect to them. Not really clear why but this is not about that.

Once the servers are up and running you can test the differences between the structure using following code:


package main

import (
	"fmt"

	"github.com/pkg/sftp"
	"golang.org/x/crypto/ssh"
)

type ServerFiles struct {
	Name  string
	files []string
}

func main() {

	server1client := ConnectSftp("localhost:2224", "sorin", "pass")
	server1files := ReadPath(server1client)
	server1struct := BuildStruct("172.17.0.2", server1files)
	server2client := ConnectSftp("localhost:2225", "sorin", "pass")
	server2files := ReadPath(server2client)
	server2struct := BuildStruct("172.17.0.3", server2files)
	diffilesstruct := CompareStruct(server1struct, server2struct)
        for _, values := range diffilestruct.files {
        fmt.Printf("%s %s\n", diffilesstruct.Name, values)
 }
	CloseConnection(server1client)
	CloseConnection(server2client)
}
func CheckError(err error) {
	if err != nil {
		panic(err)
	}
}
func ConnectSftp(address string, user string, password string) *sftp.Client {
	config := &ssh.ClientConfig{
		User: user,
		Auth: []ssh.AuthMethod{
			ssh.Password(password),
		},
		HostKeyCallback: ssh.InsecureIgnoreHostKey(),
	}
	conn, err := ssh.Dial("tcp", address, config)
	CheckError(err)

	client, err := sftp.NewClient(conn)
	CheckError(err)

	return client
}
func ReadPath(client *sftp.Client) []string {
	var paths []string
	w := client.Walk("/")
	for w.Step() {
		if w.Err() != nil {
			continue
		}
		
		paths = append(paths, w.Path())
	}
	return paths
}
func BuildStruct(address string, files []string) *ServerFiles {
	server := new(ServerFiles)
	server.Name = address
	server.files = files

	return server
}
func CompareStruct(struct1 *ServerFiles, struct2 *ServerFiles) *ServerFiles {

	diff := difference(struct1.files, struct2.files)
	diffstruct := new(ServerFiles)
	for _, value := range diff {
		for _, valueP := range struct1.files {
			if valueP == value {
				
				diffstruct.Name = struct1.Name
				diffstruct.files = append(diffstruct.files, valueP)
			}
		}
		for _, valueQ := range struct2.files {
			if valueQ == value {
				
				diffstruct.Name = struct2.Name
				diffstruct.files = append(diffstruct.files, valueQ)
			}
		}
	}
	return diffstruct
}
func difference(slice1 []string, slice2 []string) []string {
	var diff []string

	// Loop two times, first to find slice1 strings not in slice2,
	// second loop to find slice2 strings not in slice1
	for i := 0; i < 2; i++ {
		for _, s1 := range slice1 {
			found := false
			for _, s2 := range slice2 {
				if s1 == s2 {
					found = true
					break
				}
			}
			// String not found. We add it to return slice
			if !found {
				diff = append(diff, s1)
			}
		}
		// Swap the slices, only if it was the first loop
		if i == 0 {
			slice1, slice2 = slice2, slice1
		}
	}

	return diff
}
func CloseConnection(client *sftp.Client) {
	client.Close()
}

This actually connects to each server, reads the hole filepath and puts it on a structure. After this is done for both servers, there is a method that compares only the slice part of the struct and returns the differences. On this differences there is another structure constructed with only the differences.
It is true that i took the differences func from stackoverflow, and it's far from good code, but i am working on it, this is the first draft, i will post different versions as it gets better.

The output if there are differences will look like this:

172.17.0.2 /sorin/subdirectory
172.17.0.2 /sorin/subdirectory/subtest.file
172.17.0.2 /sorin/test.file
172.17.0.3 /sorin/test2

If there are no differences that it will just exit.
Working on improving my golang experience. Keep you posted.

Cheers!

Categories
kafka newtools

Memory debug by Heroku guys on Apache Kafka – nice one

Hi,

I know, i should write more about my experience with Apache Kafka, have patience, it’s still building, but until then please check this article:

https://blog.heroku.com/fixing-kafka-memory-leak

Be aware of the things that you want to include in functionalities and code that is written beside Apache Kafka functionalities, it might get you in to trouble.

I am very happy that sysdig is used by more and more teams for debug, it’s truly a great tool for this kind of situations.

Cheers!

Categories
linux newtools

Configure Jupyter Notebook on Raspberry PI 2 for remote access and scala kernel install

Hi,

This is a continuation of the previous article regarding Jupyter Notebook (http://log-it.tech/2017/09/02/installing-jupyter-notebook-raspberry-pi-2/) Let’s start with my modification in order to have an remote connection to it. It first needs a password in the form of password hash. To generate this pass run python cli and execute this code from IPython.lib import passwd;passwd(“your_custom_password”). Once you get the password hash, we can list the fields that i uncommented to activate minimal remote access:

c.NotebookApp.open_browser = False #do not open a browser on notebook start, you will access it by daemon remotely
c.NotebookApp.ip = '*' #permite access on every interface of the server
c.NotebookApp.password = u'[your_pass_has]' #setup password in order to access the notebook, otherwise token from server is required (if you want it this way you can get the token by running sudo systemctl status jupyter.service 

You can also add them at the bottom of the file as well. In order for the changes to take effect you will need also to perform a service restart with sudo systemctl restart jupyter.service.

You have now the basic steps to run Jupyter Notebook with the IPython 2 kernel. Now lets’s ger to the next step of installing the scala kernel(https://www.scala-lang.org).

The steps are pretty straight forward and they are taken from this link https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781785884870/9/ch09lvl1sec65/installing-the-scala-kernel , what i tried is to put it end to end. I am not 100% sure if you need also java 8 but i installed it anyway, you will find the steps here https://www.raspinews.com/installing-oracle-java-jdk-8-on-raspberry-pi/ but what you really need to install is sbt.

The catch here is that you don’t need to search for sbt on raspberry, just drop the default one, it will do the job. The steps are listed here http://www.scala-sbt.org/release/docs/Installing-sbt-on-Linux.html. Once it is installed you can return to the link listed above and just run the steps:

apt-get install git
git clone https://github.com/alexarchambault/jupyter-scala.git
cd jupyter-scala
sbt cli/packArchive

Sbt will grab a lot of dependences, if you work with proxies i am not aware of the settings that you need to do, but you can search it and probably you find a solution. Have patience, it will take a while until it is done, but once it is done you can run ./jupyter-scala in order to install the kernel and also check if it works with jupyter kernelspec list.

Restart the Jupyter Notebook to update it, although i am not convinced if it’s necessary šŸ™‚
In my case i have a dynamic dns service from my internet provider but i think you can do it with a free dns provider on your router as well. An extra forward or NAT of port 8888 will be needed but once this is done you should have a playgroup in your browser that knows python and scala. Cool, isn’t it?

Cheers

Categories
linux newtools

Installing Jupyter Notebook on Raspberry PI 2

Morning,

Just want to share you that i managed to install the Jupyter Notebook(http://jupyter.org) on a Raspberry PI 2 without any real problems. Beside a microSD card and a Raspberry you need to read this and that would be all.
So, you will need a image of Raspbian from https://www.raspberrypi.org/downloads/raspbian/ (i selected the lite version without the GUI, you really don’t need that actually). In installed it on the card with Linux so i executed a command similar with dd if=[path_to_image]/[image_name] of=[sd_device_name taken from fdisk -l without partition id usually /dev/mmcblk0] bs=4MB; sync. The sync command is added just to be sure that all files are syncronized to card before remove it. We have now a working image that we can use on raspberry, it’s fair to try boot it.
Once it’s booted login with user pi and password raspberry. I am a fan of running the resize steps which you can find here https://coderwall.com/p/mhj8jw/raspbian-how-to-resize-the-root-partition-to-fill-sd-card.
Ok, so we are good to go on installing Jupyter Notebook, at first we need to check what Python version we have installed and in my case it was 2.7.13 (it should be shown by running python –version). In this case then we need to use pip for this task, and it’s not present by default on the image.
Run sudo apt-get install python-pip, after this is done please run pip install jupyter. It will take some time, but when it is done you will have a fresh installation in pi homedir(/home/pi/.local).
It is true that we need also a service, and in order to do that, please create following path with following file:
/usr/lib/systemd/system/jupyter.service

[Unit]
Description=Jupyter Notebook

[Service]
Type=simple
PIDFile=/run/jupyter.pid
# Step 1 and Step 2 details are here..
# ------------------------------------
ExecStart=/home/pi/.local/bin/jupyter-notebook --config=/home/pi/.jupyter/jupyter_notebook_config.py
User=pi
Group=pi
WorkingDirectory=/home/pi/notebooks
Restart=always
RestartSec=10
#KillMode=mixed

[Install]
WantedBy=multi-user.target

You are probably wondering from where do you get the config file. This will be easy, just run /home/pi/.local/bin/jupyter notebook –generate-config

After the file is created, in order to activate the service and enable it there are sudo systemctl enable jupyter.service and sudo systemctl start jupyter.service

You have now a fresh and auto managed jupyter service. It will be started only on the localhost by default, but in the next article i will tell you also the modifications to be executed in order to run it remotely and also install scala kernel.

Cheers!

Categories
kafka newtools

Balancing requests to kafka-manager using traefik

Hi,

Just wanted to share with you a quite small and simple config to balance the traffic between three machines that have kafka-manager installed. For this i used traefik since it was new to me and i wanted to gain a little bit of experience with it.

It’s an interesting solution but it took me a while to get the pieces working. I will post here my config and will explain the needed part to get it working.

logLevel = "DEBUG"
defaultEntryPoints = ["http"]
[entryPoints]
  [entryPoints.http]
  address = ":80"
[web]
address = ":8080"

[file]
watch = true

[backends]
  [backends.backend1]
    [backends.backend1.LoadBalancer]
      method = "drr"
    [backends.backend1.servers.server1]
    url = "http://[kafka1.hostname]:9000"
    weight = 1
    [backends.backend1.servers.server2]
    url = "http://[kafka2.hostname]:9000"
    weight = 2
    [backends.backend1.servers.server3]
    url = "http://[kafka3.hostname]:9000"
    weight = 1
[frontends]
  [frontends.frontend1]
  entrypoint = ["http"]
  backend = "backend1"
  passHostHeader = true
  priority = 10

This is very basic as you can see but it took me a while to understand that you need the file block with watch = true in order for the daemon to see and parse the rules that are listed. You can also have a separate rules file and for that it would be best to consult the traefik documentation.

I will have to do now the redirect from HTTP to HTTPS in order to secure the connection to frontend. The idea of traefik is that it works like entrypoint -> frontend -> backend and as far as i saw this will be done on the entrypoint level.

Two extra additions is that you need a default entry point in order for your frontend not to be ignored and also put it on log level DEBUG because otherwise it won’t log much.

Keep you posted on the progress and also you can find traefik hereĀ https://docs.traefik.io

Cheers!

Categories
newtools

Jupyter Notebook – very very interesting tool

Hi,

As i was taking a look on the Docker newsletter beside Moby and other articles related to that i found this interesting tool and also tutorial/presentation:

Beside that you can find the official site here:Ā http://jupyter.org

This caught my attention and i will certainly try this on a machine. I am pretty curios since i believe this is used to power the Wolfram Notebook.

Cheers!

Categories
cloud kafka newtools

Integrate Kafka with Datadog monitoring using puppet

Hi,

Since i was in debt with an article on how to integate Kafka monitoring using Datadog, let me tell you a couple of things about this topic. First of all, we are taking the same config of Kafka with Jolokia that was describe in following article. From the install of the brokers on our infrastructure, JMX data is published on port 9990 (this will be needed in the datadog config).

The files you need to create for this task are as follows:

datadogagent.pp

class profiles::datadogagent {
  $_check_api_key = hiera('datadog_agent::api_key')

  contain 'datadog_agent'
  contain 'profiles::datadogagent_config_kafka'
  contain 'datadog_agent::integrations::zk'

  Class['datadog_agent'] -> Class['profiles::datadog_agent_config_kafka']
  Class['datadog_agent'] -> Class['datadog_agent::integrations::zk']
}

datadogagent_config_kafka.pp

class profiles::datadogagent_config_kafka (
$servers = [{'host' => 'localhost', 'port' => '9990'}]
) inherits datadog_agent::params {
  include datadog_agent

  validate_array($servers)

  file { "${datadog_agent::params::conf_dir}/kafka.yaml":
    ensure  => file,
    owner   => $datadog_agent::params::dd_user,
    group   => $datadog_agent::params::dd_group,
    mode    => '0600',
    content => template("${module_name}/kafka.yaml.erb"),
    require => Package[$datadog_agent::params::package_name],
    notify  => Service[$datadog_agent::params::service_name],
  }
}

And since, there isn’t yet an integration by default for the kafka on the datadog module which you can find it here:

https://github.com/DataDog/puppet-datadog-agent

i created in the templates directory the following file:

kafka.yaml.erb (as you can see from the header this is actually the template given by datadog for kafka integration with specific host and port)

##########
# WARNING
##########
# This sample works only for Kafka >= 0.8.2.
# If you are running a version older than that, you can refer to agent 5.2.x released
# sample files, https://raw.githubusercontent.com/DataDog/dd-agent/5.2.1/conf.d/kafka.yaml.example

instances:
<% @servers.each do |server| -%>
  - host: <%= server['host'] %>
    port: <%= server['port'] %> # This is the JMX port on which Kafka exposes its metrics (usually 9999)
    tags:
      kafka: broker

init_config:
  is_jmx: true

  # Metrics collected by this check. You should not have to modify this.
  conf:
    # v0.8.2.x Producers
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=ProducerRequestMetrics,name=ProducerRequestRateAndTimeMs,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.producer.request_rate
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=ProducerRequestMetrics,name=ProducerRequestRateAndTimeMs,clientId=.*'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.producer.request_latency_avg
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=ProducerTopicMetrics,name=BytesPerSec,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.producer.bytes_out
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=ProducerTopicMetrics,name=MessagesPerSec,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.producer.message_rate
    # v0.8.2.x Consumers
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=.*'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.consumer.max_lag
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=ConsumerFetcherManager,name=MinFetchRate,clientId=.*'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.consumer.fetch_rate
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=ConsumerTopicMetrics,name=BytesPerSec,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.consumer.bytes_in
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=ConsumerTopicMetrics,name=MessagesPerSec,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.consumer.messages_in

    # Offsets committed to ZooKeeper
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=ZookeeperConsumerConnector,name=ZooKeeperCommitsPerSec,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.consumer.zookeeper_commits
    # Offsets committed to Kafka
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=ZookeeperConsumerConnector,name=KafkaCommitsPerSec,clientId=.*'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.consumer.kafka_commits
    # v0.9.0.x Producers
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
        attribute:
          response-rate:
            metric_type: gauge
            alias: kafka.producer.response_rate
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
        attribute:
          request-rate:
            metric_type: gauge
            alias: kafka.producer.request_rate
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
        attribute:
          request-latency-avg:
            metric_type: gauge
            alias: kafka.producer.request_latency_avg
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
        attribute:
          outgoing-byte-rate:
            metric_type: gauge
            alias: kafka.producer.bytes_out
    - include:
        domain: 'kafka.producer'
        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
        attribute:
          io-wait-time-ns-avg:
            metric_type: gauge
            alias: kafka.producer.io_wait

    # v0.9.0.x Consumers
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=consumer-fetch-manager-metrics,client-id=.*'
        attribute:
          bytes-consumed-rate:
            metric_type: gauge
            alias: kafka.consumer.bytes_in
    - include:
        domain: 'kafka.consumer'
        bean_regex: 'kafka\.consumer:type=consumer-fetch-manager-metrics,client-id=.*'
        attribute:
          records-consumed-rate:
            metric_type: gauge
            alias: kafka.consumer.messages_in
    #
    # Aggregate cluster stats
    #
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.net.bytes_out.rate
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.net.bytes_in.rate
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.messages_in.rate
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.net.bytes_rejected.rate

    #
    # Request timings
    #
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.fetch.failed.rate
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.produce.failed.rate
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.produce.rate
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.request.produce.time.avg
          99thPercentile:
            metric_type: gauge
            alias: kafka.request.produce.time.99percentile
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumer'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.fetch_consumer.rate
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.fetch_follower.rate
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.request.fetch_consumer.time.avg
          99thPercentile:
            metric_type: gauge
            alias: kafka.request.fetch_consumer.time.99percentile
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.request.fetch_follower.time.avg
          99thPercentile:
            metric_type: gauge
            alias: kafka.request.fetch_follower.time.99percentile
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.request.update_metadata.time.avg
          99thPercentile:
            metric_type: gauge
            alias: kafka.request.update_metadata.time.99percentile
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Metadata'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.request.metadata.time.avg
          99thPercentile:
            metric_type: gauge
            alias: kafka.request.metadata.time.99percentile
    - include:
        domain: 'kafka.network'
        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Offsets'
        attribute:
          Mean:
            metric_type: gauge
            alias: kafka.request.offsets.time.avg
          99thPercentile:
            metric_type: gauge
            alias: kafka.request.offsets.time.99percentile
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.handler.avg.idle.pct.rate
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ProducerRequestPurgatory,name=PurgatorySize'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.request.producer_request_purgatory.size
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=FetchRequestPurgatory,name=PurgatorySize'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.request.fetch_request_purgatory.size

    #
    # Replication stats
    #
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.replication.under_replicated_partitions
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ReplicaManager,name=IsrShrinksPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.replication.isr_shrinks.rate
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ReplicaManager,name=IsrExpandsPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.replication.isr_expands.rate
    - include:
        domain: 'kafka.controller'
        bean: 'kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.replication.leader_elections.rate
    - include:
        domain: 'kafka.controller'
        bean: 'kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.replication.unclean_leader_elections.rate
    - include:
        domain: 'kafka.controller'
        bean: 'kafka.controller:type=KafkaController,name=OfflinePartitionsCount'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.replication.offline_partitions_count
    - include:
        domain: 'kafka.controller'
        bean: 'kafka.controller:type=KafkaController,name=ActiveControllerCount'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.replication.active_controller_count
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ReplicaManager,name=PartitionCount'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.replication.partition_count
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ReplicaManager,name=LeaderCount'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.replication.leader_count
    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica'
        attribute:
          Value:
            metric_type: gauge
            alias: kafka.replication.max_lag

    #
    # Log flush stats
    #
    - include:
        domain: 'kafka.log'
        bean: 'kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.log.flush_rate.rate

<% end -%>

To integrate all of this node, you need to add in your fqdn.yaml the class in the format:

---
classes:
 - profiles::datadogagent

datadog_agent::api_key: [your key]

After this runs, datadog-agent is installed and you can check it by usingĀ ps -ef | grep datadog-agentĀ and also if you like to take a look and you should do that, you will find that there are two new files added toĀ /etc/dd-agent/conf.d called kafka.yaml and zk.yaml.

You are done, please feel free to login to the datadog portal and check you host.

Cheers