Logs check without ELK :)


We didn’t have the time to implement ELK stack for Kafka logs so if a issue appears it should be done the old fashion way.

To that purpose, here are two commands that should help you surfing the logs in an easy manner.

First of all, there is the grep command that should show you the hole line and number.

A simple example looks like

grep -nw "2019-06-03" server.log

This should show you all the lines with date 03.06 from the log of the Kafka broker. The idea is that you can not use it with the standard construct cat server.log | grep -nw “[string]”. It must be used in this specific format.

Once you found the line number (and it could look just like 95138:java.lang.OutOfMemoryError: Java heap space there is the less command that we can use.

less +95138 server.log

And that should give you the line.

Thanks all folks!


Editing Windows registry entries(hive) from Linux


I want to share this with you and also it’s also very useful for me in case i come to this problem again.

The main reason for this is related to the fact that my Windows installation will not boot anymore. We thought that it was something related to a registry entry, so i started to take a look how can registries be modified from Linux (later saw that it was much easier to do it with a bootable Windows stick, a cmd window and regedit)

From what i managed to research on the net, it seems that the only tool for this scope is chntpw

I will no go into details o installing this tool, you can find that in different posts from other sites. What i consider important is how you find the right “hive” to edit (and by hive they understand directory structure)

So, in order to have access to the registry, you will need to mount the Windows partition on Linux.

It should be relatively easy, find the partition using sudo fdisk -l and after that for example mkdir /mnt/windows; sudo mount /dev/sda2 (for example since sda1 should be the boot partition) /mnt/windows

After it is mounted, the easiest way to see the trees is by listing the content of /mnt/windows/Windows/System32/config

And it should look similar to this:

-rwxrwxrwx 1 root root 5505024 May 9 10:50 DRIVERS
-rwxrwxrwx 1 root root 2776 May 14 15:35 netlogon.ftl
-rwxrwxrwx 1 root root 18612224 May 15 2019 SYSTEM
-rwxrwxrwx 1 root root 96206848 May 15 2019 SOFTWARE
-rwxrwxrwx 1 root root 786432 May 15 2019 DEFAULT
-rwxrwxrwx 2 root root 53215232 May 15 2019 COMPONENTS

And of course a lot more other directories.

To edit one of the trees it as simple as running

me@mintworkstation:/mnt/windows/Windows/System32/config$ chntpw -e SYSTEM
chntpw version 1.00 140201, (c) Petter N Hagen
Hive name (from header): <\Windows\system32\config\SYSTEM>
ROOT KEY at offset: 0x001020 * Subkey indexing type is: 686c
File size 18612224 [11c0000] bytes, containing 3776 pages (+ 1 headerpage)
Used for data: 257785/18420128 blocks/bytes, unused: 13/1632 blocks/bytes.

Simple registry editor. ? for help.

And the simplest way to navigate is by using ls to list and cd to change o a smaller tree (please put the name of the tree without “<” and “>” like, cd Software for example.

Once you arrived at the record you want to edit just call ed [record_name] It will show you the actual value and ask you what is the update one.

Once the changes are done, just press q and it will as you to same the registry hive. After it is saved, you are all done.

That would be all. Cheers.


Jolokia particular case using custom facts in Hiera


This is for me and also for all the other people that are searching for how to use custom defined types in Hiera

In my case i wanted to activate the HTTP endpoint of Jolokia using custom hostname and standard port. And for that it was sufficient to add in my host yaml the following lines

profiles::kafka::jolokia: "-javaagent:/usr/share/java/jolokia-jvm-agent.jar=port=8778,host=%{::networking.fqdn}"

This contains the standard fact called networking, which is a hash, and i am using the key that is called fqdn.

And it works.


kafka python

Kafka_consumer.yaml (python style) and more


As a followup to the article i posted earlier ( ) , you can use that info to put in into kafka_consumer.yaml for Datadog integration.

It’s not elegant by any means, but it works. As an advise, please don’t over complicate thinks more than they need.

In the last example i figured i wanted to create a list of GroupInfo objects for each line that was returned from consumer group script. Bad idea as you shall see below

So, in addition to what i wrote in the last article, now it’s not just printing the dictionary but order it, by partition.

def constructgroupdict():
 groupagregate = {}
 group_list = getgroups()
 for group in group_list:
    groupagregate[group] = getgroupinfo(group)
 for v in groupagregate.values():
    v.sort(key = lambda re: int(re.partition))
 return groupagregate

def printgroupdict():
 groupdict = constructgroupdict()
 infile = open('/etc/datadog-agent/conf.d/kafka_consumer.d/kafka_consumer.yaml.template','a')
 for k,v in groupdict.items():
    infile.write('      '+k+':\n')
    topics = []
    testdict = {}
    for re in v:
        if re.topic not in topics:
    for x in topics:
        partitions = []
        for re in v:
           if (re.topic == x):
        testdict[x] = partitions
    for gr,partlst in testdict.items():
        infile.write('        '+gr+': ['+', '.join(partlst)+']\n')

And after that, it’s quite hard to get only the unique value for the topic name.

The logic i chose to grab all the data per consumer group is related to the fact that querying the cluster takes a very long time, so if i wanted to grab another set of data filtered by topic, i would have been very time costly.

In the way that is written now, there are a lot of for loop, that should become challenging in care there are too many records to process. Fortunately, this should not be the case for consumer groups in a normal case.

The easiest way to integrate the info in kafka_consumer.yaml, in our case is to create a template called kafka_consumer.yaml.template

  # Customize the ZooKeeper connection timeout here
  # zk_timeout: 5
  # Customize the Kafka connection timeout here
  # kafka_timeout: 5
  # Customize max number of retries per failed query to Kafka
  # kafka_retries: 3
  # Customize the number of seconds that must elapse between running this check.
  # When checking Kafka offsets stored in Zookeeper, a single run of this check
  # must stat zookeeper more than the number of consumers * topic_partitions
  # that you're monitoring. If that number is greater than 100, it's recommended
  # to increase this value to avoid hitting zookeeper too hard.
  # min_collection_interval: 600
  # Please note that to avoid blindly collecting offsets and lag for an
  # unbounded number of partitions (as could be the case after introducing
  # the self discovery of consumer groups, topics and partitions) the check
  # will collect at metrics for at most 200 partitions.

  # In a production environment, it's often useful to specify multiple
  # Kafka / Zookeper nodes for a single check instance. This way you
  # only generate a single check process, but if one host goes down,
  # KafkaClient / KazooClient will try contacting the next host.
  # Details:
  # If you wish to only collect consumer offsets from kafka, because
  # you're using the new style consumers, you can comment out all
  # zk_* configuration elements below.
  # Please note that unlisted consumer groups are not supported at
  # the moment when zookeeper consumer offset collection is disabled.
  - kafka_connect_str:
      - localhost:9092
      - localhost:2181
    # zk_iteration_ival: 1  # how many seconds between ZK consumer offset
                            # collections. If kafka consumer offsets disabled
                            # this has no effect.
    # zk_prefix: /0.8

    # SSL Configuration

    # ssl_cafile: /path/to/pem/file
    # security_protocol: PLAINTEXT
    # ssl_check_hostname: True
    # ssl_certfile: /path/to/pem/file
    # ssl_keyfile: /path/to/key/file
    # ssl_password: password1

    # kafka_consumer_offsets: false

It’s true that i keep only one string for connectivity on Kafka and Zookeeper, and that things are a little bit more complicated once SSL is configured (but this is not our case, yet).

  - kafka_connect_str:
      - localhost:9092
      - localhost:2181

And append the info at the bottom of it after which it is renamed. Who is putting that template back? Easy, that would be puppet.

It works, it has been tested. One last thing that i wanted to warn you about.

There is a limit of metrics that can be uploaded per machine, and that is 350. Please be aware and think very serious if you want to activate it.

Than would be all for today.


kafka python

Get the info you need from consumer group (python style)


For some of you might be of help. If it’s rubbish, i truly apologize, please leave a comment with improvements 🙂

import subprocess
import socket

fqdn = socket.getfqdn()

class GroupInfo:
 def __init__(self, line):
     self.topic = line[0]
     self.partition = line[1]
     self.current_offset = line[2]
     self.logend_offset = line[3]
     self.lag = line[4]
     self.consumer_id = line[5] = line[6]
     self.client_id = line[7]
 def __str__(self):
    return self.topic+" "+self.partition+" "+self.current_offset+" "+self.logend_offset+" "+self.lag+" "+self.consumer_id

def getgroups():

 cmd = ['/opt/kafka/bin/ --bootstrap-server '+fqdn+':9092 --list']
 result = subprocess.check_output(cmd, shell=True).splitlines()
 group_list = []
 for r in result:
       rstr = r.decode('utf-8')
       print('Result can not be converted to utf-8')

 return group_list

def getgroupinfo(groupid):
 cmd = ('/opt/kafka/bin/ --bootstrap-server '+fqdn+':9092 --group '+groupid+' --describe')
 process = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
 result = subprocess.check_output(('grep -v TOPIC'),stdin = process.stdout, shell=True).splitlines()
 group_info_list = []
 for r in result:
       rstr = r.decode('utf-8')
       print('Result can not be converted to utf-8')
     if len(rstr.split()) == 0:
        group_info = GroupInfo(rstr.split())
 return group_info_list

def main():
 groupagregate = {}
 group_list = getgroups()
 for group in group_list:
    groupagregate[group] = getgroupinfo(group)
 for k, v in groupagregate.items():
    for re in v:


I will not explain it. It should be self explanatory.


kafka python

Kafka consumer group info retrieval using Python


I’ve been playing with kafka-python module to grab the info i need in order to reconfigure Datadog integration.

Unfortunately, there is a catch also on this method. And i will show you below.

Here is a little bit of not so elegant code.

from kafka import BrokerConnection
from kafka.protocol.admin import *
import socket

fqdn = socket.getfqdn()
bc = BrokerConnection(fqdn,9092,socket.AF_INET)
except Exception as e:
if bc.connected():
   print("Connection to", fqdn, " established")

def getgroup():
 list_groups_request = ListGroupsRequest_v1()

 future0 = bc.send(list_groups_request)
 while not future0.is_done:
     for resp, f in bc.recv():
 group_ids = ()
 for group in future0.value.groups:
     group_ids += (group[0],)

 description = DescribeGroupsRequest_v1(group_ids)
 future1 = bc.send(description)
 while not future1.is_done:
    for resp, f in bc.recv():

 for groupid in future1.value.groups:
     print('For group ',groupid[1],':\n')
     for meta in groupid[5]:
 if future1.is_done:
    print("Group query is done")


As you will see, print(meta[3]) will return a very ugly binary data with topic names in it, that is not converted if you try with meta[3].decode(‘utf-8’)

I hope i can find a way to decrypt it.


kafka puppet

Fact for kafka_consumer hash…or kind of


There is a late requirement that we activate the kafka_consumer functionality of Datadog.

Unfortunately this is a challenge if you don’t have a fixed number of consumer groups and topics (on one client we had a couple of hundreds consumer groups)

Here is how it should look in the example file

  #  consumer_groups:
  #    <CONSUMER_NAME_1>:
  #      <TOPIC_NAME_1>: [0, 1, 4, 12]
  #    <CONSUMER_NAME_2>:
  #      <TOPIC_NAME_2>:
  #    <CONSUMER_NAME_3>

So, if you want to grab the data using the old, you will have to do this in one iteration. I tried it this way, but even if i am done, should not be a option for larger clusters

require 'facter'
Facter.add('kafka_consumers_config') do
  setcode do
      kafka_consumers_cmd = '/opt/kafka/bin/ --bootstrap-server localhost:9092 --list'
      kafka_consumers_result = Facter::Core::Execution.exec(kafka_consumers_cmd)
      group_hash = {}
      kafka_consumers_result.each_line do |group|
        groupid = group.strip 
        kafka_consumer_topic_list_cmd="/opt/kafka/bin/ --bootstrap-server localhost:9092 --group #{groupid} --describe | sort | grep -v TOPIC  | awk {\'print $1\'} | uniq"
        kafka_consumer_topic_list_result = Facter::Core::Execution.exec(kafka_consumer_topic_list_cmd)
        kafka_consumer_topic_list_result.split(/\n/).reject { |c| c.empty? }
        topic_hash = {}
        kafka_consumer_topic_list_result.each_line do |topic|
           topicid = topic.strip
           kafka_consumer_topic_partition_cmd="/opt/kafka/bin/ --bootstrap-server localhost:9092 --group #{groupid} --describe | grep -v TOPIC | grep #{topicid} | awk {\'print $2\'} | sort"
           kafka_consumer_topic_partition_result.gsub("\n", ' ').squeeze(' ')
           topic_hash[topic] = kafka_consumer_topic_partition_result
      group_hash[group] = topic_hash    

This is posted only as a snapshot and maybe as a source of inspiration. For an actual working version, it should be done using Java or Scala in order to leverage the power of libraries (from what i know Python/Go and other programming languages have only libraries on consumer/producer part)

If i will pursue the task of rewriting this using only one interrogation loop or in Java/Scala, you will see it.


Python for opening and reading files

Since I started learning Python and hopefully take also a certification, I am trying to do some hands-on.

Nothing too complicated, just some basic exercises, for the moment. Here is one of them.

They want to though you to open the and file and read from it like:

from sys import argv

script, filename = argv
txt = open(filename)
print(f"Here is your file: {filename}")

But there is a thing not to be ignored, and that is exception handling.

from sys import argv

    script, filename = argv
    txt = open(filename)
    print(f"Here is your file: {filename}")
except ValueError as e:
    if "not enough" in e.__str__():
        print("not enough arguments given")
        print("too many arguments given")
except FileNotFoundError:
    print("file given, not found")

And since you can’t really discern between ValueError messages, or at least I don’t know how to do that more elegant, yet, we have a workaround.

There is also a FileNotFoundError which shouldn’t be left unchecked, and if I missed something, leave a comment.

P.S: I know there is also the possibility for the file not to have reading permissions. That demonstrates that for three lines of code you need to threat even more exceptional cases.



Distributing service conditionally on OS version


Since we are in the process of migrating to 16.04, my service restart script needed to be deployed with separate builds.

In that purpose, i found a fact that would help me, so that my standard file block transformed into this:

 case $facts['os']['distro']['codename']  {
    'xenial': {
        file {"/root/servicerestart":
        source => 'puppet:///modules/profiles/servicerestart-kafka-new',
        mode => '0755',
        replace => true,
    'trusty': { 
    file {"/root/servicerestart":
        source => 'puppet:///modules/profiles/servicerestart-kafka',
        mode => '0755',
        replace => true,

That should be all for now.

linux puppet

Order Linux processes by memory usage

This one is more for me actually. We have some issues with one puppet instance on which the processes fail, and i wanted to see if there is any way to order them by memory usage.

So i searched the net and found this link

The command is like

ps aux --sort -rss | head -10

And it provides you with following output, at least in my case

puppet    6327 70.1 25.5 3585952 1034532 ?     Sl   06:53   7:33 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java -XX:OnOutOfMemoryError=kill -9 %p -javaagent:/usr/share/java/jolokia-jvm-agent.jar=port=8778 -Xms1024m -Xmx1024m -cp /opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar clojure.main -m puppetlabs.trapperkeeper.main --config /etc/puppetlabs/puppetserver/conf.d -b /etc/puppetlabs/puppetserver/bootstrap.cfg
jenkins   6776  9.6 16.6 4648236 671980 ?      Sl   06:55   0:51 /usr/bin/java -Djava.awt.headless=true -javaagent:/usr/share/java/jolokia-jvm-agent.jar=port=8780 -Xms1024m -Xmx1024m -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080 --httpListenAddress=
puppetdb  5987 16.8 11.7 3845896 474164 ?      Sl   06:52   2:01 /usr/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx192m -javaagent:/usr/share/java/jolokia-jvm-agent.jar=port=8779 -cp /opt/puppetlabs/server/apps/puppetdb/puppetdb.jar clojure.main -m puppetlabs.puppetdb.main --config /etc/puppetlabs/puppetdb/conf.d -b /etc/puppetlabs/puppetdb/bootstrap.cfg
postgres  1458  0.0  2.1 249512 88656 ?        Ss   Nov21   3:10 postgres: checkpointer process                                                                                              
postgres  6206  0.0  1.4 253448 57984 ?        Ss   06:53   0:00 postgres: puppetdb puppetdb idle                                                                           
postgres  6209  0.0  0.7 252580 29820 ?        Ss   06:53   0:00 postgres: puppetdb puppetdb idle                                                                           
postgres  6210  0.0  0.5 254892 22440 ?        Ss   06:53   0:00 postgres: puppetdb puppetdb idle                                                                           
postgres  6213  0.0  0.5 254320 21416 ?        Ss   06:53   0:00 postgres: puppetdb puppetdb idle                                                                           
postgres  6205  0.0  0.5 253524 20324 ?        Ss   06:53   0:00 postgres: puppetdb puppetdb idle                       

As you can probably see, the components are taking slowly but surely more and more memory and since the machine has only 4GB allocated it will probably crash again.

If this happens, i will manually increase the memory with another 2GB and see where we will go from there.