cloud newtools python

Multiple field query in ELK from Python


There are a lot of pages on how to query ELK stack from Python client library, however, it’s still hard to grab a useful pattern.

What I wanted is to translate some simple query in Kibana like AND beat.hostname:*test AND tags:test into a useful Query DSL JSON.

It’s worth mentioning that the Python library uses this DSL. Once you have this info, things get much simpler.

Well, if you search hard enough, you will find a solution, and it should look like.

another_query_body = {
    "query": {
        "query_string" : {
            "query": "(master) AND (*test) AND (test)",
            "fields": ["", "beat.hostname" , "tags"]

As you probably guessed, each field maps to a query entry.


machine learning

Getting interactive help in IPython


I want to share with you a simple trick that I saw in a training course related to objects and classes functionality in IPython.

If you want to see a short description of the object or class you are using in your notebook please use , for example, if you just imported Elasticsearch from the elasticsearch module, the following

And if you want more details, you can use it like this, it will actually show you the code 🙂

I tried to do that also with DataFrame but it seems that it works only on already created objects

And for the more detailed look, you can try it yourself.

Here is also a link to more experienced people


machine learning python

My introduction to Linear Regression, so far


Here are some first steps that I want to share with you from my experience with regressions.

I am new and I took it to step by step, so don’t expect anything new or either very complex.

First thing, first, we started working on some prediction “algorithms” that should work with data available in the operations domain.

We needed to have them stored in a centralized location, and it happens that they are sent to ELK. So, the first step is to query then from that location.

To do that, there is a python client library with a lot of options that I am still beginning to explore. Cutting to the point, to have a regression you need a correlation parameter between the dependent and independent variable, so we thought at first about links between the number of connection and memory usage of a specific service (for example Redis). And this is available with some simple lines of code in Jupyter:

from elasticsearch import Elasticsearch
import matplotlib.pyplot as plt
from import json_normalize
es=Elasticsearch([{'host':'ELK_IP','port':'ELK_PORT'}])"metricbeat-redis-*", body={"query": {"match": {'': "info" }}}, size=1000)
df_redis = json_normalize(res_redis['hits']['hits'])
df_redis_filtered = df_redis[['','']]
df_redis_filtered[''] = df_redis_filtered[''] / 10**6
df_redis_final = df_redis_filtered[df_redis_filtered[''] < 300]

For a little bit of explaining, the used memory needs to be divided to ten to the sixth power in order to transform from bytes to MBytes, and also I wanted to exclude values of memory over 300MB. All good, unfortunately, if you plot the correlation “matrix” between these params, this happens:

As far as we all should know, a correlation parameter should be as close as possible to 1 or -1, but it’s just not the case.

And if you want to start plotting, it will look something like:

So, back to the drawing board, and we now know that we have no clue which columns are correlated. Let us not filter the columns and just remove those that are non-numeric or completely filled with zeros.

I used this to manipulate the data as simple as possible:

df_redis_numeric = df_redis.select_dtypes(['number'])
df_redis_cleaned = df_redis_numeric.loc[:, '': '' ]
df_redis_final = df_redis_cleaned.loc[:, (df_redis_cleaned != 0).any(axis=0)]

And it will bring you a very large matrix with a lot of rows and columns. From that matrix, you can choose two data types that are more strongly correlated. In my example [‘’,’’]

If we plot the correlation matrix just for those two colums we are much better than at the start.

So we are better than before, and we can now start thinking of plotting a regression, and here is the code for that.

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
import pandas as pd

x = df_redis_cpu['']
y = df_redis_cpu['']

x = x.values.reshape(-1, 1)
y = y.values.reshape(-1, 1)

x_train = x[:-250]
x_test = x[-250:]

y_train = y[:-250]
y_test = y[-250:]

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets, y_train)

# Plot outputs
plt.plot(x_test, regr.predict(x_test), color='red',linewidth=3)
plt.scatter(x_test, y_test,  color='black')
plt.title('Test Data')

Our DataFrame contains 1000 records from which I used 750 to “train” and another 250 to “test”. The output looked this way

It looks more like a regression, however, what concerns me is the mean square error which is a little bit high.

So we will need to works further on the DataFrame 🙂

In order for the linear model to be applied with scikit, the input and output data are transformed into single dimension vectors. If you want to switch back and for example to create a DataFrame from the output of the regression and the actual samples from ELK, it can be done this way:

data = np.append(np.array(y_test), np.array(y_pred), axis = 1)
dataset = pd.DataFrame({'test': data[:, 0], 'pred': data[:, 1]})
dataset['pred'] = dataset['pred'].map(lambda x: '%.2f' % x)

That is all.


cloud puppet

Install zookeeper using puppet without module


In this post, I was given the task to provide a standalone zookeeper cluster with basic auth on the latest version.

The reason that happened is that we are using a very old module on our Kafka clusters and a new requirement appeared to install the latest version of 3.5.5.

The old module had only the possibility to install the package from apt repo, which was not an option since the last version available on Ubuntu Xenial is at least two years old.

To complete this task, a different method was required. I would have to grab it with wget and add the rest of the files to make it functional.

Let us start with the puppet manifest and from that, I will add the rest.

class zookeeperstd {
  $version = hiera("zookeeperstd::version","3.5.5")
  $authenabled = hiera("zookeeperstd::authenabled",false)
  $server_jvm_flags = hiera('zookeeperstd::jvm_flags', undef)
    group { 'zookeeper':
        ensure => 'present',
    user {'zookeeper':
        ensure => 'present',
        home => '/var/lib/zookeeper',
        shell => '/bin/false',
    wget::fetch { 'zookeeper':
        source      => "${version}-bin.tar.gz",
        destination => "/opt/apache-zookeeper-${version}-bin.tar.gz",
        } ->
    archive { "/opt/apache-zookeeper-${version}-bin.tar.gz":
        creates      => "/opt/apache-zookeeper-${version}-bin",
        ensure        => present,
        extract       => true,
        extract_path  => '/opt',
        cleanup       => true,
    } ->
    file { "/opt/apache-zookeeper-${version}-bin":
        ensure    => directory,
        owner     => 'zookeeper',
        group      => 'zookeeper',
        require     => [ User['zookeeper'], Group['zookeeper'], ],
        recurse => true,
    } ->
    file { '/opt/zookeeper/':
        ensure    => link,
        target    => "/opt/apache-zookeeper-${version}-bin",
        owner     => 'zookeeper',
        group      => 'zookeeper',
        require     => [ User['zookeeper'], Group['zookeeper'], ],
    file { '/var/lib/zookeeper':
        ensure    => directory,
        owner     => 'zookeeper',
        group      => 'zookeeper',
        require     => [ User['zookeeper'], Group['zookeeper'], ],
        recurse    => true,
# in order to know which servers are in the cluster a role fact needs to be defined on each machine
    $hostshash = query_nodes(" v1_role='zookeeperstd'").sort
    $hosts_hash = $ |$value| { [$value, seeded_rand(254, $value)+1] }.hash
    $overide_hosts_hash = hiera_hash('profiles_opqs::kafka_hosts_hash', $hosts_hash)
    $overide_hosts = $overide_hosts_hash.keys.sort
    if $overide_hosts_hash.size() != $overide_hosts_hash.values.unique.size() {
        #notify {"Duplicate IDs detected! ${overide_hosts_hash}": }
        $overide_hosts_hash2 = $ |$index, $value| { [$value, $index+1] }.hash
  } else {
        $overide_hosts_hash2 = $overide_hosts_hash
	$hosts = $overide_hosts_hash2
	$data_dir = "/var/lib/zookeeper"
	$tick_time        = 2000
        $init_limit       = 10
        $sync_limit       = 5

	$myid = $hosts[$::fqdn]
    file { '/var/lib/zookeeper/myid':
        content => "${myid}",

	file { '/opt/zookeeper/conf/zoo.cfg':
        content => template("${module_name}/zoo.cfg.erb"),
   if $authenabled {
    $superpass        = hiera("zookeeperstd::super_pass", 'super-admin')
    $zoopass          = hiera("zookeeperstd::zookeeper_pass", 'zookeeper-admin')
    $clientpass        = hiera("zookeeperstd::client_pass", 'client-admin')
    file { '/opt/zookeeper/conf/zoo_jaas.config':
        content => template("${module_name}/zoo_jaas.config.erb"),
     file { '/opt/zookeeper/conf/java.env':
        content => template("${module_name}/java.zookeeper.env.erb"),
        mode => "0755",
     file { '/opt/zookeeper/conf/':
        content => template("${module_name}/"),
    file {'/etc/systemd/system/zookeeper.service':
        source  => 'puppet:///modules/work/zookeeper.service',
        mode => "644",
        } ->
    service { 'zookeeper':
        ensure   => running,
        enable   => true,
        provider => systemd,

As far as I managed to adapt some file from the existing module, here are the rest of the additional details.

# Note: This file is managed by Puppet.


# specify all zookeeper servers
# The fist port is used by followers to connect to the leader
# The second one is used for leader election
if @hosts
# sort hosts by myid and output a server config
# for each host and myid.  (sort_by returns an array of key,value tuples)
@hosts.sort_by { |name, id| id }.each do |host_id|
server.<%= host_id[1] %>=<%= host_id[0] %>:2182:2183
<% if @authenabled -%>
authProvider.<%= host_id[1] %>=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
<% end -%>
<% end -%>
<% end -%>

# the port at which the clients will connect

# the directory where the snapshot is stored.
dataDir=<%= @data_dir %>

# Place the dataLogDir to a separate physical disc for better performance
<%= @data_log_dir ? "dataLogDir=#{data_log_dir}" : '# dataLogDir=/disk2/zookeeper' %>

# The number of milliseconds of each tick.
tickTime=<%= @tick_time %>

# The number of ticks that the initial
# synchronization phase can take.
initLimit=<%= @init_limit %>

# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=<%= @sync_limit %>

# To avoid seeks ZooKeeper allocates space in the transaction log file in
# blocks of preAllocSize kilobytes. The default block size is 64M. One reason
# for changing the size of the blocks is to reduce the block size if snapshots
# are taken more often. (Also, see snapCount).

# Clients can submit requests faster than ZooKeeper can process them,
# especially if there are a lot of clients. To prevent ZooKeeper from running
# out of memory due to queued requests, ZooKeeper will throttle clients so that
# there is no more than globalOutstandingLimit outstanding requests in the
# system. The default limit is 1,000.ZooKeeper logs transactions to a
# transaction log. After snapCount transactions are written to a log file a
# snapshot is started and a new transaction log file is started. The default
# snapCount is 10,000.

# If this option is defined, requests will be will logged to a trace file named

# Leader accepts client connections. Default value is "yes". The leader machine
# coordinates updates. For higher update throughput at thes slight expense of
# read throughput the leader can be configured to not accept clients and focus
# on coordination.

<% if @authenabled -%>


<% end -%> 
QuorumServer {
       org.apache.zookeeper.server.auth.DigestLoginModule required
       user_zookeeper="<%= @zoopass %>";
QuorumLearner {
       org.apache.zookeeper.server.auth.DigestLoginModule required
       password="<%= @zoopass %>";

Server {
       org.apache.zookeeper.server.auth.DigestLoginModule required
       user_super="<%= @superpass %>"
       user_client="<%= @clientpass %>";
SERVER_JVMFLAGS="<%= @server_jvm_flags %>"
# Note: This file is managed by Puppet.

# ZooKeeper Logging Configuration

# Format is "<default threshold> (, <appender>)+

log4j.rootLogger=${zookeeper.root.logger}, ROLLINGFILE

# Log INFO level and above messages to the console
log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n

# Add ROLLINGFILE to rootLogger to get log file output
#    Log INFO level and above messages to a log file

# Max log file size of 10MB
# Keep only 10 files
log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n

And the last but not the least.

Description=ZooKeeper Service

ExecStart=/opt/zookeeper/bin/ start /opt/zookeeper/conf/zoo.cfg
ExecStop=/opt/zookeeper/bin/ stop /opt/zookeeper/conf/zoo.cfg
ExecReload=/opt/zookeeper/bin/ restart /opt/zookeeper/conf/zoo.cfg


Also, if you want to enable simple MD5 authentication, in hiera you will need to add the following two lines.

zookeeperstd::authenabled: true
zookeeperstd::jvm_flags: ""

If there is a simpler approach, feel free to leave me a message on Linkedin or Twitter.



Logs check without ELK :)


We didn’t have the time to implement ELK stack for Kafka logs so if a issue appears it should be done the old fashion way.

To that purpose, here are two commands that should help you surfing the logs in an easy manner.

First of all, there is the grep command that should show you the hole line and number.

A simple example looks like

grep -nw "2019-06-03" server.log

This should show you all the lines with date 03.06 from the log of the Kafka broker. The idea is that you can not use it with the standard construct cat server.log | grep -nw “[string]”. It must be used in this specific format.

Once you found the line number (and it could look just like 95138:java.lang.OutOfMemoryError: Java heap space there is the less command that we can use.

less +95138 server.log

And that should give you the line.

Thanks all folks!


Editing Windows registry entries(hive) from Linux


I want to share this with you and also it’s also very useful for me in case i come to this problem again.

The main reason for this is related to the fact that my Windows installation will not boot anymore. We thought that it was something related to a registry entry, so i started to take a look how can registries be modified from Linux (later saw that it was much easier to do it with a bootable Windows stick, a cmd window and regedit)

From what i managed to research on the net, it seems that the only tool for this scope is chntpw

I will no go into details o installing this tool, you can find that in different posts from other sites. What i consider important is how you find the right “hive” to edit (and by hive they understand directory structure)

So, in order to have access to the registry, you will need to mount the Windows partition on Linux.

It should be relatively easy, find the partition using sudo fdisk -l and after that for example mkdir /mnt/windows; sudo mount /dev/sda2 (for example since sda1 should be the boot partition) /mnt/windows

After it is mounted, the easiest way to see the trees is by listing the content of /mnt/windows/Windows/System32/config

And it should look similar to this:

-rwxrwxrwx 1 root root 5505024 May 9 10:50 DRIVERS
-rwxrwxrwx 1 root root 2776 May 14 15:35 netlogon.ftl
-rwxrwxrwx 1 root root 18612224 May 15 2019 SYSTEM
-rwxrwxrwx 1 root root 96206848 May 15 2019 SOFTWARE
-rwxrwxrwx 1 root root 786432 May 15 2019 DEFAULT
-rwxrwxrwx 2 root root 53215232 May 15 2019 COMPONENTS

And of course a lot more other directories.

To edit one of the trees it as simple as running

me@mintworkstation:/mnt/windows/Windows/System32/config$ chntpw -e SYSTEM
chntpw version 1.00 140201, (c) Petter N Hagen
Hive name (from header): <\Windows\system32\config\SYSTEM>
ROOT KEY at offset: 0x001020 * Subkey indexing type is: 686c
File size 18612224 [11c0000] bytes, containing 3776 pages (+ 1 headerpage)
Used for data: 257785/18420128 blocks/bytes, unused: 13/1632 blocks/bytes.

Simple registry editor. ? for help.

And the simplest way to navigate is by using ls to list and cd to change o a smaller tree (please put the name of the tree without “<” and “>” like, cd Software for example.

Once you arrived at the record you want to edit just call ed [record_name] It will show you the actual value and ask you what is the update one.

Once the changes are done, just press q and it will as you to same the registry hive. After it is saved, you are all done.

That would be all. Cheers.


Jolokia particular case using custom facts in Hiera


This is for me and also for all the other people that are searching for how to use custom defined types in Hiera

In my case i wanted to activate the HTTP endpoint of Jolokia using custom hostname and standard port. And for that it was sufficient to add in my host yaml the following lines

profiles::kafka::jolokia: "-javaagent:/usr/share/java/jolokia-jvm-agent.jar=port=8778,host=%{::networking.fqdn}"

This contains the standard fact called networking, which is a hash, and i am using the key that is called fqdn.

And it works.


kafka python

Kafka_consumer.yaml (python style) and more


As a followup to the article i posted earlier ( ) , you can use that info to put in into kafka_consumer.yaml for Datadog integration.

It’s not elegant by any means, but it works. As an advise, please don’t over complicate thinks more than they need.

In the last example i figured i wanted to create a list of GroupInfo objects for each line that was returned from consumer group script. Bad idea as you shall see below

So, in addition to what i wrote in the last article, now it’s not just printing the dictionary but order it, by partition.

def constructgroupdict():
 groupagregate = {}
 group_list = getgroups()
 for group in group_list:
    groupagregate[group] = getgroupinfo(group)
 for v in groupagregate.values():
    v.sort(key = lambda re: int(re.partition))
 return groupagregate

def printgroupdict():
 groupdict = constructgroupdict()
 infile = open('/etc/datadog-agent/conf.d/kafka_consumer.d/kafka_consumer.yaml.template','a')
 for k,v in groupdict.items():
    infile.write('      '+k+':\n')
    topics = []
    testdict = {}
    for re in v:
        if re.topic not in topics:
    for x in topics:
        partitions = []
        for re in v:
           if (re.topic == x):
        testdict[x] = partitions
    for gr,partlst in testdict.items():
        infile.write('        '+gr+': ['+', '.join(partlst)+']\n')

And after that, it’s quite hard to get only the unique value for the topic name.

The logic i chose to grab all the data per consumer group is related to the fact that querying the cluster takes a very long time, so if i wanted to grab another set of data filtered by topic, i would have been very time costly.

In the way that is written now, there are a lot of for loop, that should become challenging in care there are too many records to process. Fortunately, this should not be the case for consumer groups in a normal case.

The easiest way to integrate the info in kafka_consumer.yaml, in our case is to create a template called kafka_consumer.yaml.template

  # Customize the ZooKeeper connection timeout here
  # zk_timeout: 5
  # Customize the Kafka connection timeout here
  # kafka_timeout: 5
  # Customize max number of retries per failed query to Kafka
  # kafka_retries: 3
  # Customize the number of seconds that must elapse between running this check.
  # When checking Kafka offsets stored in Zookeeper, a single run of this check
  # must stat zookeeper more than the number of consumers * topic_partitions
  # that you're monitoring. If that number is greater than 100, it's recommended
  # to increase this value to avoid hitting zookeeper too hard.
  # min_collection_interval: 600
  # Please note that to avoid blindly collecting offsets and lag for an
  # unbounded number of partitions (as could be the case after introducing
  # the self discovery of consumer groups, topics and partitions) the check
  # will collect at metrics for at most 200 partitions.

  # In a production environment, it's often useful to specify multiple
  # Kafka / Zookeper nodes for a single check instance. This way you
  # only generate a single check process, but if one host goes down,
  # KafkaClient / KazooClient will try contacting the next host.
  # Details:
  # If you wish to only collect consumer offsets from kafka, because
  # you're using the new style consumers, you can comment out all
  # zk_* configuration elements below.
  # Please note that unlisted consumer groups are not supported at
  # the moment when zookeeper consumer offset collection is disabled.
  - kafka_connect_str:
      - localhost:9092
      - localhost:2181
    # zk_iteration_ival: 1  # how many seconds between ZK consumer offset
                            # collections. If kafka consumer offsets disabled
                            # this has no effect.
    # zk_prefix: /0.8

    # SSL Configuration

    # ssl_cafile: /path/to/pem/file
    # security_protocol: PLAINTEXT
    # ssl_check_hostname: True
    # ssl_certfile: /path/to/pem/file
    # ssl_keyfile: /path/to/key/file
    # ssl_password: password1

    # kafka_consumer_offsets: false

It’s true that i keep only one string for connectivity on Kafka and Zookeeper, and that things are a little bit more complicated once SSL is configured (but this is not our case, yet).

  - kafka_connect_str:
      - localhost:9092
      - localhost:2181

And append the info at the bottom of it after which it is renamed. Who is putting that template back? Easy, that would be puppet.

It works, it has been tested. One last thing that i wanted to warn you about.

There is a limit of metrics that can be uploaded per machine, and that is 350. Please be aware and think very serious if you want to activate it.

Than would be all for today.


kafka python

Get the info you need from consumer group (python style)


For some of you might be of help. If it’s rubbish, i truly apologize, please leave a comment with improvements 🙂

import subprocess
import socket

fqdn = socket.getfqdn()

class GroupInfo:
 def __init__(self, line):
     self.topic = line[0]
     self.partition = line[1]
     self.current_offset = line[2]
     self.logend_offset = line[3]
     self.lag = line[4]
     self.consumer_id = line[5] = line[6]
     self.client_id = line[7]
 def __str__(self):
    return self.topic+" "+self.partition+" "+self.current_offset+" "+self.logend_offset+" "+self.lag+" "+self.consumer_id

def getgroups():

 cmd = ['/opt/kafka/bin/ --bootstrap-server '+fqdn+':9092 --list']
 result = subprocess.check_output(cmd, shell=True).splitlines()
 group_list = []
 for r in result:
       rstr = r.decode('utf-8')
       print('Result can not be converted to utf-8')

 return group_list

def getgroupinfo(groupid):
 cmd = ('/opt/kafka/bin/ --bootstrap-server '+fqdn+':9092 --group '+groupid+' --describe')
 process = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
 result = subprocess.check_output(('grep -v TOPIC'),stdin = process.stdout, shell=True).splitlines()
 group_info_list = []
 for r in result:
       rstr = r.decode('utf-8')
       print('Result can not be converted to utf-8')
     if len(rstr.split()) == 0:
        group_info = GroupInfo(rstr.split())
 return group_info_list

def main():
 groupagregate = {}
 group_list = getgroups()
 for group in group_list:
    groupagregate[group] = getgroupinfo(group)
 for k, v in groupagregate.items():
    for re in v:


I will not explain it. It should be self explanatory.


kafka python

Kafka consumer group info retrieval using Python


I’ve been playing with kafka-python module to grab the info i need in order to reconfigure Datadog integration.

Unfortunately, there is a catch also on this method. And i will show you below.

Here is a little bit of not so elegant code.

from kafka import BrokerConnection
from kafka.protocol.admin import *
import socket

fqdn = socket.getfqdn()
bc = BrokerConnection(fqdn,9092,socket.AF_INET)
except Exception as e:
if bc.connected():
   print("Connection to", fqdn, " established")

def getgroup():
 list_groups_request = ListGroupsRequest_v1()

 future0 = bc.send(list_groups_request)
 while not future0.is_done:
     for resp, f in bc.recv():
 group_ids = ()
 for group in future0.value.groups:
     group_ids += (group[0],)

 description = DescribeGroupsRequest_v1(group_ids)
 future1 = bc.send(description)
 while not future1.is_done:
    for resp, f in bc.recv():

 for groupid in future1.value.groups:
     print('For group ',groupid[1],':\n')
     for meta in groupid[5]:
 if future1.is_done:
    print("Group query is done")


As you will see, print(meta[3]) will return a very ugly binary data with topic names in it, that is not converted if you try with meta[3].decode(‘utf-8’)

I hope i can find a way to decrypt it.