Linux ‘cat’ in Python – almost complete


Since I am striving to find useful content to post more often, I took homework for a ‘cat’ written in Python.

It’s not elegant, and it’s not the best version but it works.

# -*- coding: utf-8 -*-
Created on Wed Dec 25 10:28:39 2019
@author: Sorin
import sys,getopt,os
if os.path.isabs(sys.argv[-1:][0]):
    FILENAME= sys.argv[-1:][0]
    FILENAME = os.getcwd() + "\\" + sys.argv[-1:][0]

def read_content(filename):
        f = open(filename, "r+")
        content =
    except IOError as e:
            print("File could not be opened:", e)
    return content
def transform_content():
    content = read_content(FILENAME)
    content_list = content.split('\n')
    return content_list
def number_nonempty_lines(content_list):
    i = 0
    for line in content_list:
        if line != '':
            content_list[i] = str(i) + " " + line
        i = i + 1
    return content_list
def squeeze_blanks(content_list):   
    i = 0
    duplicate_index = []
    for line in content_list:
        if (line == "" or line == "$")  or (str.isdigit(line.split(' ')[0]) and (line.split(' ')[-1] == "" or line.split(' ')[-1] == "$")):
        i = i + 1
    delete_index = []
    for j in range(len(duplicate_index) - 1):
        if  duplicate_index[j] + 1 == duplicate_index[j+1]:
    for element in delete_index:
    return content_list    
def number_all_lines(content_list):
    i = 0
    for line in content_list:
        content_list[i] = str(i) + " " + line
        i = i + 1
    return content_list

def display_endline(content_list):
   return [line + "$" for line in content_list]

def show_tabs(content_list):
    content_list = [ line.replace('\t','^I') for line in content_list]
    return content_list

content_list =transform_content()
    opts, args = getopt.gnu_getopt(sys.argv[1:-1], 'AbeEnstTv', ['show-all', 'number-nonblank', 'show-ends', 'number', 'show-blank', 'squeeze-blank' 'show-tabs', 'show-nonprinting', 'help', 'version'])
except getopt.GetoptError:
     print("Something went wrong")
for opt, arg in opts:
    if opt in ('-A','--show-all'):
        content_list = display_endline(content_list)
        content_list = show_tabs(content_list)
    elif opt in ('-b', '--number-nonblank'):
       content_list = number_nonempty_lines(content_list)
    elif opt in ('-n', '--number'):
       content_list = number_all_lines(content_list)
    elif opt in ('-E', '--show-ends'):
        content_list = display_endline(content_list)
    elif opt in ('-s', '--squeeze-blank'):
        content_list = squeeze_blanks(content_list)
    elif opt in ('-T', '--show-tabs'):
        content_list = show_tabs(content_list)

Further improvements will be also posted. I must confess that there are still a couple of things to be fixed, like not running the same options twice, and the issue of putting it to work on very large files, but it will do in this form for now.



Reset Cinnamon desktop interface


I recently had an issue with Cinnamon interface, more exactly, my menu panel dissapeared.

After some quick searches on the net, I found this command:

gsettings reset-recursively org.cinnamon

It seems to do the trick.


cloud kafka puppet python

Automatic increase of Kafka LVM on GCP

I wrote an article for my company that was published on Medium regarding the topic in the subject. Please see the link


cloud python

Using GCP recommender API for Compute engine

Let’s keep it short. If you want to use Python libraries for Recommender API, this is how you connect to your project.

from import RecommenderClient
from google.oauth2 import service_account
def main():
    credential = service_account.Credentials.from_service_account_file('account.json')
    project = "internal-project"
    location = "europe-west1-b"
    recommender = 'google.compute.instance.MachineTypeRecommender'
    client = RecommenderClient(credentials=credential)
    name = client.recommender_path(project, location, recommender)
    elements = client.list_recommendations(name,page_size=4)
    for i in elements:


credential = RecommenderClient.from_service_account_file(‘account.json’) will not return any error, just hang.

That’s all folks!

cloud python

ELK query using Python with time range

Short post. Sharing how you make an ELK query from Python using also timestamp:


query_body_mem = {
    "query": {
        "bool" : {
            "must" : [
                        "query_string" : {
                        "query": "metricset.module:system AND tags:test AND[hostname]"
                         "range" : {
                            "@timestamp" : {
                                "gte" : "now-2d",
                                "lt" :  "now"
}"metricbeat-*", body=query_body_mem, size=500)
df_mem = json_normalize(res_mem['hits']['hits'])

And that’s all!


cloud newtools python

Multiple field query in ELK from Python


There are a lot of pages on how to query ELK stack from Python client library, however, it’s still hard to grab a useful pattern.

What I wanted is to translate some simple query in Kibana like AND beat.hostname:*test AND tags:test into a useful Query DSL JSON.

It’s worth mentioning that the Python library uses this DSL. Once you have this info, things get much simpler.

Well, if you search hard enough, you will find a solution, and it should look like.

another_query_body = {
    "query": {
        "query_string" : {
            "query": "(master) AND (*test) AND (test)",
            "fields": ["", "beat.hostname" , "tags"]

As you probably guessed, each field maps to a query entry.


machine learning

Getting interactive help in IPython


I want to share with you a simple trick that I saw in a training course related to objects and classes functionality in IPython.

If you want to see a short description of the object or class you are using in your notebook please use , for example, if you just imported Elasticsearch from the elasticsearch module, the following

And if you want more details, you can use it like this, it will actually show you the code 🙂

I tried to do that also with DataFrame but it seems that it works only on already created objects

And for the more detailed look, you can try it yourself.

Here is also a link to more experienced people


machine learning python

My introduction to Linear Regression, so far


Here are some first steps that I want to share with you from my experience with regressions.

I am new and I took it to step by step, so don’t expect anything new or either very complex.

First thing, first, we started working on some prediction “algorithms” that should work with data available in the operations domain.

We needed to have them stored in a centralized location, and it happens that they are sent to ELK. So, the first step is to query then from that location.

To do that, there is a python client library with a lot of options that I am still beginning to explore. Cutting to the point, to have a regression you need a correlation parameter between the dependent and independent variable, so we thought at first about links between the number of connection and memory usage of a specific service (for example Redis). And this is available with some simple lines of code in Jupyter:

from elasticsearch import Elasticsearch
import matplotlib.pyplot as plt
from import json_normalize
es=Elasticsearch([{'host':'ELK_IP','port':'ELK_PORT'}])"metricbeat-redis-*", body={"query": {"match": {'': "info" }}}, size=1000)
df_redis = json_normalize(res_redis['hits']['hits'])
df_redis_filtered = df_redis[['','']]
df_redis_filtered[''] = df_redis_filtered[''] / 10**6
df_redis_final = df_redis_filtered[df_redis_filtered[''] < 300]

For a little bit of explaining, the used memory needs to be divided to ten to the sixth power in order to transform from bytes to MBytes, and also I wanted to exclude values of memory over 300MB. All good, unfortunately, if you plot the correlation “matrix” between these params, this happens:

As far as we all should know, a correlation parameter should be as close as possible to 1 or -1, but it’s just not the case.

And if you want to start plotting, it will look something like:

So, back to the drawing board, and we now know that we have no clue which columns are correlated. Let us not filter the columns and just remove those that are non-numeric or completely filled with zeros.

I used this to manipulate the data as simple as possible:

df_redis_numeric = df_redis.select_dtypes(['number'])
df_redis_cleaned = df_redis_numeric.loc[:, '': '' ]
df_redis_final = df_redis_cleaned.loc[:, (df_redis_cleaned != 0).any(axis=0)]

And it will bring you a very large matrix with a lot of rows and columns. From that matrix, you can choose two data types that are more strongly correlated. In my example [‘’,’’]

If we plot the correlation matrix just for those two colums we are much better than at the start.

So we are better than before, and we can now start thinking of plotting a regression, and here is the code for that.

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
import pandas as pd

x = df_redis_cpu['']
y = df_redis_cpu['']

x = x.values.reshape(-1, 1)
y = y.values.reshape(-1, 1)

x_train = x[:-250]
x_test = x[-250:]

y_train = y[:-250]
y_test = y[-250:]

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets, y_train)

# Plot outputs
plt.plot(x_test, regr.predict(x_test), color='red',linewidth=3)
plt.scatter(x_test, y_test,  color='black')
plt.title('Test Data')

Our DataFrame contains 1000 records from which I used 750 to “train” and another 250 to “test”. The output looked this way

It looks more like a regression, however, what concerns me is the mean square error which is a little bit high.

So we will need to works further on the DataFrame 🙂

In order for the linear model to be applied with scikit, the input and output data are transformed into single dimension vectors. If you want to switch back and for example to create a DataFrame from the output of the regression and the actual samples from ELK, it can be done this way:

data = np.append(np.array(y_test), np.array(y_pred), axis = 1)
dataset = pd.DataFrame({'test': data[:, 0], 'pred': data[:, 1]})
dataset['pred'] = dataset['pred'].map(lambda x: '%.2f' % x)

That is all.


cloud puppet

Install zookeeper using puppet without module


In this post, I was given the task to provide a standalone zookeeper cluster with basic auth on the latest version.

The reason that happened is that we are using a very old module on our Kafka clusters and a new requirement appeared to install the latest version of 3.5.5.

The old module had only the possibility to install the package from apt repo, which was not an option since the last version available on Ubuntu Xenial is at least two years old.

To complete this task, a different method was required. I would have to grab it with wget and add the rest of the files to make it functional.

Let us start with the puppet manifest and from that, I will add the rest.

class zookeeperstd {
  $version = hiera("zookeeperstd::version","3.5.5")
  $authenabled = hiera("zookeeperstd::authenabled",false)
  $server_jvm_flags = hiera('zookeeperstd::jvm_flags', undef)
    group { 'zookeeper':
        ensure => 'present',
    user {'zookeeper':
        ensure => 'present',
        home => '/var/lib/zookeeper',
        shell => '/bin/false',
    wget::fetch { 'zookeeper':
        source      => "${version}-bin.tar.gz",
        destination => "/opt/apache-zookeeper-${version}-bin.tar.gz",
        } ->
    archive { "/opt/apache-zookeeper-${version}-bin.tar.gz":
        creates      => "/opt/apache-zookeeper-${version}-bin",
        ensure        => present,
        extract       => true,
        extract_path  => '/opt',
        cleanup       => true,
    } ->
    file { "/opt/apache-zookeeper-${version}-bin":
        ensure    => directory,
        owner     => 'zookeeper',
        group      => 'zookeeper',
        require     => [ User['zookeeper'], Group['zookeeper'], ],
        recurse => true,
    } ->
    file { '/opt/zookeeper/':
        ensure    => link,
        target    => "/opt/apache-zookeeper-${version}-bin",
        owner     => 'zookeeper',
        group      => 'zookeeper',
        require     => [ User['zookeeper'], Group['zookeeper'], ],
    file { '/var/lib/zookeeper':
        ensure    => directory,
        owner     => 'zookeeper',
        group      => 'zookeeper',
        require     => [ User['zookeeper'], Group['zookeeper'], ],
        recurse    => true,
# in order to know which servers are in the cluster a role fact needs to be defined on each machine
    $hostshash = query_nodes(" v1_role='zookeeperstd'").sort
    $hosts_hash = $ |$value| { [$value, seeded_rand(254, $value)+1] }.hash
    $overide_hosts_hash = hiera_hash('profiles_opqs::kafka_hosts_hash', $hosts_hash)
    $overide_hosts = $overide_hosts_hash.keys.sort
    if $overide_hosts_hash.size() != $overide_hosts_hash.values.unique.size() {
        #notify {"Duplicate IDs detected! ${overide_hosts_hash}": }
        $overide_hosts_hash2 = $ |$index, $value| { [$value, $index+1] }.hash
  } else {
        $overide_hosts_hash2 = $overide_hosts_hash
	$hosts = $overide_hosts_hash2
	$data_dir = "/var/lib/zookeeper"
	$tick_time        = 2000
        $init_limit       = 10
        $sync_limit       = 5

	$myid = $hosts[$::fqdn]
    file { '/var/lib/zookeeper/myid':
        content => "${myid}",

	file { '/opt/zookeeper/conf/zoo.cfg':
        content => template("${module_name}/zoo.cfg.erb"),
   if $authenabled {
    $superpass        = hiera("zookeeperstd::super_pass", 'super-admin')
    $zoopass          = hiera("zookeeperstd::zookeeper_pass", 'zookeeper-admin')
    $clientpass        = hiera("zookeeperstd::client_pass", 'client-admin')
    file { '/opt/zookeeper/conf/zoo_jaas.config':
        content => template("${module_name}/zoo_jaas.config.erb"),
     file { '/opt/zookeeper/conf/java.env':
        content => template("${module_name}/java.zookeeper.env.erb"),
        mode => "0755",
     file { '/opt/zookeeper/conf/':
        content => template("${module_name}/"),
    file {'/etc/systemd/system/zookeeper.service':
        source  => 'puppet:///modules/work/zookeeper.service',
        mode => "644",
        } ->
    service { 'zookeeper':
        ensure   => running,
        enable   => true,
        provider => systemd,

As far as I managed to adapt some file from the existing module, here are the rest of the additional details.

# Note: This file is managed by Puppet.


# specify all zookeeper servers
# The fist port is used by followers to connect to the leader
# The second one is used for leader election
if @hosts
# sort hosts by myid and output a server config
# for each host and myid.  (sort_by returns an array of key,value tuples)
@hosts.sort_by { |name, id| id }.each do |host_id|
server.<%= host_id[1] %>=<%= host_id[0] %>:2182:2183
<% if @authenabled -%>
authProvider.<%= host_id[1] %>=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
<% end -%>
<% end -%>
<% end -%>

# the port at which the clients will connect

# the directory where the snapshot is stored.
dataDir=<%= @data_dir %>

# Place the dataLogDir to a separate physical disc for better performance
<%= @data_log_dir ? "dataLogDir=#{data_log_dir}" : '# dataLogDir=/disk2/zookeeper' %>

# The number of milliseconds of each tick.
tickTime=<%= @tick_time %>

# The number of ticks that the initial
# synchronization phase can take.
initLimit=<%= @init_limit %>

# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=<%= @sync_limit %>

# To avoid seeks ZooKeeper allocates space in the transaction log file in
# blocks of preAllocSize kilobytes. The default block size is 64M. One reason
# for changing the size of the blocks is to reduce the block size if snapshots
# are taken more often. (Also, see snapCount).

# Clients can submit requests faster than ZooKeeper can process them,
# especially if there are a lot of clients. To prevent ZooKeeper from running
# out of memory due to queued requests, ZooKeeper will throttle clients so that
# there is no more than globalOutstandingLimit outstanding requests in the
# system. The default limit is 1,000.ZooKeeper logs transactions to a
# transaction log. After snapCount transactions are written to a log file a
# snapshot is started and a new transaction log file is started. The default
# snapCount is 10,000.

# If this option is defined, requests will be will logged to a trace file named

# Leader accepts client connections. Default value is "yes". The leader machine
# coordinates updates. For higher update throughput at thes slight expense of
# read throughput the leader can be configured to not accept clients and focus
# on coordination.

<% if @authenabled -%>


<% end -%> 
QuorumServer {
       org.apache.zookeeper.server.auth.DigestLoginModule required
       user_zookeeper="<%= @zoopass %>";
QuorumLearner {
       org.apache.zookeeper.server.auth.DigestLoginModule required
       password="<%= @zoopass %>";

Server {
       org.apache.zookeeper.server.auth.DigestLoginModule required
       user_super="<%= @superpass %>"
       user_client="<%= @clientpass %>";
SERVER_JVMFLAGS="<%= @server_jvm_flags %>"
# Note: This file is managed by Puppet.

# ZooKeeper Logging Configuration

# Format is "<default threshold> (, <appender>)+

log4j.rootLogger=${zookeeper.root.logger}, ROLLINGFILE

# Log INFO level and above messages to the console
log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n

# Add ROLLINGFILE to rootLogger to get log file output
#    Log INFO level and above messages to a log file

# Max log file size of 10MB
# Keep only 10 files
log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n

And the last but not the least.

Description=ZooKeeper Service

ExecStart=/opt/zookeeper/bin/ start /opt/zookeeper/conf/zoo.cfg
ExecStop=/opt/zookeeper/bin/ stop /opt/zookeeper/conf/zoo.cfg
ExecReload=/opt/zookeeper/bin/ restart /opt/zookeeper/conf/zoo.cfg


Also, if you want to enable simple MD5 authentication, in hiera you will need to add the following two lines.

zookeeperstd::authenabled: true
zookeeperstd::jvm_flags: ""

If there is a simpler approach, feel free to leave me a message on Linkedin or Twitter.



Logs check without ELK :)


We didn’t have the time to implement ELK stack for Kafka logs so if a issue appears it should be done the old fashion way.

To that purpose, here are two commands that should help you surfing the logs in an easy manner.

First of all, there is the grep command that should show you the hole line and number.

A simple example looks like

grep -nw "2019-06-03" server.log

This should show you all the lines with date 03.06 from the log of the Kafka broker. The idea is that you can not use it with the standard construct cat server.log | grep -nw “[string]”. It must be used in this specific format.

Once you found the line number (and it could look just like 95138:java.lang.OutOfMemoryError: Java heap space there is the less command that we can use.

less +95138 server.log

And that should give you the line.

Thanks all folks!