Categories
cloud machine learning python

Prometheus metrics to Pandas data frame

Hi,

We are trying to implement a decision tree algorithm in order to see if our resource usage can classify our servers in different categories.

First step in that process is querying Prometheus from Python and create some data frames with some basic information in order to get them aggregated.

To that purpose, you can also use the following lines of code:

import requests
import copy 

URL = "http://[node_hostname]:9090/api/v1/query?query=metric_to_be_quried[1d]"
  
r = requests.get(url = URL) 

data = r.json()

data_dict={}
metric_list = []
for i in data['data']['result']:
    data_dict = copy.deepcopy(i['metric'])
    for j in i['values']:
        data_dict['time'] = j[0]
        data_dict['value'] = j[1]
        metric_list.append(data_dict)    

df_metric = pd.DataFrame(metric_list)

Other pieces will follow.

Cheers

Categories
linux

Renice until cgroup implementation for process of Yahoo CMAK

Hi,

We saw that ex Kafka Manager, now called Yahoo CMAK was using more than enough CPU in some cases, in general related to bad SSL client config.

It’s not really clear if the CPU usage was real or there was only wait time for resource like memory or I/O (I don’t have an example to post right now, but there are multiple fixes for this).

The easiest one is to change the nice value for usage. What I observed is that normally it starts with nice value of 0. I guess this is default. General check for this works with

ps ax -o ni,cmd | grep cmak | grep -v grep

In order to change this, you can add a crontab line with following command:

pid=`ps ax -o pid,cmd | grep cmak | grep -v grep |  awk {'print $1'}`; ni=`ps ax -o ni,cmd | grep cmak | grep -v grep |  awk {'print $1'}`; if [ "$ni" = "0" ]; then renice 10 $pid; fi

Or, even easier than that, add Nice value under [Service] in /etc/systemd/system/multi-user.target.wants/kafka-manager.service

It does the trick until further cgroup policies are applied.

Categories
cloud newtools puppet

Datadog and GCP are “friends” up to a point

Hi,

Since in the last period I preferred to publish more on Medium, let me give you the link to the latest article.

There is an interesting case in which the combination of automation, Goggle Cloud Platform and Datadog didn’t go as we expected.

https://medium.com/metrosystemsro/puppet-datadog-google-cloud-platform-recipe-for-a-small-outage-310166e551f1

Hope you enjoy! I will get back with more also with interesting topics on this blog also.

Cheers

Categories
cloud python

Optimizing VM costs in GCP

Hi,

I recently published on Medium the hole story related to https://log-it.tech/2019/11/26/using-gcp-recommender-api-for-compute-engine/

You can find it at https://medium.com/metrosystemsro/new-ground-what-about-optimizing-the-size-of-machines-87855fbab9ef

Enjoy the read!

Categories
cloud puppet

Overriding OS fact with external one

Hi,

Short notice article. We had a issue in which the traefik module code was not running because of a wrong os fact. Although the image is Ubuntu 14.04, facter returns it like:

{
  architecture => "amd64",
  family => "Debian",
  hardware => "x86_64",
  name => "Debian",
  release => {
    full => "jessie/sid",
    major => "jessie/sid"
  },
  selinux => {
    enabled => false
  }
}

I honestly don’t know why this happens since on rest of machines it works good, the way to fix it fast is by defining an external fact in /etc/facter/facts.d

Create a file named os_fact.json, for example, that will contain this content:

{ 
   "os":{ 
      "architecture":"amd64",
      "distro":{ 
         "codename":"trusty",
         "description":"Ubuntu 14.04.6 LTS",
         "id":"Ubuntu",
         "release":{ 
            "full":"14.04",
            "major":"14.04"
         }
      },
      "family":"Debian",
      "hardware":"x86_64",
      "name":"Ubuntu",
      "release":{ 
         "full":"14.04",
         "major":"14.04"
      },
      "selinux":{ 
         "enabled":"false"
      }
   }
}

And it’s fixed.

Cheers

Categories
machine learning

Starting AIOps journey – first step

There is a learning program in our company focused on gaining knowledge for “AI era”

To that purpose we played a little bit with some performance data and came to some conclusions.

I invite you to take a look

Categories
puppet

Duplicate exported resources on puppet by mistake

We had a strange problem in our test environment the other day. There is a need to share an authorized key in order for the ssh connectivity to be available.

The way we shared the file resource was straight forward.

  @@file {"/home/kafka/.ssh/authorized_keys":
    ensure => present,
    mode => '0600',
    owner => 'kafka',
    group => 'kafka',
    content => "${::sharedkey}",
    tag => "${::tagvalue}",
  }

The tag value variable was a fact unique to each Kafka cluster.

However, each time we executed puppet, the following error the following error was present:

08:38:20 Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: A duplicate resource was found while collecting exported resources, with the type and title File[/home/kafka/.ssh/authorized_keys] on node [node_name]

We had a couple of days at our disposal to play with the puppet DB, nothing relevant came from it

This behavior started after provisioning a second cluster named similar also with SSL enabled.

After taking a look on the official Puppet documentation (https://puppet.com/docs/puppet/latest/lang_exported.html – check the caution clause), it was clear that the naming of resource should not be the same.

The problem hadn’t appear on any of our clusters since now, so this was strange to say the least.

For whatever reason, the tag was not taken into consideration.

And we know that because resources shared on both nodes were put everywhere, there was no filtering.

Solution:

Quick fix was done with following modifications.

  @@file {"/home/kafka/.ssh/authorized_keys_${::clusterid}":
    path => "/home/kafka/.ssh/authorized_keys",
    ensure => present,
    mode => '0600',
    owner => 'kafka',
    group => 'kafka',
    content => "${::sharedkey}",
    tag => "${::clusterid}",
  }

So now there is an individual file per cluster, and we also have a tag that is recognized in order to filter the shared file that we need on our server.

Filtering will be done like File <<| tag == "${::clusterid}" |>>

Cheers!

Categories
cloud puppet python

Strange problem in puppet run for Ubuntu

Hi,

Short sharing of a strange case.

We’ve written a small manifest in order to distribute some python scripts. You can find the reference here: https://medium.com/metrosystemsro/new-ground-automatic-increase-of-kafka-lvm-on-gcp-311633b0816c

When you try to run it on Ubuntu 14.04, there is this very strange error:

Error: Failed to apply catalog: [nil, nil, nil, nil, nil, nil]

The cause for this is as follows:

Python 3.4.3 (default, Nov 12 2018, 22:25:49)
[GCC 4.8.4] on linux (and I believe this is the default max version on trusty)

In order to install the dependencies, you need python3-pip, so a short search returns following options:

apt search python3-pip
Sorting... Done
Full Text Search... Done
python3-pip/trusty-updates,now 1.5.4-1ubuntu4 all [installed]
  alternative Python package installer - Python 3 version of the package

python3-pipeline/trusty 0.1.3-3 all
  iterator pipelines for Python 3

If we want to list all the installed modules with pip3 list, guess what, it’s not working:

Traceback (most recent call last):
   File "/usr/bin/pip3", line 5, in 
     from pkg_resources import load_entry_point
   File "/usr/local/lib/python3.4/dist-packages/pkg_resources/init.py", line 93, in 
     raise RuntimeError("Python 3.5 or later is required")
 RuntimeError: Python 3.5 or later is required

So, main conclusion is that it’s not related to puppet, just the incompatibility between version for this old distribution.

Cheers

Categories
python

Small addition for ‘cat’ in Python

Hi,

There was a issue on options that aggregate any other ones, like -A for my previous post

In my view the easiest way to solve it is by storing the options in a tuple.

Here is the snippet

run_options = []
try:
    opts, args = getopt.gnu_getopt(sys.argv[1:-1], 'AbeEnstTv', ['show-all', 'number-nonblank', 'show-ends', 'number', 'show-blank', 'squeeze-blank' 'show-tabs', 'show-nonprinting', 'help', 'version'])
except getopt.GetoptError:
     print("Something went wrong")
     sys.exit(2)
for opt, arg in opts:
    if opt in ('-A','--show-all'):
        run_options.append('E')
        run_options.append('T')
    elif opt in ('-b', '--number-nonblank'):
        run_options.append('b')
    elif opt in ('-n', '--number'):
        run_options.append('n')
    elif opt in ('-E', '--show-ends'):
        run_options.append('E')
    elif opt in ('-s', '--squeeze-blank'):
        run_options.append('s')
    elif opt in ('-T', '--show-tabs'):
        run_options.append('T')
 
final_run_options = tuple(run_options)
for element in final_run_options:
    if element == 'b':
        content_list = number_nonempty_lines(content_list)
    elif element == 'n':
        content_list = number_all_lines(content_list)   
    elif element == 'E':
        content_list = display_endline(content_list)
    elif element == 's':
        content_list = squeeze_blanks(content_list)
    elif element == 'T':
        content_list = show_tabs(content_list)

So basically, you store the actual cases in a list which you convert to a tuple to eliminate duplicates. Once you have the final case, you parse it and change the actual content option by option.

I didn’t have the time to test it but there is no big reason why it should’t work.

Cheers

Categories
python

Linux ‘cat’ in Python – almost complete

Morning,

Since I am striving to find useful content to post more often, I took homework for a ‘cat’ written in Python.

It’s not elegant, and it’s not the best version but it works.

# -*- coding: utf-8 -*-
"""
Created on Wed Dec 25 10:28:39 2019
@author: Sorin
"""
import sys,getopt,os
if os.path.isabs(sys.argv[-1:][0]):
    FILENAME= sys.argv[-1:][0]
else:
    FILENAME = os.getcwd() + "\\" + sys.argv[-1:][0]

def read_content(filename):
    try:
        f = open(filename, "r+")
        content = f.read()
        f.close()
    except IOError as e:
            print("File could not be opened:", e)
            sys.exit(3)
    return content
    
def transform_content():
    content = read_content(FILENAME)
    content_list = content.split('\n')
    return content_list
def number_nonempty_lines(content_list):
    i = 0
    for line in content_list:
        if line != '':
            content_list[i] = str(i) + " " + line
        i = i + 1
    return content_list
def squeeze_blanks(content_list):   
    i = 0
    duplicate_index = []
    for line in content_list:
        if (line == "" or line == "$")  or (str.isdigit(line.split(' ')[0]) and (line.split(' ')[-1] == "" or line.split(' ')[-1] == "$")):
           duplicate_index.append(i+1)
        i = i + 1
    delete_index = []
    for j in range(len(duplicate_index) - 1):
        if  duplicate_index[j] + 1 == duplicate_index[j+1]:
            delete_index.append(duplicate_index[j])
    for element in delete_index:
        content_list.pop(element)
    return content_list    
        
def number_all_lines(content_list):
    i = 0
    for line in content_list:
        content_list[i] = str(i) + " " + line
        i = i + 1
    return content_list

def display_endline(content_list):
   return [line + "$" for line in content_list]

def show_tabs(content_list):
    print(content_list)
    content_list = [ line.replace('\t','^I') for line in content_list]
    return content_list

content_list =transform_content()
try:
    opts, args = getopt.gnu_getopt(sys.argv[1:-1], 'AbeEnstTv', ['show-all', 'number-nonblank', 'show-ends', 'number', 'show-blank', 'squeeze-blank' 'show-tabs', 'show-nonprinting', 'help', 'version'])
except getopt.GetoptError:
     print("Something went wrong")
     sys.exit(2)
for opt, arg in opts:
    if opt in ('-A','--show-all'):
        content_list = display_endline(content_list)
        content_list = show_tabs(content_list)
    elif opt in ('-b', '--number-nonblank'):
       content_list = number_nonempty_lines(content_list)
    elif opt in ('-n', '--number'):
       content_list = number_all_lines(content_list)
    elif opt in ('-E', '--show-ends'):
        content_list = display_endline(content_list)
    elif opt in ('-s', '--squeeze-blank'):
        content_list = squeeze_blanks(content_list)
    elif opt in ('-T', '--show-tabs'):
        content_list = show_tabs(content_list)
print('\n'.join(content_list))

Further improvements will be also posted. I must confess that there are still a couple of things to be fixed, like not running the same options twice, and the issue of putting it to work on very large files, but it will do in this form for now.

Cheers