Categories
python

Unique value on columns – pandas

Hi,

Today is a short example on cases that have longer columns with spaces.

For example. I have a dataframe that has the following columns:

I have read in some sources that you can use the construction wine_new.[column name].unique() to filter the values.

If you have a one word column, it will work, but if the column is listed as multiple words, you can not use a construct like wine_new.’Page ID’.unique() because it will give a syntax error.

Good, so you try to rename it. why Page ID and not pageid? Ok, that should be easy

wine_new = wine_new.rename(columns={"Page ID": "pageid"}, errors="raise")

And it now looks “better”.

But if you need to keep the column name, you can just as easily use wine_new[‘Page ID’].unique() (If you want to count the number of unique values you can also use wine_new[‘Page ID’].nunique())

There are multiple resources on this topic but the approach is not explained using both of the versions on the majority of them.

Cheers

Categories
cloud machine learning python

Prometheus metrics to Pandas data frame

Hi,

We are trying to implement a decision tree algorithm in order to see if our resource usage can classify our servers in different categories.

First step in that process is querying Prometheus from Python and create some data frames with some basic information in order to get them aggregated.

To that purpose, you can also use the following lines of code:

import requests
import copy 

URL = "http://[node_hostname]:9090/api/v1/query?query=metric_to_be_quried[1d]"
  
r = requests.get(url = URL) 

data = r.json()

data_dict={}
metric_list = []
for i in data['data']['result']:
    data_dict = copy.deepcopy(i['metric'])
    for j in i['values']:
        data_dict['time'] = j[0]
        data_dict['value'] = j[1]
        metric_list.append(data_dict)    

df_metric = pd.DataFrame(metric_list)

Other pieces will follow.

Cheers

Categories
cloud python

Optimizing VM costs in GCP

Hi,

I recently published on Medium the hole story related to https://log-it.tech/2019/11/26/using-gcp-recommender-api-for-compute-engine/

You can find it at https://medium.com/metrosystemsro/new-ground-what-about-optimizing-the-size-of-machines-87855fbab9ef

Enjoy the read!

Categories
cloud puppet python

Strange problem in puppet run for Ubuntu

Hi,

Short sharing of a strange case.

We’ve written a small manifest in order to distribute some python scripts. You can find the reference here: https://medium.com/metrosystemsro/new-ground-automatic-increase-of-kafka-lvm-on-gcp-311633b0816c

When you try to run it on Ubuntu 14.04, there is this very strange error:

Error: Failed to apply catalog: [nil, nil, nil, nil, nil, nil]

The cause for this is as follows:

Python 3.4.3 (default, Nov 12 2018, 22:25:49)
[GCC 4.8.4] on linux (and I believe this is the default max version on trusty)

In order to install the dependencies, you need python3-pip, so a short search returns following options:

apt search python3-pip
Sorting... Done
Full Text Search... Done
python3-pip/trusty-updates,now 1.5.4-1ubuntu4 all [installed]
  alternative Python package installer - Python 3 version of the package

python3-pipeline/trusty 0.1.3-3 all
  iterator pipelines for Python 3

If we want to list all the installed modules with pip3 list, guess what, it’s not working:

Traceback (most recent call last):
   File "/usr/bin/pip3", line 5, in 
     from pkg_resources import load_entry_point
   File "/usr/local/lib/python3.4/dist-packages/pkg_resources/init.py", line 93, in 
     raise RuntimeError("Python 3.5 or later is required")
 RuntimeError: Python 3.5 or later is required

So, main conclusion is that it’s not related to puppet, just the incompatibility between version for this old distribution.

Cheers

Categories
python

Small addition for ‘cat’ in Python

Hi,

There was a issue on options that aggregate any other ones, like -A for my previous post

In my view the easiest way to solve it is by storing the options in a tuple.

Here is the snippet

run_options = []
try:
    opts, args = getopt.gnu_getopt(sys.argv[1:-1], 'AbeEnstTv', ['show-all', 'number-nonblank', 'show-ends', 'number', 'show-blank', 'squeeze-blank' 'show-tabs', 'show-nonprinting', 'help', 'version'])
except getopt.GetoptError:
     print("Something went wrong")
     sys.exit(2)
for opt, arg in opts:
    if opt in ('-A','--show-all'):
        run_options.append('E')
        run_options.append('T')
    elif opt in ('-b', '--number-nonblank'):
        run_options.append('b')
    elif opt in ('-n', '--number'):
        run_options.append('n')
    elif opt in ('-E', '--show-ends'):
        run_options.append('E')
    elif opt in ('-s', '--squeeze-blank'):
        run_options.append('s')
    elif opt in ('-T', '--show-tabs'):
        run_options.append('T')
 
final_run_options = tuple(run_options)
for element in final_run_options:
    if element == 'b':
        content_list = number_nonempty_lines(content_list)
    elif element == 'n':
        content_list = number_all_lines(content_list)   
    elif element == 'E':
        content_list = display_endline(content_list)
    elif element == 's':
        content_list = squeeze_blanks(content_list)
    elif element == 'T':
        content_list = show_tabs(content_list)

So basically, you store the actual cases in a list which you convert to a tuple to eliminate duplicates. Once you have the final case, you parse it and change the actual content option by option.

I didn’t have the time to test it but there is no big reason why it should’t work.

Cheers

Categories
python

Linux ‘cat’ in Python – almost complete

Morning,

Since I am striving to find useful content to post more often, I took homework for a ‘cat’ written in Python.

It’s not elegant, and it’s not the best version but it works.

# -*- coding: utf-8 -*-
"""
Created on Wed Dec 25 10:28:39 2019
@author: Sorin
"""
import sys,getopt,os
if os.path.isabs(sys.argv[-1:][0]):
    FILENAME= sys.argv[-1:][0]
else:
    FILENAME = os.getcwd() + "\\" + sys.argv[-1:][0]

def read_content(filename):
    try:
        f = open(filename, "r+")
        content = f.read()
        f.close()
    except IOError as e:
            print("File could not be opened:", e)
            sys.exit(3)
    return content
    
def transform_content():
    content = read_content(FILENAME)
    content_list = content.split('\n')
    return content_list
def number_nonempty_lines(content_list):
    i = 0
    for line in content_list:
        if line != '':
            content_list[i] = str(i) + " " + line
        i = i + 1
    return content_list
def squeeze_blanks(content_list):   
    i = 0
    duplicate_index = []
    for line in content_list:
        if (line == "" or line == "$")  or (str.isdigit(line.split(' ')[0]) and (line.split(' ')[-1] == "" or line.split(' ')[-1] == "$")):
           duplicate_index.append(i+1)
        i = i + 1
    delete_index = []
    for j in range(len(duplicate_index) - 1):
        if  duplicate_index[j] + 1 == duplicate_index[j+1]:
            delete_index.append(duplicate_index[j])
    for element in delete_index:
        content_list.pop(element)
    return content_list    
        
def number_all_lines(content_list):
    i = 0
    for line in content_list:
        content_list[i] = str(i) + " " + line
        i = i + 1
    return content_list

def display_endline(content_list):
   return [line + "$" for line in content_list]

def show_tabs(content_list):
    print(content_list)
    content_list = [ line.replace('\t','^I') for line in content_list]
    return content_list

content_list =transform_content()
try:
    opts, args = getopt.gnu_getopt(sys.argv[1:-1], 'AbeEnstTv', ['show-all', 'number-nonblank', 'show-ends', 'number', 'show-blank', 'squeeze-blank' 'show-tabs', 'show-nonprinting', 'help', 'version'])
except getopt.GetoptError:
     print("Something went wrong")
     sys.exit(2)
for opt, arg in opts:
    if opt in ('-A','--show-all'):
        content_list = display_endline(content_list)
        content_list = show_tabs(content_list)
    elif opt in ('-b', '--number-nonblank'):
       content_list = number_nonempty_lines(content_list)
    elif opt in ('-n', '--number'):
       content_list = number_all_lines(content_list)
    elif opt in ('-E', '--show-ends'):
        content_list = display_endline(content_list)
    elif opt in ('-s', '--squeeze-blank'):
        content_list = squeeze_blanks(content_list)
    elif opt in ('-T', '--show-tabs'):
        content_list = show_tabs(content_list)
print('\n'.join(content_list))

Further improvements will be also posted. I must confess that there are still a couple of things to be fixed, like not running the same options twice, and the issue of putting it to work on very large files, but it will do in this form for now.

Cheers

Categories
cloud kafka puppet python

Automatic increase of Kafka LVM on GCP

I wrote an article for my company that was published on Medium regarding the topic in the subject. Please see the link

https://medium.com/metrosystemsro/new-ground-automatic-increase-of-kafka-lvm-on-gcp-311633b0816c

Thanks

Categories
cloud python

Using GCP recommender API for Compute engine

Let’s keep it short. If you want to use Python libraries for Recommender API, this is how you connect to your project.

from google.cloud.recommender_v1beta1 import RecommenderClient
from google.oauth2 import service_account
def main():
    credential = service_account.Credentials.from_service_account_file('account.json')
    project = "internal-project"
    location = "europe-west1-b"
    recommender = 'google.compute.instance.MachineTypeRecommender'
    client = RecommenderClient(credentials=credential)
    name = client.recommender_path(project, location, recommender)
   
    elements = client.list_recommendations(name,page_size=4)
    for i in elements:
        print(i)

main()

credential = RecommenderClient.from_service_account_file(‘account.json’) will not return any error, just hang.

That’s all folks!

Categories
cloud python

ELK query using Python with time range

Short post. Sharing how you make an ELK query from Python using also timestamp:

es=Elasticsearch([{'host':'[elk_host','port':elk_port}])

query_body_mem = {
    "query": {
        "bool" : {
            "must" : [
                    {
                        "query_string" : {
                        "query": "metricset.module:system metricset.name:memory AND tags:test AND host.name:[hostname]"
                    }
                },
                {
                         "range" : {
                            "@timestamp" : {
                                "gte" : "now-2d",
                                "lt" :  "now"
            }
        
        }
   
                }
            ]
        }
   
    }
    
}

res_mem=es.search(index="metricbeat-*", body=query_body_mem, size=500)
df_mem = json_normalize(res_mem['hits']['hits'])

And that’s all!

Cheers

Categories
cloud newtools python

Multiple field query in ELK from Python

Morning,

There are a lot of pages on how to query ELK stack from Python client library, however, it’s still hard to grab a useful pattern.

What I wanted is to translate some simple query in Kibana like redis.info.replication.role:master AND beat.hostname:*test AND tags:test into a useful Query DSL JSON.

It’s worth mentioning that the Python library uses this DSL. Once you have this info, things get much simpler.

Well, if you search hard enough, you will find a solution, and it should look like.

another_query_body = {
    "query": {
        "query_string" : {
            "query": "(master) AND (*test) AND (test)",
            "fields": ["redis.info.replication.role", "beat.hostname" , "tags"]
        }
    }
}

As you probably guessed, each field maps to a query entry.

Cheers