Month: May 2020

  • Prometheus metrics to Pandas data frame

    Hi,

    We are trying to implement a decision tree algorithm in order to see if our resource usage can classify our servers in different categories.

    First step in that process is querying Prometheus from Python and create some data frames with some basic information in order to get them aggregated.

    To that purpose, you can also use the following lines of code:

    import requests
    import copy 
    
    URL = "http://[node_hostname]:9090/api/v1/query?query=metric_to_be_quried[1d]"
      
    r = requests.get(url = URL) 
    
    data = r.json()
    
    data_dict={}
    metric_list = []
    for i in data['data']['result']:
        data_dict = copy.deepcopy(i['metric'])
        for j in i['values']:
            data_dict['time'] = j[0]
            data_dict['value'] = j[1]
            metric_list.append(data_dict)    
    
    df_metric = pd.DataFrame(metric_list)

    Other pieces will follow.

    Cheers

  • Renice until cgroup implementation for process of Yahoo CMAK

    Hi,

    We saw that ex Kafka Manager, now called Yahoo CMAK was using more than enough CPU in some cases, in general related to bad SSL client config.

    It’s not really clear if the CPU usage was real or there was only wait time for resource like memory or I/O (I don’t have an example to post right now, but there are multiple fixes for this).

    The easiest one is to change the nice value for usage. What I observed is that normally it starts with nice value of 0. I guess this is default. General check for this works with

    ps ax -o ni,cmd | grep cmak | grep -v grep

    In order to change this, you can add a crontab line with following command:

    pid=`ps ax -o pid,cmd | grep cmak | grep -v grep |  awk {'print $1'}`; ni=`ps ax -o ni,cmd | grep cmak | grep -v grep |  awk {'print $1'}`; if [ "$ni" = "0" ]; then renice 10 $pid; fi

    Or, even easier than that, add Nice value under [Service] in /etc/systemd/system/multi-user.target.wants/kafka-manager.service

    It does the trick until further cgroup policies are applied.