• Recover swap file in vim

    Hi,

    This is a problem that I had since my virtual machine was not stopped properly and my ssh connection was ended prematurely.

    https://superuser.com/questions/204209/how-can-i-recover-the-original-file-from-a-swp-file/205131

    If you have a file.swp and you want to recover it, do as they say. Open the file in VIM and then type recover.

    Cheers

  • Exclusive SASL on Zookeeper connections

    Something related to following article. It seems that even if SASL is configured until version 3.6.1, Zookeeper will still allow anonymous connections and actions.

    There is now a new configuration available that will restrict such events and you can find it documented on the official Apache Zookeeper administration guide (zookeeper.sessionRequireClientSASLAuth)

    The main catch is that it’s not suppose to be configured in zoo.cfg file, but added as a parameter in java.env as a part of SERVER_JVMFLAGS variable.

    The old variable which was

    zookeeperstd::jvm_flags: "-Djava.security.auth.login.config=/opt/zookeeper/conf/zoo_jaas.config"

    will become

    zookeeperstd::jvm_flags: "-Djava.security.auth.login.config=/opt/zookeeper/conf/zoo_jaas.config -Dzookeeper.allowSaslFailedClients=false -Dzookeeper.sessionRequireClientSASLAuth=true"

    After this is implemented, when you try to connect using zkCli.sh, it will let you, but when trying to list the main node of resource tree it won’t work.

    Example:

    Connecting to localhost:2181
    Welcome to ZooKeeper!
    JLine support is enabled
    
    WATCHER::
    
    WatchedEvent state:SyncConnected type:None path:null
    [zk: localhost:2181(CONNECTED) 0] ls /
    KeeperErrorCode = Session closed because client failed to authenticate for /
    [zk: localhost:2181(CONNECTED) 1] 
    

    The same thing happens if you use zkCli.sh -server [hostname]:2181

    In order to connect you will have to add to java.env a line with:

    CLIENT_JVMFLAGS=-Djava.security.auth.login.config=/opt/zookeeper/conf/client_jaas.config"

    Client file that includes structure

    Client {
           org.apache.zookeeper.server.auth.DigestLoginModule required
           username="[client_username]"
           password="[client_password]";
    };

    Cheers

  • Unique value on columns – pandas

    Hi,

    Today is a short example on cases that have longer columns with spaces.

    For example. I have a dataframe that has the following columns:

    I have read in some sources that you can use the construction wine_new.[column name].unique() to filter the values.

    If you have a one word column, it will work, but if the column is listed as multiple words, you can not use a construct like wine_new.’Page ID’.unique() because it will give a syntax error.

    Good, so you try to rename it. why Page ID and not pageid? Ok, that should be easy

    wine_new = wine_new.rename(columns={"Page ID": "pageid"}, errors="raise")

    And it now looks “better”.

    But if you need to keep the column name, you can just as easily use wine_new[‘Page ID’].unique() (If you want to count the number of unique values you can also use wine_new[‘Page ID’].nunique())

    There are multiple resources on this topic but the approach is not explained using both of the versions on the majority of them.

    Cheers

  • Prometheus metrics to Pandas data frame

    Hi,

    We are trying to implement a decision tree algorithm in order to see if our resource usage can classify our servers in different categories.

    First step in that process is querying Prometheus from Python and create some data frames with some basic information in order to get them aggregated.

    To that purpose, you can also use the following lines of code:

    import requests
    import copy 
    
    URL = "http://[node_hostname]:9090/api/v1/query?query=metric_to_be_quried[1d]"
      
    r = requests.get(url = URL) 
    
    data = r.json()
    
    data_dict={}
    metric_list = []
    for i in data['data']['result']:
        data_dict = copy.deepcopy(i['metric'])
        for j in i['values']:
            data_dict['time'] = j[0]
            data_dict['value'] = j[1]
            metric_list.append(data_dict)    
    
    df_metric = pd.DataFrame(metric_list)

    Other pieces will follow.

    Cheers

  • Renice until cgroup implementation for process of Yahoo CMAK

    Hi,

    We saw that ex Kafka Manager, now called Yahoo CMAK was using more than enough CPU in some cases, in general related to bad SSL client config.

    It’s not really clear if the CPU usage was real or there was only wait time for resource like memory or I/O (I don’t have an example to post right now, but there are multiple fixes for this).

    The easiest one is to change the nice value for usage. What I observed is that normally it starts with nice value of 0. I guess this is default. General check for this works with

    ps ax -o ni,cmd | grep cmak | grep -v grep

    In order to change this, you can add a crontab line with following command:

    pid=`ps ax -o pid,cmd | grep cmak | grep -v grep |  awk {'print $1'}`; ni=`ps ax -o ni,cmd | grep cmak | grep -v grep |  awk {'print $1'}`; if [ "$ni" = "0" ]; then renice 10 $pid; fi

    Or, even easier than that, add Nice value under [Service] in /etc/systemd/system/multi-user.target.wants/kafka-manager.service

    It does the trick until further cgroup policies are applied.

  • Datadog and GCP are “friends” up to a point

    Hi,

    Since in the last period I preferred to publish more on Medium, let me give you the link to the latest article.

    There is an interesting case in which the combination of automation, Goggle Cloud Platform and Datadog didn’t go as we expected.

    https://medium.com/metrosystemsro/puppet-datadog-google-cloud-platform-recipe-for-a-small-outage-310166e551f1

    Hope you enjoy! I will get back with more also with interesting topics on this blog also.

    Cheers

  • Overriding OS fact with external one

    Hi,

    Short notice article. We had a issue in which the traefik module code was not running because of a wrong os fact. Although the image is Ubuntu 14.04, facter returns it like:

    {
      architecture => "amd64",
      family => "Debian",
      hardware => "x86_64",
      name => "Debian",
      release => {
        full => "jessie/sid",
        major => "jessie/sid"
      },
      selinux => {
        enabled => false
      }
    }

    I honestly don’t know why this happens since on rest of machines it works good, the way to fix it fast is by defining an external fact in /etc/facter/facts.d

    Create a file named os_fact.json, for example, that will contain this content:

    { 
       "os":{ 
          "architecture":"amd64",
          "distro":{ 
             "codename":"trusty",
             "description":"Ubuntu 14.04.6 LTS",
             "id":"Ubuntu",
             "release":{ 
                "full":"14.04",
                "major":"14.04"
             }
          },
          "family":"Debian",
          "hardware":"x86_64",
          "name":"Ubuntu",
          "release":{ 
             "full":"14.04",
             "major":"14.04"
          },
          "selinux":{ 
             "enabled":"false"
          }
       }
    }
    

    And it’s fixed.

    Cheers

  • Starting AIOps journey – first step

    There is a learning program in our company focused on gaining knowledge for “AI era”

    To that purpose we played a little bit with some performance data and came to some conclusions.

    I invite you to take a look

  • Duplicate exported resources on puppet by mistake

    We had a strange problem in our test environment the other day. There is a need to share an authorized key in order for the ssh connectivity to be available.

    The way we shared the file resource was straight forward.

      @@file {"/home/kafka/.ssh/authorized_keys":
        ensure => present,
        mode => '0600',
        owner => 'kafka',
        group => 'kafka',
        content => "${::sharedkey}",
        tag => "${::tagvalue}",
      }

    The tag value variable was a fact unique to each Kafka cluster.

    However, each time we executed puppet, the following error the following error was present:

    08:38:20 Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: A duplicate resource was found while collecting exported resources, with the type and title File[/home/kafka/.ssh/authorized_keys] on node [node_name]

    We had a couple of days at our disposal to play with the puppet DB, nothing relevant came from it

    This behavior started after provisioning a second cluster named similar also with SSL enabled.

    After taking a look on the official Puppet documentation (https://puppet.com/docs/puppet/latest/lang_exported.html – check the caution clause), it was clear that the naming of resource should not be the same.

    The problem hadn’t appear on any of our clusters since now, so this was strange to say the least.

    For whatever reason, the tag was not taken into consideration.

    And we know that because resources shared on both nodes were put everywhere, there was no filtering.

    Solution:

    Quick fix was done with following modifications.

      @@file {"/home/kafka/.ssh/authorized_keys_${::clusterid}":
        path => "/home/kafka/.ssh/authorized_keys",
        ensure => present,
        mode => '0600',
        owner => 'kafka',
        group => 'kafka',
        content => "${::sharedkey}",
        tag => "${::clusterid}",
      }

    So now there is an individual file per cluster, and we also have a tag that is recognized in order to filter the shared file that we need on our server.

    Filtering will be done like File <<| tag == "${::clusterid}" |>>

    Cheers!