cloud newtools puppet

Datadog and GCP are “friends” up to a point


Since in the last period I preferred to publish more on Medium, let me give you the link to the latest article.

There is an interesting case in which the combination of automation, Goggle Cloud Platform and Datadog didn’t go as we expected.

Hope you enjoy! I will get back with more also with interesting topics on this blog also.


cloud puppet

Overriding OS fact with external one


Short notice article. We had a issue in which the traefik module code was not running because of a wrong os fact. Although the image is Ubuntu 14.04, facter returns it like:

  architecture => "amd64",
  family => "Debian",
  hardware => "x86_64",
  name => "Debian",
  release => {
    full => "jessie/sid",
    major => "jessie/sid"
  selinux => {
    enabled => false

I honestly don’t know why this happens since on rest of machines it works good, the way to fix it fast is by defining an external fact in /etc/facter/facts.d

Create a file named os_fact.json, for example, that will contain this content:

         "description":"Ubuntu 14.04.6 LTS",

And it’s fixed.



Duplicate exported resources on puppet by mistake

We had a strange problem in our test environment the other day. There is a need to share an authorized key in order for the ssh connectivity to be available.

The way we shared the file resource was straight forward.

  @@file {"/home/kafka/.ssh/authorized_keys":
    ensure => present,
    mode => '0600',
    owner => 'kafka',
    group => 'kafka',
    content => "${::sharedkey}",
    tag => "${::tagvalue}",

The tag value variable was a fact unique to each Kafka cluster.

However, each time we executed puppet, the following error the following error was present:

08:38:20 Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: A duplicate resource was found while collecting exported resources, with the type and title File[/home/kafka/.ssh/authorized_keys] on node [node_name]

We had a couple of days at our disposal to play with the puppet DB, nothing relevant came from it

This behavior started after provisioning a second cluster named similar also with SSL enabled.

After taking a look on the official Puppet documentation ( – check the caution clause), it was clear that the naming of resource should not be the same.

The problem hadn’t appear on any of our clusters since now, so this was strange to say the least.

For whatever reason, the tag was not taken into consideration.

And we know that because resources shared on both nodes were put everywhere, there was no filtering.


Quick fix was done with following modifications.

  @@file {"/home/kafka/.ssh/authorized_keys_${::clusterid}":
    path => "/home/kafka/.ssh/authorized_keys",
    ensure => present,
    mode => '0600',
    owner => 'kafka',
    group => 'kafka',
    content => "${::sharedkey}",
    tag => "${::clusterid}",

So now there is an individual file per cluster, and we also have a tag that is recognized in order to filter the shared file that we need on our server.

Filtering will be done like File <<| tag == "${::clusterid}" |>>


cloud puppet python

Strange problem in puppet run for Ubuntu


Short sharing of a strange case.

We’ve written a small manifest in order to distribute some python scripts. You can find the reference here:

When you try to run it on Ubuntu 14.04, there is this very strange error:

Error: Failed to apply catalog: [nil, nil, nil, nil, nil, nil]

The cause for this is as follows:

Python 3.4.3 (default, Nov 12 2018, 22:25:49)
[GCC 4.8.4] on linux (and I believe this is the default max version on trusty)

In order to install the dependencies, you need python3-pip, so a short search returns following options:

apt search python3-pip
Sorting... Done
Full Text Search... Done
python3-pip/trusty-updates,now 1.5.4-1ubuntu4 all [installed]
  alternative Python package installer - Python 3 version of the package

python3-pipeline/trusty 0.1.3-3 all
  iterator pipelines for Python 3

If we want to list all the installed modules with pip3 list, guess what, it’s not working:

Traceback (most recent call last):
   File "/usr/bin/pip3", line 5, in 
     from pkg_resources import load_entry_point
   File "/usr/local/lib/python3.4/dist-packages/pkg_resources/", line 93, in 
     raise RuntimeError("Python 3.5 or later is required")
 RuntimeError: Python 3.5 or later is required

So, main conclusion is that it’s not related to puppet, just the incompatibility between version for this old distribution.


cloud kafka puppet python

Automatic increase of Kafka LVM on GCP

I wrote an article for my company that was published on Medium regarding the topic in the subject. Please see the link


cloud puppet

Install zookeeper using puppet without module


In this post, I was given the task to provide a standalone zookeeper cluster with basic auth on the latest version.

The reason that happened is that we are using a very old module on our Kafka clusters and a new requirement appeared to install the latest version of 3.5.5.

The old module had only the possibility to install the package from apt repo, which was not an option since the last version available on Ubuntu Xenial is at least two years old.

To complete this task, a different method was required. I would have to grab it with wget and add the rest of the files to make it functional.

Let us start with the puppet manifest and from that, I will add the rest.

class zookeeperstd {
  $version = hiera("zookeeperstd::version","3.5.5")
  $authenabled = hiera("zookeeperstd::authenabled",false)
  $server_jvm_flags = hiera('zookeeperstd::jvm_flags', undef)
    group { 'zookeeper':
        ensure => 'present',
    user {'zookeeper':
        ensure => 'present',
        home => '/var/lib/zookeeper',
        shell => '/bin/false',
    wget::fetch { 'zookeeper':
        source      => "${version}-bin.tar.gz",
        destination => "/opt/apache-zookeeper-${version}-bin.tar.gz",
        } ->
    archive { "/opt/apache-zookeeper-${version}-bin.tar.gz":
        creates      => "/opt/apache-zookeeper-${version}-bin",
        ensure        => present,
        extract       => true,
        extract_path  => '/opt',
        cleanup       => true,
    } ->
    file { "/opt/apache-zookeeper-${version}-bin":
        ensure    => directory,
        owner     => 'zookeeper',
        group      => 'zookeeper',
        require     => [ User['zookeeper'], Group['zookeeper'], ],
        recurse => true,
    } ->
    file { '/opt/zookeeper/':
        ensure    => link,
        target    => "/opt/apache-zookeeper-${version}-bin",
        owner     => 'zookeeper',
        group      => 'zookeeper',
        require     => [ User['zookeeper'], Group['zookeeper'], ],
    file { '/var/lib/zookeeper':
        ensure    => directory,
        owner     => 'zookeeper',
        group      => 'zookeeper',
        require     => [ User['zookeeper'], Group['zookeeper'], ],
        recurse    => true,
# in order to know which servers are in the cluster a role fact needs to be defined on each machine
    $hostshash = query_nodes(" v1_role='zookeeperstd'").sort
    $hosts_hash = $ |$value| { [$value, seeded_rand(254, $value)+1] }.hash
    $overide_hosts_hash = hiera_hash('profiles_opqs::kafka_hosts_hash', $hosts_hash)
    $overide_hosts = $overide_hosts_hash.keys.sort
    if $overide_hosts_hash.size() != $overide_hosts_hash.values.unique.size() {
        #notify {"Duplicate IDs detected! ${overide_hosts_hash}": }
        $overide_hosts_hash2 = $ |$index, $value| { [$value, $index+1] }.hash
  } else {
        $overide_hosts_hash2 = $overide_hosts_hash
	$hosts = $overide_hosts_hash2
	$data_dir = "/var/lib/zookeeper"
	$tick_time        = 2000
        $init_limit       = 10
        $sync_limit       = 5

	$myid = $hosts[$::fqdn]
    file { '/var/lib/zookeeper/myid':
        content => "${myid}",

	file { '/opt/zookeeper/conf/zoo.cfg':
        content => template("${module_name}/zoo.cfg.erb"),
   if $authenabled {
    $superpass        = hiera("zookeeperstd::super_pass", 'super-admin')
    $zoopass          = hiera("zookeeperstd::zookeeper_pass", 'zookeeper-admin')
    $clientpass        = hiera("zookeeperstd::client_pass", 'client-admin')
    file { '/opt/zookeeper/conf/zoo_jaas.config':
        content => template("${module_name}/zoo_jaas.config.erb"),
     file { '/opt/zookeeper/conf/java.env':
        content => template("${module_name}/java.zookeeper.env.erb"),
        mode => "0755",
     file { '/opt/zookeeper/conf/':
        content => template("${module_name}/"),
    file {'/etc/systemd/system/zookeeper.service':
        source  => 'puppet:///modules/work/zookeeper.service',
        mode => "644",
        } ->
    service { 'zookeeper':
        ensure   => running,
        enable   => true,
        provider => systemd,

As far as I managed to adapt some file from the existing module, here are the rest of the additional details.

# Note: This file is managed by Puppet.


# specify all zookeeper servers
# The fist port is used by followers to connect to the leader
# The second one is used for leader election
if @hosts
# sort hosts by myid and output a server config
# for each host and myid.  (sort_by returns an array of key,value tuples)
@hosts.sort_by { |name, id| id }.each do |host_id|
server.<%= host_id[1] %>=<%= host_id[0] %>:2182:2183
<% if @authenabled -%>
authProvider.<%= host_id[1] %>=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
<% end -%>
<% end -%>
<% end -%>

# the port at which the clients will connect

# the directory where the snapshot is stored.
dataDir=<%= @data_dir %>

# Place the dataLogDir to a separate physical disc for better performance
<%= @data_log_dir ? "dataLogDir=#{data_log_dir}" : '# dataLogDir=/disk2/zookeeper' %>

# The number of milliseconds of each tick.
tickTime=<%= @tick_time %>

# The number of ticks that the initial
# synchronization phase can take.
initLimit=<%= @init_limit %>

# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=<%= @sync_limit %>

# To avoid seeks ZooKeeper allocates space in the transaction log file in
# blocks of preAllocSize kilobytes. The default block size is 64M. One reason
# for changing the size of the blocks is to reduce the block size if snapshots
# are taken more often. (Also, see snapCount).

# Clients can submit requests faster than ZooKeeper can process them,
# especially if there are a lot of clients. To prevent ZooKeeper from running
# out of memory due to queued requests, ZooKeeper will throttle clients so that
# there is no more than globalOutstandingLimit outstanding requests in the
# system. The default limit is 1,000.ZooKeeper logs transactions to a
# transaction log. After snapCount transactions are written to a log file a
# snapshot is started and a new transaction log file is started. The default
# snapCount is 10,000.

# If this option is defined, requests will be will logged to a trace file named

# Leader accepts client connections. Default value is "yes". The leader machine
# coordinates updates. For higher update throughput at thes slight expense of
# read throughput the leader can be configured to not accept clients and focus
# on coordination.

<% if @authenabled -%>


<% end -%> 
QuorumServer {
       org.apache.zookeeper.server.auth.DigestLoginModule required
       user_zookeeper="<%= @zoopass %>";
QuorumLearner {
       org.apache.zookeeper.server.auth.DigestLoginModule required
       password="<%= @zoopass %>";

Server {
       org.apache.zookeeper.server.auth.DigestLoginModule required
       user_super="<%= @superpass %>"
       user_client="<%= @clientpass %>";
SERVER_JVMFLAGS="<%= @server_jvm_flags %>"
# Note: This file is managed by Puppet.

# ZooKeeper Logging Configuration

# Format is "<default threshold> (, <appender>)+

log4j.rootLogger=${zookeeper.root.logger}, ROLLINGFILE

# Log INFO level and above messages to the console
log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n

# Add ROLLINGFILE to rootLogger to get log file output
#    Log INFO level and above messages to a log file

# Max log file size of 10MB
# Keep only 10 files
log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n

And the last but not the least.

Description=ZooKeeper Service

ExecStart=/opt/zookeeper/bin/ start /opt/zookeeper/conf/zoo.cfg
ExecStop=/opt/zookeeper/bin/ stop /opt/zookeeper/conf/zoo.cfg
ExecReload=/opt/zookeeper/bin/ restart /opt/zookeeper/conf/zoo.cfg


Also, if you want to enable simple MD5 authentication, in hiera you will need to add the following two lines.

zookeeperstd::authenabled: true
zookeeperstd::jvm_flags: ""

If there is a simpler approach, feel free to leave me a message on Linkedin or Twitter.



Jolokia particular case using custom facts in Hiera


This is for me and also for all the other people that are searching for how to use custom defined types in Hiera

In my case i wanted to activate the HTTP endpoint of Jolokia using custom hostname and standard port. And for that it was sufficient to add in my host yaml the following lines

profiles::kafka::jolokia: "-javaagent:/usr/share/java/jolokia-jvm-agent.jar=port=8778,host=%{::networking.fqdn}"

This contains the standard fact called networking, which is a hash, and i am using the key that is called fqdn.

And it works.


kafka puppet

Fact for kafka_consumer hash…or kind of


There is a late requirement that we activate the kafka_consumer functionality of Datadog.

Unfortunately this is a challenge if you don’t have a fixed number of consumer groups and topics (on one client we had a couple of hundreds consumer groups)

Here is how it should look in the example file

  #  consumer_groups:
  #    <CONSUMER_NAME_1>:
  #      <TOPIC_NAME_1>: [0, 1, 4, 12]
  #    <CONSUMER_NAME_2>:
  #      <TOPIC_NAME_2>:
  #    <CONSUMER_NAME_3>

So, if you want to grab the data using the old, you will have to do this in one iteration. I tried it this way, but even if i am done, should not be a option for larger clusters

require 'facter'
Facter.add('kafka_consumers_config') do
  setcode do
      kafka_consumers_cmd = '/opt/kafka/bin/ --bootstrap-server localhost:9092 --list'
      kafka_consumers_result = Facter::Core::Execution.exec(kafka_consumers_cmd)
      group_hash = {}
      kafka_consumers_result.each_line do |group|
        groupid = group.strip 
        kafka_consumer_topic_list_cmd="/opt/kafka/bin/ --bootstrap-server localhost:9092 --group #{groupid} --describe | sort | grep -v TOPIC  | awk {\'print $1\'} | uniq"
        kafka_consumer_topic_list_result = Facter::Core::Execution.exec(kafka_consumer_topic_list_cmd)
        kafka_consumer_topic_list_result.split(/\n/).reject { |c| c.empty? }
        topic_hash = {}
        kafka_consumer_topic_list_result.each_line do |topic|
           topicid = topic.strip
           kafka_consumer_topic_partition_cmd="/opt/kafka/bin/ --bootstrap-server localhost:9092 --group #{groupid} --describe | grep -v TOPIC | grep #{topicid} | awk {\'print $2\'} | sort"
           kafka_consumer_topic_partition_result.gsub("\n", ' ').squeeze(' ')
           topic_hash[topic] = kafka_consumer_topic_partition_result
      group_hash[group] = topic_hash    

This is posted only as a snapshot and maybe as a source of inspiration. For an actual working version, it should be done using Java or Scala in order to leverage the power of libraries (from what i know Python/Go and other programming languages have only libraries on consumer/producer part)

If i will pursue the task of rewriting this using only one interrogation loop or in Java/Scala, you will see it.


Distributing service conditionally on OS version


Since we are in the process of migrating to 16.04, my service restart script needed to be deployed with separate builds.

In that purpose, i found a fact that would help me, so that my standard file block transformed into this:

 case $facts['os']['distro']['codename']  {
    'xenial': {
        file {"/root/servicerestart":
        source => 'puppet:///modules/profiles/servicerestart-kafka-new',
        mode => '0755',
        replace => true,
    'trusty': { 
    file {"/root/servicerestart":
        source => 'puppet:///modules/profiles/servicerestart-kafka',
        mode => '0755',
        replace => true,

That should be all for now.

linux puppet

Order Linux processes by memory usage

This one is more for me actually. We have some issues with one puppet instance on which the processes fail, and i wanted to see if there is any way to order them by memory usage.

So i searched the net and found this link

The command is like

ps aux --sort -rss | head -10

And it provides you with following output, at least in my case

puppet    6327 70.1 25.5 3585952 1034532 ?     Sl   06:53   7:33 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java -XX:OnOutOfMemoryError=kill -9 %p -javaagent:/usr/share/java/jolokia-jvm-agent.jar=port=8778 -Xms1024m -Xmx1024m -cp /opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar clojure.main -m puppetlabs.trapperkeeper.main --config /etc/puppetlabs/puppetserver/conf.d -b /etc/puppetlabs/puppetserver/bootstrap.cfg
jenkins   6776  9.6 16.6 4648236 671980 ?      Sl   06:55   0:51 /usr/bin/java -Djava.awt.headless=true -javaagent:/usr/share/java/jolokia-jvm-agent.jar=port=8780 -Xms1024m -Xmx1024m -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080 --httpListenAddress=
puppetdb  5987 16.8 11.7 3845896 474164 ?      Sl   06:52   2:01 /usr/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx192m -javaagent:/usr/share/java/jolokia-jvm-agent.jar=port=8779 -cp /opt/puppetlabs/server/apps/puppetdb/puppetdb.jar clojure.main -m puppetlabs.puppetdb.main --config /etc/puppetlabs/puppetdb/conf.d -b /etc/puppetlabs/puppetdb/bootstrap.cfg
postgres  1458  0.0  2.1 249512 88656 ?        Ss   Nov21   3:10 postgres: checkpointer process                                                                                              
postgres  6206  0.0  1.4 253448 57984 ?        Ss   06:53   0:00 postgres: puppetdb puppetdb idle                                                                           
postgres  6209  0.0  0.7 252580 29820 ?        Ss   06:53   0:00 postgres: puppetdb puppetdb idle                                                                           
postgres  6210  0.0  0.5 254892 22440 ?        Ss   06:53   0:00 postgres: puppetdb puppetdb idle                                                                           
postgres  6213  0.0  0.5 254320 21416 ?        Ss   06:53   0:00 postgres: puppetdb puppetdb idle                                                                           
postgres  6205  0.0  0.5 253524 20324 ?        Ss   06:53   0:00 postgres: puppetdb puppetdb idle                       

As you can probably see, the components are taking slowly but surely more and more memory and since the machine has only 4GB allocated it will probably crash again.

If this happens, i will manually increase the memory with another 2GB and see where we will go from there.