• Kernel not compatible with zookeeper version

    Morning,

    It’s important to share this situation with you. This morning i came to the office to see that a cluster that was upgraded/restarted had an issue with Zookeeper instances.

    Symptoms  were clear: instances won’t start completely. But why?

    After a little bit of investigation, i went to the /var/log/syslog (/var/log/zookeeper did not contain any information at all) to see that there is a bad page table in the jvm.

    Java version is:

    java version "1.8.0_111"
    Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
    Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
    

    So, the log showed following lines:

    Aug 16 07:16:04 kafka0 kernel: [  742.349010] init: zookeeper main process ended, respawning
    Aug 16 07:16:04 kafka0 kernel: [  742.925427] java: Corrupted page table at address 7f6a81e5d100
    Aug 16 07:16:05 kafka0 kernel: [  742.926589] PGD 80000000373f4067 PUD b7852067 PMD b1c08067 PTE 80003ffffe17c225
    Aug 16 07:16:05 kafka0 kernel: [  742.928011] Bad pagetable: 000d [#1643] SMP 
    Aug 16 07:16:05 kafka0 kernel: [  742.928011] Modules linked in: dm_crypt serio_raw isofs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy
    

    Why should the JVM throw a memory error? The main reason is incompatibility with kernel version.

    Let’s take a look in the GRUB config file.

    Looks like we are using for boot:

    menuentry 'Ubuntu' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-baf292e5-0bb6-4e58-8a71-5b912e0f09b6' {
    	recordfail
    	load_video
    	gfxmode $linux_gfx_mode
    	insmod gzio
    	insmod part_msdos
    	insmod ext2
    	if [ x$feature_platform_search_hint = xy ]; then
    	  search --no-floppy --fs-uuid --set=root  baf292e5-0bb6-4e58-8a71-5b912e0f09b6
    	else
    	  search --no-floppy --fs-uuid --set=root baf292e5-0bb6-4e58-8a71-5b912e0f09b6
    	fi
    	linux	/boot/vmlinuz-3.13.0-155-generic root=UUID=baf292e5-0bb6-4e58-8a71-5b912e0f09b6 ro  console=tty1 console=ttyS0
    	initrd	/boot/initrd.img-3.13.0-155-generic
    

    There was also an older version of kernel image available 3.13.0-153.

    Short fix for this is to update the grub.cfg file with the old version and reboot the server.

    Good fix is still in progress. Will post as soon as i have it.

    P.S: I forgot to mention the Zookeeper version:

    Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT

    P.S 2: It seems that the issue is related with the java processes in general not only zookeeper

    Cheers

  • Puppet gems install workaround after TLS 1.0 switchoff

    Hi,

    It seems that since Ruby disabled the TLS 1.0 protocol, there is an issue with installing custom gems in the puppet server.

    If you run puppetserver gem environment you will probably see the following output:

    /opt/puppetlabs/bin/puppetserver gem environment
    RubyGems Environment:
      - RUBYGEMS VERSION: 2.4.8
      - RUBY VERSION: 1.9.3 (2015-06-10 patchlevel 551) [java]
      - INSTALLATION DIRECTORY: /opt/puppetlabs/server/data/puppetserver/jruby-gems
      - RUBY EXECUTABLE: java -jar /opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar
      - EXECUTABLE DIRECTORY: /opt/puppetlabs/server/data/puppetserver/jruby-gems/bin
      - SPEC CACHE DIRECTORY: /root/.gem/specs
      - SYSTEM CONFIGURATION DIRECTORY: file:/opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/etc
      - RUBYGEMS PLATFORMS:
        - ruby
        - universal-java-1.7
      - GEM PATHS:
         - /opt/puppetlabs/server/data/puppetserver/jruby-gems
         - /root/.gem/jruby/1.9
         - file:/opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/gems/shared
      - GEM CONFIGURATION:
         - :update_sources => true
         - :verbose => true
         - :backtrace => false
         - :bulk_threshold => 1000
         - "install" => "--no-rdoc --no-ri --env-shebang"
         - "update" => "--no-rdoc --no-ri --env-shebang"
      - REMOTE SOURCES:
         - https://rubygems.org/
      - SHELL PATH:
         - /usr/local/sbin
         - /usr/local/bin
         - /usr/sbin
         - /usr/bin
         - /sbin
         - /bin
         - /usr/games
         - /usr/local/games
         - /opt/puppetlabs/bin
    

    Also if you want to install a gem you will receive:

    /opt/puppetlabs/bin/puppetserver gem install toml-rb
    ERROR:  Could not find a valid gem 'toml-rb' (>= 0), here is why:
              Unable to download data from https://rubygems.org/ - Received fatal alert: protocol_version (https://api.rubygems.org/specs.4.8.gz)
    

    Short but unsafe fix for this is:

    opt/puppetlabs/bin/puppetserver gem install --source "http://rubygems.org/" toml-rb
    Fetching: toml-rb-1.1.1.gem (100%)
    Successfully installed toml-rb-1.1.1
    WARNING:  Unable to pull data from 'https://rubygems.org/': Received fatal alert: protocol_version (https://api.rubygems.org/specs.4.8.gz)
    1 gem installed
    

    It’s not that elegant, but it does the trick. You can also include this in an puppet exec block.

    Cheers

  • Convert mdx image (daemon tools) to iso

    Hi,

    Some time ago i created some images that were saved in mdx format using (Daemon Tools). Those were Windows times, but i migrated to Debian since then.

    I created a Windows virtual-box machine in order to use them but unfortunately it does not allow to mount them in this format.

    In order to convert them you will have to install a small package called iat using command sudo apt-get install iat . It is found in the default repo.

    One it’s installed , just run iat old_image.mdx old_image.iso

    Cheers

  • Kafka cluster nodes and controller using golang

    Hi,

    Using the golang library for zookeeper from here you can get very easily the nodes that are registered in the cluster controller node.
    In order to install this module, beside needing to setup the GOPATH you will have also to install packages from linux distro repo called:
    bzr, gcc, libzookeeper-mt-dev
    Once all of this is install just go get launchpad.net/gozk 🙂

    And here is the small example:

    
    package main
    
    import (
    	"launchpad.net/gozk"
    	"fmt"
    	"strings"
    )
    func main() {
    	GetResource()
    }
    func GetResource() {
    	zk, session, err := zookeeper.Dial("localhost:2181", 5e9)
     if err != nil {
    	fmt. Println("Couldn't connect")
    	}
    	
    	defer zk.Close()
    
    	event := <-session
    	if event.State != zookeeper.STATE_CONNECTED {
    		fmt.Println("Couldn't connect")
    	}
    
    	GetBrokers(zk)
    	GetController(zk)
    
    }
    func GetController(connection *zookeeper.Conn) {
    	rs ,_ , err := connection.Get("/controller")
     if err != nil {
            fmt. Println("Couldn't get the resource")
            }
    	controller := strings. Split(rs,",")
    	// fmt.Println(controller[1])
    	id := strings.Split(controller[1],":")
    	fmt. Printf("\nCluster controller is: %s\n",id[1])
    }
    func GetBrokers(connection *zookeeper.Conn) {	
    	trs ,_, err := connection.Children("/brokers/ids")
    
    	 if err != nil {
            fmt. Println("Couldn't get the resource")
            }
    	fmt.Printf("List of brokers: ")
    	for _, value := range trs {
    	fmt.Printf(" %s",value)
    	}
    	
    }
    

    Yeah, i know it's not the most elegant but it works:

    go run zootest.go
    List of brokers:  1011 1009 1001
    Cluster controller is: 1011
    

    That would be all.

    Tnx and cheers!

  • Error 127, not related to Puppet or Golang

    Hi,

    Something from my experience playing with Golang and Puppet code this morning.
    I wrote a very very simple script to restart a service that you can find here
    Today i wanted to put it on the machine and run it with puppet, so i wrote a very small class that looked like this:

    class profiles_test::updatekafka {
    
    package { 'golang':
      ensure => installed,
      name   => 'golang',
    }
    file {"/root/servicerestart.go":
        source => 'puppet:///modules/profiles_test/servicerestart.go',
        mode => '0644',
        replace => true,
    	}
    exec { 'go build servicerestart.go':
        cwd => '/root',
        creates => '/root/servicerestart',
        path => ['/usr/bin', '/usr/sbin'],
    } ->
    exec { '/root/servicerestart':
      cwd     => '/root',
      path    => ['/usr/bin', '/usr/sbin','/root'],
      onlyif => 'ps -u kafka',
    }
    }
    

    On execution, surprise, it kept throw this feedback:

    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/File[/root/check_cluster_state.go]/ensure: defined content as '{md5}0817dbf82b74072e125f8b71ee914150'
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: panic: exit status 127
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: goroutine 1 [running]:
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: runtime.panic(0x4c2100, 0xc2100000b8)
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/usr/lib/go/src/pkg/runtime/panic.c:266 +0xb6
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: main.check(0x7f45b08ed150, 0xc2100000b8)
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/root/servicerestart.go:94 +0x4f
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: main.RunCommand(0x4ea7f0, 0x2c, 0x7f4500000000, 0x409928, 0x5d9f38)
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/root/servicerestart.go:87 +0x112
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: main.GetBrokerList(0x7f45b08ecf00, 0xc21003fa00, 0xe)
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/root/servicerestart.go:66 +0x3c
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: main.GetStatus(0xc21000a170)
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/root/servicerestart.go:33 +0x1e
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: main.StopBroker()
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/root/servicerestart.go:39 +0x21
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: main.main()
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/root/servicerestart.go:15 +0x1e
    08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: exit status 2
    08:19:44 Error: go run servicerestart.go returned 1 instead of one of [0]

    The secret is in panic: exit status 127 and it seems that it’s not related to neither golang or puppet, but shell.
    In my script they way to get the broker list is by

    output1 := RunCommand("echo dump | nc localhost 2181 | grep brokers")

    and the error is related to not having the required binaries in you path. For example if you run

    whereis echo
    echo: /bin/echo /usr/share/man/man1/echo.1.gz

    So the right way to for the exec block is actually:

    exec { '/root/servicerestart':
      cwd     => '/root',
      path    => ['/usr/bin', '/usr/sbin','/root','/bin','/sbin'],
      onlyif => 'ps -u kafka',
    }

    And then it will work.

    Cheers!

  • Multiple classes block declaration in hiera will not work

    Morning,

    Do not add multiple classes in hiera like this:

    ---
    classes:
      - profiles::datadogagent
      - profiles::updatekafka
    
    kafka::security: true
    kafka::security_default: true
    kafka::heap_size: 2048
    classes:
     - profiles::pybackuplogs
     - profiles::group_coordinator
    

    Class updatekafka will not be executed.

    The structure should look like:

    ---
    classes:
      - profiles::datadogagent
      - profiles::updatekafka
      - profiles::pybackuplogs
      - profiles::group_coordinator
    kafka::security: true
    kafka::security_default: true
    kafka::heap_size: 2048
    

    Cheers!

  • Delete corrupted Kafka topic version 2.0

    Hi,

    We had in the past the situation described in this link use-case-deleting-corrupted-kafka-topic
    The situation repeated a little bit different this time. Taking a look on the list of topics, there were three topics marked for deletion.
    None of them had a Leader or Isr, so after a little bit of investigation the conclusion that they weren’t available anymore on the filesystem.
    My assumption is that the cluster controller began the delete process but failed before sending a new metadata update to the zookeepers.
    A restart of the cluster controller was performed in order to provide a new epoch, and after that, manual deletion of the “deletion request” and metadata from the zookeepers(you can find the commands in the above link).

    On another list, everything looks good.

    Cheers

  • Golang example for kafka service restart script

    Hi,

    Not much to say, a pretty decent script for Kafka service restart(i tried to write it for our rolling upgrade procedure) that it’s still work in progress. If there are any changes that needed to be made to it, i will post it.

    Here is the script:

    package main
    
    import (
    	"os/exec"
    	"strings"
    	"fmt"
    	"time"
    	"io/ioutil"
    )
    
    const (
        libpath = "/opt/kafka/libs"
        )
    func main(){
    	StopBroker()
    	StartBroker()
    	fmt.Printf("You are running version %s",GetVersion(libpath))
    }
    
    func GetVersion(libpath string) string {
     	var KafkaVersion []string
    	files, err := ioutil.ReadDir(libpath)
    	check(err)
    for _, f := range files {
    	if (strings.HasPrefix(f.Name(),"kafka_")) {
    		KafkaVersion = strings.Split(f.Name(),"-")
    		break
    		}
        }
    	return KafkaVersion[1]
    }
    func GetStatus() bool {
    	brokers := GetBrokerList()
    	id := GetBrokerId()
            status := Contain(id,brokers)
    return status
    }
    func StopBroker() {
    	status := GetStatus()
    	brokers := GetBrokerList()
    if (status == true) && (len(brokers) > 2) {
    	stop := RunCommand("service kafka stop")
    	fmt.Println(string(stop))
    	time.Sleep(60000 * time.Millisecond)
    	} 
    if (status == false) {
    	fmt.Println("Broker has been successful stopped")
    	} else {
    	StopBroker()
    	}
    }
    func StartBroker() {
    	status := GetStatus()
    	if (status == false) {
    	start := RunCommand("service kafka start")
    	fmt.Println(string(start))
    	time.Sleep(60000 * time.Millisecond)
    	}
    }
    func GetBrokerId() string {
    	output := RunCommand("cat /srv/kafka/meta.properties | grep broker.id | cut -d'=' -f2")
    	return strings.TrimRight(string(output),"\n")
    }
    func GetBrokerList() []string {
    	output1 := RunCommand("echo dump | nc localhost 2181 | grep brokers")
    	var brokers []string
    	lines:= strings.Split(string(output1),"\n")
    for _, line := range lines {
    	trimString := strings.TrimSpace(line)
    	finalString := strings.TrimPrefix(trimString, "/brokers/ids/")
    	brokers = append(brokers, finalString)
    	}
     return brokers
    }
    func Contain(val1 string, val2 []string) bool {
      for _,value := range val2 {
    	if (value == val1) {
    		return true
    }
    }
    	return false
    }
    func RunCommand(command string) []byte {
     cmd :=exec.Command("/bin/sh", "-c", command)
     result,err := cmd.Output()
     check(err)
     return result
    }
    
    
    func check(e error) {
        if e != nil {
            panic(e)
        }
    }
    

    Cheers

  • Fix under replicated partitions with controller restart

    Hi,

    If you have a Kafka cluster with only one broker that has zero under-replicated partitions and the rest have a number that is not equal to that value, than please be aware that it is not properly registered to the cluster.
    Taking a look in the state-change log on the instance that it’s the cluster controller will not show something particularly(all of them should be reported as being registered).

    That broker does not have the possibility to take partitions because the controller epoch does not see it to be functional(this is different from reachable).

    The easiest way to fix this is by a normal restart of the cluster controller. Please do not use kill or anything else, it will make things worse.

    Now, if a tail -50f /opt/kafka/server.log is executed, you will see that the process is trying to stop and all of the components are going down on a controlled fashion.
    What you will also see is that there are some errors regarding the partition that are hosted on the “problematic” broker, and that the errors are in a cycle. Please have patience with this processes and wait for it to be stopped as it should.

    On a restart, taking a look in the state-change log for the new cluster controller, you will see from time to time that the old broker that you restarted is reported to be unreachable. This is not a real problem and ignore those errors.

    After waiting for a while you will see that the under-replicated partition number will begin to decrease, hopefully to zero on all brokers.

    Cheers

  • Kafka service problem on upgrade to version 1.1.0

    Hi,

    If you are using version 1.1.0 or want to upgrade to it, and the method is by puppet module provided from voxpopuli, please be aware of this issue.

    In the template used for the init script that it’s located under /etc/init.d/kafka and as you can also see on the latest version below:

    https://github.com/voxpupuli/puppet-kafka/blob/master/templates/init.erb

    There are some lines that take the PID file for the kafka broker by using command

    `pgrep -f "$PGREP_PATTERN"`

    . This isn’t a problem for earlier version, but unfortunately for the latest, it doesn’t return anything causing for the init script to exit with return code 1 (my suspicion is that the process name changed).

    I fixed this by replacing this string with the following

    `ps -ef | grep "$PGREP_PATTERN" | grep -v grep | awk {'print $2}'`

    and it seems to work just fine.

    This doesn’t have any impact on the already configured and running cluster, and it will not restart your Kafka brokers.

    P.S: PGREP_PATTERN will resolve to kafka.Kafka which is the string to differentiate the broker instance

    Cheers