Categories
newtools puppet

Error 127, not related to Puppet or Golang

Hi,

Something from my experience playing with Golang and Puppet code this morning.
I wrote a very very simple script to restart a service that you can find here
Today i wanted to put it on the machine and run it with puppet, so i wrote a very small class that looked like this:

class profiles_test::updatekafka {

package { 'golang':
  ensure => installed,
  name   => 'golang',
}
file {"/root/servicerestart.go":
    source => 'puppet:///modules/profiles_test/servicerestart.go',
    mode => '0644',
    replace => true,
	}
exec { 'go build servicerestart.go':
    cwd => '/root',
    creates => '/root/servicerestart',
    path => ['/usr/bin', '/usr/sbin'],
} ->
exec { '/root/servicerestart':
  cwd     => '/root',
  path    => ['/usr/bin', '/usr/sbin','/root'],
  onlyif => 'ps -u kafka',
}
}

On execution, surprise, it kept throw this feedback:

08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/File[/root/check_cluster_state.go]/ensure: defined content as '{md5}0817dbf82b74072e125f8b71ee914150'
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: panic: exit status 127
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: goroutine 1 [running]:
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: runtime.panic(0x4c2100, 0xc2100000b8)
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/usr/lib/go/src/pkg/runtime/panic.c:266 +0xb6
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: main.check(0x7f45b08ed150, 0xc2100000b8)
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/root/servicerestart.go:94 +0x4f
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: main.RunCommand(0x4ea7f0, 0x2c, 0x7f4500000000, 0x409928, 0x5d9f38)
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/root/servicerestart.go:87 +0x112
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: main.GetBrokerList(0x7f45b08ecf00, 0xc21003fa00, 0xe)
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/root/servicerestart.go:66 +0x3c
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: main.GetStatus(0xc21000a170)
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/root/servicerestart.go:33 +0x1e
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: main.StopBroker()
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/root/servicerestart.go:39 +0x21
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: main.main()
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: 	/root/servicerestart.go:15 +0x1e
08:19:44 Notice: /Stage[main]/Profiles_test::Updatekafka/Exec[go run servicerestart.go]/returns: exit status 2
08:19:44 Error: go run servicerestart.go returned 1 instead of one of [0]

The secret is in panic: exit status 127 and it seems that it’s not related to neither golang or puppet, but shell.
In my script they way to get the broker list is by

output1 := RunCommand("echo dump | nc localhost 2181 | grep brokers")

and the error is related to not having the required binaries in you path. For example if you run

whereis echo
echo: /bin/echo /usr/share/man/man1/echo.1.gz

So the right way to for the exec block is actually:

exec { '/root/servicerestart':
  cwd     => '/root',
  path    => ['/usr/bin', '/usr/sbin','/root','/bin','/sbin'],
  onlyif => 'ps -u kafka',
}

And then it will work.

Cheers!

Categories
puppet

Multiple classes block declaration in hiera will not work

Morning,

Do not add multiple classes in hiera like this:

---
classes:
  - profiles::datadogagent
  - profiles::updatekafka

kafka::security: true
kafka::security_default: true
kafka::heap_size: 2048
classes:
 - profiles::pybackuplogs
 - profiles::group_coordinator

Class updatekafka will not be executed.

The structure should look like:

---
classes:
  - profiles::datadogagent
  - profiles::updatekafka
  - profiles::pybackuplogs
  - profiles::group_coordinator
kafka::security: true
kafka::security_default: true
kafka::heap_size: 2048

Cheers!

Categories
kafka

Delete corrupted Kafka topic version 2.0

Hi,

We had in the past the situation described in this link use-case-deleting-corrupted-kafka-topic
The situation repeated a little bit different this time. Taking a look on the list of topics, there were three topics marked for deletion.
None of them had a Leader or Isr, so after a little bit of investigation the conclusion that they weren’t available anymore on the filesystem.
My assumption is that the cluster controller began the delete process but failed before sending a new metadata update to the zookeepers.
A restart of the cluster controller was performed in order to provide a new epoch, and after that, manual deletion of the “deletion request” and metadata from the zookeepers(you can find the commands in the above link).

On another list, everything looks good.

Cheers

Categories
kafka newtools

Golang example for kafka service restart script

Hi,

Not much to say, a pretty decent script for Kafka service restart(i tried to write it for our rolling upgrade procedure) that it’s still work in progress. If there are any changes that needed to be made to it, i will post it.

Here is the script:

package main

import (
	"os/exec"
	"strings"
	"fmt"
	"time"
	"io/ioutil"
)

const (
    libpath = "/opt/kafka/libs"
    )
func main(){
	StopBroker()
	StartBroker()
	fmt.Printf("You are running version %s",GetVersion(libpath))
}

func GetVersion(libpath string) string {
 	var KafkaVersion []string
	files, err := ioutil.ReadDir(libpath)
	check(err)
for _, f := range files {
	if (strings.HasPrefix(f.Name(),"kafka_")) {
		KafkaVersion = strings.Split(f.Name(),"-")
		break
		}
    }
	return KafkaVersion[1]
}
func GetStatus() bool {
	brokers := GetBrokerList()
	id := GetBrokerId()
        status := Contain(id,brokers)
return status
}
func StopBroker() {
	status := GetStatus()
	brokers := GetBrokerList()
if (status == true) && (len(brokers) > 2) {
	stop := RunCommand("service kafka stop")
	fmt.Println(string(stop))
	time.Sleep(60000 * time.Millisecond)
	} 
if (status == false) {
	fmt.Println("Broker has been successful stopped")
	} else {
	StopBroker()
	}
}
func StartBroker() {
	status := GetStatus()
	if (status == false) {
	start := RunCommand("service kafka start")
	fmt.Println(string(start))
	time.Sleep(60000 * time.Millisecond)
	}
}
func GetBrokerId() string {
	output := RunCommand("cat /srv/kafka/meta.properties | grep broker.id | cut -d'=' -f2")
	return strings.TrimRight(string(output),"\n")
}
func GetBrokerList() []string {
	output1 := RunCommand("echo dump | nc localhost 2181 | grep brokers")
	var brokers []string
	lines:= strings.Split(string(output1),"\n")
for _, line := range lines {
	trimString := strings.TrimSpace(line)
	finalString := strings.TrimPrefix(trimString, "/brokers/ids/")
	brokers = append(brokers, finalString)
	}
 return brokers
}
func Contain(val1 string, val2 []string) bool {
  for _,value := range val2 {
	if (value == val1) {
		return true
}
}
	return false
}
func RunCommand(command string) []byte {
 cmd :=exec.Command("/bin/sh", "-c", command)
 result,err := cmd.Output()
 check(err)
 return result
}


func check(e error) {
    if e != nil {
        panic(e)
    }
}

Cheers

Categories
kafka

Fix under replicated partitions with controller restart

Hi,

If you have a Kafka cluster with only one broker that has zero under-replicated partitions and the rest have a number that is not equal to that value, than please be aware that it is not properly registered to the cluster.
Taking a look in the state-change log on the instance that it’s the cluster controller will not show something particularly(all of them should be reported as being registered).

That broker does not have the possibility to take partitions because the controller epoch does not see it to be functional(this is different from reachable).

The easiest way to fix this is by a normal restart of the cluster controller. Please do not use kill or anything else, it will make things worse.

Now, if a tail -50f /opt/kafka/server.log is executed, you will see that the process is trying to stop and all of the components are going down on a controlled fashion.
What you will also see is that there are some errors regarding the partition that are hosted on the “problematic” broker, and that the errors are in a cycle. Please have patience with this processes and wait for it to be stopped as it should.

On a restart, taking a look in the state-change log for the new cluster controller, you will see from time to time that the old broker that you restarted is reported to be unreachable. This is not a real problem and ignore those errors.

After waiting for a while you will see that the under-replicated partition number will begin to decrease, hopefully to zero on all brokers.

Cheers

Categories
kafka

Kafka service problem on upgrade to version 1.1.0

Hi,

If you are using version 1.1.0 or want to upgrade to it, and the method is by puppet module provided from voxpopuli, please be aware of this issue.

In the template used for the init script that it’s located under /etc/init.d/kafka and as you can also see on the latest version below:

https://github.com/voxpupuli/puppet-kafka/blob/master/templates/init.erb

There are some lines that take the PID file for the kafka broker by using command

`pgrep -f "$PGREP_PATTERN"`

. This isn’t a problem for earlier version, but unfortunately for the latest, it doesn’t return anything causing for the init script to exit with return code 1 (my suspicion is that the process name changed).

I fixed this by replacing this string with the following

`ps -ef | grep "$PGREP_PATTERN" | grep -v grep | awk {'print $2}'`

and it seems to work just fine.

This doesn’t have any impact on the already configured and running cluster, and it will not restart your Kafka brokers.

P.S: PGREP_PATTERN will resolve to kafka.Kafka which is the string to differentiate the broker instance

Cheers

Categories
mq

IBM MQ crtmqm instance issue

Morning,

Short notice. If you are trying to create a queue manager and receive following error:

AMQ8101: WebSphere MQ error (893) has occurred.

and you also have lots of space in you qmgrs and log directory, don’t look further than that.

In some cases it happens because the install in /var/mqm is not complete. The small fix was by manually defining /var/mqm/mqs.ini, but i advise you to reinstall the framework.

Before:

$ ls -ltr /var/mqm/
total 8
drwxr-xr-x    2 root     system          256 Mar 21 13:14 lost+found
drwxrwsrwt    2 mqm      mqm             256 Mar 29 15:22 trace
drwxrwsr-x    3 mqm      mqm             256 Mar 29 15:22 qmgrs
drwxrwsr-x    3 mqm      mqm             256 Mar 29 15:22 sockets
drwxrwsr-x    3 mqm      mqm             256 Mar 29 15:22 shared
drwxrwsrwt    2 mqm      mqm            4096 Apr 03 12:10 errors

After:

$ ls -ltr /var/mqm
total 32
drwxr-xr-x    2 root     system          256 Mar 21 13:14 lost+found
drwxrwsrwx    2 mqm      mqm             256 Mar 29 15:22 trace
drwxrwsrwx    2 mqm      mqm            4096 Apr 03 12:10 errors
drwxrwsr-x    3 mqm      mqm             256 Apr 03 12:14 sockets
drwxrwsr-x    3 mqm      mqm             256 Apr 03 12:14 qmgrs
drwxrwsr-x    2 mqm      mqm             256 Apr 03 12:20 config
drwxrwsr-x    3 mqm      mqm             256 Apr 03 12:20 conv
drwxrwsr-x    2 mqm      mqm             256 Apr 03 12:20 log
-rw-rw-r--    1 mqm      mqm            1941 Apr 03 12:20 service.env
drwxrwsr-x    5 mqm      mqm             256 Apr 03 12:20 mqft
drwxrwsr-x    4 mqm      mqm             256 Apr 03 12:20 shared
-rw-rw-r--    1 mqm      mqm            1156 Apr 03 12:20 mqs.ini
-rw-rw-r--    1 mqm      mqm             637 Apr 03 12:20 mqclient.ini
drwxrwsr-x    3 mqm      mqm             256 Apr 03 12:20 exits
drwxrwsr-x    3 mqm      mqm             256 Apr 03 12:20 exits64

Cheers!