Non registered Zookeeper – why doesn’t it work?

Morning,

If you ever deploy a server via puppet or other automation language that has also zookeeper installed and you already have a working cluster, please be aware of this.

Yesterday i rebuilt a node multiple times (there were some errors to fix), and after finally getting it right, the zookeeper instance did not behave as expected.
When i took a look in the /var/lib/zookeeper directory, there was the correct myid file, that it’s also present in the config file, and version-2 directory.
Normally the version-2 should host all the data stored for the zookeeper but there was only currentEpoch file with 0 in it.

Multiple restarts, no result. The log didn’t contain anything relevant. Since the server was not live yet, i rebuilt it one more time but it had the same behavior. It looked like the node was completely out of sync, and that was the truth 😀

I figured out eventually, by mistake, that the zookeeper was not yet registered (i tried to change the id of that zookeeper and restart the hole cluster)

In order to register it, well, you need to restart the leader. How do you find it? There are multiple methods, i guess,here are two that are working. Either by running the following command

echo stat | nc localhost 2181 | grep Mode

Or by checking the exposed ports

zookeeper start/running, process 15259
root@server1:/var/lib/zookeeper/version-2# netstat -tulpen | grep 15259
tcp6       0      0 :::42844                :::*                    LISTEN      107        1104606     15259/java      
tcp6       0      0 :::2181                 :::*                    LISTEN      107        1114708     15259/java      
tcp6       0      0 :::2183                 :::*                    LISTEN      107        1104609     15259/java      
tcp6       0      0 :::9998                 :::*                    LISTEN      107        1104607     15259/java

root@server2:/var/lib/zookeeper/version-2# netstat -tulpen | grep 28068
tcp6       0      0 :::48577                :::*                    LISTEN      107        3182780     28068/java      
tcp6       0      0 :::2181                 :::*                    LISTEN      107        3185668     28068/java      
tcp6       0      0 :::2183                 :::*                    LISTEN      107        3184651     28068/java      
tcp6       0      0 :::9998                 :::*                    LISTEN      107        3182781     28068/java   

root@server3:/var/lib/zookeeper/version-2# netstat -tulpen | grep 20719
tcp6       0      0 :::2181                 :::*                    LISTEN      107        5365296     20719/java      
tcp6       0      0 :::2182                 :::*                    LISTEN      107        5382604     20719/java      
tcp6       0      0 :::2183                 :::*                    LISTEN      107        5374105     20719/java      
tcp6       0      0 :::36008                :::*                    LISTEN      107        5371417     20719/java      
tcp6       0      0 :::9998                 :::*                    LISTEN      107        5371418     20719/java

The leader always exposes the 2182(follower port) in order for the followers to grab the updates.

After a short restart of the leader, everything works as expected!

Cheers