Categories
kafka

Don’t delete the Kafka GC logs when they are used

Hi,

I made a mistake some time ago, and it’s there to hunt me.
Deleting the normal gc logs including the one it’s already used doesn’t solve anything, it just created a more difficult situation.
Here is my example:

/dev/sda1                        50G   42G  5.2G  90% /
/opt/kafka/logs# ll
total 34M
drwxrwxr-x 2 kafka kafka 4.0K Oct 10 19:34 ./
drwxr-xr-x 7 kafka kafka 4.0K Mar 14  2018 ../
-rw-rw-r-- 1 kafka kafka    0 Mar 14  2018 controller.log
-rw-rw-r-- 1 kafka kafka    0 Mar 14  2018 kafka-authorizer.log
-rw-rw-r-- 1 kafka kafka    0 Mar 14  2018 kafka-request.log
-rw-rw-r-- 1 kafka kafka 2.9M Oct 11 04:44 log-cleaner.log
-rw-rw-r-- 1 kafka kafka 6.1M Oct 11 05:24 server.log
-rw-rw-r-- 1 kafka kafka  25M Oct  4 14:03 state-change.log
lsof +L1 | grep delete
init        1     root   13w   REG    8,1         106     0     95 /var/log/upstart/systemd-logind.log.1 (deleted)
init        1     root   14w   REG    8,1        5794     0   2944 /var/log/upstart/kafka-manager.log.1 (deleted)
java     1630    kafka    3w   REG    8,1 46836567522     0 524939 /opt/kafka-2.11-0.10.1.1/logs/kafkaServer-gc.log (deleted)
java     1863 dd-agent    4r   REG    8,1     5750256     0 525428 /opt/datadog-agent/bin/agent/dist/jmx/jmxfetch-0.20.1-jar-with-dependencies.jar (deleted)
java    10749 dd-agent    4r   REG    8,1     5750216     0 525427 /opt/datadog-agent/bin/agent/dist/jmx/jmxfetch-0.20.0-jar-with-dependencies.jar (deleted)
bash    10928     root    0u   CHR  136,6         0t0     0      9 /dev/pts/6 (deleted)
bash    10928     root    1u   CHR  136,6         0t0     0      9 /dev/pts/6 (deleted)
bash    10928     root    2u   CHR  136,6         0t0     0      9 /dev/pts/6 (deleted)
bash    10928     root  255u   CHR  136,6         0t0     0      9 /dev/pts/6 (deleted)
tail    12378     root    0u   CHR  136,6         0t0     0      9 /dev/pts/6 (deleted)
tail    12378     root    1u   CHR  136,6         0t0     0      9 /dev/pts/6 (deleted)
tail    12378     root    2u   CHR  136,6         0t0     0      9 /dev/pts/6 (deleted)
tail    12378     root    3r   REG    8,1    52428909     0 525512 /opt/kafka-2.11-0.10.1.1/logs/server.log.1 (deleted)
java    14692 dd-agent    4r   REG    8,1     5750256     0 526042 /opt/datadog-agent/bin/agent/dist/jmx/jmxfetch-0.20.1-jar-with-dependencies.jar (deleted)
java    16574 dd-agent    4r   REG    8,1     5750256     0 526041 /opt/datadog-agent/bin/agent/dist/jmx/jmxfetch-0.20.1-jar-with-dependencies.jar (deleted)

Handling gc in versions lower than 1.0.0 is quite tricky. It is best to remove these options from your startup script

-XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/opt/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps

But taking into consideration that we use a standard puppet module that it’s used by multiple teams it is still to be fixed. Fortunately from 1.0.0, GC is disabled by default.

In order to fix what i showed you before, process restart is needed and we will do that.

Cheers

Categories
linux

Memory check by process in Linux

Hi,

I wanted to post this since it might be useful in some situations. On a Linux machine it seems that one way to check the memory usage by top processes is with ps aux –sort -rss (This means that it’s order by Resistent Set Size)  Once executed it will return an output similar to this:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
sorin 3673 0.6 27.3 3626020 563964 pts/1 Sl+ 02:24 1:09 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+Disa
sorin 1708 2.0 9.2 1835288 189692 ? Sl 02:11 3:56 /usr/bin/gnome-shell
sorin 1967 0.6 8.0 1642280 166160 ? Sl 02:12 1:11 firefox-esr
sorin 3413 0.1 3.7 2000252 77016 pts/0 Sl+ 02:21 0:19 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+
root 576 0.5 2.6 263688 54172 tty7 Ssl+ 02:11 1:07 /usr/bin/Xorg :0 -novtswitch -background none -noreset -verbose 3 -auth /var/run/gdm3/auth-for-Debian-gdm-Bu1jB
sorin 1813 0.0 2.2 1175504 47196 ? Sl 02:11 0:00 /usr/lib/evolution/evolution-calendar-factory
root 486 0.1 1.2 377568 26584 ? Ssl 02:11 0:21 /usr/bin/dockerd -H fd://

If you want to get more detail of a PID status you can go to /proc/[pid]/status and you can find a lot of other informations. For example the top process on my Linux machine has the following header:

sorin@debian:/proc/3673$ cat status
Name: java
State: S (sleeping)
Tgid: 3673
Ngid: 0
Pid: 3673
PPid: 3660
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 256
Groups: 24 25 29 30 44 46 108 111 116 1000
VmPeak: 3626024 kB
VmSize: 3626020 kB

As you can see, the RSS is the same as VmSize.

Cheers!