Day: July 10, 2018
How to diagnose a kernel panic caused by a killed process
You should install atop on your server as this is top on steroids and can help diagnose all kinds of server issues such as.
https://lwn.net/Articles/387202/ – Atop usage
- CPU load
- IO load
- Memory usage
- Process utilization of resources
- Paging/swaping
- etc…
- How you install atop on ubuntu/debian
- ‘apt-get install atop’
- Then you want to start the atop logging
- ‘/etc/init.d/atop start
- ‘/etc/init.d/atop start
Note : by default the atop logs every 10mins
Now lets say you console your VM or blade server. You see a message that states the server killed a process or ran out of memory or something.
Example:
- Out of memory: Kill process 11970 (php) score 80 or sacrifice child
Killed process 11970 (php) total-vm:1957108kB
When you reboot the server you will want to find out exactly how it happened. How you do this is by checking the kernel log. Now if you have kdump installed you can use that to get a dump of the kernel log and if not you can do this.
- dmesg | egrep -i ‘killed process’
- this will provide a log as indicated below
Kernel log
- [Wed July 10 13:27:30 2018] Out of memory: Kill process 11970 (php) score 80 or sacrifice child]
- [Wed July 10 13:27:30 2018] Killed process 11970 (php) total-vm:123412108kB, anon-rss:1213410764kB, file-rss:2420k]
Now once you have this log you can see the time stamp of when it occurred and you can use atop logs to drill down and find the process id, and see if you can see which daemon and or script caused the issue.
From the log ‘July 10 13:27:30 2018’ we can see the time stamp. Inside /var/log/atop you can do the following.
Run the following:
- ‘atop -r atop_20180710’
this will bring up a screen and you can toggle through the time intervals by using lowercase ‘T’ to move forward in time or Capital ‘T’ to go backward in time. - Once you find the time stamp you can
press – ‘c’ – full command-line per process to see which processes were running at that time stamp and you should be able to locate the id process from the kernel log
Example
- 3082 27% php
- 15338 27% php
- 26639 25% php
- 8520 8% php
- 8796 8% php
- 2157 8% /usr/sbin/apache2 -k start
- 11970 1% php – This is the process ID from the kernel log above and what appears to what was running. So we know it was a php script. Atop doesn’t always provide the exact script. However from the kernel log and this we can determine what was some type of rss feed. From this you can also see that it wasnt using very much CPU. This helps us determine that the php code is causing a memory leak and needs to be updated and or disabled.
- 10493 1% php
- 10942 1% php
- 5335 1% php
- 9964 0% php
Written by Nick Tailor