Category: Monitoring
HOW TO CHECK CPU, MEMORY, & DISKS THRESHHOLDS on an ARRAY of HOSTS.
So I was tinkering around as usual. I thought this will come in handy for other engineers
If you a large cluster of servers that can suddenly over night loose all its MEM,CPU,DISK due to the nature of your businesses. Its difficult to monitor that from a GUI and on an array of hosts more often than not.
Cloud Scenario……
Say you find a node that is dying because too many clients are using resources and you need migrate instances off to another node, only you don’t know which nodes have the needed resources without having to go look at all the nodes individually.
This tends be every engineers pain point. So I decide to come up with quick easy solution for emergency situations, where you don’t have time to sifting through alert systems that only show you data on a per host basis, that tend to load very slowly.
This bash script will check the CPU, MEM, DISK MOUNTS (including NFS) and tell which ones are okay and which ones are
CPU – calculated by the = 100MaxThrottle – Cpu-idle = CPU-usage
note: it also creates a log /opt/cpu.log on each host
MEM – calculate by Total Mem / Used Memory * 100 = Percentage of Used Memory
note: it also creates a log /opt/mem.log on each host
Disk – Any mount that reaches the warn threshold… COMPLAIN
Now, itemised the bash script so you can just comment out item you don’t want to use at the bottom of the script if you wanted to say just check CPU/MEM
#Written By Nick Tailor
#!/bin/bash
now=`date -u -d”+8 hour” +’%Y-%m-%d %H:%M:%S’`
#cpu use threshold
cpu_warn=’75’
#disk use threshold
disk_warn=’80’
#—cpu
item_cpu () {
cpu_idle=`top -b -n 1 | grep Cpu | awk ‘{print $8}’|cut -f 1 -d “.”`
cpu_use=`expr 100 – $cpu_idle`
echo “now current cpu utilization rate of $cpu_use $(hostname) as on $(date)” >> /opt/cpu.log
if [ $cpu_use -gt $cpu_warn ]
then
echo “cpu warning!!! $cpu_use Currently HIGH $(hostname)”
else
echo “cpu ok!!! $cpu_use% use Currently LOW $(hostname)”
fi
}
#—mem
item_mem () {
#MB units
LOAD=’80.00′
mem_free_read=`free -h | grep “Mem” | awk ‘{print $4+$6}’`
MEM_LOAD=`free -t | awk ‘FNR == 2 {printf(“%.2f%”), $3/$2*100}’`
echo “Now the current memory space remaining ${mem_free_read} GB $(hostname) as on $(date)” >> /opt/mem.log
if [[ $MEM_LOAD > $LOAD ]]
then
echo “$MEM_LOAD not good!! MEM USEAGE is HIGH – Free-MEM-${mem_free_read}GB $(hostname)”
else
echo “$MEM_LOAD ok!! MEM USAGE is beLOW 80% – Free-MEM-${mem_free_read}GB $(hostname)”
fi
}
#—disk
item_disk () {
df -H | grep -vE ‘^Filesystem|tmpfs|cdrom’ | awk ‘{ print $5 ” ” $1 }’ | while read output;
do
echo $output
usep=$(echo $output | awk ‘{ print $1}’ | cut -d’%’ -f1 )
partition=$(echo $output | awk ‘{ print $2 }’ )
if [ $usep -ge $disk_warn ]; then
echo “AHH SHIT!, MOVE SOME VOLUMES IDIOT…. \”$partition ($usep%)\” on $(hostname) as on $(date)”
fi
done
}
item_cpu
item_mem
#item_disk – This is so you can comment out whole sections of the script without having to do the whole section by individual lines.
Now the cool part.
Now if you have a centrally managed jump host that allows you to get out from your estate. Ideally you would want to setup ssh keys on the hosts and ensure you have sudo permissions on the those hosts.
We want to loop this script through an array of hosts and have it run and then report back all the findings in once place. This is extremely handy if your in resource crunch.
This assumes you have SSH KEYS SETUP & SUDO for your user setup.
Create the script
Next
Server1
Server2
Server3
Server4
Run your forloop with ssh keys and sudo already setup.
Logfile – cpumem.status.DEV – will be the log file that has all the info
Output:
cpu ok!!! 3% use Currently dev1.nicktailor.com
17.07% ok!! MEM USAGE is beLOW 80% – Free-MEM-312.7GB dev1.nicktailor.com
5% /dev/mapper/VolGroup00-root
3% /dev/sda2
5% /dev/sda1
1% /dev/mapper/VolGroup00-var_log
72% 192.168.1.101:/data_1
28% 192.168.1.102:/data_2
80% 192.168.1.103:/data_3
AHH SHIT!, MOVE SOME VOLUMES IDIOT…. “192.168.1.104:/data4 (80%)” on dev1.nicktailor.com as on Fri Apr 30 11:55:16 EDT 2021
Okay so now I’m gonna show you a dirty way to do it, because im just dirty. So say your in horrible place that doesn’t use keys, because they’re waiting to be hacked by password. 😛
DIRTY WAY – So this assumes you have sudo permissions on the hosts.
Note: I do not recommend doing this way if you are a newb. Doing it this way will basically log your password in the bash history and if you don’t know how to clean up after yourself, well………………….you’re going to get owned.
I’m only showing you this because some cyber security “folks” believe that not using keys is easier to deal with in some parallel realities iv visited… You can do the exact same thing above, without keys. But leave massive trail behind you. Hence why you should use secure keys with passwords.
Not Recommended for Newbies:
Forloop AND passing your ssh password inside it.
Log file – cpumem.status.DEV – will be the log file that has all the info
Output:
cpu ok!!! 3% use Currently dev1.nicktailor.com
17.07% ok!! MEM USAGE is beLOW 80% – Free-MEM-312.7GB dev1.nicktailor.com
5% /dev/mapper/VolGroup00-root
3% /dev/sda2
5% /dev/sda1
1% /dev/mapper/VolGroup00-var_log
72% 192.168.1.101:/data_1
28% 192.168.1.102:/data_2
80% 192.168.1.103:/data_3
AHH SHIT!, MOVE SOME VOLUMES IDIOT…. “192.168.1.104:/data4 (80%)” on dev1.nicktailor.com as on Fri Apr 30 11:55:16 EDT 2021