I was looking for a script like this and for some reason was unable to find one. I’m sure every sysadmin has something like this, but maybe not.

It’s helpful for cases when you have only one server without any kind of load balancing and one day it simply runs out of memory (or disk space), crashes, and does not let you do anything unless you can get to the hardware and pull the power to force a reboot.

It’s obviously happened to me before :)

The script is run by cron as root, here’s his crontab -e:

# Run this every minute, it's not very heavy
* * * * * /root/monitor-resources.sh 1> /dev/null

And the contents of monitor-resources.sh:

#!/bin/bash
# Script to run automatically, periodically (as root via cron)
# to check if the system is running dangerously low on memory
# or disk spcace.
# Author: Andrew Smith, http://littlesvr.ca
#

# Where to send email notifications:
EMAILADDR=admin.email@address.com
# Send an email when this percentage of memory is used:
MEMTHRESHOLD=80
# Send an email and reboot when this percentage of memory is used:
MEMCRITICALTHRESHOLD=95
# Send an email when the filesystem has less than this many
# GB of free space left:
DISKTHRESHOLD=2
# Which filesystems to check for low disk space:
LVLIST="/ /home /var/log /srv/sql /srv/httpd /srv/vcs"
# These two are the files used to prevent the script from resending
# notifications over and over again:
WARNEDMEM=/tmp/memorywarningemailsent
# Only one email is sent even if multiple filesystems are
# low on disk space.
WARNEDDISK=/tmp/diskwarningemailsent

PERCENTMEMUSED=`free -m | awk 'NR==2{printf "%d\n", $3*100/$2 }'`
if [[ $PERCENTMEMUSED -gt $MEMCRITICALTHRESHOLD ]]
then
  echo -e "Memory currently used: $PERCENTMEMUSED%. This is critical and the system\n"\
"will now reboot since it has no other way to deal with it.\n\n"\
"Processes:\n" \
"`ps axfu`" | mail -s "Have run out of memory, rebooting" $EMAILADDR
  reboot
  exit
fi

if [[ $PERCENTMEMUSED -gt $MEMTHRESHOLD ]]
then
  if [[ ! -f $WARNEDMEM ]]
  then
    echo "Memory currently used: $PERCENTMEMUSED%. No more messages will be sent until $WARNEDMEM is deleted." | mail -s "Running out of memory" $EMAILADDR
    touch $WARNEDMEM
  fi
fi

for LV in $LVLIST
do
  GIGSFREE=`df -h | awk '$NF=="'$LV'"{printf "%d", $4}'`

  if [[ $GIGSFREE -lt DISKTHRESHOLD ]]
  then
    if [[ ! -f $WARNEDDISK ]]
    then
      echo "$LV has "`df -h | awk '$NF=="'$LV'"{printf "%.1fGB", $4}'`" of disk space left. No other messages about disk space on any volumes will be sent until $WARNEDDISK is deleted." | mail -s "Running out of disk space" $EMAILADDR
      touch $WARNEDDISK
    fi
  fi
done

Come to think of it – why isn’t there a script like this bundled with every distro? Realistically when you run out of memory the system is fucked up, even if you have a swap partition. Both with and without a swap partition I was only able to recover from running out of memory because my server is physically accessible. And even so it was very hard – with a swap partition it took me a half an hour to log in and run the two commands to find out what’s causing the problem and clean it up.