Even though RAM is cheap these days, there are some conditions in which your Linux server could run out of it completely. Just the other day, I noticed my main hosting server went down - I could still ping it, but DNS, SSH and Apache2 were not responding, so I had to call the datacenter to have them reboot my system. After analyzing the system, I realized that some unknown process ate up all the memory and all the swap space! I used Cacti to monitor my server’s performance, and so I could see that it took a nose-dive after getting hit with a few million requests in a couple days (these were raw mobile device detection requests). After the system ran out of memory, it started swapping. This lasted for about 2 weeks before depleting the swap space, at which point it struggled on for another 18 hours. At this point, it was critically starved of memory and oom_killer (Out of Memory Killer) was invoked to start killing processes in a vain attempt to free up memory. The oom_killer seems to have very little intelligence as to which processes to kill first, as sshd and named were early victims. After this episode, I decided to create a script that adjusts the order in which key processes were killed, to make sure I have access to the server in the event of a memory leak or OOM condition.
oom_adjust.sh to the rescue!
I’ve created oom_adjust.sh to adjust (and periodically readjust via cron) the order in which the processes may be killed by oom_killer. The script uses a config file called oom_adjust.conf (by default it looks for it in /etc) in which you can list processes and the oom_adj value that you want to give them. The possible values are from -17 (never kill) to 15 (kill first).
oom_adjust.conf
# Adjust process oom_adj values so they are more or less likely to be killed in an oom event
# procname oom_adj
# Keep sshd ALIVE
sshd -17
# DNS is very important to me too
named -8
# I'd prefer that MySQL stays alive, but it's not required
mysqld -1
# Apache2 is a memory hog, but I'll give it a fighting chance
# I'm giving it 0 since the workers will respawn at 0 anyway
apache2 0
# Sphinx search is cool, but I can live without it if an oom occurs
searchd 3
# Memcache is in the same boat as Sphinx search
memcached 3
# I only use mongodb for testing on this server
mongod 5
# It would be nice if smtpd stayed up, so I still get alerts
smtpd 5
# These services can be killed first
pure-ftpd 10
pure-ftpd-mysql 10
snmpd 10
fail2ban-server 10
ntpd 10
authdaemond 10
saslauthd 10
qmgr 10
pickup 10
Here is the main script, which I’ve symlink’d into /usr/sbin for convinience. oom_adjust.sh
#!/bin/sh
# oom_adjust.sh Out of Memory Killer (oom_killer) Priority Adjustment Script
# by Steve Kamerman <stevekamerman@gmail.com>, Jan 2011
# https://www.stevekamerman.com
OOM_ADJ_FILE=/etc/oom_adjust.conf
if [ ! -f $OOM_ADJ_FILE ]; then
echo "oom_adjust.sh: config file $OOM_ADJ_FILE was not found" >&2
exit 1
fi
echo "oom_adjust.sh is setting oom_killer priorities"
for LINE in `cat $OOM_ADJ_FILE | sed -e '/^[# \t].*/d' | sed -e '/^$/d' | sed -e 's/ /:/'`; do
NAME=`echo $LINE | cut -d":" -f1`
ADJ=`echo $LINE | cut -d":" -f2`
echo " Setting $NAME to $ADJ"
for PID in `pidof $NAME`; do
echo $ADJ > /proc/$PID/oom_adj
done
done
exit 0
If your distro uses /etc/rc.local, you can put call this script there to apply the adjustments on startup. I also call it on my servers via crontab every night to keep the processes in check, in case they have respawned/restarted with a different PID.