Handling OOM issues gracefully on linux

I figured out that there is almost no simple way to handle situation when your system is running out of memory. There is a subsystem called OOM killer implemented in the kernel, but that thing is truly dangerous. You probably never want to get your system to point when it is being used, because it might leave the system in unusable state.

For this reason I created a tool that allows everyone who uses linux to handle this kind of situation gracefully, it’s called terminator daemon, and it watches the system and eventually kills specific kinds of processes, which can be easily defined by an administrator.

You can find more about this tool on https://github.com/benapetr/terminator

Basic idea is, that when your system is running out of memory, this daemon will pick processes that can be “safely” killed without any impact on system and takes them out according to your settings. You can even prevent it from killing certain processes, for example: If you were running mysql server, you probably wouldn’t care about interactive shells being killed, but you would care if mysql was killed.

In such situation you can tell terminatord to never kill mysql, but other “user processes” can be safely killed.

In addition you can even set terminatord to execute some command on every kill, for example it can send you a mail that something wrong is going on with your system.

There are several examples on GitHub, here I copy pasted some of them:

Kill all processes that use more than 400mb of ram, except for user apache and root

# get uid of apache
grep apache /etc/passwd
# let's say apache is user 20, now we test it
terminatord -dvvv --soft 400 --hard 420 --ignore 20 --dry
# if everything is ok
terminatord -d --soft 400 --hard 420 --ignore 20 --kill

Kill random processes in case that system has less than 100mb of free ram, except for root

# test it
terminatord -dvvv --ssoft 100 --shard 60 --quiet --dry

Combine both examples, in this example unlike the previous one, apache processes will not be killed, when system go OOM

# let's say apache is user 20, now we test it
terminatord -dvvv --soft 400 --hard 420 --ignore 20 --shard 60 --ssoft 100 --dry
# if everything is ok
terminatord -d --soft 400 --hard 420 --ignore 20 --kill --shard 60 --ssoft 100

No Comments

You can leave the first : )

Leave a Reply

Your email address will not be published. Required fields are marked *