PRM (Process Resource Monitor) in Linux Servers

PRM (Process Resource Monitoring) monitors the process table on a given system and matches process id’s with set resource limits in the config file or per-process based rules. Process id’s that match or exceed the set limits are logged and killed; includes e-mail alerts, kernel logging routine and more…

Installation:

Download the package and extract it:
# wget http://www.rfxnetworks.com/downloads/prm-current.tar.gz
# tar xvfz prm-current.tar.gz
# cd prm-0.5/

And run the install.sh script:
# ./install.sh

All projects on rfxnetworks.com are free for use and distribution in accordance with the gnu gpl, funding for the continued development and research into this and other projects, is solely dependent on public contributions and donations.

Configuration

The prm installation is located at ‘/usr/local/prm’, and the configuration file is labeled ‘conf.prm’.

Open the ‘/usr/local/prm/conf.prm’ file with your preferred editor. There is an array of options in this file but we will only be focusing on the main variables.

Lets skip down to the user e-mail alert’s section and set the USR_ALERT value to ‘1’; enabling alerts.
# enable user e-mail alerts [0=disabled,1=enabled] USR_ALERT=”1″

And configure our e-mail addresses for alerts:
# e-mail address for alerts USR_ADDR=”root, you@domain.com”

Check the 5,10, or 15 minute load average; relative to the later option below for min. load level.
# check 5,10,15 minute load average. [1,2,3 respective of 5,10,15] LC=”1″

PRM optionally has a required load average for running. If the load is not equal to or greater than this value; PRM will not run. Setting this value to zero will force the script to always run but this should not be needed.
# min load level required to run (decimal values unsupported) MIN_LOAD=”1″

This is the introduction described wait value, used for pauses between trigger increments. The value of wait multiplied by the value of kill_trig equal the duration of time before a process is killed (10×3=30seconds).
# seconds to wait before rechecking a flagged pid (pid’s noted resource # intensive but not yet killed). WAIT=”10″

The trigger limit before processes are killed, described in detail in the above ‘wait’ description and introduction.
# counter limit that a process must reach prior to kill. The counter value # increases for a process flagged resource intensive on rechecks. KILL_TRIG=”3″

The max percentage of CPU a process should be allowed to use before PRM flags it for killing.
# Max CPU usage readout for a process – % of all cpu resources (decimal values unsupported) MAXCPU=”35″

The max percentage of MEM a process should be allowed to use before PRM flags it for killing.
# Max MEM usage readout for a process – % of system total memory (decimal values unsupported) MAXMEM=”15″

That is it. You should tweak the MAXCPU/MAXMEM limits to your desired needs but the defaults should be fine for most.

Usage
The executable program resides in ‘/usr/local/prm/prm’ and ‘/usr/local/sbin/prm’. The prm executable can receive one of two arguments:

-s Standard run
-q Quiet run

The log path for prm is ‘/usr/local/prm/prm_log’, as well pid specific logs are stored in ‘/usr/local/prm/killed/’.

A default cronjob for PRM is installed to ‘/etc/cron.d/prm’, and is configured to run once every 5 minutes.

There is a provided ignore file, to ignore processes based on string rules. The ignore file is located at ‘/usr/local/prm/ignore’. This file supports line separated ignore strings. As a default the strings ‘root, named and postgre’ are ignored by PRM; this script was not intended to monitor root processes but rather user land tasks. It could easily watch root processes by removing the given line in the ignore file but this is strongly discouraged.

Leave a Reply

Your email address will not be published. Required fields are marked *