Send oracle performance alert based on oramon performance data

Links: http://www.dbatools.net/experience/oramon-performance-perf-alert.html

    For critical online transaction process database, it's very important to find out an effective way to monitor the database performance and alert the database performance problem. Because their are many concurrent sessions, any small problem can make the database run into worse situation, if we did not get it resolved in time, the database may hang, the application may hang, and the sevice lost until we restart the database server or application servers. Some companies hire NOC DBAs to perform 7x24 monitoring, some companies install some performance monitoring tools such as Nagios, to monitor the database by server load or CPU usage etc at every 5 minutes, and send an alert to DBAs in time.

    As a matter of fact, the key point is not that you have 7x24 NOC DBAs, you need a good tool to gather database performance data, and then build some alert rules based on these data. Nagios is not good enough, 5 minutes check interval is not effective, it's hard to build good alert rules based on the data that oramon gathered. The oramon utility can gather key performance data for you, why not make some good alert rules on it, and the get the perfromance monitoring done well?

    I rewrite the data gather tool with Perl, to integrite the performance alert rules and support different RDBMS as the performance database. Because oramon gather performance data at every 10 seconds, so the alert rules is not based one point of data, it's based on the recent 5 points of data, it can improve the alert rules quanlity greatly.

    After one days test, I add the following database performance alert rules to my databases. If the active sessions in the last 5 points are greater than given value (different value for different databases), send out the performance alert. If the enqueue wait sesssions in the last 5 points are greater than 10, send out the performance alert, because I think enqueue wait should be avoided in OLTP systems. If the CPU WIO percent in the last 5 points are greater than 40, send out the performance alert. If the CPU USER percent in the last 5 points are greater than 80, send out the performance alert. If the active parallel slave sessions in the last 5 points are greater than 4, send out the performance alert.

    With the data gathered by oramon, we can add more performance alert rules according to swap in and out, physical read, physical writes, the average db file sequential read time, the IOPS (summary of database read and write event waits) of the database. Don't send performance alert just according to the load average or CPU utilization.

    By embed the performance alert rules into the data scan tool, we can get the performance alert quickly, right after it happens, because it's applying the rules at every 10 seconds. By check the last 5 points of data, it can eliminate lot's of unnecessary performance alerts. Last day, we get some performance alert from the enqueue wait rule, and we get it resolved quickly without impaction of the service. With these effective performance alert rules, DBAs can master the database well, and then can sleep well every day.

Comments (1)

comment3,

Post a comment

Remember Me?

« Previous | Main | Next »

Powered by
Movable Type 5.01