ProTop Newsletter October 2021

Welcome to our monthly ProTop newsletter, where you’ll find out what’s new, tips and tricks and other cool ways to use both the free and commercial versions of ProTop.

 

Today we’re going to take a deep dive into ProTop’s advanced alerting functionality. While the free version of ProTop provides a smorgasbord of data and insight into your OpenEdge environment, the commercial version adds features like alerting that help you sleep better at night.

Overview

Everyone has a default alerting system: their users. When there’s a problem, the users call the help desk, the help desk investigates and escalates, much scrambling around occurs, eventually the issue is resolved or life goes on until the next time. There is a cost associated with this reactive model: lost productivity, lost revenue, lower customer satisfaction, burned-out staff, tarnished company reputation…

Rather than reacting “after the horse has left the barn”, ProTop strives to proactively alert you to looming issues before the business notices that there is a problem brewing.

Deployment architecture

ProTop is made up of two components:

  1. 4GL client: this is the 4GL application that you download and install in your OpenEdge environment. It includes the free ProTop RT console and the monitoring and advanced alerting agent
  2. Web portal: this is the web component of ProTop, where alerts and telemetry data are stored long term. It can be deployed on premise or in the cloud, in private or shared infrastructure

ProTop RT is free, and the web portal has a mix of both free and commercial features.

Data collectors

The data that you see in ProTop RT‘s panels is provided by a series of data collector programs. When you visualize one of the 50+ panels, that data is provided by a data collector.

When alerting is enabled, the DBA configures which data collectors are relevant to each monitored resource, and enables those data collectors in the agent configuration file, pt3agent.cfg. For example, if you are not using OpenEdge Replication, then there is no need to activate the replAgent data collector.

Alert anatomy 101

Alerting is configured locally, on the monitored server, using the alert.cfg file. ProTop installs with a default alert.cfg, but you should not modify this file as it will get overwritten by the next update. Instead, make a copy and name it using our hierarchical naming convention.

Each line in alert.cfg is made up of 8 elements. They are summarized below and discussed in details on the Elements of an Alert Definition page.

  1. Metric name: There are over a thousand at last count
  2. Metric type: numeric or character
  3. Operator: >, <, =, <>, <=, >=
  4. Threshold value for the metric being tested
  5. Sensitivity of the trigger: it can be numeric (fire after # or more occurrences) or a ratio X:Y (fire after X occurrences over Y samples)
  6. Notification (nag) level: How often to fire the alert
  7. Message text: Configurable alert message
  8. Action: This is where it gets interesting – see below

Here are a few examples:

afterImaging char <> "enabled" "" "daily" "&1 &2 &3" alarm

If the afterImaging metric does not contain the string “enabled”, generate an alarm. But only bug me about this once per day.

dbBkUpFull num > 100900 "" "daily" "&1 &2 &3" alert
dbBkUpFull num > 129600 "" "daily" "&1 &2 &3" alarm

dbBkupFull is the elapsed time in seconds since the last full backup using probkup. If its value is greater than 100,900 seconds (28 hours), generate an alert. If its value is greater than 129,600 seconds (36 hours), generate an alarm.

osRd num > 15000 "2:3" "hourly" "OS reads/sec &1 &2 &3" alert

If the number of operating system reads exceed 15,000 per second for 2 out of 3 samples, generate an alert. Do not repeat the alert until an hour has elapsed since the last one.

Alert Levels

ProTop supports 4 increasingly critical alert levels: info (blue), alert/warning (yellow), alarm (orange), page (red). As the DBA, you configure which combination of metrics and thresholds trigger which level of alert.

  • Info (blue): a purely informational message, such as “Backup successfully completed”
  • Warning (yellow): individually, these alerts may not require immediate action, but collectively they may inform you of an interesting trend or pattern
  • Alarm (orange): These are alerts that typically require some action on the part of the DBA or are of special interest to the DBA
  • Page (red): Page alerts are critical alerts. The DBA needs to act immediately before the business perceives the impact of the problem

There are also “script” alerts (see the Script Responses section below), notifying the DBA when an automated script response was executed.

Pro Tip:

All alerts are uploaded and displayed in the ProTop web portal, and you should make it a habit to check the Alert Dashboard a few times per day. Info (blue) and warning (yellow) alerts should only reside in the Alerts Dashboard. No additional action is typically warranted. Alarms (orange) are usually sent to a more visible channel: email, Slack, Teams, ServiceNow (or whichever help desk application). Page (red) alerts should always go to a I-WANT-TO-KNOW-NOW service. This could mean an SMS message directly to your phone or a service – at White Star Software we use PagerDuty.

Alert Enhancers

Alert enhancers provide detailed contextual information when certain alerts are triggered, allowing the DBA to act immediately. For example, the zBlockedDetails alert enhancer includes details on who is blocked by whom and why. If the blocked process is critical, the DBA immediately knows the connection ID, device and PID of the process to interrupt.

In this example, a batch process is blocked waiting for an exclusive lock on a record in the po-trans table. The ROWID of the record in question is 50682868, and the offending user is #922, on /dev/pts/662 with PID 1307.

 

There are similar alerts for long transactions and heavy database activity. See the Alert Configuration help page for details on all the available alert enhancers.

Script Responses

ProTop gives a DBA the ability to run any custom code in response to any alert through the automated script response functionality. If activated, the ProTop agent will look for a script with the same name as the triggered metric and execute it if it exists, passing as parameters the metric name, metric value, monitored resource name and the full DB path if it’s a database being monitored.

Actions

Let’s get to the action! You can combine alert levels, alert enhancers and script responses to create a series of actions to execute when an alert is triggered. Let’s take the zBlkDura alert shown above. The alert.cfg line may look something like this:

zBlkDura num > 300 "" "hourly" "Session blocked &1 &2 &3 seconds" zBlockedDetails,alarm

If any process is blocked for more than 5 minutes, run the zBlockedDetails alert enhancer and generate an alert.

If you additionally wanted to run a custom data-gathering script, for example, the alert definition would include the script tag:

zBlkDura num > 300 "" "hourly" "Session blocked &1 &2 &3 seconds" zBlockedDetails,script,alarm

If this alert was triggered, ProTop would search for a script named $PROTOP/bin/zBlkDura (or zBlkDura.bat on Windows) and execute it. Note that file names on UNIX/Linux are case sensitive, so you must match the case exactly as defined in the Alertable Metrics section.

And as mentioned earlier, you can have multiple alert definitions for a metric, allowing you to escalate your response:

zBlkDura num > 120 "" "hourly" "Session blocked &1 &2 &3 seconds" zBlockedDetails,alert
zBlkDura num > 300 "" "hourly" "Session blocked &1 &2 &3 seconds" zBlockedDetails,script,alarm

In this case, an alert (yellow – warning) would be generated if the blocked duration exceeded 120 seconds.  if the blocked duration exceeded 300 seconds, a script response would be triggered and an alarm (orange) raised.

Pro Tip: Always put the alert level last, so that the output from the previous actions are included in the triggered alert.

Sleeping Better at Night

All of this functionality and more exists so that you can sleep easier through the night, knowing that if something critical starts to brew in your OpenEdge environment, ProTop can notify you in time to act before the business even notices.

What’s next?

Would you like to suggest a cool new feature for ProTop? Have any questions or comments? Head over to our community page at https://community.wss.com.

Interested in learning more about the free ProTop RT (Real-Time)? The help pages at https://help.wss.com contained detailed instructions on how to install and configure ProTop RT.

Intrigued by the monitoring and alerting aspect of ProTop? Install the free version first, then reach out to us to activate a free trial and show you around the commercial features.

Want to learn more about being an OpenEdge DBA? Sign up to get all our blogs and updates in your inbox and subscribe to our YouTube page.

No Comments

Post A Comment

Related post...