New DBA Series: DBA Checklists – Part One

Database Administrator (DBA). Sounds important, doesn’t it?  But what are you supposed to do every day?

People often ask us: “what tasks should an OpenEdge DBA be performing? When? What should my daily, weekly, quarterly, etc. checklist have on it?”

In this New DBA Series group of posts, we’ll provide a few recommendations. Use the comments below to let us know what else you check regularly.

Morning

Paul likes to call these tasks his “Morning Smile Routine”. One of our customers uses the term and it stuck. Start your work day with your favourite beverage and this list of tasks:

Databases

  • Verify successful backup
  • Verify after-imaging is enabled and properly switching extents; most extents should be empty and one busy
  • If using OE Replication, ensure the targets are synchronized and there is no more than one locked AI extent
  • If you are using after-imaging to maintain a warm spare verify that it is available and up to date
  • Check BI file size
  • Check for long open transactions
  • Review database log file for overnight error messages
  • If using Alternate Buffer Pool, ensure that there are free buffers in ABP and that lru2 is disabled
  • If you use fixed extents rather than “all variable”, check fixed extent free space
  • Verify monitors are running and alerts are flowing
  • Check monitored metrics for trends approaching actionable thresholds
  • Read Progress PANS alerts

OS

  • Verify sufficient file system free space, especially check:
    • DB filesystems (data, bi, and ai filesystems)
    • AI archive filesystem
    • backup filesystem
    • temp files (the Progress -T parameter)
    • application specific filesystems
  • Check for “runaway” processes
  • Check for overnight processes that may still be running (but should not be)
  • Review OS logs
  • Review OS free memory
  • Review summary of previous day’s CPU and disk utilization
  • Check OS configuration for unwelcome changes

During the Day

No one wants to spend their day staring at a monitoring screen and reading hundreds of alert emails, but you should check the following metrics at least 2-3 times throughout the day. Keep note of the values in a spreadsheet so that you and your colleagues can more easily identify what is normal and what is unusual. Or use ProTop to trend the data automatically.

  • Make sure that AI switching & warm spare applies are occurring regularly
  • Pay attention to the number of users/connections, know what is normal so that you recognize unusual increases or decreases
  • Look out for high disk IO rates, especially if they result in a low buffer hit ratio
  • Watch for unusually active tables or indexes, be extra vigilant after new code has been released
  • Keep an eye peeled for unusual log file messages and alerts
  • Notice the lock table high water mark and the normal number of active locks
  • Observe the time between checkpoints
  • Check for long open TRX & BI file growth – Find the oldest transaction (usually a code problem)
  • Investigate blocked users/connections. REC = record locking, coding issue. BK*, TX*, etc. indicate system resource constraints
  • Take note of excessively active connections or “rapid readers.” What are they doing? Is it legitimate? Is there a better way?
  • Use the “client statement cache” or “proGetStack” to identify specific code causing a problem – work with development to get it fixed
  • Identify latch waits or high numbers of latch requests on latches that should be low, e.g. OM, LRU, LRU2, EC, CDC
  • Scan for OS bottlenecks and constraints

Weekly

There is nothing worse than needing to scan a database log file, only to find that it is 42 GB in size. Or spending 20 minutes analyzing a DB Analysis before realizing it was generated in 2003.

  • Rotate/Truncate the .lg files; this can be automated in 11.7+
  • Clean up trash in database directories -protrace, leftover scratch files, core files, etc. Don’t forget the -T directories!
  • Refresh dbanalys – don’t forget the DEV/TEST/QA/Training databases.
  • After refreshing dbanalys – review index utilization, identify idxcompact targets. Check rows per block settings. Check for very long RM Chains; fragmentation; scatter.
  • Schedule appropriate remediation activities

Monthly

Once a month, take an hour to summarize and document what happened. Did you have any down time? Was it scheduled or unscheduled? Is your infrastructure able to handle the load? Were there any major changes (more users, new application code/processes/database features/settings/OpenEdge release…) and how did they affect the systems?

  • Outage summary
    • Planned and unplanned
  • Capacity Planning Reports:
    • Basic CRUD & transaction trends
    • Overall database growth
    • User/connection trends
    • I/O response
    • Project disk space needs
    • Project disk throughput needs
    • Project memory & CPU utilization

Quarterly

If you’re actively monitoring your OpenEdge environment, you will often come across tuning and tweaking opportunities. But at the very least, schedule some time once per quarter to review your OpenEdge environment and adjust as necessary.

  • Review all startup parameters (brokers and clients) and config options
    • Did you recently move to a new OpenEdge release that affords new configuration opportunities?
  • Review your storage area configuration, has the usage of any of the tables changed in ways that might mean that table should be in a different area?
  • If you are allowing SQL-92 connections – run  dbtool to adjust SQL width; run “update statistics” for the optimizer
    • These tasks can be automated in recent OpenEdge releases
  • If you are using SSL, etc. – review certificate validity & expiration. Don’t find out about expired certificates the hard way
  • Review OS configuration, kernel parameters, etc. – test IO throughput: random reads & synchronous writes
  • Review OpenEdge release level, is it time to apply patches or start planning an upgrade?
  • Review monitored metrics and alerts

Annually

Ah yes…the annual checklist…that few actually do. When was the last time you actually executed your DR plan or reviewed your licensing?

  • Disaster recovery test
  • License review & “true-up”
  • Review hardware landscape and potential upgrades
  • Review business growth plans & projections
  • Review the current actual RPO and RTO with business leads and confirm they still meet the business’ needs
  • Make your plans to attend PUG Challenge, Progress Next and other conferences!

What are you doing?

Are you a good DBA? Are you checking some or all of these metrics? Did we forget anything? Let us know in the comments below!

But wait…there’s more!

Stay tuned for part two, where we’ll talk about Special Events like updates, upgrades and migrations. We have our checklists for those too!

No Comments

Post A Comment

Related post...