Database monitoring is a multilayer pursuit which involves the tracking of database performance metrics and resources. The goal is to identify and prevent issues that affect database hardware and software performance and the availability of critical application infrastructure.
Effective database monitoring requires keeping a watchful eye on several important levels of the database. Often, the sources of your database performance challenges are much deeper than you think. The four levels described below detail what to monitor and the areas that can have the greatest impact on database performance.
1) Monitoring at the SQL level
Your applications are continually firing SQL queries against database instances, then formatting and presenting the results to users. Bottlenecks at this level usually boil down to inefficient SQL statements that introduce latency, mishandle errors and impair throughput and concurrency.
To track down the offending SQL, focus your database monitoring in these areas:
- Optimizer statistics — The query optimizer is a core component in any relational database management system (RDBMS) and its role is critical to SQL performance as it should determine the most efficient way to execute the query and retrieve the required rows; however, in order for the query optimizer to work optimally, it needs to have representative statistics about the various schema objects.
- Execution plans — Execution plans contain the steps needed to retrieve the data. It acts like a “GPS” which provides the instructions on how to efficiently get from point a to point b. The execution plans also include key information such as indexes associated with database objects and the impact/cost of every step in the SQL statement execution. Metrics like I/O cost, CPU cost and subtree cost point to the effects of each statement on the environment.
- Wild cards — It’s easy to abuse wild cards — statements like SELECT * — because in small tables with few records, the performance hit is negligible. But as databases grow, so does the overhead of SELECTing many times more records than are actually needed.
- Filtering — Use WHERE clauses for filtering as early as possible in the query. The sooner you pare down the overall data set to just the data needed to answer the question at hand, the lower the burden you impose on logical objects and physical resources.
- Implicit conversions — Avoid the need to convert data from one type to another. If the data type of a column in your table is varchar and you’re trying to compare values in that column to an integer, an implicit conversion is required. That makes the platform burn precious CPU cycles unnecessarily. Better to align the data types early in the query.
- Row-by-row processing — Any RDBMS works best on set processing rather than on looping through each row in each table.
- Indexes — You can have too many, and you can have too few. But don’t make the mistake of disabling your indexes. In SQL Server, for example, you can inadvertently keep the database from using an intended index if you place a function on an indexed column.
2) Monitoring at the instance/database level
Whether you’re managing relational database systems like Oracle or SQL Server or open source and NoSQL platforms like PostgreSQL, MYSQL or MongoDB, or any combination of the above, each platform is a factor in performance.
Database monitoring at the platform level is important because it provides oversight of all the moving parts necessary to keep your databases running smoothly. Whether you are monitoring one or multiple database platforms, keep an eye on the following areas:
- I/O contention — The paramount input/output metric is usually the number of logical reads incurred by a given SQL statement. The art of writing good SQL involves a continual quest to reduce the number of times you make the database perform logical reads. Thus, I/O contention is an important metric in SQL query optimization.
- Locked objects — Locking is at the heart of transaction concurrency, which is the ability of the transaction to pass the ACID (Atomicity, Consistency, Isolation, Durability) test. Examining locked objects, like indexes themselves, can help you quickly identify and resolve the root cause for locks in the system. SQL statements run in the context of individual sessions. Not all database sessions are active (running in the foreground) all the time, so inactive sessions are unlikely to affect performance. But sessions can also be blocked due to database locks.
- Wait stat analysis — Wait events are associated with specific resources like CPU, memory and network resources. Bottlenecks in those resources can affect the individual SQL statements that rely on them. It’s also useful to know how to interpret the different wait types at work in your database, including locking/blocking (LCK), I/O problems (PAGEIOLATCH), latch contention (LATCH) and network slowdown (NETWORK).
- Parameters — Every RDBMS has configuration parameters that can significantly impact the performance of the database. This includes memory parameter, optimizer parameters, file parameters, and more.
- Files — RDBMS systems have various files such as data files, transactions logs, undo, and more. It is essential to ensure that these files are configured properly as it relates to their size and associated parameters to ensure optimal performance.
3) Monitoring at the infrastructure level
Of course, it’s all castles in the air if there isn’t an operating system and hardware underneath the platform and the SQL. At this level, you want to have your eye on elements like these:
- CPU — All your queries require CPU cycles. Which ones are consuming the most cycles? Are any of them unduly burdening the CPU? Are there trends in the way the most active processes are consuming CPU cycles?
- Memory — Like CPU cycles, your queries need memory. When large chunks of RAM are bogged down, other queries must wait longer for processing. Only by drilling into your overall memory utilization can you see the amount of memory that each process consumes.
- Storage subsystem — Logical reads refer to I/O from memory, but if the platform can’t answer the query from the data in memory, it reads from disk. That’s slower, and the more bytes that are read and written, the more it affects performance.
- Network — Are your queries generating network activity? Should they be? A query spending most of its time on ASYNC_NETWORK_IO does not necessarily mean a network problem; the system could simply be feeding too much data to the client. But high network utilization can mean that your queries are sending read requests and writes to unanticipated resources on the network.
4) Monitoring at the user/session level
The user level and sometimes even session level, can be the most deceptive level of all. If your users are complaining, then you know you have a problem somewhere and you need to locate it and deal with it.
But what if they aren’t complaining? Does that mean you can take the rest of the day off?
The fact that your databases are running smoothly today is no guarantee that they aren’t going to give you a big headache tomorrow. Database monitoring allows you to proactively identify problems before they are even noticed at the user level.
Why database monitoring needs to be multilevel
Why is database monitoring across all four levels necessary? When you face complaints at the user level, it’s tempting to jump to the infrastructure level and throw hardware at the problem. Some IT teams are longer on hardware expertise than in diagnostic skills, so they naturally gravitate toward adding processing power, memory, disk space and gigabits of throughput. After all, if your favorite tool is a hammer, every problem looks like a nail.