Evaluate database health using DMVs
In this tutorial we will learn how to evaluate database health using DMVs. However, Microsoft Azure SQL Database and Azure SQL Managed Instance enable a subset of dynamic management views to diagnose performance problems. Further, this might be caused by blocked or long-running queries, resource bottlenecks, poor query plans, and so on.
Identify CPU performance issues
If CPU consumption is above 80% for extended periods of time. Then, consider the following troubleshooting steps:
The CPU issue is occurring now
If issue is occurring right now then, there are two possible scenarios:
- Firstly, Many individual queries that cumulatively consume high CPU
Use the following query to identify top query hashes:
SQL
PRINT ‘– top 10 Active CPU Consuming Queries (aggregated)–‘;
SELECT TOP 10 GETDATE() runtime, *
FROM (SELECT query_stats.query_hash, SUM(query_stats.cpu_time) ‘Total_Request_Cpu_Time_Ms’, SUM(logical_reads) ‘Total_Request_Logical_Reads’, MIN(start_time) ‘Earliest_Request_start_Time’, COUNT() ‘Number_Of_Requests’, SUBSTRING(REPLACE(REPLACE(MIN(query_stats.statement_text), CHAR(10), ‘ ‘), CHAR(13), ‘ ‘), 1, 256) AS “Statement_Text” FROM (SELECT req., SUBSTRING(ST.text, (req.statement_start_offset / 2)+1, ((CASE statement_end_offset WHEN -1 THEN DATALENGTH(ST.text)ELSE req.statement_end_offset END-req.statement_start_offset)/ 2)+1) AS statement_text
FROM sys.dm_exec_requests AS req
CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) AS ST ) AS query_stats
GROUP BY query_hash) AS t
ORDER BY Total_Request_Cpu_Time_Ms DESC;
- Secondly, Long running queries that consume CPU are still running
Use the following query to identify these queries:
SQL
PRINT ‘–top 10 Active CPU Consuming Queries by sessions–‘;
SELECT TOP 10 req.session_id, req.start_time, cpu_time ‘cpu_time_ms’, OBJECT_NAME(ST.objectid, ST.dbid) ‘ObjectName’, SUBSTRING(REPLACE(REPLACE(SUBSTRING(ST.text, (req.statement_start_offset / 2)+1, ((CASE statement_end_offset WHEN -1 THEN DATALENGTH(ST.text)ELSE req.statement_end_offset END-req.statement_start_offset)/ 2)+1), CHAR(10), ‘ ‘), CHAR(13), ‘ ‘), 1, 512) AS statement_text
FROM sys.dm_exec_requests AS req
CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) AS ST
ORDER BY cpu_time DESC;
GO
Identify IO performance issues
When identifying IO performance issues, the top wait types associated with IO issues are:
- Firstly, PAGEIOLATCH_*. This is for data file IO issues (including PAGEIOLATCH_SH, PAGEIOLATCH_EX, PAGEIOLATCH_UP). However, if the wait type name has IO in it, it points to an IO issue. And, if there is no IO in the page latch wait name, it points to a different type of problem.
- Secondly, WRITE_LOG. This is for transaction log IO issues.
If the IO issue is occurring right now
Use the sys.dm_exec_requests or sys.dm_os_waiting_tasks to see the wait_type and wait_time.
Identify data and log IO usage
Use the following query to identify data and log IO usage. However, if the data or log IO is above 80%, it means users have used the available IO for the SQL Database service tier.
SQL
SELECT end_time, avg_data_io_percent, avg_log_write_percent
FROM sys.dm_db_resource_stats
ORDER BY end_time DESC;
If the IO limit has been reached, you have two options:
- Firstly, Option 1: Upgrade the compute size or service tier
- Secondly, Option 2: Identify and tune the queries consuming the most IO.
Identify tempdb performance issues
When identifying IO performance issues, the top wait types associated with tempdb issues is PAGELATCH_* (not PAGEIOLATCH_). However, PAGELATCH_ waits do not always mean you have tempdb contention. This wait may also mean that you have user-object data page contention due to concurrent requests targeting the same data page. Further, to confirm tempdb contention, use sys.dm_exec_requests to confirm that the wait_resource value begins with 2:x:y where 2 is tempdb is the database ID, x is the file ID, and y is the page ID.
For tempdb contention, a common method is to reduce or re-write application code that relies on tempdb. However, common tempdb usage areas include:
- Firstly, Temp tables
- Secondly, Table variables
- Thirdly, Table-valued parameters
- Then, Version store usage (specifically associated with long running transactions)
- Lastly, queries that have query plans that use sorts, hash joins, and spools
Top queries that use table variables and temporary tables
Use the following query to identify top queries that use table variables and temporary tables:
SQL
SELECT plan_handle, execution_count, query_plan
INTO #tmpPlan
FROM sys.dm_exec_query_stats
CROSS APPLY sys.dm_exec_query_plan(plan_handle);
GO
WITH XMLNAMESPACES(‘http://schemas.microsoft.com/sqlserver/2004/07/showplan’ AS sp)
SELECT plan_handle, stmt.stmt_details.value(‘@Database’, ‘varchar(max)’) ‘Database’, stmt.stmt_details.value(‘@Schema’, ‘varchar(max)’) ‘Schema’, stmt.stmt_details.value(‘@Table’, ‘varchar(max)’) ‘table’
INTO #tmp2
FROM(SELECT CAST(query_plan AS XML) sqlplan, plan_handle FROM #tmpPlan) AS p
CROSS APPLY sqlplan.nodes(‘//sp:Object’) AS stmt(stmt_details);
GO
SELECT t.plan_handle, [Database], [Schema], [table], execution_count
FROM(SELECT DISTINCT plan_handle, [Database], [Schema], [table]FROM #tmp2
WHERE [table] LIKE ‘%@%’ OR [table] LIKE ‘%#%’) AS t
JOIN #tmpPlan AS t2 ON t.plan_handle=t2.plan_handle;
Monitor resource use
You can monitor Azure SQL Database resource usage using SQL Database Query Performance Insight. However, for Azure SQL Database and Azure SQL Managed Instance, you can monitor using Query Store. Further, you can also monitor usage using these views:
- Firstly, Azure SQL Database: sys.dm_db_resource_stats
- Secondly, Azure SQL Managed Instance: sys.server_resource_stats
- Lastly, both Azure SQL Database and Azure SQL Managed Instance: sys.resource_stats
sys.dm_db_resource_stats
You can use the sys.dm_db_resource_stats view in every database. The sys.dm_db_resource_stats view shows recent resource use data relative to the service tier. However, the average percentages for CPU, data IO, log writes, and memory are recorded every 15 seconds and are maintained for 1 hour. For example, this query shows the average and maximum resource use for the current database over the past hour:
SQL
SELECT
AVG(avg_cpu_percent) AS ‘Average CPU use in percent’,
MAX(avg_cpu_percent) AS ‘Maximum CPU use in percent’,
AVG(avg_data_io_percent) AS ‘Average data IO in percent’,
MAX(avg_data_io_percent) AS ‘Maximum data IO in percent’,
AVG(avg_log_write_percent) AS ‘Average log write use in percent’,
MAX(avg_log_write_percent) AS ‘Maximum log write use in percent’,
AVG(avg_memory_usage_percent) AS ‘Average memory use in percent’,
MAX(avg_memory_usage_percent) AS ‘Maximum memory use in percent’
FROM sys.dm_db_resource_stats;
sys.server_resource_stats
You can use sys.server_resource_stats to return CPU usage, IO, and storage data for an Azure SQL Managed Instance. The data is collected and aggregated within five-minute intervals. And, there is one row for every 15 seconds reporting. However, the data returned includes CPU usage, storage size, IO utilization, and managed instance SKU. Historical data is retained for approximately 14 days.
SQL
DECLARE @s datetime;
DECLARE @e datetime;
SET @s= DateAdd(d,-7,GetUTCDate());
SET @e= GETUTCDATE();
SELECT resource_name, AVG(avg_cpu_percent) AS Average_Compute_Utilization
FROM sys.server_resource_stats
WHERE start_time BETWEEN @s AND @e
GROUP BY resource_name
HAVING AVG(avg_cpu_percent) >= 80
sys.resource_stats
The sys.resource_stats view in the master database has additional information that can help you monitor the performance of your database at its specific service tier and compute size. However, the data is collected every 5 minutes and is maintained for approximately 14 days. For example, query for data exposed,
SQL
SELECT TOP 10 *
FROM sys.resource_stats
WHERE database_name = ‘resource1’
ORDER BY start_time DESC
Reference: Microsoft Documentation