Which Databases are Actually Used? CPU and Memory Use BY Database Queries

We have a fairly large SQL Server environment where I work with over 200 total SQL instances.  They are spread about 60/40 non production vs. production and with an active development staff, we have many new databases being created and worked on in our non production environments every week.  In order to keep an eye on which databases are used (in use) we began capturing a snapshot of database level CPU and memory usage twice a day (6AM/6PM).  Over time this data can give a decent indicator of which databases have activity (user, developer or otherwise) and their overall “rank” against the other databases on a particular instance.  This type of data helps us make decisions on:

  • Clean Up – was this database created as a one off test data set in dev? can it be deleted?
  • Consolidation – is this database used minimally enough to be considered a candidate for consolidation to a shared sql instance?
  • High Level Resource Usage per instance – which DB is consuming the most resources on which instance?

In order to keep things simple, we use basic DMV queries to collect this data twice a day.. For CPU usage we look at sys.dm_exec_query_stats and for memory usage, we look at sys.dm_os_buffer_descriptors.  Let’s take a quick look at the collection queries:  Note: a colleague sent me some basic queries that I tweaked for my needs, so please forgive me if I am not attributing the right folks from the sql community who originally posted these (or similar).

–Declare a Date for the collection time

Declare @TheDate date;

Select @TheDate = getdate();

 

WITH DB_CPU_Stats

AS

(SELECT DatabaseID, DB_Name(DatabaseID) AS [DatabaseName], SUM(total_worker_time) AS [CPU_Time_Ms]

FROM sys.dm_exec_query_stats AS qs

CROSS APPLY(SELECT CONVERT(int, value) AS [DatabaseID]            

FROM sys.dm_exec_plan_attributes(qs.plan_handle)       

WHERE attribute= N’dbid’) AS F_DB

GROUP BY DatabaseID)

SELECT @TheDate ,@@ServerNameDatabaseName, [CPU_Time_Ms],

CAST([CPU_Time_Ms] * 1.0 / SUM([CPU_Time_Ms]) OVER() * 100.0 AS DECIMAL(5, 2)) AS [CPUPercent]

FROM DB_CPU_Stats

WHERE DatabaseID > 4 — system databases

AND DatabaseID <> 32767 — ResourceDB

ORDER BY 4 desc OPTION (RECOMPILE);

 

 The query above can be run on any SQL Server 2008+ machine and return the total CPU time consumed per database (since the last reboot or exec_stats flush) as well as the percentage of the total CPU consumed across the whole server.  The results look like this:

cpubydb

Remember that the above values are cumulative through the last reboot, so you need some intelligence to self join the table across intervals and calculate the deltas.

The other query determines which databases are consuming the most memory at the time of the collection:

Declare @TheDate date;

Select @TheDate = getdate();
— get memused per DB
With DB_MemUsed as(
SELECT @@ServerName as Servername, db_name(database_id) as databasename,database_id,

              COUNT(*) * 8 / 1024 AS MBUsed

FROM   sys.dm_os_buffer_descriptors

where database_id > 4

and database_id  <> 32767

GROUP BY      database_id

)

SELECT @TheDate,ServerName, DatabaseName, MBUsed,

CAST(MBUsed * 1.0 / SUM(MBUsed) OVER() * 100.0 AS DECIMAL(5, 2)) AS MBUsedPercent

FROM DB_MemUsed

WHERE Database_ID > 4 — system databases

AND Database_ID <> 32767 — ResourceDB

ORDER BY 4 desc OPTION (RECOMPILE);

 

The result set looks like this:

membydb

Collecting the data to a central database table over time provides a nice, centrally managed data set to run trending queries against.  This information is NOT cumulative so each collection is just a snapshot of the memory consumption at that particular time. This makes it easier to run aggregates against the data over a time period like this:

Select datepart(year,a.capture_time) as captureyear,

datepart(week,a.capture_time) as captureweek,

a.servername,

a.databasename,

avg(MBUsed) as AvgMB_MEMUsed,

Max(MBUsed) as MaxMB_MEMUsed,

Min(MBUsed) as MinMB_MEMUsed

from [DB_MemUsage_ByDatabase_ByDay] a

where a.servername = ‘<MyINstanceName>’

group by datepart(year,a.capture_time),

datepart(week,a.capture_time)  ,

a.servername,

a.databasename

The above query assumes you are saving the results of the memory usage query to a table called DB_MemUsage_ByDatabase_ByDay.  Once several days’ worth of collections are available, you can run the above query to get some aggregate memory usage statistics “By Week”.

Hope this is helpful, send me an email or comment if you need assistance or have any questions.

Advertisements

Collecting xEvents Query Stats From Multiple SQL Instances to a Central Database Part 3

In this post on collecting xEvent query execution data to a central database I will cover the logic in the powershell script (provided as a link below) that determines the instances to collect from and then connect and collect the data to the central database.

First a refresher with an image from my previous post.

figure 1

As you can see above, the powershell script first queries the central database (DB_ServerMain table) to determine what instances to collect from.  Then it loops through those instances and executes the xEvents tSQL queries from my previous posts.  Let’s take a quick look at some parts of the PS1 file.

Determine What Instances to Collect From

At the top of the powershell file is a query and then the beginning of a loop:

ps1_part1

The first part of the PS1 file opens a connection to the central database and executes the query against DB_ServerMain to determine which SQL instances to collect xEvent query data from. At the bottom of the snippet above, you can see that it assumes all .XEL files will be in “C:\DBAMonitoring\traces\” on the remote machine.  It them goes into a foreach loop iterating through the list of remote SQL Server instances to collect from.

Within the loop, a connection string for source (remote server) and destination (central collection server) are defined, some remote instance specific variables are set (for incremental loading) and then the SQL statement is defined dynamically:

ps1_part12JPG

Note that xel filename and fileoffset are embedded in the dynamic SQL prior to it’s execution.  Then the actual call to run the query on the remote server and save the results to the central server is executed in a TRY/CATCH block:

ps1_part3

The code first defines the destination table name in the central DB instance (“XEvents_Queries”).  Then it opens the connection to the remote instance, creates a SqlCommand object (with the query AND the source connection) and then executes the query (with the .ExecuteReader() method of  the command object).  At this point, nothing has been written to the central collection database.  If there is an error in the execution of the query on the remote instance, then the CATCH block will fire.  In the powershell script, my catch block basically assumes there is a file rollover condition by checking for the phrase “The offset” in the caught exception: $ex1.Contains(“The offset”).  If the named xel file and offset are no longer valid (due to a file rollover), it basically resets the sql query to call the system function with NULLs instead of named incremental starting point (like this:   FROM sys.fn_xe_file_target_read_file(‘C:\DBAMonitoring\traces\sql2012*.xel’, null,NULL,NULL))

The final actionable piece of the script uses the SQLBulkCopy object to connect to the central collection database, and then write resultset from the remote instance to it:

ps1_part4

Basically, it connected to the central collection server with the conneciton string declared at the beginning of the script.  It also defines the destination table, bacthsize and timeout values.  The .WriteToServer method is called with the $Reader variable that contains the resultset from the remote instance.  The .WriteToServer method will perform a BCP insert operation in chunks of 5000 rows.

That’s pretty much it for the powershell script.  You can download the full version of it here FetchQueryMonitorxEvents.ps1.

Collecting xEvents Query Stats From Multiple SQL Instances to a Central Database Part 2

In this post I will continue from the previous post about centralizing xEvents based query execution stats collection and reporting.   The key topics I will cover are:

1) Determine which sql instances you need to collect from and when the last collection return (to handle incremental loads throughout the day from the source servers to the central DB server)

2) Review of the xPath oriented queries that parse the .XEL files into a relational format for storage in the central DB

In my last post, I ended with discussing the XEL file’s offset functionality and how to use it within the sys.fn_xe_file_target_read_file system function to bring back a set of events “since the last poll”.  This functionality is what I use to perform incremental loads of xEvent query execution data from various remote sql servers into my central database.  First let’s take a look at the basic data model I am using:

xEventStorageTable

DB_ServerMain has the name of the SQL Server and a flag (enablexEventQuerCollection) to denote whether it should be included in centralized collection.  Straightforward enough… you can get the list of SQL instances to collect xEvents from with a query like:

Select *
from DB_ServerMain
where enablexEventQueryCollection = 1

Recall from my previous post, that we can use the file_offset value in the .XEL file to basically run an incremental load (return everything after the specified offset).  In the data model presented above, you can see the XEvents_Queries table has the servername, file_name, and file_offset value (along with the query stats specifics).  In order to fully automate the collection process, we need the name of the SQL Server instance, the name of the last XEL file collected from AND the highest file_offset value within that last file that we have a record of.

Select  a.ServerName,isnull(xEventFile.xEventFile_Name,'NULL') as xEventFile_Name, isnull(convert(varchar(50),xEventFileOffset.xEventFile_Offset), 'NULL') as xEventFile_Offset 
from DB_ServerMain a 
OUTER APPLY (Select max(file_name) as xEventFile_Name from dbo.XEvents_Queries ws where s.servername = a.servername ) as xEventFile 
OUTER APPLY (Select max(file_offset) as xEventFile_Offset from dbo.XEvents_Queries ws where ws.servername = a.servername and ws.file_name = (Select max(file_name) as xEventFile_Name from dbo.XEvents_Queries ws where ws.servername = a.servername )) as xEventFileOffset

I’ll admit, it’s not the slickest code in the world, but it works to pull the servername, filename and file_offset for each instance to be monitored.  The OUTER APPLY(s) help in keeping it fast enough even after you have millions of rows in xEvent_Queries.    Once you know the instances you want to collect from (and the latest filename/fileoffset combination), you can connect to each SQL instance and select the contents of the .XEL files using the additional parameters in the call to sys.fn_xe_file_target_read_file() system function like this:

SELECT CAST(event_data AS XML) AS event_data_XML, file_name, file_offset INTO #Events FROM sys.fn_xe_file_target_read_file('C:\DBAMonitoring\traces\sql2012*.xel', null,'C:\DBAMonitoring\traces\sql2012_0_130295018340930000.xel', 141312) AS F

In the example above, the call will create (and fill) a temporary table called #Events with the alerttime and full event XML for all files that match the wildcard search “c:\DBAMonitoring\traces\sql2012*.xel”.  It will then start with the file and file_offset we specified (sql2012_0_130295018340930000.xel  and 141312) and disregard any earlier files or earlier entries within the file (based on the file_offset value).

The above query is basically the “incremental load” since the last poll, but it only pulls the raw XML event data (into an XML column) with the filename and file_offset associated with that event.  In order to make sense of the XML event, it needs to be parsed using xPath functions to pull out the individual elements (e.g. reads, writes, duration, etc).  The code below shows that logic:

 SELECT @@ServerName as ServerName, 
 file_name,
 file_offset,
 event_data_XML.value ('(/event/action[@name=''query_hash'']/value)[1]', 'nvarchar(100)') AS query_hash, 
 DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), event_data_XML.value ('(/event/@timestamp)[1]', 'datetime' )) AS timestamp, 
 event_data_XML.value ('(/event/@name)[1]', 'vARCHAR(50)' ) AS EventName, 
 event_data_XML.value ('(/event/data [@name=''duration'' ]/value)[1]', 'int')/1000 AS duration_ms, 
 event_data_XML.value ('(/event/data [@name=''object_type'' ]/text)[1]', 'varchar(100)') AS object_type, 
 DB_Name(event_data_XML.value ('(/event/action [@name=''database_id'' ]/value)[1]', 'int')) AS DatabaseName, 
 CASE event_data_XML.value ('(/event/@name)[1]', 'vARCHAR(50)' ) 
 when 'sp_statement_completed' then event_data_XML.value ('(/event/data [@name=''object_name'' ]/value)[1]', 'NVARCHAR(4000)') 
 ELSE OBJECT_NAME(event_data_XML.value ('(/event/data [@name=''object_id'' ]/value)[1]', 'BIGINT'),event_data_XML.value ('(/event/action [@name=''database_id'' ]/value)[1]', 'int')) END as ObjectName, 
event_data_XML.value ('(/event/action [@name=''client_hostname'']/value)[1]', 'varchar(100)') as HostMachine, 
event_data_XML.value ('(/event/action [@name=''client_app_name'']/value)[1]', 'varchar(100)' ) as client_app_name, 
 event_data_XML.value ('(/event/action [@name=''nt_username'']/value)[1]', 'varchar(100)') as nt_username, 
 event_data_XML.value ('(/event/data [@name=''cpu_time'']/value)[1]', 'int')/1000 AS cpu_time_ms, 
 event_data_XML.value ('(/event/data [@name=''physical_reads'']/value)[1]', 'BIGINT') AS physical_reads, 
 event_data_XML.value ('(/event/data [@name=''logical_reads'' ]/value)[1]', 'BIGINT') AS logical_reads, 
 event_data_XML.value ('(/event/data [@name=''writes'' ]/value)[1]', 'BIGINT' ) AS writes, 
 event_data_XML.value ('(/event/data [@name=''row_count'' ]/value)[1]', 'BIGINT' ) AS row_count,
 CASE event_data_XML.value ('(/event/@name)[1]', 'vARCHAR(50)' ) 
 when 'sql_batch_completed' then event_data_XML.value ('(/event/data [@name=''batch_text'' ]/value)[1]', 'NVARCHAR(4000)') 
 ELSE event_data_XML.value ('(/event/data [@name=''statement'' ]/value)[1]', 'NVARCHAR(4000)') 
 END AS statement 
FROM #Events

The code above uses xPath to parse out the various elements of the XML event into usable tabular columns that match the order and structure of the xEvents_Queries table in the central database. The results look like this:

xpathresults

So now you have access to the reads/writes/cpu/durations as well as the statement within a larger batch or stored procedure that is using those resources.  You can run these type of queries on the host in question anytime to get a real-time look at the query activity, but pulling it all into a central database (with rows marked with an @@ServerName to delineate between instances) will be a more manageable solution for environments where there is more than one instance to keep an eye on (read: any environment)

With the information from this post and previous posts, you should be able to set up an active xEvents session to capture “statement complete” events.  You can then use the type of query similar to the above example to parse your captured xEvents data into easily consumable, aggregate-able (is that a word?) data sets for analysis.

I have used this type of setup on the local server to answer questions like this:

1) What are my top 5 executed tSQL statements on the server between 8:05AM and 10:30AM yesterday?

2) Which database had the most io/cpu usage for the last 6 hours?  Morning vs. Afternoon for the last 3 days? etc.

In my next post I’ll tie all of this together by going through a powershell ps1 file that has the above queries embedded into it and actually performs the write of query event rows from remote servers to the central database.

Collecting xEvents Query Stats From Multiple SQL Instances to a Central Database Part 1

In my previous post, I discussed setting up xEvents sessions in lieu of server side traces to collect statements level activity from a SQL Server instance.  I covered the basics of setting up the session to collect the sp_statement_complete and sql_statement_complete events to a .xel event file.  I also provided some queries that allow you to read the entries from all .xel files in directory and then parse the content into a regular tabular resultset for sorting, filtering, and aggregation.  While this is all very useful information to capture on a specific instance, most (if not all) of us in the DBA world are managing multiple SQL Server instances across production and non-production environments.  In this post I’ll cover the following topics:

1) Standardizing your xEvents sessions on individual sql instances to streamline the centralization process

2) Using powershell to automate the collection of xEvent data from multiple instances and store it in a set of centralized set of tables on one instance (a central management or monitoring sql server)

3) Automating an aggregation process to generate pre-aggregated table rows (by day, month, etc) for the query data we are collecting.

Standardize your Query Performance xEvents Sessions

In order to simplify the centralization of xEvent data, I adhered to the following simple configurations:

  1. All query performance xEvent sessions write out .xel file to the same named directory on any SQL instance.  In my case, I use c:\DBAMONITORING\traces for this.
  2. I configure the xEvent Session to write to 2MB files and rollover files after 40 have accumulated.  This ensures that I never have more that 80MB of disk space in use on  the local machine.  Later in the post, I’ll cover how I collect and centralize the data from the .xel files every 15 minutes so as long as you don’t product >80MB worth of .xel files every 15 minutes, you probably won’t lose captured event data between polls.
  3. All query performance xEvent .xel files should have the same prefix so they can be read into memory together.  In my case, all .xel files are started with SQL2012<somestring>.xel.

With these settings in place on each SQL 2012 instance that I want to collect xEvent query execution data from, I can use a powershell script executed on a central database instance that connects remotely to the monitored instances and pulls the recent xEvent data to a set of tables.

Centralized xEvent Processing and Collection

The diagram below illustrates the simple way that this solution works.

figure 1

The central database has a few tables to handle the following:

1) Control Table – has the names of the SQL instances that you would like to collect xEvent data from.  This list is used by the powershell script to determine which SQL instances to collect from each time that it runs.

2) xEvent Tables – These tables store the query performance information (xEvents) from the various servers in your environment.  Then you can run queries or write reports on top of this data to review instance specific query performance metrics.

The Powershell script performs the following actions:

1) Query the Control table to determine which instances to collect from.
2) Loop through the list of instances and for each one
a) connect and read the latest xEvent data from the .xel files on the server
b) Pre-process the data on the remote machine (in a temp table) and then load it directly into the xEvent tables in the central database

In my previous post (Capture and Review SQL 2012 Query Performance History with xEvents) I provided a TSQL script to run on a local instance that would read in the .xel files as XML strings to a temp table.  Then the second half of the query parsed the XML data into the familiar trace-like fields including:

Field Name Description
query_hash the unique query hash value that allows you to group identical calls without comparing/parsing the actual statement
timestamp time of the statement execution
EventName either sp_statement_completed or sql_statement_completed
duration_ms the duration in ms of the statement
object_type shows prepared or adhoc to indicate the type of call
DatabaseName the DB within which the statement was executed
ObjectName The name of the stored procedure called (if applicable)
cpu_time_ms total CPU ms used to complete the query
physical_reads physical read count required to complete the query
logical_reads logical read count required to complete the query
writes write count required to complete the query
row_count number of rows affected by the query
statement The actual statement that was executed

With those values above, you can run all kinds of grouping and time-windowed queries to see your most costly statements by cpu/reads/write/duration.  The only thing missing to centralize the content is to maintain an InstanceName with each record as well.  In the queries I make through the powershell script, I include an @@Servername as ServerName in the select list and few other control related fields including the .xel file name that the event came from and offset within the file that the event is part of.  Let’s take a quick step back and review how the .XEL files work in terms of logging xEvents to disk.

xEvents to Disk (.XEL file target), sys.fn_xe_file_target_read_file and Offsets

SQL Server provides a system function to read the data out of .xel files directly from TSQL.  The function is sys.fn_xe_file_target_read_file (documentation here: http://technet.microsoft.com/en-us/library/cc280743.aspx).  A basic call looks like this:

SELECT top 10000 file_name, file_offset, CAST(event_data AS XML) AS event_data_XML
FROM sys.fn_xe_file_target_read_file(‘C:\dbamonitoring\traces\sql2012*.xel’, null, null, null) AS F;

The above reads all of the files that start with “sql2012” and end with “.xel” in the directory and return a result set of events including the filename the event came from, the group offset number within the file (used for filtering) and an XML column of the full event data.  The results are shown below:

functionresults_offset

As you can see above, in an xel file, xEvents are logged in groups (denoted by a shared file_offset value).  When you call the  sys.fn_xe_file_target_read_file you can call it with a filename and a file_offset as parameters:

SELECT top 10000 file_name, file_offset, CAST(event_data AS XML) AS event_data_XML
FROM sys.fn_xe_file_target_read_file(‘C:\dbamonitoring\traces\sql2012*.xel’, null, ‘C:\dbamonitoring\traces\sql2012_0_130268187987170000.xel’, 775168) AS F;

In the example above, the select will return rows beginning with the specified file and file_offset and continue with rows from any subsequent files in the directory that match the sql2012*.xel search string.  Using this filtering functionality of the system function yielded better performance (in my limited tests) than comparing on an event timestamp which requires xpath-ing out the timestamp value and then a type conversion to make it a date for comparison.

part 2 coming soon….

Capture and Review SQL 2012 Query Performance History with xEvents

In this post, I’ll walk through the xEvents setup we have on a number of SQL 2012 instances. We use it to collect query performance metrics as a server-side trace replacement (note: this post assumes you are running SQL 2012+).  If you are familiar with server side tracing and/or profiler from 2008/2012, then a lot of this will make sense to you.  Basically I run an xEvent session that captures:

  • sp_statement_complete – individual statements within stored procedures that require more than 10 logical reads to complete
  • sql_statement_complete – individual statements within a sql batch (ad hoc) that require more than 10 logical reads to complete

We could just capture stored procedure completes and batch completes but going down to the statement level provides a few significant advantages (and challenges)

Advantages 

  • Statement level captures identify the specific sections within a larger stored procedure or batch which directs optimization efforts to the real slow elements.  If you capture events at the sp or batch completed level, then you’ll still need to do the work of finding the specific parts of the SP that are the slowest or most problematic.
  • In SQL 2012, sp and batch statement completed events include a query hash value so you can aggregate similar calls that use different parameters easily.  Back when we used server side traces to capture query execution information, i had to develop a complicated and resource intensive process that stripped raw sql calls of their parameter values and formatting so that they could then be aggregated for statistical analysis.

Disadvantages

  • Statement level captures will take in much more data than the sp/batch completed events, so you need to make sure you plan accordingly (more on this later…)
  • Since there is a much higher volume of data collected, it is harder to keep a long term, highly granular history.  (more on this below)

So the first step is to set up an xEvent session on the server in question.  I use this code:

CREATE EVENT SESSION [query_performance] ON SERVER
ADD EVENT sqlserver.sql_statement_completed(
ACTION(sqlserver.query_hash,sqlserver.client_hostname,sqlserver.database_id,
sqlserver.plan_handle, sqlserver.nt_username,
sqlserver.client_pid,
sqlserver.client_app_name)
WHERE (sqlserver.database_id >= 6 AND sqlserver.query_hash <> 0 )
),ADD EVENT sqlserver.sp_statement_completed(
ACTION(sqlserver.query_hash,sqlserver.client_hostname,sqlserver.database_id,
sqlserver.plan_handle, sqlserver.nt_username,
sqlserver.client_pid,
sqlserver.client_app_name)
WHERE sqlserver.database_id >= 6
AND sqlserver.query_hash <> 0)ADD TARGET package0.event_file(SET filename=N’C:\DBAMonitoring\traces\sql2012.xel’,
max_file_size=(2),
max_rollover_files=(40))
WITH (MAX_MEMORY=4096 KB,
EVENT_RETENTION_MODE=ALLOW_MULTIPLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=120 SECONDS,
MAX_EVENT_SIZE=0 KB,
MEMORY_PARTITION_MODE=NONE,
TRACK_CAUSALITY=OFF,
STARTUP_STATE=ON)

The code above creates an event session (names the events to capture and whatever filters we may have) but does not start the session. Some additional comments about the CREATE EVENT SESSION code above:

  • I capture sql_statement_completed and SP_statement_completed to capture SQL  statement level information (within ad hoc batches and stored procedures respectively).  My preference is to capture the lowest level query activity only (so I don’t have batchcompleted and spcompleted events cluttering the data set of individual sql statements.
  • You have the option to capture xEvents into the SQL Server Ring Buffer, but I have found the .xel files easier to deal with and allow for more fine grained control of retention.  As you see above, I am capturing xEvents to .xel files in the C:\DBAMonitoring\Traces folder on the sql instance in question.  I also limit files to 2MB each and no more than 40 files total (so a max disk footprint of 80Mb before older files are rolled over (aged out and deleted))
  • Since I am capturing query execution statistics for aggregations later, it is OK if I don’t collect every single event and I don’t want to collect events in a way that might impact performance so I set the EVENT_RETENTION_MODE=ALLOW_MULTIPLE_EVENT_LOSS setting on the event session.

Once you run the code above, the session is created on the SQL instance, but not started yet.  You can either run a start command in TSQL:

— start session
ALTER EVENT SESSION query_performance
ON SERVER
STATE = START;

or use SSMS to start the session

use SSMS to manage even sessions from within the Management folder of the instance

use SSMS to manage even sessions from within the Management folder of the instance

Once the xEvents Session is running you can see the .evt files collecting in the directory specified:

.evt files act just ilke the old .trc files ni that they reach a max size and then generate another file.  Once the rollover file count is met, then older files are deleted as new files are generated.

So just remember that the the “history” you can poll from active .evt files will be limited based on the volume of events captured and the sizing controls specified for the .evt file retention and rollover. You can query active .evt files much like you could with .trc files in SQL 2008 and 2005.  That said, I have found it easier to load the data into temp (or perm) tables from the evt files and then run additional queries for aggregations.

— copy event data into temp table #Events
— reads all .xel files that start with “sql2012” and end with “.xel” and puts each event into an XML column that we can query directly.
SELECT CAST(event_data AS XML) AS event_data_XML
INTO #Events
FROM sys.fn_xe_file_target_read_file(‘D:\DBAMonitoring\traces\sql2012*.xel’, null, null, null) AS F;

After the #Events table is populated, we can run the following XML xpath query to make ANOTHER temp table with the raw content separated out into individual columns.

— extract event details into a temp table #Queries
SELECT
event_data_XML.value (‘(/event/action[@name=”query_hash” ]/value)[1]’, ‘nvarchar(4000)’ ) AS query_hash,
event_data_XML.value (‘(/event/@timestamp)[1]’, ‘datetime’ ) AS timestamp,
event_data_XML.value (‘(/event/@name)[1]’, ‘vARCHAR(50)’ ) AS EventName,
event_data_XML.value (‘(/event/data [@name=”duration” ]/value)[1]’, ‘int’ )/1000 AS duration_ms,
event_data_XML.value (‘(/event/data [@name=”object_type” ]/text)[1]’, ‘varchar(100)’ ) AS object_type,
DB_Name(event_data_XML.value (‘(/event/action [@name=”database_id” ]/value)[1]’, ‘int’ )) AS DatabaseName,
CASE event_data_XML.value (‘(/event/data [@name=”object_type” ]/text)[1]’, ‘varchar(100)’ )
when ‘PROC’ then OBJECT_NAME(event_data_XML.value (‘(/event/data [@name=”object_id” ]/value)[1]’, ‘BIGINT’))
END as ObjectName,
event_data_XML.value (‘(/event/data [@name=”cpu_time” ]/value)[1]’, ‘int’ )/1000 AS cpu_time_ms,
event_data_XML.value (‘(/event/data [@name=”physical_reads”]/value)[1]’, ‘BIGINT’ ) AS physical_reads,
event_data_XML.value (‘(/event/data [@name=”logical_reads” ]/value)[1]’, ‘BIGINT’ ) AS logical_reads,
event_data_XML.value (‘(/event/data [@name=”writes” ]/value)[1]’, ‘BIGINT’ ) AS writes,
event_data_XML.value (‘(/event/data [@name=”row_count” ]/value)[1]’, ‘BIGINT’ ) AS row_count,
event_data_XML.value (‘(/event/data [@name=”statement” ]/value)[1]’, ‘NVARCHAR(MAX)’) AS statement
INTO #Queries
FROM #Events

With the raw data now in tablular format (#Queries) you can run any number of aggregations based on the query_hash field.  The query_hash is a unique identifier for similar queries that run with different parameters.  It is a huge time saver compared to the old server side trace approach where you had to write your own “generecize the sql statement so I can group on it” functionality.

So with the #queries table in place, you can run the query below to return the specific statements (by query _hash) ordered by total logical reads:

— filter and include a sample query
WITH QueryHashTotals AS
(
SELECT query_hash,DatabaseName, ObjectName,EventName,
COUNT(*) AS num_queries,
SUM(logical_reads) AS sum_logical_reads
FROM #Queries
GROUP BY query_hash,DatabaseName, ObjectName,EventName
),
RunningTotals AS
(
SELECT query_hash,DatabaseName, ObjectName, EventName,num_queries, sum_logical_reads,
CAST(100. * sum_logical_reads
/ SUM(sum_logical_reads) OVER()
AS NUMERIC(5, 2)) AS pct,
CAST(100. * SUM(sum_logical_reads) OVER(ORDER BY sum_logical_reads DESC
ROWS UNBOUNDED PRECEDING)
/ SUM(sum_logical_reads) OVER()
AS NUMERIC(5, 2)) AS running_pct
FROM QueryHashTotals
)
SELECT RT.*, (SELECT TOP (1) statement
FROM #Queries AS Q
WHERE Q.query_hash = RT.query_hash) AS sample_statement
FROM RunningTotals AS RT
ORDER BY sum_logical_reads DESC;

With the data in tables you can pretty much do whatever your want with it in terms of aggregations and queries.  Only simple modifications to the query are required to return results by writes, duration, cpu, etc.  In addition to the aggregations you can derive (Top N Queries), you can also use the raw data for investigations (“What was the query stream leading up to my database issue”).  It all depends on how much data you can retain and at what level of granularity.

My next post will discuss how to roll out this xEvents based query monitoring solution to multiple machines and collect the data to centralized database for reporting across an entire environment of SQL Server 2012 instance.  In the meantime, if you have any questions or issues with the samples above, email me or comment and I’ll try to reply as soon as  can.

Wait Stats Report – Visualize Raw WaitStats Data

This post should close the loop on my last few Wait Stats monitoring posts.  In order to more easily sift through and make sense of raw wait stat information, I put together an SSRS report that connects to the DBAMonitoring database we’ve been populating and provides a clickable, visual display of wait stats per server.

Installing and configuring an SSRS setup is beyond the scope of this post, so I am going to assume that:

  1. You have access to a basic SQL Server 2008 SSRS instance that is running and you can get to via http://localhost/reports (or whatever your SSRS base URL is).
  2. You have configured the tables, procedures, powershell scripts and SQL Agent jobs from my previous posts on wait stats collection
  3. Download these two files
    • Wait_Stats.rdl (rename to .rdl)
    • rpt_FetchWaitStatsDiff.sql (rename to .sql) – this is the code for a stored procedure in the DBAMonitoring database that is used by the report when rendering wait stats across collection intervals.

In this post, I plan to cover the following:

  1. Deploying the report file (wait_stats.rdl) to your SSRS instance and stored procedure code to your
  2. Configuring a data source for the report
  3. Reviewing the queries that feed into the various visual components of the report

Assuming your localhost database has a DBAMonitoring database with all the tables and procedures from this and previous posts, you can simply upload the .rdl report file to your running SSRS instance from the web interface. Go to the SSRS url (e.g. http://localhost/Reports/Pages/Folder.aspx) and then click the “Upload File” button in the toolbar.  Select the rdl file from your local hard drive and click OK:upload2

Once the report is uploaded you should be able to run it from the SSRS web portal by simply clicking it.  if everything works properly, you should be able to open up the report right away and see a list of servers you have been collecting wait stat data from:

waitstatsserverlist

You can click on one of the sql instances (in my example, localhost and mraftree-bos-v are the same machine, but I can collect them separately to give the illusion of multi-server centralized management).  Once you click a sql server instance in the list, the report will refresh with the wait stat content:

fullwiatstats

Sections of the report:

1) Top Section – Wait Stats High Level Stacked Bar Chart – shows the interval specific wait stats on the 30 minute collection interval.  By default, this view shows the last 3 hours of collections.  You can easily see in this chart if there is a specific wait category that has spiked in a specific interval (indicating some issue with a poor query plan, paralellized workload, etc.)

2) Wait Stats Detail – shows the numeric representation of the wait stats for the last 30 minute interval.  This is a good place to look for the highest wait types and potentially drill in to determine a lower level reason for waits

3) Wait Group History Matrix – a numeric representation of the top level stacked bar chart.  It shows the specific wait category numbers by interval

4) Wait Type History – similar to the Wait Group Category Matrix, but shows the lower level wait type level stats.

Let me know if you have any issues deploying this report and the supporting code and I’ll be glad to help.

In my next post, I’ll be building on this central monitoring paradigm to capture and report on active session activity (running and inactive SPIDs and their associated SQL calls).  Happy Monitoring!

Wait Stats – Making Sense of Raw Data Collection

A lot of the work that went into this wait stats collection and reporting solution was inspired by the functionality of the MS SQL Server PerfDashboard reports. You can download the latest version here.  The PerfDashboard reports are great.  They provide good diagnostic information, they are accessible from within management studio and they look good.  The major limitation of these reports is that they do no offer a view into wait stats history at regular intervals.  At any point in time when you look at the perfdashboard reports, for wait stats you are just seeing the cumulative wait stats since the SQL Server instance started.  In most cases a SQL instance stays up for days/weeks/years and it becomes impossible to discern recent changes in wait stat activity due to the long time frame that the DMV’s wait stats have been accumulating.  For example, you may be having an issue in the last 6 hours with queries running inefficiently due to parallelism, but you may not be able to see a lot of parallelism waits (CX_PACKET wait type) relative to other types of waits that have been accumulating for months.

This is what the “Historical Waits” perfdashboard report shows:

Historical Waits (PerfDashboard)

It looks nice, but if the above wait stat information is based on a server running for the last 180 days, it’s not much use in terms of debugging a recent issue with SQL Server performance.  You may have had a serious CPU issue eariler in the day, but even a significant event for 45 minutes wouldn’t skew the “CPU” wait type’s overall value or percentage position in the list above.  It just wouldn’t have enough impact to “show through” against the larger time frame that all the stats are collected within cumulatively.

The collection approach I covered in my last post is the first step toward discerning recent wait statistics without restarting the instance.  The collection approach is admittedly simple (just collect the accumulated wait stats per server every 30 minutes).  But once you have a reasonable set of snapshots (a few hours’ worth), you can run calculations across collections to see the “real” wait stats by wait type on a 30 minute interval basis.  Before I get into the logic required to calculate the diffs across collections, take a look at the SSRS report I put together based on DB_WaitStats table data.

waitstatssrs

Notice that information is displayed in 30 minute increments (since the job is collecting a snapshot of wait stat data every 30 minutes from SQL Agent).  The top section shows a stacked bar chart with each 30 minute collection’s wait category data (so you can easily see if any one wait category – color in the graph – is growing or shrinking significantly).

The tabular matrix below it is showing the same data as the graph, but in numerical format with each column representing a different collection interval.  The same query (within a stored procedure) is used for both the stacked bar chart and the numerical matrix.  The high level control of flow in the procedure is depicted in the image below:

waitstatsproc

  1. Before the loop, get the latest increment_id for the server in question (so we can use it as a starting point to go back in time across collection intervals)
  2. Within the loop – compare the cumulative collected wait stats between the current interval_id and previous interval_id (to determine the “real” wait time between collections).  For each iteration, insert rows into a temp table with the “real” wait times for the server/interval combination.
  3. Return the results from the temp table.

The procedure could probably be altered to do a recursive CTE to accomplish the same thing, but looping through increment_id’s for a particular server seemed more logically consumable and understandable.  You can download the code for the procedure here: rpt_FetchWaitStatsDiff.sql  (rename to .sql from the .key file extension).  I also leverage a great table create (and populate) script from Jason Strate (http://www.jasonstrate.com/2010/01/wait-stat-categories/) to generate a table called WaitTypeCategory that I use to join wait_types together in order to provide a higher level aggregate of wait types.  That script can be downloaded here: WaitTypeCategory.sql

So now we have the collection powershell script and the sql agent job to force the collection of SQL Instance wait stats every 30 minutes to the central DBAMonitoring database.  In addition to that, we have the rpt_FetchWaitStatsDiff procedure that will calculate the intra-collection “real” wait stats and return them for the last 6 intervals.

In my next post, I’ll go through the SSRS report I showed above in detail to show you how to visualize this data and enable yourself to quickly diagnose high level wait-oriented issues with your SQL instances.