Which Databases are Actually Used? CPU and Memory Use BY Database Queries

We have a fairly large SQL Server environment where I work with over 200 total SQL instances.  They are spread about 60/40 non production vs. production and with an active development staff, we have many new databases being created and worked on in our non production environments every week.  In order to keep an eye on which databases are used (in use) we began capturing a snapshot of database level CPU and memory usage twice a day (6AM/6PM).  Over time this data can give a decent indicator of which databases have activity (user, developer or otherwise) and their overall “rank” against the other databases on a particular instance.  This type of data helps us make decisions on:

  • Clean Up – was this database created as a one off test data set in dev? can it be deleted?
  • Consolidation – is this database used minimally enough to be considered a candidate for consolidation to a shared sql instance?
  • High Level Resource Usage per instance – which DB is consuming the most resources on which instance?

In order to keep things simple, we use basic DMV queries to collect this data twice a day.. For CPU usage we look at sys.dm_exec_query_stats and for memory usage, we look at sys.dm_os_buffer_descriptors.  Let’s take a quick look at the collection queries:  Note: a colleague sent me some basic queries that I tweaked for my needs, so please forgive me if I am not attributing the right folks from the sql community who originally posted these (or similar).

–Declare a Date for the collection time

Declare @TheDate date;

Select @TheDate = getdate();

 

WITH DB_CPU_Stats

AS

(SELECT DatabaseID, DB_Name(DatabaseID) AS [DatabaseName], SUM(total_worker_time) AS [CPU_Time_Ms]

FROM sys.dm_exec_query_stats AS qs

CROSS APPLY(SELECT CONVERT(int, value) AS [DatabaseID]            

FROM sys.dm_exec_plan_attributes(qs.plan_handle)       

WHERE attribute= N’dbid’) AS F_DB

GROUP BY DatabaseID)

SELECT @TheDate ,@@ServerNameDatabaseName, [CPU_Time_Ms],

CAST([CPU_Time_Ms] * 1.0 / SUM([CPU_Time_Ms]) OVER() * 100.0 AS DECIMAL(5, 2)) AS [CPUPercent]

FROM DB_CPU_Stats

WHERE DatabaseID > 4 — system databases

AND DatabaseID <> 32767 — ResourceDB

ORDER BY 4 desc OPTION (RECOMPILE);

 

 The query above can be run on any SQL Server 2008+ machine and return the total CPU time consumed per database (since the last reboot or exec_stats flush) as well as the percentage of the total CPU consumed across the whole server.  The results look like this:

cpubydb

Remember that the above values are cumulative through the last reboot, so you need some intelligence to self join the table across intervals and calculate the deltas.

The other query determines which databases are consuming the most memory at the time of the collection:

Declare @TheDate date;

Select @TheDate = getdate();
— get memused per DB
With DB_MemUsed as(
SELECT @@ServerName as Servername, db_name(database_id) as databasename,database_id,

              COUNT(*) * 8 / 1024 AS MBUsed

FROM   sys.dm_os_buffer_descriptors

where database_id > 4

and database_id  <> 32767

GROUP BY      database_id

)

SELECT @TheDate,ServerName, DatabaseName, MBUsed,

CAST(MBUsed * 1.0 / SUM(MBUsed) OVER() * 100.0 AS DECIMAL(5, 2)) AS MBUsedPercent

FROM DB_MemUsed

WHERE Database_ID > 4 — system databases

AND Database_ID <> 32767 — ResourceDB

ORDER BY 4 desc OPTION (RECOMPILE);

 

The result set looks like this:

membydb

Collecting the data to a central database table over time provides a nice, centrally managed data set to run trending queries against.  This information is NOT cumulative so each collection is just a snapshot of the memory consumption at that particular time. This makes it easier to run aggregates against the data over a time period like this:

Select datepart(year,a.capture_time) as captureyear,

datepart(week,a.capture_time) as captureweek,

a.servername,

a.databasename,

avg(MBUsed) as AvgMB_MEMUsed,

Max(MBUsed) as MaxMB_MEMUsed,

Min(MBUsed) as MinMB_MEMUsed

from [DB_MemUsage_ByDatabase_ByDay] a

where a.servername = ‘<MyINstanceName>’

group by datepart(year,a.capture_time),

datepart(week,a.capture_time)  ,

a.servername,

a.databasename

The above query assumes you are saving the results of the memory usage query to a table called DB_MemUsage_ByDatabase_ByDay.  Once several days’ worth of collections are available, you can run the above query to get some aggregate memory usage statistics “By Week”.

Hope this is helpful, send me an email or comment if you need assistance or have any questions.

Advertisements

Wasting space with heaps

Over the last few months, our team of DBAs has been developing various data collection routines for monitoring (and designing the tables to store said monitoring data).  We are planning to remove a 3rd party SQL Server monitoring tool from our environment in the next couple of months, so most of the effort has been on making sure we don’t lose any critical monitoring/analysis functionality once we remove that product.  This is all well and good, but in our haste to “get the data collecting”, we found a few problematic issues.

1) If you are going to collect some data set (wait stats, file io stats, etc) ALWAYS ALWAYS ALWAYS implement a deleting process as well.  Set a ridiculous retention period (e.g. 10 years) if you don’t want to delete anything, but at least have the process in place to trim aging records off of your monitoring tables.  I know it’s the most obvious, but it is often overlooked or deferred with the thought that “we’ll delete when it grows too big”.

2) DON’T FORGET a Clustered Index!  In the vast majority of our routines, we execute a DMV query against a series of servers and then write the results to table on a central database server.  Nothing fancy there, but I found myself “cloning” routines (copy/paste powershell code, quickly make a table struct with a SELECT INTO, etc.) and not taking the time to define at least a clustered index.  In at least 2 cases, we had tables that were heaps (no clustered index) and over time their unused space grew to > 7 times the actual required space to house the rows.  This was caused by us deleting aging records from the heap.  A heap can’t reuse space allocated to it so it will always grow (unless you copy it to another heap [select into], runt he alter table rebuild command, or add a clustered index).

I added a few clustered indexes to the problem tables and the unused space dropped down to <1MB per table.

I’m sure the above is obvious to most readers, but I am hoping that by admitting to this sloppiness here, it will prevent me from overlooking this basic things in the future.

Unlocking SQL Logins Without a Password

So today I broke something.  While prepping to bring our disaster recovery SQL servers in line with their production counterparts, I was going through the servers and adding linked servers that were enabled on the main production servers but not enabled/existing on their DR counterparts.  It seemed simple enough, especially since most of our linked servers are using pass through authentication of the user’s windows domain credentials to connect.  HOWEVER, in a handful of cases we are using SQL Server logins for the security context of linked servers.

The problem arose when I attempted to add a linked server, using a SQL Server login/password.  I had the wrong password (it was wrong in our password management application.. but that is another story entirely).  I added the linked server without incident since SQL Server doesn’t confirm the provided credentials before creating the linked servers.  Consequently, when I added the linked server with the wrong password, and then attempted to open the linked server in Management Studio a few times, the SQL Server login on the target server was locked out.  This resulted in a chain reaction as several production applications were using that login for their normal security context.

Needless to say, internal customers started complaining about security errors coming from a core business application “all of a sudden”.

As soon as I saw the emails, I knew it had something to do with me and what I was working on.  I checked the SQL Error log on the server in question and saw the obvious set of failed login attempts followed by a stream of locked out login error messages.  A bonehead move to be sure, but when I tried to unlock the account in management studio, I was told that I had to provide the password to the alter login screen if I wanted to unlock the account.  Sigh… not a real problem, more of an annoyance… or so I thought.  When I went in to our password management application, of course, the sql login information was NOT there.

The fastest way to get the cow out of the ditch was to disable the account policy on the SQL login first, and THEN unlock it using SSMS’s login properties window.

So i had to execute this simple command:

ALTER LOGIN <mybadloginname> WITH CHECK_POLICY = OFF;

Disabling the CHECK_POLICY prevented SQL Server from trying to apply the Domain level controls on AD accounts and I was able to unlock the SQL Login after that.

We decided as a rule to leave CHECK_POLICY=OFF for a handful of SQL Server logins in our environment to avoid the scenario of a sloppy (but good intentioned) admin causing a critical SQL Server login to get locked out due to a misconfiguration.

Collecting xEvents Query Stats From Multiple SQL Instances to a Central Database Part 4

In this final post about xEvent based query monitoring data collection, I’ll cover the following:

  1. Running daily pre-aggregations of query stats by query hash and purging aging raw xEvent data
  2. Some useful sample queries for making sense of all the data you are now collecting.

Once you have a solution like this in place and large datasets are growing larger on your central database, you need to consider a data retention process.  Obviously you can’t keep a record of every execution of every sql statement in perpetuity, but you do want to maintain a balance between low granularity in the short term and high granularity over a longer period of time.  The way I do this is to have a couple extra tables that store daily aggregations of stats (by database and query hash).  Basically I run a job every day to calculate the aggregates by database and query hash (individual statements).  I added some basic data warehousing fields (day/week/month/year/etc.) and I am aggregating Execution count, CPU, Reads and Durations.  The table diagram below shows this:

agregatetables

The queries to generate rows for these tables are straightforward.  For example:

declare @StartDate Date=getdate()-1
declare @EndDate date
Select @EndDate = dateadd(day,1,@Startdate)– delete any data you already have for yesterday
— Run the group by query aganist the raw table XEvents_Queries
— and insert rows to the pre-aggregate table [XEvents_Queries_ByDBAggregate]
INSERT INTO [dbo].[XEvents_Queries_ByDBAggregate]
([dbserverid]
,[servername]
,[rowyear]
,[rowmonth]
,[rowDay]
,[rowweek]
,[rowdayofweek]
,[databasename]
,[executioncount]
,[TotalReads]
,[AvgReads]
,[MaxReads]
,[TotalCPU]
,[AvgCPU]
,[MaxCPU]
,[TotalDuration]
,[AvgDuration]
,[MaxDuration])
Select b.dbserverid, a.servername, datepart(year,a.timestamp) rowyear,datepart(month,a.timestamp) rowmonth,datepart(day,a.timestamp) rowDay,
datepart(week,a.timestamp) rowweek,datepart(weekday,a.timestamp) rowdayofweek,a.databasename, count(*) as executioncount,
sum(a.logical_reads) as TotalReads,
Avg(a.logical_reads) as AvgReads,
Max(a.logical_reads) as MaxReads,
sum(cpu_time_ms) as TotalCPU,
avg(cpu_time_ms) as AvgCPU,
max(cpu_time_ms) as MaxCPU,
sum(duration_ms) as TotalDuration,
avg(duration_ms) as AvgDuration,
max(duration_ms) as MaxDuration
from XEvents_Queries a
inner join DB_ServerMain b
on a.ServerName = b.ServerName
WHERE a.timestamp >= @Startdate
and a.timestamp < @EndDate
group by b.dbserverid, a.servername, datepart(year,a.timestamp) ,datepart(month,a.timestamp) ,datepart(day,a.timestamp) ,
datepart(week,a.timestamp) ,datepart(weekday,a.timestamp) ,a.databasename

The code above sets two variables (notice they are date and not datetime) for yesterday and today and then it generates the aggregate data (by databasename in the example above).  As designed, you can put the logic above into a sql agent job and run it at 12:01AM each day to generate the aggregates for the previous day. The whole script to populate both the database level and queryhash level aggregate tables can be found here:aggregatesonceaday.sql  This script will create the aggregate tables:preaggregatetablecreates.sql

Once you have these in place, you can easily build trend analysis and troubleshooting queries like this:

 

— return the  queries by database AND highest total reads per week

select

qs.servername
as instancename,

qs.databasename,

query_hash,

isnull(objectname,) as objectname,

isnull(objectname,) + ‘ – ‘ + sql as sql,

dateadd(week, rowweekofyear, dateadd(year, rowyear1900, 0)) 4

    datepart(dw, dateadd(week, rowweekofyear, dateadd(year, rowyear1900, 0)) 4) + 1 as WeekStart,

sum(executioncount)executioncount,

sum(TotalReads) TotalReads,

avg(AvgReads) AvgReads,

max(MaxReads) MaxReads,

sum(TotalCPU) TotalCPU,

avg(AvgCPU) AvgCPU,

max(MaxCPU) MaxCPU,

sum(TotalDuration)TotalDuration,

avg(AvgDuration) AvgDuration,

max(MaxDuration) MaxDuration

,RANK()  over (PARTITION by dateadd(week, rowweekofyear, dateadd(year, rowyear1900, 0)) 4

   datepart(dw, dateadd(week, rowweekofyear, dateadd(year, rowyear1900, 0)) 4) + 1 DatabaseName

   order by sum(TotalReads)  desc) as Rank

from XEvents_Queries_ByDBByQueryHashAggregate qs

WHERE dateadd(week,rowweekofyear,‘1/1/’+convert(varchar(10),rowyear)) > getdate()-35 — for the last 5 weeks

AND qs.servername =  ‘<servernamehere>’

group by

qs.servername,

databasename,

query_hash,

isnull(objectname,),

isnull(objectname,) + ‘ – ‘ + sql,

dateadd(week, rowweekofyear, dateadd(year, rowyear1900, 0)) 4 – datepart(dw, dateadd(week, rowweekofyear, dateadd(year, rowyear1900, 0)) 4) + 1

 

And this… to get the aggregates by database only (so you can get a basic “how active is this database in general?” type metric:

— return the  query metrics by database only

select dbserverid,

servername,

databasename,

dateadd(week, rowweek, dateadd(year, rowyear1900, 0)) 4


datepart(dw, dateadd(week, rowweek, dateadd(year, rowyear1900, 0)) 4) + 1 as StartDate,

sum(executioncount)executioncount,

sum(TotalReads) TotalReads,

avg(AvgReads) AvgReads,

max(MaxReads) MaxReads,

sum(TotalCPU) TotalCPU,

avg(AvgCPU) AvgCPU,

max(MaxCPU) MaxCPU,

sum(TotalDuration)TotalDuration,

avg(AvgDuration) AvgDuration,

max(MaxDuration) MaxDuration

from emdw..XEvents_Queries_ByDBAggregate

WHERE dateadd(week,rowweek,‘1/1/’+convert(varchar(10),rowyear)) > getdate()-35

AND servername =  ‘<servernamehere>’

group by

dbserverid,

servername,

databasename,

dateadd(week, rowweek, dateadd(year, rowyear1900, 0)) 4


datepart(dw, dateadd(week, rowweek, dateadd(year, rowyear1900, 0)) 4) + 1

order by dateadd(week, rowweek, dateadd(year, rowyear1900, 0)) 4


datepart(dw, dateadd(week, rowweek, dateadd(year, rowyear1900, 0)) 4) +desc

This should get you going and please post comments with any questions or requests for clarifications.