An AD User is permissioned through which AD groups that are logins on my SQL Server?

This is a useful query to impersonate any windows user and then list the various AD groups that the user is gaining access to your SQL Server through.  Even if the windows user isn’t an explicit login on the SQL Server, the script will return the AD groups that he or she is a member of that allowed access to SQL Server.

DECLARE @LOGINNAME VARCHAR(255)
SELECT @LOGINNAME = ‘domainname\loginname’
EXECUTE AS LOGIN = @LOGINNAME
SELECT distinct token.*,grp.*
FROM sys.login_token token
JOIN sys.server_principals grp
ON token.sid = grp.sid
WHERE token.[type] = ‘WINDOWS GROUP’
AND grp.[type] = ‘G’
order by 3
REVERT

Advertisements

My default schema is not working….

I ran into an issue the other day where we moved an application database that was using named schemas (other than DBO) to logically organize the tables within the database.  When we moved the database to a new sql server, we mapped the application’s SQL login to the new database (with the default schema set to a named schema within the database and NOT dbo).

The problem is that the application’s connection seemed to be using the dbo schema by default even though the mapped DB user had the correct default schema.  After a little digging, we learned that if a login has the sysadmin role at the server level, then all queries against user database will default to using the dbo schema regardless of what you have set for the default schema in the mapped database user account.

I found the answer here:

http://social.msdn.microsoft.com/Forums/sqlserver/en-US/a1155e64-e53f-4e20-9b1d-4f31e193b14e/default-schema-not-working?forum=sqlsecurity

but there are countless other links describing this behavior.  Hope it saves someone a little time.

Convert an Existing Normal Table to a Partitioned Table – Quick and Dirty Edition

I’m sure there a thousand posts on the best way to partition a table and manage it, but I wanted to provide a “quick and dirty” solution that we put together for something in my shop in the last week.  Here’s a little background….

We have a table called ApplicationLog that is a centralized table for logging application activity across multiple apps within our environment.  The table typically gets ~200K rows per month but in some heavier testing/release months, that number can climb into the tens of millions (depending on how verbose the logging is set to and if there are any botched releases that result in a flood of applog messages).  At the time the DBA team got involved, there was a monthly purging job that just ran a “delete from applicationlog where date<getdate()-180” to trim out entries older than 180 days.  This worked for a couple years without incident, but then more applications started logging their messages to the table and the volume of data just got too high.  The problem presented itself when the monthly purging job started getting deadlocks as it attempted to delete rows that were (presumably) on the same pages as new messages being logged. We wanted to use partitioning on this table to prevent deadlocks and blocking during the purge process.

First lets take a look at the table to be partitioned:

applicationlogtablediagram

Pretty basic table, no funky datatypes, and the ID is an identity.  It also had a single index, clustered on the identity ID column… even though no-one every queries by the ID…. classic developer stuff.  The basic steps we followed to set up the table for partitioning include:

  1. Define a partition Function
  2. Define a Partition scheme
  3. Define a new table with the same structure (ApplicationLog_Part)
  4. Define a new clustered index on the Date field (and using the partition scheme)
  5. Populate the new partitioned table with the data from the existing table
  6. Rename the table (ApplicationLog becomes ApplicationLog_Backup and ApplicationLog_Part becomes ApplicationLog)

Lets take a look at these one by one.  The partition function is the piece of code that defines the boundaries between partitions.  In our case this is simple a series of dates on the first of each month.  The code below illustrates the Partition Function and Scheme we used:

partfunctionandscheme

The first statement makes the partition function with the historical and future partition boundaries predetermined.  The above is a condensed version for readability, but the real version has partition boundaries going out to 12/1/2025.  We added all the boundaries up front because we wanted to keep things simple and not have to dynamically regenerate the partition function with different boundary definitions each month.

Also notice that at the time of definition, a partition function has nothing to do with any particular table or data set.  It is merely a series of boundary definitions with a single parameter that can be passed (of the same datatype as the partition boundary values) [sort of like a big CASE statement]..

The partition scheme is just a construct that can employ the partition function as rows are inserted into an underlying table.

The next step was to define a new table with the identical scheme as ApplicationLog and then create a clustered index (that used the partition scheme/function) on it.  One of our requirements was that we couldn’t change the structure of the table at all (so that the existing logging code would continue to work without any changes).  So we made a new version of the table called ApplicationLog_Part:

tableandindex

After the table is created, we define a new clustered index on the Date field (not the identity ID) and then use the ON clause in the create index statement to specify how to store the data as it is inserted.  In the case above, we reference the partition scheme ps_dbo_ApplicationLog(date) with the column that it needs to determine which partition to route the data to.  The next step is to populate the temporary table with the last 120 days of data from the  current table:

partloadtable1

In our case we had to preserve the identity column so you see the identity_insert set operation at the top.  Then you basically perform an insert from the original table to the temporary (_part) table.  Once the insert finishes, you can evaluate the distribution of data with the second query.  The final step is to rename the tables and reset the identity value.

partloadtable2

As shown above, you can simply rename the tables and then reset the identity configuration on the _part table (now named ApplicationLog).  Now you have a working partitioned table called ApplicationLog that accepts inserts from all the existing code without issues.

The final piece I’ll cover is handling the routine deletion of aging data.  Since the table is now partitioned, we don’t need to run delete statements that can contend with applications writing to the table.  Instead we can take advantage of the SWITCH operation.  The SWITCH PARTITION option allows you to instantaneously move a whole partition of data out of the original partitioned table and into it’s own separate table.

partloadtable3

The great thing about it is that the operation is almost instantaneous regardless of the number of rows in the partition.  There are countless overviews of partitioning out there and a lot of it handles the complexities of dynamically managing the partition function boundaries by recreating the partition function regularly.  My “quick and dirty” approach doesn’t handle these complexities.  Instead I simply define the partition function with boundaries predefined and going out 25 years (long past when I’d expect this applicationlog table to actually be in service.  Another element that we did not have to address was having partitions on different physical files/ drives.  That is something that can be useful when you have tiered storage and you want  keep your recent partitions on your fastest underlying disks and your aging partitions on your slower cheaper disks.

PS> sorry for all the images of code, but it looks nicer with the SSMS formatting.  If you need any of the code, I can send it along.

Wasting space with heaps

Over the last few months, our team of DBAs has been developing various data collection routines for monitoring (and designing the tables to store said monitoring data).  We are planning to remove a 3rd party SQL Server monitoring tool from our environment in the next couple of months, so most of the effort has been on making sure we don’t lose any critical monitoring/analysis functionality once we remove that product.  This is all well and good, but in our haste to “get the data collecting”, we found a few problematic issues.

1) If you are going to collect some data set (wait stats, file io stats, etc) ALWAYS ALWAYS ALWAYS implement a deleting process as well.  Set a ridiculous retention period (e.g. 10 years) if you don’t want to delete anything, but at least have the process in place to trim aging records off of your monitoring tables.  I know it’s the most obvious, but it is often overlooked or deferred with the thought that “we’ll delete when it grows too big”.

2) DON’T FORGET a Clustered Index!  In the vast majority of our routines, we execute a DMV query against a series of servers and then write the results to table on a central database server.  Nothing fancy there, but I found myself “cloning” routines (copy/paste powershell code, quickly make a table struct with a SELECT INTO, etc.) and not taking the time to define at least a clustered index.  In at least 2 cases, we had tables that were heaps (no clustered index) and over time their unused space grew to > 7 times the actual required space to house the rows.  This was caused by us deleting aging records from the heap.  A heap can’t reuse space allocated to it so it will always grow (unless you copy it to another heap [select into], runt he alter table rebuild command, or add a clustered index).

I added a few clustered indexes to the problem tables and the unused space dropped down to <1MB per table.

I’m sure the above is obvious to most readers, but I am hoping that by admitting to this sloppiness here, it will prevent me from overlooking this basic things in the future.

Unlocking SQL Logins Without a Password

So today I broke something.  While prepping to bring our disaster recovery SQL servers in line with their production counterparts, I was going through the servers and adding linked servers that were enabled on the main production servers but not enabled/existing on their DR counterparts.  It seemed simple enough, especially since most of our linked servers are using pass through authentication of the user’s windows domain credentials to connect.  HOWEVER, in a handful of cases we are using SQL Server logins for the security context of linked servers.

The problem arose when I attempted to add a linked server, using a SQL Server login/password.  I had the wrong password (it was wrong in our password management application.. but that is another story entirely).  I added the linked server without incident since SQL Server doesn’t confirm the provided credentials before creating the linked servers.  Consequently, when I added the linked server with the wrong password, and then attempted to open the linked server in Management Studio a few times, the SQL Server login on the target server was locked out.  This resulted in a chain reaction as several production applications were using that login for their normal security context.

Needless to say, internal customers started complaining about security errors coming from a core business application “all of a sudden”.

As soon as I saw the emails, I knew it had something to do with me and what I was working on.  I checked the SQL Error log on the server in question and saw the obvious set of failed login attempts followed by a stream of locked out login error messages.  A bonehead move to be sure, but when I tried to unlock the account in management studio, I was told that I had to provide the password to the alter login screen if I wanted to unlock the account.  Sigh… not a real problem, more of an annoyance… or so I thought.  When I went in to our password management application, of course, the sql login information was NOT there.

The fastest way to get the cow out of the ditch was to disable the account policy on the SQL login first, and THEN unlock it using SSMS’s login properties window.

So i had to execute this simple command:

ALTER LOGIN <mybadloginname> WITH CHECK_POLICY = OFF;

Disabling the CHECK_POLICY prevented SQL Server from trying to apply the Domain level controls on AD accounts and I was able to unlock the SQL Login after that.

We decided as a rule to leave CHECK_POLICY=OFF for a handful of SQL Server logins in our environment to avoid the scenario of a sloppy (but good intentioned) admin causing a critical SQL Server login to get locked out due to a misconfiguration.

Collecting xEvents Query Stats From Multiple SQL Instances to a Central Database Part 4

In this final post about xEvent based query monitoring data collection, I’ll cover the following:

  1. Running daily pre-aggregations of query stats by query hash and purging aging raw xEvent data
  2. Some useful sample queries for making sense of all the data you are now collecting.

Once you have a solution like this in place and large datasets are growing larger on your central database, you need to consider a data retention process.  Obviously you can’t keep a record of every execution of every sql statement in perpetuity, but you do want to maintain a balance between low granularity in the short term and high granularity over a longer period of time.  The way I do this is to have a couple extra tables that store daily aggregations of stats (by database and query hash).  Basically I run a job every day to calculate the aggregates by database and query hash (individual statements).  I added some basic data warehousing fields (day/week/month/year/etc.) and I am aggregating Execution count, CPU, Reads and Durations.  The table diagram below shows this:

agregatetables

The queries to generate rows for these tables are straightforward.  For example:

declare @StartDate Date=getdate()-1
declare @EndDate date
Select @EndDate = dateadd(day,1,@Startdate)– delete any data you already have for yesterday
— Run the group by query aganist the raw table XEvents_Queries
— and insert rows to the pre-aggregate table [XEvents_Queries_ByDBAggregate]
INSERT INTO [dbo].[XEvents_Queries_ByDBAggregate]
([dbserverid]
,[servername]
,[rowyear]
,[rowmonth]
,[rowDay]
,[rowweek]
,[rowdayofweek]
,[databasename]
,[executioncount]
,[TotalReads]
,[AvgReads]
,[MaxReads]
,[TotalCPU]
,[AvgCPU]
,[MaxCPU]
,[TotalDuration]
,[AvgDuration]
,[MaxDuration])
Select b.dbserverid, a.servername, datepart(year,a.timestamp) rowyear,datepart(month,a.timestamp) rowmonth,datepart(day,a.timestamp) rowDay,
datepart(week,a.timestamp) rowweek,datepart(weekday,a.timestamp) rowdayofweek,a.databasename, count(*) as executioncount,
sum(a.logical_reads) as TotalReads,
Avg(a.logical_reads) as AvgReads,
Max(a.logical_reads) as MaxReads,
sum(cpu_time_ms) as TotalCPU,
avg(cpu_time_ms) as AvgCPU,
max(cpu_time_ms) as MaxCPU,
sum(duration_ms) as TotalDuration,
avg(duration_ms) as AvgDuration,
max(duration_ms) as MaxDuration
from XEvents_Queries a
inner join DB_ServerMain b
on a.ServerName = b.ServerName
WHERE a.timestamp >= @Startdate
and a.timestamp < @EndDate
group by b.dbserverid, a.servername, datepart(year,a.timestamp) ,datepart(month,a.timestamp) ,datepart(day,a.timestamp) ,
datepart(week,a.timestamp) ,datepart(weekday,a.timestamp) ,a.databasename

The code above sets two variables (notice they are date and not datetime) for yesterday and today and then it generates the aggregate data (by databasename in the example above).  As designed, you can put the logic above into a sql agent job and run it at 12:01AM each day to generate the aggregates for the previous day. The whole script to populate both the database level and queryhash level aggregate tables can be found here:aggregatesonceaday.sql  This script will create the aggregate tables:preaggregatetablecreates.sql

Once you have these in place, you can easily build trend analysis and troubleshooting queries like this:

 

— return the  queries by database AND highest total reads per week

select

qs.servername
as instancename,

qs.databasename,

query_hash,

isnull(objectname,) as objectname,

isnull(objectname,) + ‘ – ‘ + sql as sql,

dateadd(week, rowweekofyear, dateadd(year, rowyear1900, 0)) 4

    datepart(dw, dateadd(week, rowweekofyear, dateadd(year, rowyear1900, 0)) 4) + 1 as WeekStart,

sum(executioncount)executioncount,

sum(TotalReads) TotalReads,

avg(AvgReads) AvgReads,

max(MaxReads) MaxReads,

sum(TotalCPU) TotalCPU,

avg(AvgCPU) AvgCPU,

max(MaxCPU) MaxCPU,

sum(TotalDuration)TotalDuration,

avg(AvgDuration) AvgDuration,

max(MaxDuration) MaxDuration

,RANK()  over (PARTITION by dateadd(week, rowweekofyear, dateadd(year, rowyear1900, 0)) 4

   datepart(dw, dateadd(week, rowweekofyear, dateadd(year, rowyear1900, 0)) 4) + 1 DatabaseName

   order by sum(TotalReads)  desc) as Rank

from XEvents_Queries_ByDBByQueryHashAggregate qs

WHERE dateadd(week,rowweekofyear,‘1/1/’+convert(varchar(10),rowyear)) > getdate()-35 — for the last 5 weeks

AND qs.servername =  ‘<servernamehere>’

group by

qs.servername,

databasename,

query_hash,

isnull(objectname,),

isnull(objectname,) + ‘ – ‘ + sql,

dateadd(week, rowweekofyear, dateadd(year, rowyear1900, 0)) 4 – datepart(dw, dateadd(week, rowweekofyear, dateadd(year, rowyear1900, 0)) 4) + 1

 

And this… to get the aggregates by database only (so you can get a basic “how active is this database in general?” type metric:

— return the  query metrics by database only

select dbserverid,

servername,

databasename,

dateadd(week, rowweek, dateadd(year, rowyear1900, 0)) 4


datepart(dw, dateadd(week, rowweek, dateadd(year, rowyear1900, 0)) 4) + 1 as StartDate,

sum(executioncount)executioncount,

sum(TotalReads) TotalReads,

avg(AvgReads) AvgReads,

max(MaxReads) MaxReads,

sum(TotalCPU) TotalCPU,

avg(AvgCPU) AvgCPU,

max(MaxCPU) MaxCPU,

sum(TotalDuration)TotalDuration,

avg(AvgDuration) AvgDuration,

max(MaxDuration) MaxDuration

from emdw..XEvents_Queries_ByDBAggregate

WHERE dateadd(week,rowweek,‘1/1/’+convert(varchar(10),rowyear)) > getdate()-35

AND servername =  ‘<servernamehere>’

group by

dbserverid,

servername,

databasename,

dateadd(week, rowweek, dateadd(year, rowyear1900, 0)) 4


datepart(dw, dateadd(week, rowweek, dateadd(year, rowyear1900, 0)) 4) + 1

order by dateadd(week, rowweek, dateadd(year, rowyear1900, 0)) 4


datepart(dw, dateadd(week, rowweek, dateadd(year, rowyear1900, 0)) 4) +desc

This should get you going and please post comments with any questions or requests for clarifications.

Nested Views And Bad Plans – Not Pushing Predicates

I came across an issue in one of our production SQL Servers yesterday that had to do with nested views and sudden, terrible performance of once highly responsive queries.  First a little background….

Views on Top of Views

In one of our application databases, we have a complex view that is used to denormalize financial transaction information to make it easier for users to access all the various dates, amounts, and other identifying flags  in a single “selectable” object.  This view works well for our purposes and has been reliably quick to return results (we maintain the underlying indexes and statistics and have optimized the tSQL in the view to a reasonable level).

In an effort to extend this approach of giving users access to a simplified version of the underlying data through a view,  a second view was developed to provide users with aggregate data.  This second view  performs a select on the first view with several sum and count functions and a long group by list.  So we have a “nested view” scenario with fairly complicated query logic in each view.

View 1 – Denormalized Data
create view vwCustomerHoldings as
Select custid, acctid, fundID, fundName, code1, code2, price1, price2, price3, tradedate, effectivedate
From <~20 tables with various join conditions>
View 2 – Aggregates of the denormalized data
create view vwCustomerHoldingsAggregate as
Select custid, acctid, fundID, fundName, code1, code2, tradedate, effectivedate,sum(price1), sum(price2), sum(price3)
From vwCustomerHoldings
GROUP BY custid, acctid, fundID, fundName, code1, code2, tradedate, effectivedate

For months and months we had no issues with various users pulling data with queries like this:

Select *
from vwCustomerHoldings
Where effectivedate = ‘6/21/2013’
and code1 in (‘val1′,’val2′,’val3′,’val4′,’val5′,’val6′,’val7′,’val8′,’val9’)

The typical response time was 3-5 seconds which is well within acceptable bounds for this application.  Yesterday, execution duration jumped up to ~2 minutes for the same call.

We checked the statistics on the underlying tables first and they were fresh within the last 6 hours.  (I have a  stats age script I have been using for a long time from Glenn Berry that I got here.)  There were no index or underlying table structure changes and there hadn’t been a significant change to the underlying data (other than the usual overnight load of yesterday’s transactions).  So we started executing the query in management studio and looking at the IO stats and the query plans.  As you might imagine the query plans associated with calls like this are very large and difficult to dissect online, so I’m going to focus on the results of SET STATISTICS IO ON which were just as revealing of the issue.

First we ran the query against a dev system with a smaller data set, and got a good result (4 second response time).  The Statistics IO information is below:

It has a high read profile (~400K reads on average for the good plan), but that is OK in our environment. When we run the same query on the production data set, the statistics IO looks a LOT different and has a few key elements that reveal what is happening;

stats1

The first thing that stands out is the HUGE IO on the Worktable object (>100 Million Reads).  Obviously, SQL Server has chosen a less than optimal plan to execute the TSQL on our production instance.  My initial reaction is “How can there be so many reads with that Worktable? Its almost like it’s reading the entire underlying data set with no where clause at all….!??!”…

It turns out that is exactly what is happening.  This is an example of a failure of the query optimizer to pick an execution plan that applies the WHERE clause first (or at least earlier in the execution).  The result is that the entire data set from vwCustomerHoldings is being aggregated by vwCustomerHoldingsAggregate before the where effectivedate=’6/21/2013′ is applied to the underlying data set.  So that massive Worktable entry in the statistics IO is evidence that all of the data is being run without any where clauses being applied until the end. This is a failure to execute something call “Predicate Pushing” in the execution plan.  Ideally the predicates (where clause conditions) are applied as early in the execution as possible to lower the row count that needs to be worked on and joined to by the rest of the query.  SQL Server usually does this, but in some cases, like this one, it stopped doing it.

The other red flag in the output is that the bad plan has a:

Warning: Null value is eliminated by an aggregate or other SET operation.

The good plan doesn’t have this in the output.  This warning is issued whenever you perform aggregate functions on fields that have a null.  SQL Server disregards them so you don’t have a sum() getting voided by doing a “+ NULL”.  The fact that we see this on the bad plan and not the good is a further indication that the entire vwCustomerHoldingsAggregate view is being run (with all of it’s aggregates and group bys) BEFORE the where clause is applied.  In the good plan, the where clauses (effective date and code filters) are applied first (via predicate pushing) so that when the aggregate logic from the vwCustomerHoldingsAggregate view is finally executed, there aren’t any NULL values to eliminate so the message is not output.

There are a few approaches that we investigated to help eliminate this problem.

  • Use an Indexed View to materialize the columns and simplify the plan – This won’t work because you can’t have an indexed view that references another view (see here)
  • Convert the top level view (vwCustomerHoldingsAggregate) to a stored procedure – This would allow us a lot more flexibility in using query hints or even a plan guide to force a known good plan on calls to the procedure.
  • Don’t use the top level view, but instead re-write all the places of the application using the vwCustomerHoldingsAggregate view and replace them with inline SQL to perform the aggregates against the underlying view.

The last 2 options are not great for us because the application is using the vwCustomerHoldingsAggregate view in over 50 places and would be difficult to make such a sweeping change without a lot of development and testing effort.  While   the view is referenced in  many places by the application, it is only in a handful of places where we are seeing this bad query plan issue.

We tested a few different query hints and found that OPTION (Force Order) always results in the optimal execution plan for single day calls like the one in question (<5 seconds response time).  So in a couple of places in the application, we are having our developers tweak the sql to look like this:

Select *
from vwCustomerHoldings
Where effectivedate = ‘6/21/2013’
and code1 in (‘val1′,’val2′,’val3′,’val4′,’val5′,’val6′,’val7′,’val8′,’val9’)
OPTION (FORCE ORDER)

This query hint works in our case, because the underlying tables that our where clauses apply to are among the first few tables in the FROM clause of the vwCustomerHoldings lower level view.  Forcing the order in which the tables are joined and filtered in the execution plan has resulted in a consistent, good plan for our problem query so we have a stop gap in place to keep things performing within acceptable ranges.  However, this is not a good long term solution for us.  We will need to address areas where we have nested views and put in the effort to convert them to parameterized queries or stored procedures so that we can better control query plan generation in the future.

Here are a few links about this scenario as well:

http://stackoverflow.com/questions/7183198/does-sql-server-propagate-where-conditions-in-complex-views
http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will_17.html
http://stackoverflow.com/questions/6653979/why-does-putting-a-where-clause-outside-view-have-terrible-performance