Welcome to the IDERA SQL diagnostic manager demonstration. Today, we’re going to be looking at the flagship product. That idea offers for performance and availability monitoring around your SQL server environment. I’m going to start off with just a couple of slides touch on some of the key features and benefits as well as discuss the architecture. We’ll go into the product and I’ll look at everything firsthand. Without further ado, let’s get started to kick things off.
I like to introduce diagnostic managers, a solution that’s much more than just a symptom based monitoring. What I mean by that is of course you have ways to understand that the sequel instances experiencing performance problems, whether you run some of your own queries, or maybe you look at different perf Mon metrics or could be as basic as getting that phone call that tells from the end-user perspective that something’s happening, but understanding that you have a problem, doesn’t give you enough information about, what’s causing that problem,
What the root cause is. It related to certain application users specific query may be parameters that are being passed to a command or to store procedure. It could be related to certain times of the day for that matter. Really that’s what diagnostic managers intended to help you with. It will certainly alert you to problems that in the symptoms, but it will also give you the tools necessary to really research, drill down deep dive and understand what’s going on really under the hood.
To kind of start off with, here are some of the bullet points of different areas of the functionality that I like to emphasize as we go into looking at the product, of course, it’s a comprehensive monitoring, alerting and reporting solution. It has now actually evolved to the point where it also provides recommendations for how to fix problems. We also have the functionality called history browser, and I’m looking at the second bullet point for historical analysis, but that’s kind of ambiguous in the sense that’s going to cover trending baselining and of course reporting, but history browsers a time machine here, that’s going to let you roll back in time and replay everything that led up to a problem that, or, the quote unquote fire that was started on a SQL server.
Of course this is a proactive monitoring tool. You’ll be alerted in advance, but let’s face it. Sometimes you can’t jump and respond to those alerts whenever you get them. If you’re dealing with an issue after the fact history browsers, the answer to be able to go back and look at all the different moving parts that make up SQL server performance diagnostic manager also includes multiple baseline capability where you can set different time windows unique from the default baseline, where you get the best accuracy around those little time windows. For example, you might want to baseline reporting services activity that’s happening 9:00 AM until noon, and maybe utilization jumps up higher during that time window. You want to get the best accuracy around that normal activity. Going hand in hand with that, we also include dynamic thresholding where the thresholds can move and adjust with the baseline for whatever those time windows are that you’ve created so that you only get alerts.
If the current performance jumps successively outside, what’s considered normal for that time of the day. Diagnostic measure also includes a variety of alert action responses to notify you of problems, to send us an MP traps to other systems. It also can automate different responses like executional SQL scripts or jobs or execution of third-party EFCs or PowerShell commands. For that matter. One of the alert action responses is also the prescriptive analytics, and that leads us to the next bullet point, which is what we’ll analyze an instance and tell you what is potentially wrong with that instance, and even provide learning links to teach you about whatever the problem is. Many times even fix it scripts and going hand in hand with that, we have the predictive analytics, which is basically some heuristic analysis that will identify the likelihood of different alerts happening later today, based on past trends in, or, repetitive behavior of that event over the past diagnostic manager includes hyper V and VMware monitoring support built right in without any modularity or additional costs, like other monitoring tools that either have no virtualization visibility or they come with modularity or additional expense, custom counters and dashboards.
We’re going to look at that here in a moment, but this really brings kind of what I call the fun factor to the product, because it allows you to take a monitoring tool that’s been developed over, 15 plus years, and think outside the box and add some of your own metrics, whether you want to add custom SQL scripts or WMI counters, Perth, Mon counters, additional hyper V or VMware counters above and beyond what we’ve already provided out of the box. When you add custom counters for all intents and purposes it’s as if it were developed in the product and we treat it the same as all the other metrics we’ll trend at baseline, it, you can alert against it and take actions against the data that’s coming back report against it, and even add it to one of the custom dashboards if you want. The custom dashboards are flexible for, presenting different types of data, to different audiences.
The last two bullet points we do include robust reporting in this solution. You can imagine over 15 plus years, that year after year release, after release, we constantly add new functionality including reports. You’ll find different ways to Polish up the data. Last bullet point, we do support SQL 2000 through 2017, which I think is notable dimension. When other monitoring tools are slow in many cases to adapt to new upcoming technologies that are coming out around SQL server. With that, let’s go to the architecture slide.
And so here’s the architecture. This product takes about five to 10 minutes to get installed. One of the reasons it is so easy to install is that nothing really gets deployed out to the actual monitored instance. That means there’s no agents, there’s no extra store procedures. We don’t create any extra databases.
The collection service is the worker be responsible for gathering that data? It’s going to run its little queries to the different dynamic management views. CIS perf info has some good performance data exist, exact sessions and CIS processes, system tables that will combine the data for regular snapshot view of session activity, physical stats, DMV for fragmentation statistics, extended events for query related data. The collection service will gather this data, hand it over to the management services, what I call the doorway to the repository. It’s how data goes into the database and is how data comes out to refresh the consoles.
The management services also responsible for performing any of the alerts and automation and mitigation activities. The management services also refreshing the data on the consoles, which you see on the left-hand side, the.net client is the primary way that we work with this product. However, a lot of the different types of data that we have available through the dotnet client are also now available through the web interface.
We also have an optional mobile interface. It’s optional to install. You don’t have to install it if you don’t want to, but it does provide flexibility when you’re on the road to be able to connect in, look at what’s going on with session activity or SQL agent job status or database status right there from your mobile device.
The product is licensed based on how many instances you decide to monitor and not based on the number of users of this product. If you want to give additional access to other team members, you can give them that access and even secure them to the level they need to be secured. If they only need read only access and only for, three or four instances, and that’s the limit to their roles or responsibilities where you can give them that type of secure access. If you want to.
One thing I think is notable to mention here is that we also include around the security, the audit capability works tracking what people are doing through the product itself, which may not seem like a big deal when we’re talking about performance monitoring. We’re hearing more about how auditors are turning their attention to the part that monitoring tools play in accountability and protection of the data in the environment. For example, if someone walks into a monitoring tool, whether it be us or whether it be solar winds, whatever, and they change a threshold on a metric and they change it in a way that you don’t get the alert that you should have whenever a problem popped up. Because of that, maybe there was downtime or a server had to be rebooted, or the instance was not responding very well to request from a customer application, whatever the case may be, you’re going to want to know who made that change and we track what users are doing through the product itself, so that you’ll be able to report on that and show that you do have accountability if auditors ask for it.
I find most of the time, actually all of the time, as far as I know, there’s not on-board auditing like what we have in monitoring tools. They just don’t think to add it in. So anyway, that’s the architecture.
I’m going to jump over now to my environment and bring up my console. This is the console, and we’re looking at the active alert screen, which is listing out a rundown of all of the alert activities that are happening across the entire enterprise. And it’s a summary view. I can drill into any one of these, if I want to, we’ll get into that here in just a moment, a couple of other things, notable dimension at the top, you have your getting started tab, which has a link here that will take you out to our trial center, where we have different videos that you can watch a little tips and tricks, different resources, documentation, and things to help you with getting started using the product. There’s the web console tab. Honestly, I’ll never go here because I bookmark the URL in my browser, but here’s the URL you need to get to it.
The newsfeed, this is a collaboration component. Again, a differentiator of what we offer versus other tools that are out there. Usually monitoring tools when they’re developed the developers, don’t think about, the fact that the people working with this tool may be at different locations. Like you said, you have different groups at different locations, and maybe they need a means of being able to collaborate and work together and share information through the monitoring tool itself, identifying problems and working together to solve those problems. We have this optional newsfeed available. It’s like for lack of a better description, like an internal Facebook for your servers and your DBS, where if certain most important servers, you can follow those servers.
If any problems pop up, they get posted to the wall and other team members that are most important to you to follow. You can follow them. If they make any comments, you get notified and you can come in here and see what people are doing, basically do work together to solve problems. Now on the left-hand side, you’ll probably notice that the console looks a lot like outlook and the way it looks and feels the general organomics. We very deliberately developed it this way, because we want this to be something that’s easy to use, intuitive, something that’s familiar based on other products. You’ve used, not something that you look at the screen and you’re wonder what in the world am I looking at? How do I get to a certain type of data? Or what does this feature mean to me? Where’s the giant manual I have to pull out and lead to understand this product.
Basically, the way it works is you look at the left-hand side, you follow the color status. If something’s critical, it’s red. If it’s warning it’s yellow and you can begin your diagnostic drill down process. The first windows, my views that comes right out of the box pre-populate or populating the containers based on the most critical event that exists on the instance. So, first thing in the morning, you want to see your most critical instances, select the critical container. Now on the right hand side, there are your critical instances, same with warning or instances that are okay, or in maintenance mode, maintenance mode is a feature of this product. If you’re planning to do maintenance and reboot a server and things like that, and you don’t want alerts flying everywhere, you can put a server in maintenance mode for scheduled period of time. During that window, it’s not going to alert everyone.
The second windows called tags and tags is just another name for groups, but we call it tags because tags implies that you’re labeling something, maybe even putting multiple tags or multiple labels on that, something that something being a SQL server instance. In larger environments, you may want to create different tag groups based on department, based on hosted applications, based on location. You mentioned you have these different groups that need, that may need to have their own access. Maybe you want to have everything fed up to this one, centralized deployment that create tag groups for each location and put their only their instances in the tag group. I say that because you can also define security around these tag groups to restrict them, to see what they need to see and do what they need to do. Only for that tag group and nothing else. So that’s what you want to do.
I’m going to pick on my payroll application, it’s running slowly today. I’m getting a phone call from an end user, of course, diagnostic managers sending me all kinds of alerts. When I select the payroll application on the right-hand side, I see two instances that are responsible for delivering data to that application customer SQL. That is my salary database hosted instance. I, dear prod VM is hosting a personnel record database, but they both come together to share data with that payroll application. It’s easy to see right here where I need to focus my attention. I dare prod VM on mouse over the little red X, and it shows me some of the top issues that are popping up. I have a blocking condition that’s been around now for over 3000 seconds to be candid with you. I created that blocking in the background for demo purposes, but in any case, that’s what we’re seeing popping up here is one of our problems.
I’m going to go ahead and click on idea prod VM instance, and now it drills down to the dashboard. The dashboard is a good starting point to understand where the bottlenecks exists. It’s showing me the common areas that most DBAs like to look around. CPU utilization, memory, network, activity sessions, blocking dead locking network packet, send packets, received database level activity, and notice that the charts are changing colors. If there are any alert conditions that reside within the metrics within that individual chart panel, before we get into those alerts, though, this dashboard is fully customizable. You might even notice some of the counters are labeled custom counters because I added them. You can actually have fun with this product and do some creative things with it. I just basically plugged in a query here for this database count where I’m doing a select count star from SIS databases.
I might set that up in a way that I alert if it increments by one each polling interval, which would tell me that someone just added a database or I might go the other way. If it, the count goes down, then I have it alert me, because it means someone deleted a database SQL configuration changes. That’s when I threw in that basically take snapshots of the configuration, compares at each pulling to the previous snapshot and then alerts me of a configuration change took place. Again, you can have fun with this and you can customize the dashboard by going up to the top, select the customized button. It brings up these panels on the right hand side, the chart panels are kind of generically labeled in a sense like file activity is kind of ambiguous to different things. If I click on any of these, notice, it brings up some descriptive information, more descriptive information about what it’s going to display as well as a quick graphical view of what the chart panel’s going to look like if I decide to use it.
One of the chart panels is the custom counters. If I wanted to, I could have a dashboard that is 100% focused on my own custom counters, if I want to. If you have a bigger amount of space on your monitor, appear at the top, you can change the layout and include columns, more rows to get, or to take advantage of that real estate. If you want to and longterm at the top, if you need to save a dashboard, just hit the little save button. If you ever need to bring up the dashboard that you’ve created at the top, there’s the selected dashboard option. Basically just swipe through the dashboards to find the one that you want to use and then select it to bring it to the forefront and then close the designer to see the live data on the screen. Like I said, these charts are going to change colors.
We see that the blocking condition is still persistent and it’s going to remain persistent until we go over there and look at it here in just a moment, just of a administrative stuff to cover real briefly.
If you want to make changes to thresholds, you can right click on the instance and go to configure alerts. If you were in the active alerts section, you can right click on the alert and select configure alerts as well. By the way, this is where your predictive analytics exists. It’s telling you that later on today, based on past performance trends and alert trends, these are some of the percentages of likelihood that different alerts, these different alerts are going to happen later today. Again, that can help you to be prepared, but anyway, I can change the thresholds by right, clicking on an alert and select configure alerts here or down in the alerts tab.
Here’s another alert list and right click and select configure alerts here. The reason I’m showing you these different ways to do it is so you understand that we’ve tried to strategically position menu options, where we think different people are going to be looking for them to make it again, very simple to use this product. I personally prefer the right click on the instance method and then select configure alerts. It brings up the list of all the metrics that we’re monitoring out of the box. They’re all categorized. You can just kind of go down the list back day since last backup custom counters database category of counters, including availability groups, database space consumption, log space, consumption, mirror status monitoring, attempt DB.
This is commonly overlooked by monitoring tools, but we obviously find that is extremely important to monitor because 10 DB is where application developers will temporarily store data, manipulate data, process it.
It is quite possible that you’re going to see with all of that activity, multiplied times the number of applications that are have their database hosted on that instance, that there could be contention. It’s important to monitor the size of temp DB and the version store versus the generation and versions for generation versus the cleanup ratio. If a session is consuming 20 gigabyte of temp, DB space, you probably want to know about that and we can alert you to that as well. That’s built right into the product and we have a nice graphical view that you can go in and look at real time. What’s happening on 10 DB with different sessions, including what commands are executing. There’s a lot of here though. CPU, memory, disk activity, cluster monitoring, long running jobs, job completion, status replication, monitoring, blocking dead locking sessions, CPU time hosts, and VM level CPU, memory, power, state VMs, moving from one host to another via VM resource configuration changes happening that could affect what’s happening in the operating system, which can affect what happens in SQL server.
Whenever you select, like I’m just kind of clicking around here. If I click on any of these metrics on the right-hand side, you’ll see what the thresholds are set to, and you can just move the slider bar to whatever value you want to put in out of the box. We have defined what we considered to best practice settings, but what works for one environment could be completely different for the next environment. Even what works for one instance can be different with the next instance. If you need to make any fine tune adjustments to these, it’s very simple to do that. You can save your changes as templates. If you ever need to apply a template, when you add a new instance, just click the apply template and select the template you want to use. That will apply all the thresholds that you’ve defined in the past and below the thresholds is this little green hash bar.
That’s the baseline performance for this time window. You’re going to see that moving around. Anyway, this is showing you what is the normal range of performance, and that can help you to make the appropriate adjustments to thresholds as well. Down at the bottom, you have this configure baseline button, and this is where you go to set your default baseline more. You can create those baseline windows. Like I said, you might have one for reporting services for certain days of the week, certain times of the day. Well, I have one here just trying to be creative integration services, workflows activity happening from three o’clock to four o’clock. You’ll see that your baseline on charts will dynamically adjust with those time windows as well. One of the thing notable dimension up here at the top alert recommendations are available. There’s a little yellow flag that you can click on.
If diagnostic manager senses that there’s a lot of alert noise coming from a particular metric, then it’s going to throw some flags here to tell you that, Hey, it would probably be a good idea to make adjustments to the warning threshold or critical threshold on certain metrics here. That will help you to quiet the noise that’s coming from. Some of those metrics, if necessary, one more thing. We’ll get into some of the alerts that are happening here. This kind of answers to what you’re looking for around having some administrative capabilities within the tool, under the tools menu. Here’s the alert actions and responses. Here are the different action providers we have you covered with just about everything you could think of everything from email notifications, enabling additional query data collection for diagnostic purposes, enabling the event log action provider to write events or alerts to the application of that log.
Senescent P traps posts, newsfeed alerts to that collaboration wall. We were looking at a moment ago. There’s the hot ticket one for you? The PowerShell action provider to execute your own commands program action provider. Like if you wanted to kick off a batch file, VB script or dot exe, basically it’s like a command line action provider send alerts to system center operations manager. We do play friendly with Microsoft. We are after all a gold certified partner. We do have some integration with stom kickoff, SQL scripts, sequel jobs, all of those are available and they’re configurable here too. If I click on the SMTP provider, this is where we go to configure the exchange server or SMTP relay. Okay. The alert actions and responses, once you’ve configured the action providers, then the alert response you can create rules. These rules are simple, just like outlook automation, where we give it a friendly name, define the conditions based on instance, name metrics, having a threshold condition, met multiple metrics, having a threshold condition, met severity, changing different days of the week, different times of the day, the second windows, where we define actions and response to the conditions, including, for many of those action providers that you can click add and plug them in here, but, configure the email to have different parameters in the message or in the subject line, basically embedding some of the alert information, right into the email message or kick off a script, your own custom script.
Like you said, you want to have some administrative capabilities. This is one that I created actually that kills blocking, but you don’t just want to kill all blocking. Sometimes blocking will just happen when things slow down and it may pop up and then it goes away. In this case, I may have a pesky blocking condition that happens all the time with my payroll application. That’s what I’ve plugged in here in my parameter payroll application. It doesn’t matter what the database or the login, if it’s the payroll application, I want to kill that process if I see blocking. Basically I know that the alert summary text, which is one of the options here in the drill down or the dropdown, it includes the session ID. I basically plugged it into my script and I run a parse against the session ID, pull it out, run a quick query against it, to see the program database and login.
If it matches my parameters at the top, then I run a kill command against it. I use that just to show how you can get creative with the tool. And again, have fun with it. One of them is the prescriptive analysis. Like I said, this will analyze an instance and tell you what is wrong with it and what you need to do to fix the issue, whatever the issue is or the circumstances. Let’s pretend in this case, being that this is a blocking wait time alert that my persistent walking condition in the background has triggered that prescriptive analytic analysis. I’m going to go up here to the top, to the analyze tab. We’re going to look at the report and you can see all of the different findings. It looks like some of my databases haven’t had an integrity check performed in a very long time.
There’s a very large table. That’s had a high number of modifications since last statistics were updated, duplicate or missing indexes. Here’s the duplicate index was discovered, whatever you select at the bottom, it displays when it’s a problem, why it would not be a problem, different schools of thought on the topic. You ha you can kind of debate based on your own environment. What’s going to work for you.
I pay attention to this little section at the top, cause it will light up these little buttons providing me different fixes or different recommendations. If I click on optimized script, for example, I have this script to drop the index, which I can copy out if I want to, or just for convenience, hit the run button. It will let me reach out and actually perform that fits. This kind of goes along with your requirement to be able to have some administrative capability through the tool itself, hit that run button, and it’s going to be fixed for you.
Sometimes the recommendations are related to query tuning, like search predicate contains the column expression in the function set row count command used instead of top command, a cursor could be optimized for improved performance. Again, if you highlight something down at the bottom, it tells you what it discovered and recommendations. Sometimes it’ll have these little learning links that you can click on and it brings up the browser and takes you out to third-party article. In this case, it’s a Microsoft article, understanding SQL server fast-forward server cursors, and again, at the top show problem, highlight lights up and I can highlight and pull that up to see in the command where I need to go to rewrite the query. Usually at highlights, one set of texts. Usually it’s a piece of text that throws the red flag in this case, it’s a cursor. So it’s highlighting that whole section.
We’ll just pretend at this point, we need to share this with that payroll application developer could be an internal developer or a third-party developer, but I’m going to go up here to the top select export. Now I can pull all of these findings recommendations, fix it, scripts, all of that is going to be pulled out now. I can share that with that third party developer, if I need to. Hopefully that will give them some good food for thought. Now, one of these recommendations may be telling me at this point that I need to kill the blocking. I’m going to go back here to the main dashboard and we’re going to drill into the blocking condition before I eliminate the blocking. We select the sessions, which obviously it’s still critical here in a quick mouse over the red indicator will pop up this little window showing me block session count is critical.
Clicking on that chart will take me down to the blocking and dead locking view. I see a dedicated view of the blocking chain as it exists right now. The chart at the bottom reflects that this has been a persistent problem for quite some time. It has been because I use the table lock to create that. The obvious question, when I see this chain is how did this mess start? It may have started off as process blocking another process, and then that escalated and turned into one process, blocking many processes. That could have escalated into many lead blockers being, blocking many block processes. Again, you’ll want to know how did that start and how did it blow up into all what we’re seeing right now? Normally with native tools, there’s not an easy way to figure that out, but this is where we can go in and use the history browser at the top.
I’m going to click the little clock icon here and it brings up a calendar now on the right hand side, showing today’s date, previous snapshots gathered from earlier today. We’ll just go back to the beginning of the call here and there. You can see where it escalated and I’ll go on back here all the way back to the very beginning. And there you have it. Session one 20 blocked 1 23, here’s the block detail report. We can pop that open to see the details of the user database application. The last command the batch started and completed date and time stamp. All of this again, can be exported out. We’ll pretend the application says payroll application and not SSMS. We export that out, give this information as evidence of a problem to that application developer. They may come back at this point and ask us, what about other sessions that may have played a part in causing performance problems?
I see we have the blocking condition. I see all these good recommendations you gave me with prescriptive analytics, but can you show me evidence of other session activity at the same time? And the answer to that is yes.
When we’re in history browser mode, we can flip around between this console on any of the tabs and it’s as if we’re looking at those tabs back on, you can see at the top here, 12, 17:00 PM earlier today. I’m going to go in here to the one session, one 20 right click view session details, which will take me to all of my active, sleeping and suspended sessions from 12, 17:00 PM. It took me right to the source of the problem, by the way, which you don’t see it here. If I move off, it’s a highlights it in red showing that this is a problematic session.
Down at the bottom again, you see the user database application. I’m going to go ahead and export this to Excel because I don’t see much going on here. Everything is sleeping other than the sessions that are causing the blocking activity. We’ll share that evidence with them, but we do need to eliminate that blocking conditions. What I’m going to do now is click on the yellow flag here that indicates they click here to switch back to real-time mode. When I do that, you’re going to see the date and time go away and we’ll be back in real-time mode. There, you can see the current blocking condition. I’m going to take the lead blocker now, right? Click on it. When I right click on that session, I have a few management and administrative options to view lock activity around the session. If I want to, or I can right click on the session, show query history for the sake of time, I’m going to show you that here in a moment, not just this moment, trace the session to perform a minimally invasive trace on just that one session and nothing else.
We can track every sequence of event happening on that session. It doesn’t make a lot of sense here because as you can see, this process is sleeping, but if it were active, we’d see everything happening and I can right. Click and kill the session, which is what I’m going to do. Hopefully everything goes back to normal. Actually this one went to a runnable state and finished. That looks good just to be sure. We’ll go back to the blocking view. Everything has dropped off now to zero block processes. Everything’s looking good. If I’m stepping into this late in the game, and I need to go back and research the problem, I can always use that history browser or back arrow button to take me back to a time and a place where that was still a problem and start diagnosing again. By the way, what I just did with killing a process and running a trace, like I said, we’re hearing more about how auditors are looking to the part that monitoring tools and the play in accountability to the organization.
Everything I just did has been logged. Here’s where I killed the session and it shows who it was that killed it. I’m logged in actually as a different user is B B Smith bill Smith, but it shows he’s the one who logged in what workstation he’s coming from, what he logged into at the repository, with the session that he affected, that he eliminated the status of the session when he killed it. The user on that session, all of that’s being tracked here in case auditors are looking for that. And that’s one anecdotal story. I’m going to move on here to the queries tab.
This product is unique in that it has seven ways of query analysis. I find other monitoring tools out there usually do a pretty good job of one or two ways. I know like with Konfio they do pretty good job with query waits statistics or solar winds that used to be called Konfio.
Wait, statistics analysis is kind of their bread and butter. We also have query waits statistics built into this tool, but we also include other ways of slicing and dicing query performance. I would argue that all of them are important.
The first way is called signature mode. Signature mode is displaying your worst performing queries based on average performance, average CPU, average reads, average rights, and based on number of occurrences for a given date range and time period as specified up here at the top. You’re getting an overall sense of what are your worst performing queries just by averages and number of occurrences. If I were to open up a query, for example, I’m noticing this top query here has red across board here, which indicates that the performance in any of those columns falls into the top 5% highest impact yellow would indicate top 20% highest impact.
If I take this query and open it up, I’m able to see the query, but anywhere that parameters may have been passed, I will see a little number sign. And that would be the indicator. What, where parameters would have been passed, but it will give me through this view a sense of what queries are affecting me the most were impacting me the most. I can just high-level see the query and then take the next steps necessary to start mitigating that impact. There’s also the charts here that we can change and do comparisons of the overall query performance based on duration reads, writes CPU based on applications, SQL text or database for that matter. The obvious question that comes from this is, this is nice that you’re showing me average performance and number of occurrences within the date range and time period. What if I want to see individual query performance?
Can you show me that? And who did that, who executed that? What did they pass to it as far as parameters? Well, jumping to the next screen is statement mode and statement mode will do just that. It’s going to break apart.
Now, all of those queries that were previously aggregated, and it’s going to show us the overall individual query performance, including the user, the database, the application. For the same day range and time period, you have the same queries, but now they’re separated into individual performance. I still go down the list and look for yellows and reds because that’s going to help draw my attention to where I need to focus it. If I open up the query, now I will see the number sign or pound sign is gone. Now actual parameters that were passed are going to be listed here. Now what’s nice about this is as I look at this query and I see that this is a pretty messy looking query.
I may ask myself, what can I do to fix this query and make it better. It’s already showing that it’s in the top 5% high impact of all my queries. There something I can do to improve this query? Instead of me having to go through looking at this line by line, I’m going to hit this diagnosed button down here at the bottom and let diagnostic manager analyze the query. It’s going to go through line by line, looking at that, come the overall query and notice a, found a couple of things wrong with the query. A search predicate contains a column expression in a function. Selecting show, problem pops up the overall command. And there we have it. I need to rewrite that line in particular, the blue text is what’s really throwing the red flags in this command. Select star has been discovered now using a, a wild card, a nastiness.
Obviously it’s going to return a lot of data, maybe even data we don’t require. We’re seeing here that it’s thrown some red flags on that. Now you might take this back to that payroll application developer, and they may come back and tell what? We know this is not a good practice, but we had to do it this way for reason, one, two, and three. If that’s the case, then you can just close it and then flag it as completed or block it at the top from ever coming up. Otherwise you can select export again and share this information with that payroll application developer, if you need to. That’s a nice little feature to have available and that’s on the second way of query analysis.
The third way is the query history. This is what I said earlier. I’m going to get to this in just a moment and you can either hit the query history button at the top where you find your query down at the bottom and select show query history at the bottom.
It will take you in again, to analyze that query for the given day range in time period and across different performance duration CPU reads writes. Now, why is this important? It’s important because it’s going to help you identify trends over time of how this query performs worse than other times, any time types of spikes in overall performance. Degrative degradation with that query. It’s also important because if I accept any of those recommendations that we saw in the prescriptive analysis report, I’m going to want to know what that did to, what, after I make the change, what that did to improve performance, this will show me where we came from, but I can put a middle marker on today’s date and monitor in the future and see how the query performance improves. That’s the third way of query analysis. The fourth way is called query weights.
Going to query weights there, you can see, I just told you backup kicked off and there we have that backup. It just popped up there and you’re able to see your query weights over time, leading up to this moment. Here’s that blocking condition that remained persistent, and then I killed the process. It dropped off and you can look at the query weights over time by weight category statements, applications, databases, host, system sessions, users. However you want to do that, or if you’re in the heat of the battle and you’re trying to figure out what’s going on right here, right now, click the little drop down and go to query, wait, spider ration. Now we see the breakdown of wait times per application. We’re going to pick on the payroll application still. Although payroll application is a hypothetical application. We’ll just pretend though that we’re getting the calls for payroll application, and we want to see what’s going on around that application.
I find the application on the left hand side, I’m going to pick on that backup job that just kicked off. We’ll kick a click on the SQL safe, backup service, this drills down to the weight types, but only for that specific application. Now for that application, these are the weight categories. If something draws my attention in the weight category list, I click on it and it drills down now to the statements associated with that weight category, that application and nothing else. We’re not muddying the water with any other activity from any other applications. Now, I understand that this is truncated looking at the statement, but we did that for cosmetic purposes. If you want to see the entire statement, just right click and view text, and it brings up the entire command for you to view, including the parameters, which you can then copy out.
If you need to share that with that payroll application developer, what are you drilled down to the statement to take you to the databases down to the host system, down to the sessions, down to the users. Far, that’s five different ways of query analysis. The sixth and seventh, it’s going to take us over to the web interface actually to look at what we call workload analysis.
So this is the web interface. I w I don’t think we have to, beat this one to death, because a lot of what we’ve already seen in the.net client is available through the web interface. The web interface is a read only view. It does not have parallel functionality yet. It doesn’t have the same functionality yet, but it’s getting there. We’re moving in that direction. From here, you can drill down into an instance. Just kind of move across the tabs, looking at sessions, queries, query weights, resource utilization, statistics, database, status, alert, activity, the alerts activity.
You get a kind of a timeline view of different categories of alerts, and you can see hot spots trending with alert activity throughout the day. This can also be helpful for identifying windows of time that you might want to baseline differently from the rest of the day, if you want to, but through the same web interface for what we call the common web framework, we can plug in that add on called SQL workload analysis, which pulls in the sixth and seventh wave query analysis, the six way being execution plan and the seventh way being the high level statistical analysis of all query activity. As you can see here, I’m looking at one hour or at this point, because it was this morning. I installed it. I can switch it to four hours if I want to, or a day five days or four weeks, for whatever instance I’ve selected on the right-hand side, I’ll see the top sequel statements, top logins, top machines, top programs, top databases for whatever time slice I’ve selected.
If something draws my attention in any of these categories, I can click on it to drill down into it and get more in depth data. Like I want to know why do I have a 2,206 seconds of using CPU time? Clicking on that, we’ll drill down now to show me statistics. Basically I have the one day worth of data in the stool right now I can see in the areas of CPU SOS schedule real ThreadPool and signal wait time, which are associated with CPU activity, any spikes that have happened where SQL server versus non sequel server CPU or SQL compilations versus recompilation. If anything draws my attention, as far as any little spikes, I quick mouse over will show me the time period that spike represents. I can zoom in on it by highlighting that section and letting off, and it will zoom in to that little period of time.
Now I’m looking at a smaller period of time, any statements here that are drawing my attention, I click on it and it drills down to show me what it discovers as potential problems, any missing indexes, if there’s a heavy hitter or what they call heavy operator detected that will be displayed. You can view the execution plan and see all the steps of the operation, including the clustered index scan, that same clustered index that is responsible for 87.9. 1% of the actual statement execution time is also listed on the right-hand side as being the top access to object for that sliver of time that I highlighted and going hand in hand with that, I see the top log-in that executed that statement and the top machine. If I click on the top access object, it’s going to drill down now and let me get rid of this little message once and for all.
It will drill into the statistics around just that object. I drill down pretty granularly here. I don’t know that it has all the wait time trends yet. It takes time for it to build trends. Because I just installed it today, it’s it doesn’t have that trend information, but again, it really lets you get down into the weeds and this doesn’t miss a beat. It’s going to give you the stats on all of your queries. You get an idea of what has been highest impact in your environment, and perhaps why based on some of the findings.
Going back over to the console, like I said, seven different ways of query analysis, and then wrap up here, I’m going to go across these remaining tabs briefly and then touch on reports, and then we can wrap it up. Resource utilization, this focuses purely on hardware, CPU, memory, disk activity, reads and writes and disputing, and this busy alerts file activity.
What’s going on with a read and write activity down to the file level disc sizes and how much space is consumed on the disc, whatever draws your attention on these charts, you can click on them and they will go deeper. Like I just clicked on a memory chart. Now it’s taken me to my underlying most critical memory counters, which include page life, expectancy, buffer, and procedure cash. If there’s something that you want to email with someone, maybe we’re looking at this, working of working set memory for SQL server. I want to share this with one of my colleagues, but I don’t like the color scheme here. Maybe I’ll expand it change the toolbar, or include the toolbar, go to the properties, make it more of a three-dimensional chart, perhaps flipping around like this. I don’t know, we’ll go with more dramatic view here and there you go.
I can go ahead and copy this out, save the image or export to CSV and pop it into an email to share with my team if I want. I mean, you can still use history browser and go back. If you want to look at the data from earlier today, and then moving along, we have the databases view. This is also a great way to analyze your overall databases, to see where you could recapture some space. I’m always going into my environment and playing around and creating bloated databases and things like that. If I start running low on space, I’ll usually go in and do a multi-select of databases, just hold down the control button like this and select whatever you want to analyze side by side. I’ll just pick on a few of them here. Notice as I’m clicking them, it’s adding them to the charts at the bottom that I can then expand.
Now I see my transaction log sizes for each of my databases side-by-side or hit the dropdown and change it to database sizes in megabytes, side-by-side including unused space, all of which can be exported out, or if I don’t really care about my database sizes, I care more about what’s going on under the hood with transactional activity. Well, I can expand the other chart and the same databases are selected. Now I’m able to see the overall active session count across all databases or hit the dropdown and change it to transactions per second. Now I’m able to see the underlying transactional activity that’s happening, and this one seems to be up there. By mouse over it’ll bring that database to the forefront. Oops, let’s try that again. See it pulls that to the forefront and pushes everything else to the bag. You have the temp DB view.
This is what I was talking about earlier. I think it’s important for any monitoring tool. That’s worth its weight, to be able to display temp DB, and then down at the bottom, all the session activity going on against that temp DB, which you still have, the administrative controls that you were talking about to be able to go in and kill the session. If it’s causing problems on temp DB and your Monitoring, watching space get consumed. The temp DB is growing and version store generation is different than the cleanup ratio. While you can go in and clean that up yourself here, these are just kind of notable dimension, the backups and restores. You don’t even have to really babysit this view, but it is here. If you need to go in and see what databases have never been backed up, or which databases have the oldest backups, I usually bubbled the last backup timestamp up to the top though, the oldest up to the top.
As you go down the list, whatever databases that you select down at the bottom, you’ll see the history for the backups and restores. If it’s backups, you’ll see the tight, full differential transaction log, all of which can be exported. If you need to share this with others, Tables and indexes, as it implies is show you the size and data consumption of your tables and the size of your indexes and the levels of fragmentation. Just basically select any database from the list. I’m going to pick on our DM repository. I usually bubble my row counts up to the top. I have my top row count because that’s where I’m going to see the best improvements in when it comes to defragging the indexes here, we have one table growth it’s has 99% fragmentation because I’m grooming data, adding data. I want to resolve some of the fragmentation issues here on that particular table.
What I’m going to do is right click. Like you said, you want a tool that has some administrative capability. Well, we hit rebuild indexes and there you go. Now went from 99% to 0%. It actually performed an online rebuild and it supports online rebuild. If we’re talking about enterprise edition, otherwise it performs an offline rebuild. If it’s either non-supported data type or standard edition, and I can do it with this one too. I want to, there we go. That was taking longer to complete, but it will complete,
I mean, even this right here, to be honest with you, this right here is it’s world. Apart from solar winds is what I had to do is I have to go into solar winds and, cause easier. Even solar winds gives me a gooey display of all my fragmented tables that I have to go on the other side of the semesters. I write in my code, go ahead and clean that specifically table up. So I’m going back and forth. So, so the wind was good for, what I’m saying quickly, letting me see where my problem was, but I also needed another tool to actually do the work. To have one glass, to actually see the work from a shoot and fix it, that’s a lot.
Well, great. That’s great to hear, to go hand in hand with that. You can even set up one of those alert action responses like this one I created that actually targets the fragmentation when it runs its analysis at 3:00 AM. Basically I have it do use and dollar sign database for whatever the alerts coming from. It runs through, running a defrag against all of the fragmented tables that are greater than 30% fragmented. That’s what I put in my criteria. There’s a number of ways you can approach it with this tool. But yeah, definitely. If you need to right here, right. Click rebuild indexes, and you’re done with it. There’s mirror status monitoring in the tool, replication monitoring, here’s your SQL agent job status. You can check on jobs, making sure that they’re running. You don’t have to babysit. This will alert. You have a jobs taking, a hundred percent longer than it normally does, or there’s a failure, but you can also go in here and highlight a job and check out its history and see the error messages, anything that happened.
This is a job that is a deadlock, a job. It creates deadlocks in my environment. Sometimes it’s the victim and sometimes it’s the owner. That’s why we have both types here. Other than that, if there’s anything you want to add, remember, custom counters are available here. You can plug in your own SQL queries, windows counters per funk, counters, WMI counters, either one. And we’ll enumerate everything for you. You don’t have to type really anything other than the name of your custom counter and define your thresholds. It’s very simple. If you want, you can even plug in your own custom SQL script. The only thing is if you plug in a SQL script, the end result when it runs, its query must be a numeric value, but that’s so we can assign a threshold to it. I find that even with that caveat, I can do just about anything I need.
Even if it’s looking at results, I’ll do a select count star in my table against my table where you know that the column data criteria meets my criteria. I’ll put that as a numeric value and then create alerts around that if I need to. Other than that, the last part is the reporting. All kinds of reports. Like I said, this product’s been around for 15 plus years. Here’s where you can select to deploy the reports, to reporting services. We take all the legwork out of it for you. It basically will create the data source on reporting services. It will link the reports to reporting to that data source for you. You can go in and set up subscriptions or do whatever you want. You can also like with any of the reports that any of these reports that we’re pulling up sometimes you’ll see, on the left hand side, the option to deploy that as an individual report or select schedule email does still leverages reporting services subscriptions behind the scenes, but it makes it easy to create that subscription, but right from diagnostic manager, as opposed to having to go log into reporting services, but just run the report by clicking the run report button, it brings up the report.
A lot of these reports are linked together. This is one of them. What I mean is looking at this report. I see that some of the instances are critical. I might want to know why are they critical? If I click on the critical status, it will drill into another report and it automatically submits that instance name that I drilled into. Now I’m looking at all of the alerts in the other report called active alerts and why this instance is so critical and I can hit the dropdown and export this to Excel or PDF or hit the print button. Or if I want to know how’s this instance been performing today so far, I can click on the instance. Again, it links to another report showing me the server summary and the response time, the SQL servers, CPU activity, OSTP memory usage, disk busy activity, database status at the very bottom.
That’s a couple of the real-time reports, but you also have analyze which has historical activity and plan that is forecasting into the future. You could go in and run reports for your top servers over time, top queries, alert, history, resource utilization, statistics, historically worst performing databases, any of these. And let’s pull up the top databases. For example, I’m going to look at my last 30 days and I want to see my top 10 databases based on the last 30 days of collected data. As we run the report, it is rendering it on the screen here and presenting us different categories of performance around our databases. I can see my top 10 databases as I typed it in up at the very top and the parameters top databases by data size, top databases by growth over the last 30 days, top databases by reads per second, going down the list a little further top databases by writes per second transactions per second, where he waits statistic activity.
There’s other performance data that goes along with whatever category you’re looking at it and including the reads and writes a percentage of growth for the databases. As you can probably see with some of the other reports I’ve executed so far, many of these reports provide that drill down capability. Again, here we can do the same thing. As far as if something draws my attention, I can select that database and drill into it to look at it more specifically. For example, I had my top databases by transaction per second. Number two in the list here is one of my databases. It’s not really that busy, but nevertheless, that falls into number two on the list. If I want to look at that now and isolate the performance around just that database and nothing else, and again, for the last 30 days, then I click on it and it will take me to yet another report called the database statistics report, where I’m able to see what the performance has looked like as far as transactions per second, for just that one database and down at the bottom, there is again, more of the statistical analysis and the little data points that were collected per each poll iteration.
If I need to share this data with any particular audience, I can go up to the top and export to Excel or PDF or hit the print button as well. Of course, with many of these reports, you may want to set up subscriptions or automate the reports to email you on a day-to-day basis. Of course, on the left hand side, we have the little links here available to deploy the reports, to reporting services, or even schedule an email on a regular basis. That pretty much wraps up the functionality of diagnostic manager.
This tool is available for download on the idea of website, www.idera.com, select the products, drop down, go to the SQL diagnostic manager link, and you’ll see a getting started button to begin the trial. Once you enter in some basic information, like your name, company, phone number and email address, you’ll be presented with a link to download the software.
So you can download the software. Once you’ve installed the software, it will automatically embed the trial key initially. That will include monitoring for up to 14 days and 15 instances fully functional. If you need to talk to your account manager and perhaps extend that we’ll work with you on that as well. With that wraps up the diagnostic manager demonstration. If you have any questions, feel free to reach out to us again on the website or to your account manager, and we will get a technical resource involved to assist you. With that, I appreciate you attending the demonstration. Have a nice day.