Transcript for:
Monitoring Solutions with Site24x7 - Lesson Summary

hello Dave it's John hey John how you going not good Dave one of our largest customers just called saying that their servers are down their servers are down I'll log in and check log in and check why hasn't our monitoring system alerted us I'm not sure our we should know that the customer servers are down before they do yeah but I don't care if we need to replace our monitoring system then do it I don't want this to happen [Music] again all right guys put your hand up if you've been in this situation before I think it's safe to say that we need to change this monitoring solution and address John's concerns ladies and gentlemen let me introduce to you today the sponsor of this video manage engines site 24x7 if you want to follow along with me today then you're in luck not only is this platform easy to use but they also offer a 30-day free trial now I don't want to waste any more time time and I don't want to receive another nasty phone call from John so let's jump straight in the first thing that we're going to want to do is go to site24x7 once you're there click on that start 30-day free trial in here entering an email address and a password to create your account you can then opt in receiving marketing material from Zoho and its partners and lastly accept the terms of service and the privacy policy and then just go ahead and click sign up once the sign up is completed you'll be taken straight into the site 24x7 portal you'll also receive an email that looks something like this and that's just simply to verify your email address now the first bit of infrastructure that we're going to be setting monitoring up on is VMware Vere so we'll go ahead and click on VMware on the left hand side now the way that this works is that site 24x7 have all their servers up in the cloud so we're simply going to be installing a polar on a Windows VM within our environment that polar is going to be responsible for collecting the metrics from our vsphere environment and uploading them into the site 24x7 servers from there we'll be able to read all the metrics and information in this portal right here next I'll click on the 64-bit button to download the windows OS polar underneath that you're going to see a show device key that device key is responsible for connecting your polar to your site 24x7 account when we run through the installation of the Polar you are going to enter in that device key I'm going to switch over to the windows VM and you can probably see on the desktop that I've already downloaded the polar installer I'm going to double click on that and in the first welcome screen we'll just go ahead and click next for the license agreement we'll go ahead and click on yes and here we need to go back to the portal get our device key and enter it in right here once you've entered the key in then go ahead and click next we'll go ahead and click on next for the destination location and if you use a proxy to connect to the internet you can go ahead and tick this box otherwise just go ahead and click next and this is the final screen so we'll just go ahead and click next to complete the installation the installation is now complete so we'll just click that finish button and up on screen you're presented with a read me file for the polar now we're just going to go into the services and just make sure that our 24x7 service has started you can see on screen here that my service has not started so I'm going to go ahead right click and click Start once the service is started we'll then go back to our site 24x7 portal and here the portal is checking that it can communicate successfully with the polar once the polar successfully connects it will tell you up on screen here select the polar and then click next now it's time to add in your V Center click on ADD V Center here I'm going to enter in my V Center name which is VMV Center and then on the next field I'm going to enter in my fully qualified domain name so that's going to be VMV center. vm. loo under VMware user credentials let's hit that plus button these credentials are going to be used by the site 24x7 polar to connect into your Venter server once it connects into the Venter server it's going to be able to read the inventory and also gather all the metrics that it needs I've already created a user account in my Venter the user account is site24x7 at vsphere dolo that account has readon permissions from the Venter and to all the child objects underneath it now that we've got our user credential sorted we're going to leave the rest as default for now and let's scroll down to the bottom and hit that save button now that our Venter is added in we just need to give it a couple of minutes to go through and start polling and collecting the data as you can see up on screen it is going to poll every 15 minutes so we might just pause the video Let It collect the data and come back shortly well I did say that we'd come back shortly but actually it's been 2 days the system is now grabbed enough information so that I can give you guys a really good demonstration now as it stands we have monitoring set up we'll already be receiving alerts whenever a monitor goes down up on screen behind me here we have a list of all the monitors that Venter has discovered so these include virtual machines data stores s6i hosts clusters and even resource pools the monitors marked in blue here are actually virtual machines that are powered off if we scroll down we get to the virtual machines that are powered on marked in Green Let's click on one of the virtual machines and actually let's click on that site 24x7 polar in the first screen here we have a monitor overview and this simply shows the availability CPU and memory on that first ribbon if we want some more data we can scroll down for space metrics we do need to wait 7 days for it to start generating the graph here so we'll continue on down here we have a snapshot of the CPU memory the dis usage in kilobits per second is only used for esxi hosts so we don't see anything there and then lastly we have the network usage at the very bottom we have some more VM details and then also some details of the esxi host if we click on the CPU tab at the top we'll see a summary up the top with some CPU metrics and then if we scroll down we can see a few historical graphs here with CPU ready CPU utilization and CPU weight let's go and check out the memory tab very similar with the CPU metrics we have a summary at the top and then if we scroll down we have some memory historical graphs that we can view right here let's go ahead and click on processes now to view the processes within the virtual machine we do need to go and install an agent we're going to be doing that later on in this video video so let's go on to the next one of disio disio is really only for esxi hosts so we won't see any of those graphs here however we will see some space consumption so let's scroll down and have a look on the left side we can see the free space view and then on the right side we have a nice space split up summary let's go ahead and click on network once again we're given a summary at the top of the screen there with some Network information and if we scroll down once again we have the historical information on data received and trans MIT and then we have packets received and packet transmitted as well let's go click that data store tab here is the data store where this virtual machine lives and in this screen we're shown a summary of the data store performance next up is ZR forecast this is a pretty cool forecasting tool but unfortunately I won't be able to show you this as it does need to be collecting data for the last 15 days if you want to read up a little bit more on this forecasting tool then check the video description I'll be posting links there to the documentation let's check out outages outages is going to keep a record of if the monitor goes down and if it is down how long is it down for if we click on more and then we'll go and click on inventory we're able to see a little bit more information on our virtual machine plus we have some site 24x7 profiles that we can modify and we're going to get to that real soon if we click on more once again and select log report this is basically a CPU and memory log report taken at those 15minute polling intervals now let's go over and click on home on the left hand side we're going to scroll back down to that site 24x7 VM and this time on the right hand side next to those three lines we're going to click on that and we're going to select edit now remember not too long ago we did talk about profiles I'm going to scroll down to configuration profiles and next to threshold and availability we have a default threshold which is called VMware VM I'm going to click on the little pencil here to modify that and this is the area where we can configure thresholds for certain metrics out of the box we are notified for agent failures and we're also notified if a Nick gets disconnected but more importantly down the bottom we do have CPU utilization and memory utilization thresholds that we can set right here for example on the CPU utilization I may want to set 85% here and this is going to be like a warning notification that comes through if we want to have a critical notification come through we're going to click on ADD critical threshold we'll put in our threshold of say 95% and you can see under the notify as C column we have the notification as critical if we want we can do something very similar for memory utilization but if we scroll all the way to the bottom where it says set threshold values if we click on that we have a whole list of metrics here that we can add in and set our thresholds on for example we can select snapshot size that will get added to the list and we can place a threshold of say 1 GB and if the snapshot size reaches 1 GB in size we are going to receive a trouble notification or a warning notification if I click on ADD critical threshold the exact same logic applies to the notification we set on the CPU utilization I might set here 2 GB and when the snapshot reaches 2 GB in size we'll receive a critical notification once you're done with that down the bottom of the page we just click on that save button one other feature that I want to mention here is the it automation template let's have a quick look at that by clicking on select Automation and then going down to add automation templates you can see right here on screen that one of the options for automation is is a rest API you have a bunch of options here that are available to you but if rest API is not your thing then we can drop down this menu and we can also select a server script the script language that the system understands is a batch script Powershell and a VB script going back up to the type field and dropping down the menu once again we also have an option to select a server command Window Service or we can just do a server reboot a very simple example of this it automation could be that virtual machine gets powered off and then it automation kicks in triggers a script and powers that virtual machine back on again we're going to close out of this window on that simple example that I gave on it automation not only can we trigger that automation when a virtual machine goes down but if we drop down this menu right here we do have a whole bunch of other options so you may select it automation to kick in when you receive a critical alert or a warning alert for example and you've got a few other options up on screen here one of of the last sections that I want to cover up in this screen is the notification profile I'm going to go ahead and click on the little pencil here to modify the default notification I'm going to highlight a few important options here on this screen the first one being that the system sends a root cause analysis whenever the monitor is down and the root cause analysis alert looks something like this scrolling down to alert configuration now we're going to receive an alert anytime that a monitor is either in a down state up critical or or in a trouble State and moving over to alerting period if we drop down this menu by default we are going to be receiving this alert 24 hours a day however as you can see on screen we can modify those hours so you may do something like for critical we do want to set that for all hours and then for trouble tickets we may only want to receive that during work hours under notification medium if we drop down this menu the options selected are email mobile push notification and SMS we only have email set up at this point but you can see the other options there that are available to you let's scroll down to notification delay now whenever the monitor is in either a down state critical or in a trouble State then there'll be a notification delay set by this option right here we drop down this menu we can either notify immediately or notify after 2 3 4 or five consecutive failures to the right of that we've got the alerting period and similar to before we have by default all hours but you can cut that down and select other options right here persistent alert is going to keep alerting you until you actually come in and acknowledge the alarm we can fine-tune that by using that notify after every field we then have the option to select a user alert group and lastly the notification medium by default here we have email if we scroll down to escalation settings let's say that you've received an alert and that alert hasn't cleared after 30 minutes here we can say escalate after 30 minutes and we can notify a different group so that group might be a manager or might be a group of more senior engineers and to the right of that you have your notification medium let's say email and then just under that you can also trigger it automation once you've made all your changes here click that save button and click the save button again at the top of the screen here now those profiles are applied on every single VM so you can imagine the amount of time that saves by using this method you definitely don't want to go touch each virtual machine and change thresholds and all this kind of thing you want to have a good Baseline threshold and then apply that to the majority of your systems now we're going to move on from VMware vs spere and we're going to connect in a physical Cisco switch to our monitoring system before we add in the switch we do need to add in the network module onto our site 24x7 polar and to do that we're going to click on admin on the left hand side and then we're going to click on on premise polar on the right hand side under the network module column I'm going to click on that enable network module the download is initiated you can now see that under the network module column it has a status of up so we can now proceed on the left hand side let's go and click on network we'll then go and expand network monitoring basic overview and we'll click on switch make sure that you have your site 24x7 polar selected and just go and click next by default we have an SNMP V1 slv2 Community string here of public for the demonstration purposes I'm going to be using this Community string in my lab however for a production environment please do not use public cuz it's used absolutely everywhere even better still you want to use SNMP V3 we can simply go and click on add new credential and right here you can add in your SNMP V3 information whatever option you've selected make sure you have clicked on the little box next to it and go ahead and click on next I'm only going to be adding in a single switch here however if you want it to go out and probe your network you can select that add Network option I'm going to be selecting add device so I'll go ahead and select that for display name I'm going to add in Cisco switch we're going to drop down this category option and we'll simply select switches for device template my switch is a Cisco 3750 so I'm going to drop that down in the search field I'm going to just type in 3750 and I'll select Cisco Catalyst 3750 series there's nothing else that I need to change here so I'm going to go ahead click next here we can modify some filters we're going and click on the little pencil under the actions column you can see by default we're only going to be discovering interfaces that are in the UP State if you also want to discover interfaces that are in the down state you can simply drop down the menu and also select down I'm going to exit out of this filter and we'll go ahead and click next here we have a summary with all of our settings if you're happy with this we'll go ahead and click on Discover now we need to give the polar a little bit of time to go out and discover the switch and also start to gather some of those metrics so we'll pause the video right here once again and we'll come back shortly okay ladies and gentlemen we are back it's been a couple of minutes since the poll has gone out discovered our switch and gathered a few metrics on screen you can see my device name called Cisco _ switch let's go ahead and click on that in our first tab we're going to see an overview of device performance you can see things such as availability response time packet loss CPU utilization and memory utilization right down the bottom of the page you also have a little summary of some device information if we go ahead and click on stuck if your switch has stacking capabilities the information for that is going to show up right here let's move on to interfaces here we have a list of interfaces on our switch along with a snapshot of metrics such as performance errors and discards if we want to set some threshold alerts we can go ahead and click on that threshold configuration button we're going to be setting a bulk threshold on every single interface face here the first thing we'll do is hit that child monitors dropdown menu and I'm going to click on select all and as you can see up on screen I have configured a threshold of 70% on my receive and transmit utilization and on 70% the system is going to send me a trouble alert I've also added in a critical threshold of 90% for my receive and transmit utilization when the interface reaches 90% the system is going to fire off a critical alarm once you've set up your threshold go ahead and click the save button let's move on to the traps menu here we're able to set our Cisco switch to send SNMP traps to our site 24x7 monitoring once the Trap fires off it will be received right here let's move on to Performance counters up on screen we have some default performance counters for our switch if we want to add in some additional counters we can go ahead and click on ADD performance counters and we have quite a large list of performance counters that we can add in here I won't be adding any at this time so I'm I'm going to close this window once again here we can set up some alerts for thresholds of these performance counters to do that we can click on that threshold configuration button and similar to before we go ahead and we drop down the child monitors I'm going to select switch CPU and if the switch reaches 70% I want it to fire off a trouble alarm and in addition to that if the switch reaches 90% of CPU then I want it to send me a critical alarm once you have your threshold set up go ahead and click save let's go to tabular Performance counters a great example and use case of this is if you're monitoring the temperature of your switch you'll probably find that there's a few SNMP oids that are able to monitor certain components of the switch and display the temperature now instead of having each of those components displayed up on screen here we just have one performance counter saying temperature of switch and then underneath that we'll have each oid if you want to see what this looks like then have a look at this example up on screen this is taken from the site 24 4x7 knowledge base and it looks at multicast packets and interface collisions for each interface we spoke about Zia forecast with our VMware vsphere monitoring just a reminder check those links in the video description below if you want to read up a little bit more about Zia forecast under more if we go and click on outages similar to VMware vsphere this is going to list any outages in the last 24 hours if you want to change that time frame on the right hand side you can click on the drop down menu and you have a list of options there let's select the inventory here we have a little bit of information about our switch along with all the profiles that are attached to our switch again you can go and click on the modify button next to any of those profiles and you can tune them to your liking and last but not least we have our log report and here is a log report with an entry for each of those polling intervals and we can see some basic information here such as response time packet loss CPU utilization and memory utilization now we're going to move on to the last bit of our demonstration which is going to be setting up an agent within a Windows OS and monitoring the virtual machine from within the operating system let's go ahead and click on the home button now what I'm going to do is search for one of my domain controllers in the search field up here I'm going to be typing in VAB a dc1 and then just clicking on that object once it appears I'm then going to go across to the processes Tab and we're going to click on that download agent now the way that this works is that we download this agent we install it in the operating system and that agent talks directly to the site 24x7 Cloud so it does not go through that site 24x7 polar so I'm going to go ahead click on download agent I'm then going to select windows and in step one we'll go ahead click download site 24x7 Windows Server agent scroll down to the bottom for the license agreement and select I accept now you guys know how to install this agent so I'm going to skip ahead and stop at two screens that I just want to highlight and the first screen being our device key similar to when we installed the p ER we do need to go back to site 24x7 get our device key and paste it in here in site 24x7 this is where you get your device key you're going to copy that and you're going to paste it in here the next screen being the server monitor settings here we can disable such things as it automation plug-in Monitor and management actions by default disable it automation is selected you need to determine which settings are right for your environment once you have those selected then go ahead and complete the agent installation now not only is this agent going to pick up operating system metrics but it's also going to detect that it is a domain controller and it's going to be able to obtain domain controller metrics as well on top of that it's also going to be able to determine Windows updates so you're going to be able to get alarms if your server requires any Windows updates if we go back to site 24x7 and search on that server's host name you'll see that at the moment we only see the VMware VM we need to give it some time to gather the metrics and report back up to site 24x7 so that's it's going to be available in the portal for you now in the interest of time we are going to go and have a look at a different server that I have performed this exact process on and it has already been Gathering data and metrics and all this kind of good stuff and we'll be able to see all that in the portal the name of that server is called vmad so I'm going to type that up in the search box now sorry the server is actually called VM A2 but what you can see up on screen here is not only the VMware VM but you also see a monitor for ad server and then you see one for Windows update first up let's go and take a look at the ad server one now once again very similar layout to what we saw in our Windows VM and also in our Cisco switch however we just got some metrics here which are specific for active directory some things to highlight here up on that top ribbon we can see the replication inbound bites and replication outbound bites and this we're talking about active directory replication in the middle section there on the left hand side we can see some interesting statistics for our active directory domain we can see user computer and group counts and lastly the organization counts on the right hand side we see some core services for active directory and good to see that they're all green if we scroll down we have some metrics here along with historical graph data for Keras and for elap if we continue scrolling down further we have even more metrics here for active directory let's go ahead and click on the replication tab right here we have a whole heap of metric zoning right into that active directory replication performance let's go ahead and click on domain Services we have some performance data at this point in time we also have that historical graph of domain service operations in the middle there and I'll scroll down to the bottom just so you can see that last one which is on search operations let's go up and click on that ldap tab the ldap tab is pretty cool because not only can we see that historical graph there but we also have some metrics with average minimum and maximum and I'll just scroll down to the bottom so you can see the last bit of this page which is elap operations and successful binds next we'll click on Security account manager now you can see that my active directory is not that busy but on this page we can see machine creations and user Creations as an active directory admin these type of metrics can be really useful for administering your domain let's go across to databases here we have the metrics on our active directory database we have case size we have I log wrs we've got reads and wrs waiting records and threads we've got average IO log rights and average IO database reads so plenty of information to keep an eye on your active directory database let's go ahead and click on outages just like our VMware vsphere and our Cisco switch we have a list of outages here if they occur and on the right hand side we can drop that menu down and select a different time frame if we wish let's go ahead and click on inventory here we have some basic information on our server and we also have our profiles that are attached here once again we can go and click on the little pencil to modify them and you can change them to suit your environment they are for threshold and availability and also for notifications let's go ahead and click on log report like with our other environments a basic log report here taking at those folding intervals and we can see replication outbound byes inbound byes and address book client sessions now we're going to take a look at the Windows update monitor for this server in the search bar at the top I'm going to type in VM 82 and I'm going to select that first one that says Windows update now this one's really useful because you're going to be able to keep track of Windows updates throughout your environment and anytime that a Windows update comes out and your system's not patched you'll be receiving an alert that looks something like this now this first page is pretty self-explanatory but what I want to show you is on the left hand side if we click on total pending updates that number five there we have a list of the updates that are missing from this server and further to that on the right hand side we can click the KB link just like this and it takes us to the Microsoft support site that tells us all about this update another really cool feature here is if we click on the update history and just like that we can see a list of all the update history here now outages inventory and log report I'm going to skip because we have covered that in VMware vsphere and in the Cisco switch section and now I want to move our Focus across to dashboards on the left hand side let's go ahead and click on dashboards now within dashboards we can go and create our own custom dashboards here by selecting build a custom dashboard on the left hand side we'll leave performance widgets Monitor and all monitors selected where it says data store I'm going to drop that down and I'm going to select Microsoft active directory for choose monitor I'm going to drop that down and I'm just going to select VM A2 however you can select all of them as well and then for graphs I'm going to be selecting curos and nlm authentications so I simply just drag that onto the middle of the screen here and here's my graph so I can just click on done customizing and there's my dashboard so it's as simple as that within 30 seconds you can have your own dashboard up and running if I go back to dashboards and scroll all the way down to the bottom where it says system dashboards we have some predefined dashboards already created for us I want to just show you the VMware Health dashboard I'm going to go ahead and click on that we have a list of top 10 heavy hitting V V sphere monitors right here so this covers almost every aspect that you need to know about VMware vs spere in this list we have clusters we have esxi hosts we have virtual machines resource pools we have data stores and lastly we have those dreaded snapshots one thing missing in here is networking but that can be easily resolved by creating a custom dashboard now before we wrap up this video there's one last section that I want to cover and that is user accounts on the left hand side let's go click settings and we're going to click on user and alert management at the moment I have a single user here which is my super admin account let's go ahead and click on that add user button the top half of this we can add in the user details but what I want to really highlight here is the allowed to access so by default it's allowed access to all monitors however we can select monitor group and if I drop down that menu and in the search I'm going to type in customer so what this means is that this system not only caters for single company use but you can also open this up for msps and service providers by using monitor groups here we can ensure that this user can only access the monitors that it needs to see for example if it belongs to customer a then this user can only see the customer a monitors we can also associate this user to a user alert group and the alert groups that are available to you out of the box our admin group application team and network team and this is an excellent example of where we can can use escalations if this member is part of a senior admins group and we have an escalation going to that admin group then this user will receive that alert we can also expand alert settings at the bottom and we can tune this specifically for this user once you got all your settings right go ahead click save now if you are serious about monitoring your infrastructure I highly recommend that you consider site 24x7 not only because they sponsored this video but you saw over the last 30 minutes how easy it is to deploy and get running within your environment therefore today we're going to be rating site 24x7 nine CIS admins out of 10 the other reason for such a high score was that I was able to deploy site 24x7 get it up and running view all the metrics receive alerts without much interaction with the knowledge base or with a user guide and that's extremely important because from a company's point of view they do not want to be spending large amount of dollars on operating expenses let us know in the comments below if you're already running site 24x7 and what your experience is with this platform also feel free to leave a comment if you have any questions at all now if you're a VMware engineer and you haven't checked out these F 8 yet why not have a look at these two videos installing Venter and esxi 8 that's all we have time for today thank you so much for watching and we'll see you again in the next video