Summary: Information and background on vSphere alarms as well as how to set them up.
Date: Around 2010
Refactor: 1 May 2025: Checked links and formatting.
With the introduction of vSphere 4.0 the possibilities of using Alarms in VMware are greatly improved. This article describes some of the possibilities that are now available while using alarms. We'll go through some of the default defined alarms and customize them to our environment to make sure they do what we want. Note that we'll only use Email Notification in our environment. SNMP traps are also supported, but not used with us.
Since Alarms consists of 4 tabs we'll go through them per tab:
After going through all the options I'll tell you how to configure vCenter for email notification and I'll give you my minimal customizations in the vCenter Alarm Definitions.
By default alarm definitions are configured at the vCenter level. So in your vCenter select the object representing the vCenter server, select the tab Alarms and select the Definitions view:
During this walk through we'll focus on the defined alarm “Cannot connect to storage”. In the general tab we'll leave the default alarm name but we'll modify the description so other system administrators know that I changed it:
Default is:
Default alarm to monitor host connectivity to storage device
We'll change it to:
Customized alarm to monitor host connectivity to storage device - sjoerd, 7 June 2011.
As you can see it's really obvious that the the alarm is changed and by who.
The alarm type we use now is for hosts. Note that it's possible to create alarms for:
Now note that for an alarm to work it needs to be triggered. In VMware the triggering can be done in two different ways:
In the tab “Triggers” there are already three events added that can be configured to trigger the alarm:
Each of them has a default status of “unset” and can have extra conditions so it's possible to only activate the trigger when it happens on a specific datacenter, datastore, host, etc. The default status is not really helpful, it means the event will never trigger the alarm. We'll set the events like this:
These options are chosen according to the amount of trouble they give. Lost storage connectivity means end users will not be able to work anymore while path redundancy can impact performance, but ens users will still be able to work. We won't set any conditions since we want the alarms to work on the entire environment.
This gives this result:
Note: As you can see in the screenshot the alarm will be triggered if ANY of the specified events occur. Since this is a default alarm that we are slightly customizing this is an option that cannot be changes. If you want the alarm to be triggered if all events occur you'll have to create the alarm manually. Then you'll have the option to customize this.
As you can see below this is not customizable when monitoring for events. That is logical, because if you lose storage connectivity you can't have a fluctuation for example as you can have with CPU usage:
It is however interesting to dive in these options a little bit deeper, at least explaining what it should do: Using Range and Frequency with Alarms
The Range parameter specifies a tolerance percentage above or below the configured threshold. For example, the built-in alarm for virtual machine CPU usage specifies a warning threshold of 75 percent but specifies a range of 0. This means that the trigger will activate the alarm at exactly 75 percent. However, if the Range parameter were set to 5 percent, then the trigger would not activate the alarm until 80 percent (75 percent threshold + 5 percent tolerance range). This helps prevent alarm states from transitioning because of false changes in a condition by providing a range of tolerance.
The Frequency parameter controls the period of time during which a triggered alarm is not reported again. Using the built-in VM CPU usage alarm as our example, the Frequency parameter is set, by default, to five minutes. This means that a virtual machine whose CPU usage triggers the activation of the alarm won't get reported again – assuming the condition or state is still true – for five minutes.
In the action tab it's possible to define the specific action that should be taken when the alarm gets triggered. This can be done on four different alarm state changes:
For every action you can define these options:
Now the question is, how much minutes may be acceptable to have a notification send again? The assumption is that whoever gets the first notification will work on it as fast as possible since it is a severe warning/alert. However, some repeat may be expected in case somebody accidental forgets the email. I decided to set it to 240 minutes.
Also, considering what I've set in in the trigger configuration I only want the Alert to be repeated, not the warnings. All this gives me this result:
Note that there are other actions available as well:
Every Alarm has these actions available:
VM- and host-alarms have more actions:
Before vCenter is capable of sending email it needs to know some email settings. Go to Administration → vCenter Server Settings → Mail and fill in the correct values:
Note: There is no way in vCenter to test this configuration. The best way to test is to make a custom alarm, on something like VM CPU usage and set it to sent an email when usage is above 20% or something. That will be triggered pretty fast so emails will be sent.
This is an overview of default alarms as defined in vCenter 4.1 that needs to be customized as described above or as described below:
Note that the default alarm “Datastore usage on disk” has been disabled and replaced by “Datastore overallocation”.