Summary: How to determine how many VMs you can run on your vmware volumes.
Date: Around 2009
Refactor: 21 February 2025: Checked links and formatting.
This article explains the following article in more detail:
Yellow Bricks article: Max amount of VMs per VMFS volume
The reason for this is that both articles are quite technical and can be quite confusing. I have a lot of experience with storage and I would say a little bit more than the average system administrator but not like the guys who work with NetApp and EMC for 20 years. I thought I'd write an article on going through the articles step by step, excluding the steps that are not relevant for my environment, providing an extensive example to use for future references.
My environment:
Excluded step: I'm not taking SCSI reservations into considerations. See the quote below on what kind of operations cause SCSI reservations. In my environment I do not expect these kind of operations to happen on a daily bases:
VMFS is a clustered file system and uses SCSI reservations as part of its distributed locking algorithms. Administrative operations, such as creating or deleting a virtual disk, extending a VMFS volume, or creating or deleting snapshots, result in metadata updates to the file system using locks, and thus result in SCSI reservations. Reservations are also generated when you expand a virtual disk for a virtual machine with a snapshot. A reservation causes the LUN to be available exclusively to a single ESX host for a brief period of time. Although it is acceptable practice to perform a limited number of administrative tasks during peak hours, it is preferable to postpone major maintenance or configuration tasks to off-peak hours in order to minimize the impact on virtual machine performance.
Remember that changing settings and or defaults could do more harm than good when you're not properly analyzing your environment. Default values are there for a reason.
To come to an actual number of Virtual Machines per VMFS volume we have to gather data first. Data gathering consists of two parts:
The first step we do is gathering the average amount of active SCSI commands because that will take some time. To do so, we will run Vmware ESXTOP in batch mode using a modified configuration file. First we'll have to create the configuration file:
Now start esxtop in batchmode like this:
esxtop -b -c /root/batch_mode -d 2 -n 900 | gzip -9c > /tmp/esxtop_esxprd01_1244.csv.gz
And run it in batch mode on the background for a full day:
nohup esxtop -b -c /root/batch_mode -d 15 -n 5760 | gzip -9c > /tmp/esxtop_fullday_`hostname -s`.csv.gz &
Explanation to above switches:
After running the command and downloading/unpacking the file you can load the file in excel/perfmon to evaluate the data. In my environment we had so many disks excel would run out of columns. Even the latest version of libre office did not fix that. So I was left with perfmon which does not very well with large files.
Update: Excel 2010 can import the amount of columns. Using special paste/transpose you can flip columns and rows and filter out what you don't need so the file gets smaller which is better for perfmon. I performed these steps so the data was readable:
* Imported the csv as text and saved as xls
"
at the start and end of each lineYou can import a CSV in perfmon as follows:
Because perfmon would crash several times I was forced to create a file over a shorter period of time. Eventually I added these counters:
Which look like this in perfmon:
The sum of these counters are the total numbers of outstanding commands for that specific LUN. IN case multiple esx hosts access that LUN please perform the same steps for all esx hosts and add these numbers as well.
The LUN queue depth determines how many commands the HBA is willing to accept and process per LUN. If a single VM is issuing IO, the queue depth setting is indeed the leading parameter. If multiple VMs are simultaneously issuing IO to the LUN, the Disk.SchedNumReqOutstanding setting becomes the leading parameter.
Note that it's a best practice to keep both settings at the same value.
The LUN queue depth can be found like this:
This will give you an overview of the LUNs accessible by the ESX host and the corresponding queue depth value:
The Disk.SchedNumReqOutstanding is a per host value and can be found like this:
Now we can use the formula from the sources to determine the maximum number of VMs per VMFS volume:
First sum up all the gathered information:
The formula used on shared storage is: lun queue depth / average active SCSI commands: 32 / 4 = 8 or 16 / 4 = 4.
So, depending on the queue depth which is different on a few LUNs I can have a maximum of 8 or 4 VMs per VMFS volume.
Now that we have established that for performance reasons the maximum number of VMs per VMFS volumes should not exceed 8 we should also look at the storage demand. In our environment we have an acceptance cluster consisting of 4 esx hosts. In this cluster reside 117 VMs. We should calculate whether these 117 VMs (and future growth) can co exist with each other keeping esx limits in mind. In esx 4.1 the maximums we should consider are these:
You can create csv files with PowerCLI to select the storage need per cluster:
$timestamp = Get-Date -format "yyyyMMdd-HH.mm" # $vCenter = "vCenter" # Connect-VIServer $vCenter foreach ($cluster in (Get-Cluster)){ #$cluster = "Acceptance" $csvfile = "D:\adminshf\$timestamp-$cluster-storagerequirements.csv" $myCol = @() $vms = Get-Cluster $cluster | Get-VM #$vms = Get-Cluster "Acceptance" | Get-VM foreach($vm in $vms){ $vmview = Get-VM $vm | Get-View $VMInfo = "" |select-Object VMName,UsedSpaceGB,ProvisionedSpaceGB,MEMSize,MEMReservation,ProposedMEMReservation,RequiredVMKernelStorage $VMInfo.VMName = $vmview.Name $VMInfo.UsedSpaceGB = [System.Math]::Round($vm.UsedSpaceGB,0) $VMInfo.ProvisionedSpaceGB = [System.Math]::Round($vm.ProvisionedSpaceGB,0) $VMInfo.MEMSize = $vmview.Config.Hardware.MemoryMB $VMInfo.MEMReservation = $vmview.Config.MemoryAllocation.Reservation $VMInfo.ProposedMEMReservation = [System.Math]::Round(($VMInfo.MEMSize / 3),0) $VMInfo.RequiredVMKernelStorage = ($VMInfo.MEMSize - $VMInfo.ProposedMEMReservation) $myCol += $VMInfo } $myCol |Export-csv -NoTypeInformation $csvfile } # Disconnect-VIServer -Confirm:$false
You can now open the csv in excel and make some calculations on what you need per LUN.
The last step would logically be to deploy the storage. I wrote a special script for that, which you can find here.