VMware vSphere

From Leo's Notes
Last edited on 23 June 2019, at 23:16.

vSphere is a server virtualization platform from VMware. It is not an operating system or application per se. The name covers a range of technologies under the VMWare Infrastructure umbrella since version 4.

Overview[edit | edit source]

Software Defined Data Center is where all infrastructure (network, machines, storage) is virtualized.

Virtualization is the process of creating a software-based representation of physical machines, networks, or storage. VMs itself is a running program that represent a physical machine and its properties can be configured programatically (such as storage, network, cpu, memory, etc).

Other benefits include:

  • Isolated from other VMs, and
  • Sheltered from physical hardware changes/faults.
  • Files can be provisioned into VMs, or snapshotted, or moved, etc.
  • ACLs can be made to limit user access to certain VMs or resources.

Core resources managed by the vmkernel:

  • CPU
  • Memory
  • Disk
  • Network


ESXi Host[edit | edit source]

A ESXi host (also called vmkernel, hypervisor) can run a total 1024 VMs per host with a maximum of 4096 Virtual CPUS per host. Minimum/Maximum requirements are:

  • 2 - 768 CPUs per ESXi host
  • 4GB-16TB memory limit
  • 32 Gigabit ports, or 16 10Gbit adapters, or 4 25/40/50/100Gbit adapters.

Hosted Architecture - using a generic OS (such as windows, linux) to run VMs. Compare this to Bare Metal Architecture - using an actual minimal hypervisor built for running VMS such as ESXi, or HyperV.

hostd is a process that runs on every ESXi host that is responsible for most operations on the host as well as communicating with the vmkernel. Clients communicate witht hostd to control the host. vpxa is the agent that uses hostd to control the ESXi host.


Storage[edit | edit source]

Storage on ESXi are also known as datastores and is the location where virtual machine data is stored. There are serveral different backing methods that can be used for datastores:

  • Local, vmfs
  • SAN
    • Fibre Channel (FC)
    • Fibre Channel over Ethernet (FCoE)
    • iSCSI
  • NAS via NFS
  • vSAN (vSphere 5.5) vsphere san
  • VVOLS (vSphere 6.0) virtual volumes

Virtual machine data is stored as virtual disks (vmdk).

Files representing a virtual disk:

  • vmdk - contains disk directory info, adapter type, geometry information
  • flat.vmdk - the actual disk data

These virtual disks can also be made to link/map to another file or physical device using Raw Device Mapping (RDM). RDM files cannot be saved on NFS datastores (?)

  • Virtual: vdm 62TB, a mapping/pointer to another vmdk file
  • Physical: vdmp 64TB, a mapping/pointer to a physical disk/lun

Note that physical disks will not be snapshotted.

Disk allocation can be set as:

  • Thin: Allocates blocks as they are written
  • Thick : Allocates all blocks up front
    • Lazy: Writes 0s when requested
    • Eager: Writes 0s when disk is initialized

Tin disks can be 'inflated' which basically converts the disk into a thick eager allocated disk.

vSAN[edit | edit source]

vSAN is a proprietry software that pools storage from multiple hosts into a single datastore. It requires that each host contribute to the pool using only SSDs, or SSDs and HDD combination. A minimum of 10% SSD must be present for it to contribute storage to the pool. The storage is mounted as an Object Store File System (OSFS)

There is a maximum of 64 concurrent host access to shared storage. (same as the number of hosts per cluster)


NFS[edit | edit source]

VM data can be accessed over NFS. When using NFS 3, ensure the export option no_root_squash is set as ESXi will access files as root. NFS 4 has optional kerberos authentication as well as built-in file locking support.

Best practice is to have multiple NICs on multiple switches to the NAS. Use IP Hash to improve throughput.

Fibre Channel[edit | edit source]

Fibre Channel in a nutshell - Storage is in a form of a disk array - Every device has a World Wide Name (WWN) which is like a MAC address, but longer. 64bit address - LUNs are carved from the storage disk array and can be:

- Masked: The SP/Server hides the LUN from the target when scanned
- Zoned: Done at the switch level (the Fabric), to segregate/restrict access to the LUN

Fibre Channel can be accessed over Ethernet and is called FCoE. Traffic can be routed through all possible paths for multipathing


iSCSI[edit | edit source]

In a nutshell: - target: Source / server containing storage LUNs. - initiator: User / the ESXi server requring storage.

Devices have:

- iSCSI qualifed name (IQN), up to 255 characters long, containing the prefix, date code, organizational naming string, optional colon followed by a custom name
- Extended Unique Identifier (EUI), 16 character name with 24 bits for a unique ID such as serial number

Initiator can use either hardware accelerated initators (independent hardware iSCSI), dependent hardware iSCSI, or purely software iSCSI.

In ESXi, do not mix the types of initiators.


Both iSCSI or FC supports multipathing. NICs should be connected to separate virtual switches. Multipathing can support storage processor configurations: - active-active or - active-passive

It also supports load balancing/failover mechanisms: - round robin - most recently used (MRU) - fixed / static paths


vCenter[edit | edit source]

vCenter Server allows central management over multiple ESXi hosts and their virtual machines. The vCenter server package can be installed on Windows/Linux (up to 6.7), but is now a virtual appliance (VCSA, vCenter Software Appliance). vCenter can orchestrate features such as Distributed Resource Scheduler (DRS), HA, Fault Tolerance, vMotion, Storage vMotion.

There are two groups of services:

  1. vCenter Server services
    • vSphere Client, vSphere Server, the vSphere Web Client, Update Manager, auto deploy, esxi dump collector
    • cannot have multiple servers but vCenter can be set up in a HA arrangement (active-passive, witness) mode if using the appliance version
  2. Platform Services Controller (PSC)
    • New service group starting from vSphere 6 that handles infrastructure security, certificate management, server reservation
    • vCenter single sign-on, vmware directory services (VMware's implementation of LDAP), license server, lookup service, vmware CA, certificate store
    • Can be load balanced with multiple controller servers from different regional locations


Limits:

  • 25k powered on vms, 65k vm inventory, 2000 ESXi hosts
  • 64 hosts per cluster, 8000 VMs per cluster

The embedded database bundled with Windows vCenter shoudln't be used with more than 20 ESXi hosts or 200 VMs.

The appliance version includes both the vCenter Server and Platform Services Controller services and runs on VMware Photon OS with PostgreSQL.


Virtual Provisioning X Agent (VPXA) gets installed on the ESXi host when it is added to the vCenter inventory. The agent is the intermediary between vCenter and hostd. vCenter controls the ESXi host using this agent with the Virtual Provisioning X Daemon (VPXD) on the vCenter Server. The free version of ESXi does not include the vpxa agent.

vCenter uses a user 'vpxuser' to control the ESXi host. It does not need the root or dcui administrator accounts.


When vCenter shares a single vCenter Single Sign-On server, they can be linked using Enhanced Linked Mode. This allows vMotion across vCenters, management of other vCenters from one location. It's enhanced because it can link vCenters on different platforms (appliance and Windows, for example).

The PSC should be load balanced on external servers if it is used by multiple vCenters to avoid a single point of failure for multiple vCenters.

vCenter exposes an API to automate tasks.

Installation[edit | edit source]

Can be installed on Windows, Linux, or MacOS using their installer package. This requires at least one ESXi server already set up.

Once installed, the appliance will deploy onto the ESXi server and can later be managed using the VMware Appliance Management Interface (VAMI).


Clients[edit | edit source]

  • vSphere (host) Client - Deprecated windows client
  • vSphere Web Cleint (based on Adobe Flex/Flash)
  • HTML5 - has 97% of features. vsphere advanced configuration not there yet. next version should be feature complete

ESXi Shell via SSH Use vSphere Power CLI

Always use vSphere to manage ESXi hosts.


Permissions[edit | edit source]

Permissions defines what user or group has access to what role and on what object. Users and groups can be local, or from an identity source.

Permissions on an object takes the following presidence:

  1. Explicit definitions on objects
  2. Definitions by groups
  3. Definitions by user


Migration[edit | edit source]

  • vMotion across ESXi hosts, across datacenters, encrypted.

vRealize Orchestrator

vSphere Distributed Resource Scheduler (DRS)

  • Across 64 ESXi hosts

Patch management using VMware vSphere Update Manager


Types of Migrations: - Cold: Migrate a powered off machine. Both storage/host can be changed. CPUs can be different families. - Suspended: Suspend a machine and then moving it. CPUs must be in the same family. - vSphere vMotion: Migrate a powered on machine to a different host. storage remains in the same location; shared storage is required CPUs must be in the same family. - vSphere Storage vMotion: Migrate a powered on virtual machine's files to a new datastore, with the VM still running on the same host VM runs on the same ESXi server. Cannot migrate across vCenter server instances (only this type) - Share Nothing vSphere vMotion: Migrate a powered-on virtual machine to a new host and new storage at the same time. ie. both vMotion and storage vMotion at the same time Shared storage is not required as files are being copied.

Both ESXi host should have a minimum of 1gbit connection. Higher throughput connections and multiple nics can be used to improve performance. 1gbit supports vmotion for up to 4 VMs at a time; additional VMs will be queued. 8 VMs can be migrated on a 10gbit connection simultaneously. 128 concurrent vmotion migrations per VMFS/NFS datastore.

All virtual aspects must match between source and destination. - network connectivity; no network connection to a internal virtual switch. group ports and policies must be the smae - no CPU affinity set - no virtual device with a local image mounted - LUNs as a RDM disk must be visible

CPUs can be different, so a baseline can be set. This is called Enhanced vMotion Compatibility (EVC) which restricts CPU features flag to a specific baseline.

Storage vMotion mirror mode syncs storage then 'mirrors' IOs on both src/dst before switching control over. Allows for busy VMs to still be migrated without it timing out. Storage arrays supporting the vmware storage api can move files on the server-side rather than from the ESXi host side.


Distributed Resource Scheduler (DRS)[edit | edit source]

DRS can automate VM migration by offloading VMs from overloaded Hosts. It is a form of load balancing by making sure work are distributed on all hosts in the cluster.

DRS can be set to give recommendation (initial placement or load balancing recommendations), or automatically apply the recommendations.

Initial placement can be done automatically when you create a VM on a cluster. DRS will automatically place the VM when it is powered on to the best host.

Automation levels:

- manual - Initial placement is done manually; migration recommendations are shown.
- Partial - Initial placement is chosen automatically; migration recommendations are shown later when cluster is imbalanced.
- Fully - Initial placement is chosen automatically and migration is done automatically.

Threshold determines how eager DRS wants to migrate VMs. This is done by differentt levels of recommendations, from 1 through 5. Level 1 makes the biggest impact on performance while level 5 makes the least.

By default, the migration threshold will apply recommendations 1, 2, and 3.

affinity / antiaffinity rules Keep VMs together for performance (eg. network kept on local virtual switch, or memory compression/page sharing for similar VMs). Keep VMs apart to maintain service level


Use DRS and HA to complement each other.

  • DRS is a proactive approach - prevents issues from happening before it occurs
  • HA is a reactive approach - restarts VMs when a host goes down.

DRS + HA can defragment resources if they are fragmented across hosts.



Networking[edit | edit source]

  • vSphere Distributed Switches (VDS)
 - virtual switch for an entire data center, up to 2000 hosts 
 - switch itself is saved on vCenter, and is created on each ESXi host as required
 - supports Link Aggregation Control Protocol (LACP) for teaming links
  • vSphere Standard Switches (VSS)
 - configuration only on one ESXi host.

Standard switch has the ability to move layer 2 traffic between virtual machines internally. ~256~ 4096 virtual switches per host, 1016 active ports per host


Connection types:

  1. virtual machine port group, used for virtual machines only
  2. vmkernel - used by the hypervisor; vMotion, iSCSI, NFS, or Fault Tolerance logging (and management on ESXi).
    • port ID starts with 'vmk'. eg: vmk0
  3. uplink ports - connects to real NIC on the physical hardware

VMware recommends isolating management, vmotion, iSCSI, datastore traffic

Switches can have:

  1. No uplinks
  2. One uplink
  3. Multiple uplinks (automatically sets up NIC teaming)

Security Policies[edit | edit source]

Promiscuous mode: Forwards all switch traffic regardless of destination to this port. MAC Address Changes: Allow guest to modify MAC address of the virtual NIC. Forged Transmits: Accept traffic with frames that contain a different MAC address than that of the assigned NIC MAC address.

Traffic Shaping allows you to set bandwidth restrictions by specifying the average bandwidth or peak bandwidth (absolute maximum, in kilobits/sec), with the option of burst bandwidth size (in kilobytes). Outbound is free, but Inbound traffic shaping requires VDS which costs.

Teaming and failover - Configure this by editing the virtual network - turn off "Notify switches" if using Microsfot Network Load Balancing (NLB)


Virtual Machines[edit | edit source]

Can only be powered on if: - swp file can be made (n/a if full memory reservation is set) - reservation can be made

Hot pluggable devices include USB, ethernet, hard disks. Some guets can also support CPU and memory. Requires: - proper ESXi license - Guest OS supports it, including installation of VMware Tools. - the feature is enabled in ESXi.

VMWare Tools should be installed on all guest operating systems. It allows - ESXi to control power operations - Balloon memory driver to reclaim memory - Time synchronization (though, disable other time sync daemons if enabled).

Cloning[edit | edit source]

Alternate to deploying a VM using a template. A cloned VM can also be customized using customization specification. More on this later.

Instant clones can be used to clone a running source machine. The source machine will pause temporarily as the new delta disk is generated for each virtual disk. This new machine can also be customized.


Snapshots[edit | edit source]

vmsd - snapshot descriptor file vmem - snapshot memory vmsn - memory snapshot -$delta.vmdk - disk delta


Snapshots makes use of delta or child disk to keep track of changes. When deleting a snapshot, the snapshot data is consolidated with the parent snapshot so that dependent snapshot's delta disk still functions.

Resource[edit | edit source]

Memory can be overcommitted in ESXi. When memory runs low, the .vswp swap file in each VM will be used.

Memory reclaimation techniques in order are:

  1. Dedup using Transparent Page Sharing (TPS)
  2. Deallocate using the balloon driver
  3. Memory compression
  4. SSD swapping
  5. VM .vswp file


Each VM can have up to 128 virtual CPUs. The VMkernel schedules (ie. time slices) work across all physical CPUs on the system. Hyperthreading when enabled allows the VMkernel to schedule work more effeciently as it can pair busy threads with idle threads on one core. The VMkernel can migrate vCPU from one processor to another to keep load balanced across all CPUs. This check is done every 2-40 or so milliseconds.


Reservations, Limits, and Shares[edit | edit source]

Reservations defines the minimum amount of resources a VM is guaranteed to have when powered on. If the reservation is not obtainable, the VM will not be allowed to start. By default, all VMs are created without reservation.

Limits define the upper bound for a resource that a VM will be allocated.

For example, given 3 VMs (A, B, C) running on a single 2.4GHz core, if A has a reservation of 500MHz, B has a reservation of 1000MHz, and C has a limit of 600MHz. What is the actual CPU cycles given to each VM? A: 2400 MHz total, because 1500MHz is reserved, there is only 900MHz that is free to allocate to each VM. So, A would be given 500+(900/3)MHz, B given 1000+(900/3)MHz, and C is given (900/3)MHz.


A Share defines the relative priority a VM has to a resource. The value given is an arbitrary number which is used to calculate the percentage of a resource a VM has access to. For example, given 3 VMs (A, B, C) with shares 100, 300, 200, the resource allocated to each VM would be 1/6, 3/6, and 2/6. These shares have no effect if there is no resource contention.


Limits, Shares, and Reservations can be applied to a group of VMs using something called Resource Pools. Pools can be nested. Because of the flexibility of this system, it is simple to: - set up a hierarchical organization; changing resource allocation to different departments; meet SLAs. - delegate control to different teams with resource boundaries - with DRS, resources can be assigned appropriately


4 - high, 2 - normal, 1 - low

pCPU * cap / vCPU



vApps[edit | edit source]

Consolidates multiple VMs as a service. Ordering of Startup/Shutdown sequence can be defined.


Templates[edit | edit source]

OVF - Open Virtualization File OVA - Open Virtualization Appliance

An existing VM that is used as a source copy (ie. master copy) for new VMs. A new template can be created from a powered down VM. The .vmx file gets renamed to a .vmtx file.

When creating a new VM using templates, you can specify how the VM is to be set up using something called Customization Specifications.

A Customization Specification defines system configs that will be applied when the VM is first turned on such as network config, license keys, timezones, etc.. Supported operating systems include Windows and Linux with VMTools installed.


Content Library[edit | edit source]

A feature since 6.5 that allows VM templates to be shared across different vSphere. This is similar to the registry in Docker.

Subscription service allows solutions to be shared for money too.


vSphere HA[edit | edit source]

vSphere HA allows for the automatic restart of failed ESXi hosts, failed VMs, or failed applications. By default when enabled, all protected VMs will be restarted on other ESXi hosts in the cluster when their ESXi host fails. The state of the VM is lost.

VM health checks are done using VMware Tools heartbeat. Application health checks are done using "VM Component Protection, VMCP" supportd applications that uses VMware Tools to report application health to vSphere HA. This feature was introduced with ESXi 6.0.

When a Host loses all paths to the underlying storage, it goes into a "All Paths Down" (APD) state. After a particular timeout, it goes into a "Permanentt Device Loss (PDL)" state.

When enabling vSphere HA, an agent called Fault Domain Manager (FDM) gets installed and a heartbeat is given every second to the master Host. FDM directly communicates with hostd to control power states of VMs as well as determine the slave/master Host using an election process.

FDM determines whether a host is down by constantly sending and monitoring heartbeats with other hosts using the management network as well as heartbeat datastores.

Because heartbeat traffic is critical to ESXi hosts when using vSphere HA, all hosts should have at least two network management interfaces to prevent against network isolation. Network interfaces do not need to be accessible externally -- as long as hosts can ping each other over the management network is sufficient.

A Host is isolated when it cannot ping other hosts or the gateway. Hosts starting from version 5 can use 2 or more heartbeat datastores to determine whether a host is truly isolated. Previous versions of ESXi such as version 3 will shut down all VMs.

If using heartbeat datastores, ensure at least one datastore is physically separate from the management network to prevent split brain.

The master is elected with priority given to the Host with the highest number of data store or alternatively the host with the highest host management ID.

The master node will determine which nodes are available and which VMs are protected and need restarting by reading the host list that is stored in a datastore.




admission control - can we guarantee the resource? - vSphere HA maintains capacity on remaining hosts so that restarts will succeed

When controlling Admission Control, we specify:

- Host failures to tolerate: Number of hosts that can go down but still allow VMs to be restarted

We can use either slots, or percentage of cluster resources

- slots policy - converts cpu/memory into 'unit's called slots. Each host can hold a certain number of slots
  - defaults to 32MHz when no reservations are set on any machine
  - use this if all VMs are relatively the same size
  - can be manually defined if VMs are different sizes
  
- percentage of cluster resources defines how much reservation is on each host and determines the maximum cpu/memory to calculate the number of slots per machine
  - use this for variable sized VMs.

vSphere HA will still involve downtime as VMs need to restart.


vSphere Fault Tolerance[edit | edit source]

extension of vSphere HA. Something you configure on a VM that is part of a HA cluster. The state of the VM is executed on two Hosts in a synchroniized manner.

Starting from 4.0, only single CPU supported.

  • synchronize the state, not files
  • disk format must be eager thick zerod

Starting from 6.0, FT VMs multiple CPUs up to 8 are allowed. But there can only be 8 FT vCPUs per ESXi Host, and 4 VMs per host.

  • synchronizes both state and files
  • disk format can be anything

Synchronization is done over a dedicated network connection designated for FT.



vSphere Update Manager[edit | edit source]

hostupdates.vmware.com https://marketplace.vmware.com/vsx/

Update Manager can update:

  • vSphere + ESXi hosts
  • virtual appliances
  • VMware Tools
  • 3rd party software on hosts

It can also scan VMs for software compliance.

Update Manager snapshots VMs before applying an update (remediation) and rolls back if it fails. FT vms and VMs running v3 or earlier will not be snapshotted.

Uudpate manager can automatically update templates by turning it into a VM, updating the software on it, then converting it back into a template.

vCenter Server Appliance delivers vSphere Update Manager as an optional service

Baselines include one or more patches, extensions or upgrades. Baselines can include other baselines. Update Manager includes:

  • Host critical dynamic patch baseline
  • Host noncritical dynamic patch baseline
  • VMware Tools upgrade baseline
  • VMware hardware upgrade baseline
  • Virtual appliance upgrade baseline

Issues[edit | edit source]

Resetting ESXi SSH Account Lockout[edit | edit source]

Failed login attempts via SSH will lock an account out based on the values defined in Security.AccountLockFailures and Security.AccountUnlockTime.

You should see a notice in vSphere similar to: Remote access for ESXi local user account 'root' has been locked for 900 seconds after N login attempts.

Reset this by adjusting these two security parameters values under vSphere -> the ESXi Host -> 'Advanced System Settings'.