Apache CloudStack is open-source cloud computing software. It is used to deploy a infrastructure as a service (IaaS) platform on virtualization technologies such as KVM, VMware, and Xen. This is similar to OpenStack but is significantly simpler to setup and manage (albeit with less features).

This page contains my notes on setting up and using CloudStack 4.15. I am by no means a CloudStack expert so take my notes here with a huge grain of salt and feel free to make corrections.

Installation[edit | edit source]

This installation is based on CloudStack 4.15 using CentOS 8. The setup described below uses KVM and Open vSwitch. I'm basing the design decisions and approach from the installation guide at http://docs.cloudstack.apache.org/en/latest/quickinstallationguide/qig.html

Overview[edit | edit source]

I will have 1 management node and a few bare metal nodes. All nodes will have the same processor (Intel something) and memory (24GB).

Each node will have the same network configuration based on OpenVSwitch. There will be only 1 ethernet connection per node with various VLANs trunked to each node. The VLANs are:

Network Vlan Network subnet
Management 11, untagged
Storage 3205
Guest 100 - 200 n/a
Public 2

The network configs for the 4 nodes I'll be using are listed below. There is also a NFS server used for primary storage. The reason for the weird IPs is because this was set up on an existing network.

Node Networks
management Management:


baremetal1 Management:


baremetal2 Management:


baremetal3 Management:


netapp1 Storage:

Switch config[edit | edit source]

For completeness, here's the configuration of the HP Procurve switch that the nodes are connected to. The switch should have all the guest VLANs defined and tagged.


# Guest VLANs
vlan 100 name guest100
vlan 101 name guest101
vlan 200 name guest 200
interface 1-8 tagged vlan 100-200

# Public, management, storage VLANs
vlan 2 name public
vlan 11 name management
vlan 3205 name storage
interface 1-8 untagged vlan 11
interface 1-8 tagged vlan 2,3205

Node setup[edit | edit source]

Each node will be set up with the following sub-steps.

CloudStack Repos[edit | edit source]

Install CloudStack repos.

# cat > /etc/yum.repos.d/cloudstack.repo <<EOF

Install base packages[edit | edit source]

Install all other dependencies.

# yum -y install epel-release
# yum -y install bridge-utils net-tools

Install OpenVSwitch from CentOS Extras:

# yum -y install \
http://mirror.centos.org/centos/8/extras/x86_64/os/Packages/centos-release-nfv-openvswitch-1-3.el8.noarch.rpm \

Disable SELinux[edit | edit source]

The system should have SELinux disabled. Use setenforce and edit the selinux config:

# setenforce 0
# vi /etc/selinux/config 
## disable selinux

Disable firewalld[edit | edit source]

# systemctl stop firewalld
# systemctl disable firewalld

Configure Open vSwitch[edit | edit source]

# echo "blacklist bridge" >> /etc/modprobe.d/local-blacklist.conf
# echo "install bridge /bin/false" >> /etc/modprobe.d/local-dontload.conf

# systemctl start openvswitch
# systemctl enable openvswitch

We will be using network-scripts to configure the Open vSwitch bridges later. I removed NetworkManager but retained network-scripts to ensure NetworkManager doesn't interfere with my network setup. The install guide leaves NetworkManager around.

I create a 'shared' bridge that's tied to the network interface called nic0. This was done to make it easier to change the bridge setup during my testing but this could be simplified. Each of the physical networks I later set up in CloudStack are its own individual bridge to make it obvious how VMs get connected to the network.

# ovs-vsctl add-br   nic0
# ovs-vsctl add-port nic0 enp4s0f0 tag=11 vlan_mode=native-untagged
# ovs-vsctl set port nic0 trunks=2,11,40-49,3205

# ovs-vsctl add-br management0 nic0 11
# ovs-vsctl add-br cloudbr0 nic0 2
# ovs-vsctl add-br cloudbr1 nic0 100
# ovs-vsctl add-br storage0 nic0 3205

The node's management IP address needs to be removed from the primary network interface and then assigned on the management0 interface. If you're doing this to a node remotely, this might interrupt your connection.

# ip addr del dev enp4s0f0
# ip addr add dev management0
# ip route add default via
# ip addr add dev storage0

# ip link set management0 up
# ip link set storage0 up

Network configuration[edit | edit source]

Once the Open vSwitch bridges are set up, configure the interfaces as follows:

Network Interface Role Configuration
enp4s0f0 primary NIC in the host up on boot; no IP
nic0 network OVS switch that connects to the other bridges to the NIC up on boot; no IP
cloudbr0 public traffic. up on boot; no IP
cloudbr1 guest traffic up on boot; no IP
management0 management traffic up on boot; assigned with management IP
storage0 storage traffic up on boot; assigned with storage network IP
cloud0 link local traffic up on boot; assigned

Network configs are applied using network-scripts. The idea here is to have the network interfaces be configured when the system boots automatically. For interfaces that require a static IP address, I used the following network-scripts file. Adjust the device name and IP address as required.

# cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-cloudbr0

For devices that don't require a static IP:

cat <<EOF > ifcfg-cloudbr0

Once configured, verify that your node comes up with the proper network settings on a reboot.

Management node setup[edit | edit source]

On the management node, set up the network configs and the CloudStack management packages.

Setup Storage[edit | edit source]

If you intend to use the management server as the primary and secondary storage, you will need to set up a NFS server. If you intend to use an external NFS server as the primary storage, you can skip this step.

# mkdir -p /export/primary /export/secondary
# yum -y install nfs-utils
# cat > /etc/exports <<EOF
/export/secondary *(rw,async,no_root_squash,no_subtree_check)
/export/primary *(rw,async,no_root_squash,no_subtree_check)
# systemctl start nfs-server
# systemctl enable nfs-server

CloudStack management services[edit | edit source]

Install MySQL. MariaDB isn't supported and the installation fails with it.

# rpm -ivh http://repo.mysql.com/mysql80-community-release-el8.rpm
# yum -y install mysql-server
# yum -y install mysql-connector-python

## edit /etc/my.cnf to have the following lines.
cat >> /etc/my.cnf <<EOF
binlog-format = 'ROW'

# systemctl enable mysqld
# systemctl start mysqld

Setup CloudStack.

# yum -y install cloudstack-management

# cloudstack-setup-databases cloud:password@localhost --deploy-as=root
# cloudstack-setup-management
# systemctl start cloudstack-management
# systemctl enable cloudstack-management

After starting cloudstack-management for the firs time, it might take from 2-10 minutes for the database to set up completely. During this time, the web interface won't be responsive. In the mean time, you will need to seed the system VM images to the secondary storage. If you are using an external NFS server for your secondary storage, adjust the mount point in the following command accordingly.

## Seed the systemvm into secondary storage
# /usr/share/cloudstack-common/scripts/storage/secondary/cloud-install-sys-tmplt -m /export/secondary -u https://download.cloudstack.org/systemvm/4.15/systemvmtemplate-4.15.1-kvm.qcow2.bz2 -h kvm -F

We will continue the setup process via the web interface after setting up a bare metal node.

Bare metal node setup[edit | edit source]

You should set up at least one bare metal node which will be used to set up your first zone and pod.

On a bare metal node, set up everything outlined in the Node setup section above. The node should have the CloudStack repos, Open vSwitch, SElinux/firewalld, and the networking configured. The agent node must have virtualization enabled on the CPU and KVM should be installed. You should be able to find /dev/kvm on the system.

CloudStack Agent[edit | edit source]

To set up the node, install the cloudstack-agent package.

# yum -y install cloudstack-agent

Configure qemu and libvirtd.

## edit /etc/libvirt/qemu.conf 

## edit /etc/libvirt/libvirtd.conf
listen_tls = 0
listen_tcp = 1
tcp_port = "16509"
auth_tcp = "none"
mdns_adv = 0

The CloudStack install guide instructs you to edit the libvirtd arguments to --listen, but this will prevent libvirtd from starting using systemd. Instead, you should skip this step entirely because the CloudStack agent will configure this for you when you add the node to a zone.

## The install guide suggests editing /etc/sysconfig/libvirtd to use the listen flag.
## However, this only works if you're not using systemd or using the libvirtd-tcp socket.
## I skipped this step since the agent will configure this later on.

Start the CloudStack agent. The CloudStack agent should also automatically bring up libvirtd (it's a service dependency).

Allow sudo access[edit | edit source]

Ensure that /etc/sudoers does not require TTY. In the older documentation, CloudStack requires that the 'cloud' user be able to sudo with the addition of Defaults:cloud !requiretty. However, looking at the installation on the CentOS 8 box, the agent actually runs as root, so perhaps root needs to be able to sudo?

Setting up your first zone[edit | edit source]

At this point in the process, you should have at least one bare metal host and your management node should be up and running and it should be serving the CloudStack web UI at http://cloudstack:8080/client. Login using the default admin / password credentials.

You will be greeted with a setup wizard. I have had no luck with this and it's better to ignore it. Instead, navigate to Infrastructure -> zones and manually set up your first zone.

Description Screenshot
There are 3 types of zones that you can create:
  1. Basic zone - All guest VMs are placed on a single shared flat network. There is no isolation or security policies in place to prevent guest VMs from seeing each other.
  2. Advanced zone - Guest VMs can be placed in one or more VLAN based networks. Guest networks can either be isolated or L2. Isolated networks (depending on the chosen network offering) comes with a virtual router (VR) which offers NAT/SNAT and firewall services and uses one or more public IP addresses. L2 networks are similar but doesn't have a virtual router but instead requires these services to be offered externally. Tenants can also create something called a virtual private cloud (VPC). A VPC is like a regular isolated guest network but with additional features. A VPC allows the user to:
    1. Create multiple subnets (called tiers) which can route with each other
    2. Network traffic between tiers can be controlled through Network ACLs
    3. One or more public IPs can be associated to a VPC.
    4. Like an isolated guest network, all subnets can be NATed out through a single public IP
    5. You can create a private gateway (and therefore static routes) within a VPC
    6. You can create a VPN connection to a VPC
  3. Advanced zone with security groups - Guest VMs are placed on a shared network that is publicly routable. There is no concept of a 'public' network because the guest network should also be public. As a result, there is no ability to create any other kind of guest networks or VPCs. The only benefit here is the ability to define security groups per-VM (which is implemented via IPTables on the bare metal host). Because enabling security groups in a zone will restrict that zone from being able to create isolated guest networks or VPCs, the security group feature only appears useful in an environment where guests only need to connect to the internet.

Be aware of each type's limitations before continuing.

We will be creating an advanced network zone.

CloudStack - New Zone 1.png
We will add the DNS resolvers for the zone and specify the hypervisor type (KVM).

Empty the guest CIDR since we're going to allow users to specify their own.

CloudStack - New Zone 2.png
When using the advanced zone, you need to specify the physical networks for the management, storage, and public networks.

These should correspond to the physical network devices on the hypervisor. Recall that in the previous step where we set up the Open vSwitch bridges, we created the following bridges for each role:

  • management - management0
  • storage - storage0
  • public - cloudbr0
  • guest - cloudbr1
Specify the public network. The addresses defined here populates the 'Public IP' pool.

All isolated guest networks and all VPCs will use one of the addresses defined in this pool for the SNAT/NAT. The addresses specified here should therefore be accessible from the internet.

CloudStack - New Zone 4.png
Create a new pod.

The pod network here should cover your management network subnet. The reserved IP addresses here will be used by system VMs that require access to the management network.

CloudStack - New Zone 5.png
Specify the guest network VLAN range.

Because we're using VLAN as an isolation method, this range specifies what VLANs the guest networks will use over the guest physical network.

CloudStack - New Zone 6.png
Specify the storage network.

The reserved start/end IPs will be used by system VMs that require access to the primary storage.

If you are assigning static IPs on your bare metal hosts, ensure that the reserved addresses don't overlap with the IP range specified here (because I had CloudStack assign a VM with the same IP as a bare metal host)

CloudStack - New Zone 7.png
Specify a cluster name.
CloudStack - New Zone 8.png
Add your first bare metal host.

You must add one host now and can add additional ones later.

CloudStack - New Zone 9.png
Specify your primary storage.

The server should be accessible from the storage network.

CloudStack - New Zone 10.png
Specify your secondary storage.

You need to have at least one NFS secondary storage that has been seeded with the system VM template.

Secondary storage pools should be accessible from the management network (confirm?)

CloudStack - New Zone 11.png
Launch the zone.

This step might take a few minutes. If all goes well, you can then enable the zone shortly after. If you run into any problems, check the logs on the management node at /var/log/cloudstack/management.

CloudStack - New Zone 12.png

Once your zone has been enabled, it should automatically start a Console Proxy VM and secondary storage VM. You can find this under Infrastructure -> System VMs. If for some reason the System VMs are not starting, check that your systemvm template is available in your secondary storage and that the cloud0 bridge on each host is up. You should be able to ping the link local IP address (the 169.254.x.x address) from the hypervisor.

Once the two system VMs are running, verify that you're able to create new guest networks or VPCs. These networks should create a virtual router.

Configuration[edit | edit source]

Service offerings[edit | edit source]

Deployment planner[edit | edit source]

There are a few deployment techniques that can be used. These are set within a compute offering and cannot be changed after it's been created (really? can we change it via API?). The options are:

Deployment planner Description
First fit Placed on the first host that has sufficient capacity
User dispersing Evenly distributes VMs by account across clusters
User concentrated Opposite of the above.
Implicit dedication requires or prefers (depending on planner mode) a dedicated host
Bare metal requires a bare metal host

More information from CloudStack's documentation on Compute and Disk Service Offerings.

Enable SAML2 authentication[edit | edit source]

Enable the SAML2 plugin by setting saml2.enabled=true under Global Settings.

Set up SAML authentication by specifying the following settings:

Setting Description Example value
saml2.default.idpid The URL of the identity provider https://sts.windows.net/c609a0ec-xxx-xxx-xxx-xxxxxxxxxxxx/
saml2.idp.metadata.url The metadata XML URL https://login.microsoftonline.com/609a0ec-xxx-xxx-xxx-xxxxxxxxxxxx/federationmetadata/2007-06/federationmetadata.xml?appid=c5b8df24-xxx-xxx-xxx-xxxxxxxxxxxx
saml2.sp.id The identifier string for this application cloudstack-test.my-organization.tld
saml2.redirect.url The redirect URL using your cloudstack domain. https://cloudstack-test.my-organization.tld/client
saml2.user.attribute The attribute to use.

If you're not sure what's available, look at the management logs after a login attempt.

For Azure AD: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress

Restart the management server. To allow a user access, create the user and enable SSO. The user's username must match the value that's obtained from the saml2.user.attribute field.

Bugs[edit | edit source]

SAML Request being rejected by Azure AD[edit | edit source]

If you are using Azure AD, you may have issues authenticating because the SAML request ID that's generated might begin with a number. When this happens, you will get an error similar to: AADSTS7500529: The value '692rv91k6dgmdas33vr3b2keahr4lqjv' is not a valid SAML ID. The ID must not begin with a number.. For more information, see: https://github.com/apache/cloudstack/issues/5548

Users cannot login via SSO[edit | edit source]

Users that will be using SAML for authentication will need to have their CloudStack accounts created with SSO enabled. There seems to be a bug with the CloudStack web UI where a user's SAML IdPID isn't settable (it gets set to a '0'). A work-around would be to create and authorize users via CloudMonkey.

The steps on adding a new user are:

  1. Create the user: create user firstname=First lastname=User email=user1@ucalgary.ca username=user1@ucalgary.ca account=RCS state=enabled password=asdf
  2. Find the user's ID: list users domainid=<tab> filter=username,id
  3. Authorize the user: authorize samlsso enable=true entityid=https://sts.windows.net/c609a0ec-xxx-xxx-xxx-xxxxxxxxxxxx/ userid=user-id
  4. Verify that the user is enabled for SSO: list samlauthorization filter=userid,idpid,status

When authorizing a user, the entityid must be the URL of the identity provider. The end slash is also mandatory.

Enable SSL[edit | edit source]

A few things to note about enabling SSL:

  • If you added hosts via IP address, enabling SSL would likely break the management-to-client connection. You might need to re-add the host so that the certificates all match up.
  • On CloudStack 4.16, the button to upload a new certificate in the SSL dialog box does not work. This is fixed in 4.16.1.

Upload SSL certificates[edit | edit source]

You can upload SSL certificates to CloudStack under Infrastructure -> Summary and then clicking on the 'SSL Certificates" button. Provide the root certificate authority, the certificate, the private key (in PKCS8 format), and the domain that the certificate applies to. Wildcard domains should be specified as *.example.com.

Alternatively, you may use the CloudMonkey tool to upload certificates using the file parameter passing feature like so:

# cmk upload customcertificate domainsuffix=cloudstack.steamr.com id=1 name=root certificate=@Root.crt
# cmk upload customcertificate domainsuffix=cloudstack.steamr.com id=2 name=intermediate1 certificate=@Intermediate.crt
# cmk upload customcertificate domainsuffix=cloudstack.steamr.com id=3 privatekey=@server.key.pkcs8 certificate=@domain.crt

Enabling SSL[edit | edit source]

After providing at least one certificate to CloudStack that's suitable for your instance, you should be able to enable SSL for your management console and console proxy.

Enable SSL on the management console[edit | edit source]

The management console can be configured by editing /etc/cloudstack/management/server.properties with the following lines:


Because CloudStack doesn't spawn as root, you cannot bind to port 443. Use an IPTables rule to work around this limitation or have your reverse proxy service configured properly.

We specify a pkcs12 keystore file which we can generate from our existing PEM certificates and private key with these commands:

## Combine Files
# cat key.key servercert.crt intermediate.crt root.crt > combined.crt

## Create keystore
# openssl pkcs12 -in combined.crt -export -out combined.pkcs12

## Import keystore
# keytool -importkeystore -srckeystore combined.pkcs12 -srcstoretype PKCS12 -destkeystore /etc/cloudstack/management/combined.pkcs12 -deststoretype pkcs12
Console proxy[edit | edit source]

If you enable SSL on the management console, you'll need to enable SSL for the console proxies. Otherwise, browsers would not load the noVNC resources. You'lll need to provide at least one CloudStack certificate and then enable the consoleproxy.sslEnabled setting under global configuration.

If there are no certificates available, the console proxy service VM won't be able to spawn the service (it throws an exception about being unable to initialize SSL).

Self signed certificates[edit | edit source]

To generate a self-signed certificate, run through the following steps:

## Make your root CA
# openssl genrsa -des3 -out rootCA.key 4096
# openssl req -x509 -new -subj "/C=CA/ST=Alberta/O=University of Calgary/CN=example.com" -nodes -key rootCA.key -sha256 -days 1024 -out rootCA.crt

## Make your certificate
# openssl genrsa -out cloudstack-test.example.com.key 2048
# openssl req -new -key cloudstack-test.example.com.key -out cloudstack-test.example.com.csr -subj "/C=CA/ST=Alberta/O=University of Calgary/CN=cloudstack-test.example.com"
## Check the signing request
# openssl req -in cloudstack-test.example.com.csr -noout -text

## Sign the certificate
# openssl x509 -req -in cloudstack-test.example.com.csr -CA rootCA.crt -CAkey rootCA.key -CAcreateserial -out cloudstack-test.example.com.crt -days 500 -sha256
## Check the certificate
# openssl x509 -in cloudstack-test.example.com.crt -text -noout

## Convert it to pkcs8 format
# openssl pkcs8 -topk8 -in cloudstack-test.example.com.key -out cloudstack-test.example.com.pkcs8.encrypted.key
# openssl pkcs8 -in cloudstack-test.example.com.pkcs8.encrypted.key -out cloudstack-test.example.com.pkcs8.key

Tasks[edit | edit source]

Re-add existing KVM bare metal host[edit | edit source]

Once a host has been added to CloudStack, the CloudStack agent will have generated some public/private keys and configured itself to talk to the management node. If you need to remove and re-add a host, you will need to clean up the agent before re-adding it back to CloudStack again. Based on my experience, I had to do the following:

  1. Before removing the host from CloudStack, drain it of all VMs. virsh list should be empty. If not and you've removed the host from the management server already, manually kill each VM with virsh destroy.
  2. systemctl stop cloudstack-agent
  3. rm -rf /etc/cloudstack/agent/cloud*
  4. unmount any primary storages with umount /mnt/* and clean up with rmdir /mnt/*
  5. systemctl stop libvirtd
  6. rm -rf /var/lib/libvirt/qemu
  7. You may need to edit /etc/sysconfig/libvirtd to not use the listen flag. This might prevent libvirtd (and subsequently cloudstack-agent) from starting.
  8. Edit /etc/cloudstack/agent/agent.properties and remove the keystore passphrase, any UUIDs and GUIDs, cluster/pod/zone, and the host.
  9. Restart with systemctl start cloudstack-agent (libvirt should come up automatically as it's a dependency). Ensure that it comes up OK.

You may then re-add the host back to CloudStack.

Building RPMs[edit | edit source]

To build the RPM packages from scratch, you'll need to install a bunch of dependencies and then run the build script. For more information, see:

  • https://docs.cloudstack.apache.org/en/
# yum groupinstall "Development Tools"
# yum install java-11-openjdk-devel genisoimage mysql mysql-server createrepo
# yum install epel-release

# curl -sL https://rpm.nodesource.com/setup_12.x | sudo bash -
# yum install nodejs

# cat <<EOF > /etc/yum.repos.d/mysql.repo
name=MySQL Community connectors
# yum -y install mysql-connector-python

enable powertools

# yum install jpackage-utils maven

# git clone https://github.com/apache/cloudstack.git
# cd cloudstack
# git checkout 4.15

# cd packaging
# sh package.sh --distribution centos8

Usage server[edit | edit source]

Install cloudstack-usage. Start it and restart the management server. Set enable.usage.server=true in global settings.

Question: Where is the collected data located? Can we visualize it in the UI?

Adding some Linux templates[edit | edit source]

You can add the "Generic Cloud" qcow2 disk images as a system template to CloudStack.

Because these cloud images uses cloud-init, you will need to provide some custom userdata when deploying these images. Userdata will only work when the VM is deployed on a network that offers the "User Data" service offering. If you can't use userdata or if you want the VMs to come up with a specific root password, you can use virt-customize to set the root password on the qcow2 file.

Distro Type URL
Rocky Linux 8.4 CentOS 8 https://download.rockylinux.org/pub/rocky/8.4/images/Rocky-8-GenericCloud-8.4-20210620.0.x86_64.qcow2
CentOS 8.4 CentOS 8 https://cloud.centos.org/centos/8/x86_64/images/CentOS-8-GenericCloud-8.4.2105-20210603.0.x86_64.qcow2
Fedora 34 Fedora Linux (64 bit) https://download.fedoraproject.org/pub/fedora/linux/releases/34/Cloud/x86_64/images/Fedora-Cloud-Base-34-1.2.x86_64.qcow2
Ubuntu Server 21.04 http://cloud-images.ubuntu.com/hirsute/current/hirsute-server-cloudimg-amd64.img

You need to convert img to qcow with qemu-img:

qemu-img create -F qcow2 -b cloudimg-amd64.img -f qcow2 cloudimg-adm64.qcow2 10G

Here's an example of a cloud-init configuration which you would put in the userdata field when deploying a VM:

hostname: vm01
manage_etc_hosts: true
  - name: vmadm
    groups: users, admin
    home: /home/vmadm
    shell: /bin/bash
    lock_passwd: false
ssh_pwauth: true
disable_root: false
  list: |
  expire: false

Increasing the management console's timeout[edit | edit source]

The default timeout is 30 minutes. You may adjust the number of minutes in the session.timeout value stored in /etc/cloudstack/management/server.properties.


Restart the cloudstack-management service to apply.

Upgrade CloudStack[edit | edit source]

Things to watch out for:

  • Don't upgrade packages when they're in use. Stop the services completely before doing a yum/dnf upgrade.
  • System VM template: I upgraded the CloudStack management server to 4.16.1 using a custom compiled RPM package. However, the management server didn't start and inspecting the logs show that it was expecting a system VM template at /usr/share/cloudstack-management/templates/systemvm/systemvmtemplate-4.16.1-kvm.qcow2.bz2. This is easily fixed by downloading the template and restarting the management server. wget http://download.cloudstack.org/systemvm/4.16/systemvmtemplate-4.16.1-kvm.qcow2.bz2 -O /usr/share/cloudstack-management/templates/systemvm/systemvmtemplate-4.16.1-kvm.qcow2.bz2

Traefik[edit | edit source]

Using Traefik for SSL termination[edit | edit source]

With the console proxy served using SSL, we could put a reverse proxy in front of both the management UI and the console proxy service VMs with a valid certificate. This allows us to 'mask' the self-signed certificate with Traefik's ability to request for a proper certificate from Let's Encrypt.

In my test version of CloudStack, I've set up Traefik with the following configs. I updated the console proxy to use a dynamic URL by setting consoleproxy.url.domain to something like *.cloudstack-test.example.com. CloudStack's console proxy service will translate the * by the system VM's IP address (Eg. becomes 10-1-1-1). We'll tell Traefik to reverse proxy these domains for both HTTPS and WSS on ports 443 and 8080 respectively. My dynamic traefik configs to make this happen looks like the following:

      insecureSkipVerify: true

      rule: Host(`cloudstack-test.example.com`)
      service: cloudstack-poc
        - http
        - https-redirect

      rule: Host(`cloudstack-test.example.com`)
      service: cloudstack-poc
        - https
        certresolver: letsencrypt

      rule: Host(`136-159-1-1.cloudstack-test.example.com`)
      service: 136-159-1-100
        - https
        certresolver: letsencrypt

      rule: Host(`136-159-1-1.cloudstack-test.example.com`)
      service: 136-159-1-100-ws
        - httpws
        certresolver: letsencrypt
          - url: ""

          - url: ""
        serversTransport: ignorecert

          - url: ""
        serversTransport: ignorecert

        scheme: https

And the following traefik configs:

    address: ":80"
    address: ":443"
    address: ":8080"

      email: user@example.com
      storage: "/config/acme.json"
        entryPoint: http

Change guest VM CPU flags[edit | edit source]

The default CPU flags that guest VMs sees are set to qemu64 compatible features. This VMs migrate between different CPUs easier as there's less features to support.

If you need to pass the host's CPU flags to your guest VMs, you need to tweak the CloudStack agent configuration file and restart any VMs on the hypervisor. For more information, see: http://docs.cloudstack.apache.org/en/

# echo "guest.cpu.mode=host-passthrough" >> /etc/cloudstack/agent/agent.properties
# systemctl restart cloudstack-agent.service

Using Open vSwitch and DPDK[edit | edit source]

Getting DPDK working with Open vSwitch is relatively straight forward. You need to install the DPDK packages, configure the kernel to use hugepages and IO passthrough, enable the vfio driver on your network interfaces for DPDK support, reconfigure Open vSwitch to use the DPDK device, and enable DPDK on the CloudStack agent.

There are some existing resources that might help.

Install DPDK tools:

# yum -y install dpdk dpdk-tools

Reconfigure your kernel by editing /etc/default/grub. Add the following. Adjust the isolcpus depending on your CPUs available. I assigned 4 cores out of 80 vCPUs. I am also using 16 1GB huge pages. Adjust this according to how much memory your system has (and probably what performance you're seeing)

# vi /etc/default/grub
## default_hugepagesz=1GB hugepagesz=1G hugepages=16 iommu=pt intel_iommu=on isolcpus=1-19,21-39,41-59,61-79 intel_pstate=disable nosoftlockup

# grub2-mkconfig -o /boot/grub2/grub.cfg

You can also configure huge pages by sysctl (optional if you set it in the kernel cmdline)

# echo 'vm.nr_hugepages=16' > /etc/sysctl.d/hugepages.conf
# sysctl -w vm.nr_hugepages=16

Load the vfio-pci kernel module on boot

# echo vfio-pci > /etc/modules-load.d/vfio-pci.conf

Reboot the machine. When it comes back, verify that you have hugepages and vfio-pci loaded, and that IOMMU is working.

# cat /proc/cmdline | grep iommu=pt
# cat /proc/cmdline | grep intel_iommu=on
# dmesg | grep -e DMAR -e IOMMU
# grep HugePages_ /proc/meminfo
# lsmod | grep vfio-pci

Set the network interfaces you wish to use DPDK on to the vfio-pci driver. This is done using the dpdk-devbind.py script that's provided by the DPDK tools package.

# modprobe vfio-pci
# dpdk-devbind.py --bind=vfio-pci ens2f0
# dpdk-devbind.py --bind=vfio-pci ens2f1
## Verify
# dpdk-devbind.py --status

Network devices using DPDK-compatible driver
0000:31:00.0 'Ethernet Controller X710 for 10GBASE-T 15ff' drv=vfio-pci unused=i40e
0000:31:00.1 'Ethernet Controller X710 for 10GBASE-T 15ff' drv=vfio-pci unused=i40e

Enable DPDK on Open vSwitch. pmd-cpu-mask defines what cores are used for data path packet processing. The dpdk-lcore-mask defines cores that non-datapath OVS-DPDK threads such as handler and revalidator threads run. These two masks should not overlap. For more information on these parameters, see: https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-dealing-with-multi-numa.

# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0x00000001  
# ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0x17c0017c                   
# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024"

## Verify
# ovs-vsctl get Open_vSwitch . dpdk_initialized
# ovs-vsctl get Open_vSwitch . dpdk_version

If Open vSwitch is already configured to use these interfaces by name, you will just need to change the interface type to dpdk and set its PCI address.

# ovs-vsctl set interface ens2f0 type=dpdk
# ovs-vsctl set interface ens2f0 options:dpdk-devargs=0000:31:00.0
# ovs-vsctl set interface ens2f1 type=dpdk
# ovs-vsctl set interface ens2f1 options:dpdk-devargs=0000:31:00.1

The bridge that these interfaces are connected to must also have its datapath_type updated:

# ovs-vsctl set bridge nic0 datapath_type=netdev

Restart Open vSwitch for these to apply properly and confirm that it's working

# systemctl restart openvswitch
# ovs-vsctl show
        Port bond0
            Interface ens2f1
                type: dpdk
                options: {dpdk-devargs="0000:31:00.1"}
            Interface ens2f0
                type: dpdk
                options: {dpdk-devargs="0000:31:00.0"}

Update the CloudStack agent so that this host has the DPDK capability. Edit /etc/cloudstack/agent/agent.properties. Note that the keyword is openvswitch.dpdk.enabled (enabled ending with -ed). The example from ShapeBlue's blog post is wrong.


Restart the CloudStack agent for this capability to be visible by the management server. You should be able to call list hosts filter=capabilities,name and have the host list dpdk as a capability. Eg:

(localcloud) 🐱 > list hosts filter=capabilities,name
count = 22

If you don't see this, double check your agent configs and restart it again.

For VMs to take advantage of DPDK, you must either set extraconfig on the virtual machine or create a new compute service offering. Extraconfig might get overwritten whenever the VM is updated, so it's not a reliable solution. Extraconfig is a URL encoded config and you cannot use single quotes in it or else you will break the VM deployment. Eg:

(localcloud) 🐱 > update virtualmachine extraconfig=dpdk-hugepages:%0A%3CmemoryBacking%3E%0A%20%20%20%3Chugepages%3E%0A%20%20%20%20%3C/hugepages%3E%0A%3C/memoryBacking%3E%0A%0Adpdk-numa:%0A%3Ccpu%20mode=%22host-passthrough%22%3E%0A%20%20%20%3Cnuma%3E%0A%20%20%20%20%20%20%20%3Ccell%20id=%220%22%20cpus=%220%22%20memory=%229437184%22%20unit=%22KiB%22%20memAccess=%22shared%22/%3E%0A%20%20%20%3C/numa%3E%0A%3C/cpu%3E%0A%0Adpdk-interface-queue:%0A%3Cdriver%20name=%22vhost%22%20queues=%22128%22/%3E id=af64cc80-a4e4-4c17-9c7d-c34ed234dc6a
virtualmachine = map[account:RCS affinitygroup:[] cpunumber:2 cpuspeed:1000 cpuused:5.88% created:2022-05-03T13:16:02-0600 details:map[Message.ReservedCapacityFreed.Flag:false dpdk-hugepages:a extraconfig-dpdk-hugepages:<memoryBacking>

Troubleshooting[edit | edit source]

2022-05-05T22:35:28.312Z|281704|netdev_dpdk|INFO|vHost Device '/var/run/openvswitch/csdpdk-1' connection has been destroyed
2022-05-05T22:35:28.312Z|281705|netdev_dpdk|INFO|vHost Device '/var/run/openvswitch/csdpdk-1' connection has been destroyed
2022-05-05T22:35:28.313Z|281706|netdev_dpdk|INFO|vHost Device '/var/run/openvswitch/csdpdk-1' connection has been destroyed
2022-05-05T22:35:28.313Z|281707|netdev_dpdk|INFO|vHost Device '/var/run/openvswitch/csdpdk-1' connection has been destroyed
2022-05-05T22:35:28.313Z|281708|netdev_dpdk|INFO|vHost Device '/var/run/openvswitch/csdpdk-1' connection has been destroyed

Check the agent logs for issues from qemu. I had defined an invalid property which prevented the VM from starting.

[root@cs10 agent]# grep qemu agent.log
org.libvirt.LibvirtException: internal error: process exited while connecting to monitor: 2022-05-05T22:35:52.060450Z qemu-kvm: -netdev vhost-user,chardev=charnet0,queues=256,id=hostnet0: you are asking more queues than supported: 128
2022-05-05T22:35:52.060633Z qemu-kvm: -netdev vhost-user,chardev=charnet0,queues=256,id=hostnet0: you are asking more queues than supported: 128
2022-05-05T22:35:52.060817Z qemu-kvm: -netdev vhost-user,chardev=charnet0,queues=256,id=hostnet0: you are asking more queues than supported: 128

Tools[edit | edit source]

CloudMonkey[edit | edit source]

Get started:

When you first run CloudMonkey, you will need to set the CloudStack instance URL and credentials and then run sync.

$ cmk
> set url
> set username admin
> set password password
> sync

The settings are then saved to ~/.cmk/config.

The sync command fetches all the available API calls that your account can use. Once that is done, you can then use tab completion while in the CloudMonkey CLI.

Cheat sheet[edit | edit source]

What Command
Change output format set display table|json
Create compute offering create serviceoffering name=rcs.c2 displaytext=Medium cpunumber=2 cpuspeed=750 memory=2048 storagetype=shared provisioningtype=thin offerha=false limitcpuuse=false isvolatile=false issystem=false deploymentplanner=UserDispersingPlanner cachemode=none customized=false
Add a new host add host clusterid=XX podid=XX zoneid=XX hypervisor=KVM password=**** username=root url=http://bm01

Automate zone deployments[edit | edit source]

There is an example script on how to automate a basic zone deployment at: https://github.com/apache/cloudstack-cloudmonkey/wiki/Usage

Terraform[edit | edit source]

The Terraform CloudStack provide works for the most part. However, for CloudStack 4.16, you'll need to recompile it from scratch because the distributed binaries don't work properly (resulting in deployments hanging indefinitely). To build the Terraform provider, I will use Docker:

# git clone https://github.com/apache/cloudstack-terraform-provider.git
# cd cloudstack-terraform-provide
# git clone https://github.com/tetra12/cloudstack-go.git
# cat <<EOF >> go.mod
replace github.com/apache/cloudstack-go/v2 => ./cloudstack-go
exclude github.com/apache/cloudstack-go/v2 v2.11.0
# docker run --rm -ti -v /home/me/cloudstack-terraform-provider/:/build golang bash 
> cd /build
> go build

Copy the resulting binary to your terraform plugins path. Because I ran terraform init, it placed it in my terraform directory under .terraform/providers/registry.terraform.io/cloudstack/cloudstack/0.4.0/linux_amd64/terraform-provider-cloudstack_v0.4.0. Edit the metadata file in the same directory as the provider executable and remove the file hash so that terraform runs the provider.

See also: Terraform#CloudStack

Packer[edit | edit source]

The Packer CloudStack provider also works for the most part, but is limited in that it cannot enter keyboard inputs. Any OS deployments will require some sort of manual inputs or require that the ISO media you use is completely automated. I also had to compile the provider manually since the default plugin that's fetched by packer doesn't quite work due to API changes.

See also: Packer#CloudStack

Troubleshooting[edit | edit source]

When you run into issues, check the logs in /var/log/cloudstack/. There's typically a stacktrace which gets generated whenever you encounter an error.

Can't create shared network in a advanced zone using Open vSwitch[edit | edit source]

Whenever I try creating a shared network in an advanced zone that is using OVS, the step fails with: "Unable to convert network offering with specified id to network profile".

Stack trace shows that the OVS guest network guru isn't able at designing the network because the zone isn't capable of handling this network offering.

2021-09-28 16:36:26,416 DEBUG [c.c.a.ApiServer] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) CIDRs from which account 'Acct[76a1585d-1bf6-11ec-a3c5-8f3e88f01ab1-admin]' is allowed to perform API calls:,::/0
2021-09-28 16:36:26,439 DEBUG [c.c.u.AccountManagerImpl] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) Access granted to Acct[76a1585d-1bf6-11ec-a3c5-8f3e88f01ab1-admin] to [Network Offering [7-Guest-DefaultSharedNetworkOffering] by AffinityGroupAccessChecker
2021-09-28 16:36:26,517 DEBUG [c.c.n.g.BigSwitchBcfGuestNetworkGuru] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) Refusing to design this network, the physical isolation type is not BCF_SEGMENT
2021-09-28 16:36:26,521 DEBUG [o.a.c.n.c.m.ContrailGuru] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) Refusing to design this network
2021-09-28 16:36:26,524 DEBUG [c.c.n.g.NiciraNvpGuestNetworkGuru] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) Refusing to design this network
2021-09-28 16:36:26,527 DEBUG [o.a.c.n.o.OpendaylightGuestNetworkGuru] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) Refusing to design this network
2021-09-28 16:36:26,530 DEBUG [c.c.n.g.OvsGuestNetworkGuru] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) Refusing to design this network
2021-09-28 16:36:26,536 DEBUG [c.c.n.g.DirectNetworkGuru] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) GRE: VLAN
2021-09-28 16:36:26,536 DEBUG [c.c.n.g.DirectNetworkGuru] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) GRE: VXLAN
2021-09-28 16:36:26,536 INFO  [c.c.n.g.DirectNetworkGuru] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) Refusing to design this network
2021-09-28 16:36:26,539 INFO  [c.c.n.g.DirectNetworkGuru] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) Refusing to design this network
2021-09-28 16:36:26,543 DEBUG [o.a.c.n.g.SspGuestNetworkGuru] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) SSP not configured to be active
2021-09-28 16:36:26,546 DEBUG [c.c.n.g.BrocadeVcsGuestNetworkGuru] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) Refusing to design this network
2021-09-28 16:36:26,549 DEBUG [o.a.c.e.o.NetworkOrchestrator] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) Releasing lock for Acct[76a0531f-1bf6-11ec-a3c5-8f3e88f01ab1-system]
2021-09-28 16:36:26,624 DEBUG [c.c.u.d.T.Transaction] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) Rolling back the transaction: Time = 172 Name =  qtp1816147548-400; called by -TransactionLegacy.rollback:888-TransactionLegacy.removeUpTo:831-TransactionLegacy.close:655-Transaction.execute:38-Transaction.execute:47-NetworkOrches
2021-09-28 16:36:26,667 ERROR [c.c.a.ApiServer] (qtp1816147548-400:ctx-291672d1 ctx-3f19296a) (logid:83c45c2a) unhandled exception executing api command: [Ljava.lang.String;@69a8823d
com.cloud.utils.exception.CloudRuntimeException: Unable to convert network offering with specified id to network profile
        at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.setupNetwork(NetworkOrchestrator.java:739)
        at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator$10.doInTransaction(NetworkOrchestrator.java:2634)
        at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator$10.doInTransaction(NetworkOrchestrator.java:2572)
        at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:50)
        at com.cloud.utils.db.Transaction.execute(Transaction.java:40)
        at com.cloud.utils.db.Transaction.execute(Transaction.java:47)

Possible answer[edit | edit source]

The guest network was set up with GRE isolation. This however isn't supported with KVM as the hypervisor (see this presentation). After re-creating the zone with the guest physical network set up with just VLAN isolation, I was able to create a regular shared guest network that all tenants within the zone can see and use.

To make the shared network SNAT out, I created another shared network offering that also has SourceNat and StaticNat.

$ cmk list serviceofferings issystem=true name='System Offering For Software Router'
$ cmk create networkoffering \
name=SharedNetworkOfferingWithSourceNatService displaytext="Shared Network Offering with Source NAT Service" traffictype=GUEST guestiptype=shared conservemode=true specifyvlan=true specifyipranges=true \
serviceofferingid=307b14d8-afd1-43ea-948c-ffe882cd5926 \
supportedservices=Dhcp,Dns,Firewall,SourceNat,StaticNat,PortForwarding \
serviceProviderList[0].service=Dhcp serviceProviderList[0].provider=VirtualRouter \
serviceProviderList[1].service=Dns serviceProviderList[1].provider=VirtualRouter \
serviceProviderList[2].service=Firewall serviceProviderList[2].provider=VirtualRouter \
serviceProviderList[3].service=SourceNat serviceProviderList[3].provider=VirtualRouter \
serviceProviderList[4].service=StaticNat serviceProviderList[4].provider=VirtualRouter \
serviceProviderList[5].service=PortForwarding serviceProviderList[5].provider=VirtualRouter \
servicecapabilitylist[0].service=SourceNat servicecapabilitylist[0].capabilitytype=SupportedSourceNatTypes servicecapabilitylist[0].capabilityvalue=peraccount

Using this network offering, I was able to create a shared network in the advanced networking zone that has a NAT service which is visible to all accounts. The only issue with this approach is that there isn't a way to create a port forwarding for a specific VM because the account that owns this network is 'system'.

Open-ended questions[edit | edit source]

Compute offerings with 'unlimited' CPU cycles?[edit | edit source]

Compute offerings require a MHz value assigned. Why is this? Can we just assign a VM entire cores?

- If you read the docs, CPU (in MHz) only has an effect if CPU cap is selected. In all other cases, the value here is something akin to 'cpu shares'.

- if you put in a huge number like 9999, deployment would fail though.

How to implement showback?[edit | edit source]

Is there a way to implement showback based on resources consumed by account?

Monitoring resources?[edit | edit source]

Is there a way to monitor resource usage by account, node? Any good way to push VMs into a CMDB like ServiceNow?

NetApp integration?[edit | edit source]

Is it possible to do guest VM snapshots by leveraging NetApp?

Backups?[edit | edit source]

The only backup plugins that are available are 'dummy' which does nothing and 'veeam' which only supports VMware + Veeam. If you're using KVM, there doesn't seem to be any way to easily backup/restore VMs.