User Tools

Site Tools


opennebula_sr-iov_vmm_driver

OpenNebula KVM SR-IOV Driver

Mellanox has recently released a driver which supports SR-IOV on their ConnectX-3 HCA family. This means it is now possible to use Infiniband verbs from inside a VM. To support these Infiniband HCAs, and any other SR-IOV network device, in OpenNebula I developed an SR-IOV VMM driver.

Prerequisites and Limitations

  1. This driver has been developed to support OpenNebula 4.0 and KVM.
  2. SR-IOV capable hardware and software is required. Before using this driver SR-IOV must be functional on the VM host.
  3. Libvirt must run as root. This is required for the hypervisor to attach the SR-IOV virtual function to the VM.
  4. Any OpenNebula functions which rely on the “virsh save” command (stop, suspend and migrate) are not supported by SR-IOV enabled VMs. These functions will work with VMs provisioned without SR-IOV devices. (More info)
  5. Hot attaching of SR-IOV network devices is not supported. You can still hot attach non SR-IOV network devices. (More info)
  6. A virtual bridge is required. The VM will attach interfaces to this bridge but the OS will not use them. They are only required to pass IP address information into the VM. The bridge does not require any external connectivity, it just needs to exist.
  7. VF usage tracking is implemented in the /tmp file system. During a fatal host error the VF usage tracking might become out of sync with actual VF usage. You will have to manually recover. (More info)
  8. A modified context script is required to decode the SR-IOV interface information inside the VM. Examples are given for Infiniband.
  9. IPv6 has not been tested.

Testing Environment

  1. CentOS 6.4
  2. Mellanox OFED 2.0
  3. libvirt 0.10.2
  4. OpenNebula 4.0

Installation

1. Extract kvm-sriov.tar.gz to /var/lib/one/remotes/vmm/kvm-sriov

2. Edit /etc/one/oned.conf and add:

  VM_MAD = [
      name       = "kvm_sriov",
      executable = "one_vmm_exec",
      arguments  = "-t 15 -r 0 kvm-sriov",
      default    = "vmm_exec/vmm_exec_kvm.conf",
      type       = "kvm" ]

3. Use this guide to extract the bus slot and function addresses for the virtual functions: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Host_Configuration_and_Guest_Installation_Guide/sect-Virtualization_Host_Configuration_and_Guest_Installation_Guide-SR_IOV-How_SR_IOV_Libvirt_Works.html

4. In the /var/lib/one/remotes/vmm/kvm-sriov/vf_maps directory create a file with the name of the VFs root device, e.g. “ib0”. In the file write the bus, slot and function for each VF you want the driver to use. The address must be in hexadecimal. Each line in the file represents a VF, the bus, slot and function addresses are separated by a single space character “ “. There must be no additional text, spaces or lines in the table.

Example, four virtual functions:

/var/lib/one/remotes/vmm/kvm-sriov/vf_maps/ib0:

  0x07 0x00 0x01
  0x07 0x00 0x02
  0x07 0x00 0x03
  0x07 0x00 0x04

5. Edit /var/lib/one/remotes/vmm/kvm-sriov/kvmrc and set the “DUMMY_BRIDGE” and “DUMMY_MAC_PREFIX” the “DUMMY_MAC_PREFIX” must be different to the prefix with OpenNebula creates. The prefix will be used by the contextualisation scripts to differentiate SR-IOV and libvirt network interfaces.

6. On VM hosts using this driver edit /etc/libvirt/qemu.conf and change the user and group to “root” then restart libvirtd.

7. On the head node restart OpenNebula.

8. Create a virtual network in OpenNebula. The “Bridge” field must contain “sriov_” prepended to the name of the file that contains the VF mapings, e.g. “sriov_ib0”

9. Update the contextualisation scripts for the VMs with the one provided here.

10. Restart OpenNebula

11. Create hosts to the the “kvm_sriov” driver.

Appendix

Libvirt “save” command

The “save” command saves the state of the VM's memory to the host's disk. This operation fails when a VM contains an SR-IOV device because it is passed through the hypervisor and not defined by it. As a resut the devices memory state cannot be read, causing the save operation to fail.

Hot attaching SR-IOV Network Interfaces

Libvirt does not provide functionality to hot plug SR-IOV devices therefore this function cannot be implemented.

VF Usage Tracking

Libvirt does provide a built-in method for tracking VF usage but it does not support the Mellanox OFED 2 drivers. As a result I had to create my own usage tracking mechanism. If you are using this driver with an SR-IOV card which supports VF assignment from a pool then you can use the instructions found below to update the scripts: http://wiki.libvirt.org/page/Networking#Assignment_with_.3Cinterface_type.3D.27hostdev.27.3E_.28SRIOV_devices_only.29

In the host's /tmp directory a vf_interfaces directory will be created by the driver. Inside the vf_interfaces directory a directory will be created for each SR-IOV root device. Inside the root device tracking folder files will be created to indicate that a VF is in use. For example, if the first, third and fourth VF are in use you see:

/tmp/vf_interfaces/ib0:

  0
  2
  3

Driver

Context Script Modification

Edit the 00-network script (in our case: /srv/one-context.d/00-network). Update the gen_network_configuration() and gen_iface_conf() functions.

gen_iface_conf() {
  cat <<EOT
  DEVICE=$DEV
  BOOTPROTO=none
  ONBOOT=yes
  ####################
  # Update this      #
  ####################
  TYPE=$TYPE
  #-------------------
  NETMASK=$MASK
  IPADDR=$IP
  EOT
    if [ -n "$GATEWAY" ]; then
      echo "GATEWAY=$GATEWAY"
    fi
    echo ""
}
gen_network_configuration()
{
  IFACES=`get_interfaces`
  for i in $IFACES; do
      MAC=`get_mac $i`
      ###################################
      # Update this                     #
      ###################################
      ib_vf=$(echo "$MAC" | cut -c 1-2)
      ib_pos=$(echo "$MAC" | cut -c 5)
      if  [ "$ib_vf" == "AA" ]; then
        DEV="ib"$ib_pos
        UPCASE_DEV=`upcase $DEV`
        TYPE="Infiniband"
      else
        DEV=`get_dev $i`
        UPCASE_DEV=`upcase $DEV`
        TYPE="Ethernet"
      fi
      #-----------------------------------
      IP=$(get_ip)
      NETWORK=$(get_network)
      MASK=$(get_mask)
      GATEWAY=$(get_gateway)
      ####################################
      # Update this                      #
      ####################################
      if  [ "$ib_vf" == "AA" ]; then
        gen_iface_conf > /etc/sysconfig/network-scripts/ifcfg-ib$ib_pos
      else
        gen_iface_conf > /etc/sysconfig/network-scripts/ifcfg-${DEV}
      fi
      #-----------------------------------
  done
}

Take note of:

  1. “$ib_vf” == “AA”. This means that the script is expecting AA and the MAC prefic for SR-IOV devices.
  2. gen_iface_conf > /etc/sysconfig/network-scripts/ifcfg-ib$ib_pos. The ifcfg file is being generated for Infiniand.
/app/dokuwiki/data/pages/opennebula_sr-iov_vmm_driver.txt · Last modified: 2021/12/09 16:42 (external edit)