User Tools

Site Tools


playground:computer_engineering

Further background to the problem

The process of adding support for the Convey FPGA nodes is complicated by the nature of the system. The Cluster Management Software (OpenNebula) assumes that a HyperVisor is running on the machines which it is monitoring, it communicates with this HyperVisor to send and receive information about the node.

The roll of the HyperVisor is to supervise the Virtual Machines (VMs) which run on its node and act as translator between the OpenNebula and said VMs. However in the case of the Convey FPGAs, there is no HyperVisor implemented, because there will not be any VMs used as they introduce inefficiencies and reduce the throughput of the powerful FPGAs. Therefore part of the task of adding the Convey FPGAs into the OpenNebula management software is to fool OpenNebula into believing a HyperVisor exists on the machines, when in fact it does not.

This is not a trivial task as the HyperVisor itself is a compiled binary, and therefore its code inaccessible. And OpenNebula's code must remain completely standard and not be modified as it is subject to updates and alterations which would interfere with the modifications being done.

The approach taken is to run standard operations like adding, removing and monitoring Virtual Machines on a standard node being monitored by OpenNebula which has a HyperVisor and evesdrop on the communication between the HyperVisor and OpenNebula. From there these messages can be reverse engineered and added to the Convey machines so they can mimic the HyperVisor.

OpenNebula GUI:

21/12/2012

After 2 weeks, the Convey Machines can now be successfully started and stopped as well as report accurate monitoring information to OpenNebula. So far focus has been placed on getting the Conveys to communicate with OpenNebula. Now work must be done to allow OpenNebula to remotely deploy images onto the machines to allow them to boot from different Operating Systems.

30/01/2013

A new method for modifying OpenNedula was discovered to allow for improved integration and compatibility, instead of hi-jacking the Hypervisor of the nodes, a custom Information Monitor (IM) and Virtual Machine Monitor (VMM) were created which allows the machines to be monitored while not interfering with the management infrastructure already in place for Virtual Machines.

The AMDDev node was configured as a Bare Metal Machine and then benchmarked to ensure that the BMM management driver was working correctly, a HPL throughput of 295.6 GFLOPS was achieved in the process.

Attention was then placed of getting a single machine to swap Operating Systems. This has been done by several scripts, initially the Operating System image is mounted and analyzed by a LVM (Logical Volume Manager) and a partition is created based on its size. Then the contents of the image are copied to the partition and the GRUB2 boot is loader modified to automatically select the new Operating System as the default boot device. After the system is rebooted, it is effectively running on a separate, independent Operating System. Currently various issues are being addressed, firstly to maximize the configuration compatibility of the scripts and secondly to resolve portability issues with the Operating System images (i.e. swap space and drivers).

/var/www/wiki/data/pages/playground/computer_engineering.txt · Last modified: 2013/01/30 11:10 by mcawood