The OpenNebula Engine for Data Center Virtualization and Cloud Solutions

Virtualization has opened up avenues for new resource management techniques within the data center. Probably, the most important characteristic is its ability to dynamically shape a given hardware infrastructure to support different services with varying workloads. Therefore, effectively decoupling the management of the service (for example a web server or a computing cluster) from the management of the infrastructure (e.g. the resources allocated to each service or the interconnection network).

A key component in this scenario is the virtual machine manager. A VM manager is responsible for the efficient management of the virtual infrastructure as a whole, by providing basic functionality for the deployment, control and monitoring of VMs on a distributed pool of resources. Usually, these VM managers also offer high availability capabilities and scheduling policies for VM placement and physical resource selection. Taking advantage of the underlying virtualization technologies and according to a set of predefined policies, the VM manager is able to adapt the physical infrastructure to the services it supports and their current load. This adaptation usually involves the deployment of new VMs or the migration of running VMs to optimize their placement.

The dsa-research group at the Universidad Complutense de Madrid has released under the terms of the Apache License, Version 2.0, the first stable version of the OpenNebula Virtual Infrastructure Engine. OpenNebula enables the dynamic allocation of virtual machines on a pool of physical resources, so extending the benefits of existing virtualization platforms from a single physical resource to a pool of resources, decoupling the server not only from the physical infrastructure but also from the physical location. OpenNebula is a component being enhanced within the context of the RESERVOIR European Project.

The new VM manger differentiates from existing VM managers in its highly modular and open architecture designed to meet the requirements of cluster administrators. OpenNebula 1.0 supports Xen and KVM virtualization platforms to provide several features and capabilities for VM dynamic management, such as centralized management, efficient resource management, powerful API and CLI interfaces for monitoring and controlling VMs and physical resources, fault tolerant design… Two of the outstanding new features are its support for advance reservation leases and on-demand access to remote cloud provider

Support for Advance Reservation Leases

Haizea is an open source lease management architecture that OpenNebula can use as a scheduling backend. Haizea uses leases as a fundamental resource provisioning abstraction, and implements those leases as virtual machines, taking into account the overhead of using virtual machines (e.g., deploying a disk image for a VM) when scheduling leases. Using OpenNebula with Haizea allows resource providers to lease their resources, using potentially complex lease terms, instead of only allowing users to request VMs that must start immediately.

Support to Access on-Demand to Amazon EC2 resources

Recently, virtualization has also brought about a new utility computing model, called cloud computing, for the on-demand provision of virtualized resources as a service. The Amazon Elastic Compute Cloudi s probably the best example of this new paradigm for the elastic capacity providing. Thanks to virtualization, the clouds can be used efficiently to supplement local capacity with outsourced resources. The joint use of these two technologies, VM managers and clouds, will change arguably the structure and economics of current data centers. OpenNebula provides support to access Amazon EC2 resources to supplement local resources with cloud resources to satisfy peak or fluctuating demands.

Scale-out of Computing Clusters with OpenNebula and Amazon EC2

As use case to illustrate the new capabilities provided by OpenNebula, the release includes documentation about the application of this new paradigm (i.e. the combination of VM managers and cloud computing) to a computing cluster, a typical data center service. The use of a new virtualization layer between the computing cluster and the physical infrastructure extends the classical benefits of VMs to the computing cluster, so providing cluster consolidation, cluster partitioning and support for heterogeneous workloads. Moreover, the integration of the cloud in this layer allows the cluster to grow on-demand with additional computational resources to satisfy peak demands.

Ignacio Martín Llorente

Release of OpenNebula 1.0 for Data Center Virtualization & Cloud Solutions

The dsa-research group is pleased to announce that a stable release (v1.0) of the OpenNebula (ONE) Virtual Infrastructure Engine is available for download under the terms of the Apache License, Version 2.0. ONE enables the dynamic allocation of virtual machines on a pool of physical resources, so extending the benefits of existing virtualization platforms from a single physical resource to a pool of resources, decoupling the server not only from the physical infrastructure but also from the physical location.

Main Features

The OpenNebula Virtual Infrastructure Engine differentiates from existing VM managers in its highly modular and open architecture designed to meet the requirements of cluster administrators. The last version supports Xen and KVM virtualization platforms to provide the following features and capabilities:

  • Centralized management, a single access point to manage a pool of VMs and physical resources.
  • Efficient resource management, including support to build any capacity provision policy and for advance reservation of capacity through the Haizea lease manager
  • Powerful API and CLI interfaces for monitoring and controlling VMs and physical resources
  • Easy 3rd party software integration to provide a complete solution for the deployment of flexible and efficient virtual infrastructures
  • Fault tolerant design, state is kept in a SQLite database.
  • Open and flexible architecture to add new infrastructure metrics and parameters or even to support new Hypervisors.
  • Support to access Amazon EC2 resources to supplement local resources with cloud resources to satisfy peak or fluctuating demands.
  • Ease of installation and administration on UNIX clusters
  • Open source software released under Apache license v2.0
  • As engine for the dynamic management of VMs, OpenNebula is being enhanced in the context of the RESERVOIR project (EU grant agreement 215605) to address the requirements of several business use cases.

More details at http://www.opennebula.org/doku.php?id=documentation:rn-rel1.0

Relevant Links

  • Benefits and Features: http://www.opennebula.org/doku.php?id=about
  • Documentation: http://www.opennebula.org/doku.php?id=documentation
  • Release Notes: http://www.opennebula.org/doku.php?id=documentation:rn-rel1.0
  • Download: http://www.opennebula.org/doku.php?id=software
  • Ecosystem: http://www.opennebula.org/doku.php?id=ecosystem

First Technology Preview of the Haizea Lease Manager

I would like to give a warm welcome to Haizea to the virtualization ecosystem. The new technological component is an open-source VM-based lease management architecture, which can be used

  • As a platform for experimenting with scheduling algorithms that depend on VM deployment or on the leasing abstraction.
  • In combination with the OpenNebula virtual infrastructure manager, to manage a Xen or KVM cluster, allowing you to deploy different types of leases that are instantiated as virtual machines (VMs).

Its full integration with OpenNebula will be part of the next Technoloy Preview (TP1.1), due mid-july. Haizea is being developed by Borja Sotomayor, a PhD student at the University of Chicago, who is now visiting our research group partially funded by the European Union’s FP7 Reservoir project (”Resources and Services Virtualization without Barriers”).

Ignacio Martín Llorente

Newly released Globus Toolkit 4.2 includes GridWay 5.4!

Globus Toolkit logo

Recently, Charles Bacon announced, on behalf of the Globus Toolkit development team, the release of Globus Toolkit 4.2, containing an upgrade to the web services specifications used by the toolkit as well as new features in all services.

Starting from Globus Toolkit 4.0.5, GridWay 5.2 was included as a contribution for the GT4.0 final distribution, but contributions are not supported by Globus Toolkit and have very limited documentation. The new Globus Toolkit 4.2 includes a new stable release, GridWay 5.4, as a true Globus component, well documented and fully integrated in Globus installation, building and testing procedures.

This new stable release of GridWay is the result of the previous development release, GridWay 5.3, which was released on December 2007 and has been thoroughly tested and documented since then. In few days, GridWay 5.4 will be also available from GridWay’s web page.

RELEVANT LINKS

HPC, Grid and Cloud Computing in Cetraro

I am attending the INTERNATIONAL ADVANCED RESEARCH WORKSHOP ON HIGH PERFORMANCE COMPUTING AND GRIDS in Cetraro (Italy). This is the 9th edition of the workshop organized by Prof. Lucio Grandinetti. I have to say the venue of the workshop, at Grand Hotel San Michele, is just perfect. The panel of speakers includes representatives of the more relevant Grid and HPC research initiatives and technologies around the world. The abstracts of the presentations are available online at the workshop site.

Cloud Computing for on-Demand Resource Provisioning

This is the title of the talk that I gave in the Workshop. The aim of the presentation was to show the benefits of the separation of resource provisioning from job execution management in different deployment scenarios. Within an organization, the incorporation of a new virtualization layer under existing Cluster and HPC middleware stacks decouples the execution of the computing services from the physical infrastructure. The dynamic execution of working nodes, on virtual resources supported by virtual machine managers such as the OpenNEbula Virtual Infrastructure Engine, provides multiple benefits, such as cluster consolidation, cluster partitioning and heterogeneous workload execution. When the computing platform is part of a Grid Infrastructure, this approach additionally provides generic execution support, allowing Grid sites to dynamically adapt to changing VO demands, so overcoming many of the obstacles for Grid adoption.

The previous scenario can be modified so the computing services are executed on a remote virtual infrastructure. This is the resource provision paradigm implemented by some commercial and scientific infrastructure Cloud Computing solutions, such as Globus VWS or Amazon EC2, which provide remote interfaces for control and monitoring of virtual resources. In this way a computing platform could scale out using resources provided on-demand by a provider, so supplementing local physical computing services to satisfy peak or unusual demands. Cloud interfaces can also provide support for the federation of virtualization infrastructures, so allowing virtual machine managers to access resources from remote resources providers or Cloud systems in order to meet fluctuating demands. The OpenNEbula Virtual Infrastructure Engine is being enhanced to access on-demand resources from EC2 and Globus-based clouds. This scenario is being studied in the context of the RESERVOIR– Resources and Services Virtualization without Barriers — EU-funded initiative

Download the slides

Towards a New Model for the Infrastructure Grid

This is the title of my contribution in the Panel “From Grids to Cloud Services”, chaired by Charlie Catlett, in the Workshop. The aim of the presentation was to introduce the discussion on the future of compute grid infrastructures, from infrastructures for the sharing of basic resource services to infrastructures for the sharing of hardware resources. A widely distributed virtual infrastructure, inspired in the federation of cloud systems as providers of virtualized resources (hardware) as a service, would not require end users to learn new interfaces and port their applications to the expected runtime environment. The sharing of resources would be performed at resource level, so local job managers could scale out to partner or commercial clouds, transparently to end users. This new model provides additional benefits, such as the support to any service, seamless integration with any service middleware stack…; at the cost of the virtualization overhead in the execution of the jobs.

It was very interesting to share this position on cloud computing with other researchers from Grid and HPC fields. So the question is: Are the existing compute Grid Infrastructures going to evolve to Grids of Clouds?. In other words, Which model is better for end users and site administrators?, to share basic infrastructure services or the physical infrastructure?.

Download the slides

Ignacio Martín Llorente

New Technology Preview (TP2) of the OpenNEbula (ONE) Virtual Infrastructure Engine

I am pleased to announce that a new Technology Preview (TP2) of the OpenNEbula (ONE) Virtual Infrastructure Engine is available for download under the terms of the Apache License, Version 2.0. ONE enables the dynamic deployment and re-allocation of virtual machines on a pool of physical resources, so extending the benefits of existing virtualization platforms from a single physical resource to a pool of resources, decoupling the server not only from the physical infrastructure but also from the physical location.

ONE Technology Preview 2 extends the functionality for management of both physical resources and virtual machines, and provides resources for developers, making public the Trac web interface .

RELEVANT LINKS

EGEE III Training the Trainers at CERN

Greetings from CERN, where the World Wide Web and the EGEE Project were born.

Here I am, attending a very special training event: EGEE III Training the Trainers. I’m teaching several national grid initiatives representatives how to use GridWay, so they will be able to do so in further tutorials. Trainers (trainees in this case) were able to understand the basics of GridWay, not only from a theoretic point of view, but also via several examples that involved command line interface and programming using the DRMAA C API, which is an Open Grid Forum standard widely used for application porting.

The feedback was great and many questions were asked, showing the increasing interest in GridWay and Grid Technology in general.

Entrance of CERN Training Centre

Entrance of the CERN Training Centre

José Luis Vázquez-Poletti

EGEE III NA4 All Hands Meeting

During the last two days, Eduardo Huedo and I attended the EGEE III NA4 All Hands Meeting at Orsay, France.

This was a kind of Kick-off Meeting, as different workplans for the NA4 Activity (Application Porting) within the EGEE III Project were defined for the next two years. In particular, our participation is divided in two main tasks:

  • Application Porting Support: To help the newcomers to port their applications onto the Grid. Our vast experience achieved with the GridWay Metascheduler will help us in our commitment.
  • Regional Coordination/Support: To coordinate our Federation’s research groups/infrastructures in their path to application porting by providing them guidelines, advice and acting as an interface between them. After all these years of conferences and meetings, I’m sure we know mostly everybody and consequently we’ll be able to provide them the best support.

So, EGEE III is here and there’s much to do!

EGEE III NA4 All Hands Meeting

Minutes before the beginning of the meeting

J.L. Vázquez-Poletti

BE14: Showing industry what Grid can do

The OGF-23 took place in Barcelona last week, hosted by the Barcelona Supercomputing Center. Simultaneously and at the same location, the BEinGRID Industry Days were held. The main purpose of this event was to review the first wave of business experiments. dsa-research is involved in BE14, one of the experiements of the first wave. We had a demonstration presented by people from the BSC and from the University of Surrey, both partners in this experiment. The demonstration showed a portal hiding the complexity of and offering access to GridAD, born out of the union GridWay and Grid Superscalar using DRMAA.

Even more interesting was the final review of the experiment, were David Linke from Linkat (our business partner) explained the results of the experiment under the point of view of the end user, and also presented the desgined explotation plan. The results were even better than what I was expecting. Simplifying it, the objective was to open new process development in the field of the chemical industry by cutting down the simulation time needed by the software created by the University of Surrey. This software simulates a chemical process in order to find the optimum set up. Because the process relies on independent simulations, it is embarrassing parallelizable. The numbers in David’s presentation showed a mind blogging cut down from 55 hours to only 7 hours for one complete optimization. And, to top it all, this new solution was able to discover a new process to produce acetic acid more efficiently than what is being used in the real industry!. This is really a good indicator of the experiment success, no major discoveries were made in the production of this acid for the last 10 years.

So the experiment is over, and here at dsa-research are quite pleased of the excellent third position it achieved (out of 18).

Building a Virtual Cluster with KVM

Probably you have heard all the wonders that virtualization technology promises to the IT world, but VMs are also valuable tool for academic purposes. Specially VMs can be very effective to provide an experimental playground in parallel and distributed computing courses. I’ve always found difficult to get my students an account in clusters or grid environments and so to include some hand-on sessions (which are critical to really understand some of the concepts). Thanks to VM there are good news… each student gets its own cluster (or grid for that matter)!

In this post I’ll describe a basic setup to build a virtual cluster to play with some distributed computing technologies (e.g. MPI, SGE or Globus Toolkit). The virtualization technology considered here is KVM for several reasons: (i) it is included in the kernel and there is no need to modify the lab configuration; (ii) it is available for the main Linux distributions; (iii) it is easy to use and configure (compared to Xen or VMWare); (iv) it’s open source.

The following picture shows a schema of the virtual cluster. The physical host will act as the cluster front-end, and will run the cluster services like NFS, the SGE master daemons and an apt proxy to install software in the workernodes. Then we have a couple of workernodes in a private network, note that there is no need to connect them to the Internet using NAT (it is also a more realistic setup).

Virtual Cluster

Preparing the system…

First you’ll need to install the hypervisor and some additional packages (the following has been tested in a Ubuntu 8.04 box):

$ sudo apt-get install kvm qemu apt-cacher

This will install KVM, the qemu user-land tools, a simple apt proxy (see /etc/apt-cacher/apt-cacher.conf) and some other utilities like brctl. In Ubuntu 8.04 the KVM kernel modules are loaded at boot time (/etc/init.d/kvm). Also, as you’ll be using the TAP interface for networking, you probably would like to load the tun module at boot time (add tun to the following file).

$ sudo vi /etc/modules

Just a side note, if you are trying this in a laptop unload the kvm modules before hibernating it (add kvm and kvm_intel or kvm_amd to /etc/hibernate/blacklisted-modules)

Setting up the Network
As show in the figure the frontend and workernodes are in a private LAN (10.0.0.0/24). To set up the network you have to:

  1. Create a bridge for the physical machine (veth0) to connect all your workernodes
  2. Set up the frontend ip on the bridge and adjust your routing table
  3. Add the interfaces attached to each of the virtual wokernodes (tap0,tap1) to the bridge

The following script performs the previous steps. You can either specify this script as a network option for kvm or copy it to the default system location (/etc/kvm/kvm-ifup):

#!/bin/sh
#------ Virtual Network Configuration adjust as needed ------
BRNAME="veth0"
BRADDR="10.0.0.1"
BRNET="10.0.0.0"
BRMASK="255.255.255.0"
#----------------------------------------------------------
setup_bridge()
{
  brctl show | grep $BRNAME > /dev/null

  if [ $? -ne 0 ] ; then
    echo “Creating Bridge $BRNAME”

    brctl addbr $BRNAME
    brctl stp $BRNAME off
    ip addr add $BRADDR dev $BRNAME
    ip link set dev veth0 up

    route add -net $BRNET netmask $BRMASK dev $BRNAME
    echo “1″ > /proc/sys/net/ipv4/ip_forward
  fi
}

setup_vif()
{
  ip link set $1 down
  ip link set promisc on dev $1
  ip addr add 0.0.0.0 dev $1

  brctl addif $BRNAME $1

  ip link set dev $1 up
}

echo “Configuring interface $1″
setup_bridge
setup_vif $1

Similarly, to shutdown the interfaces we proceed in the opposite way. The following script removes the workernode’s interface from the bridge and removes the bridge when there are no virtual machines left. Place this script in /etc/qemu-ifdown:

#!/bin/sh
#------ Virtual Network Configuration ------
BRNAME="veth0"
BRADDR="10.0.0.1"
BRNET="10.0.0.0"
BRMASK="255.255.255.0"
#-------------------------------------------
shutdow_bridge()
{
  VIFS=`brctl showmacs $BRNAME | tail -n +2 | wc -l`

  if [ $VIFS -eq 0 ] ; then
    route del -net $BRNET netmask $BRMASK dev $BRNAME
    ip link set $BRNAME down
    brctl delbr $BRNAME
  fi
}

shutdow_vif()
{
  ip link set $1 down
  brctl delif $BRNAME $1
}
echo “Taking interface $1 down”

shutdow_vif $1
shutdow_bridge

Finally if you are using a firewall make sure that packets traversing the bridge are forwarded, for example with:

iptables -A FORWARD  -i veth0 -o veth0 -j ACCEPT

Note that depending on the services you plan to install you may need to adjust the firewall rules for the workernodes.

Installing the Worker nodes

To install the workernodes for our cluster I’ll use a single 5GB disk image with raw format (you can mount it, and in linux with ext3 only the written sectors will reserve space).

qemu-img create -f raw virtual_wn1.dsk 5GB

Now you can continue as in a regular install. So grab your favorite Linux distro (I’ll use debian) and install it in the disk you have just created.

kvm -boot d
      -cdrom debian-40r1-i386-netinst.iso\
      -hda vnode1.dsk\
      -m 128 \
      -net nic,macaddr=52:54:00:12:34:56\
      -net tap,ifname=tap0

When configuring the workernode do not forget:

  1. The network configuration for the workernode (ip:10.0.0.2, netmask:255.255.255.0, network:10.0.0.0)
  2. Static name resolution for the cluster, the /etc/hosts of all the three machines should include:
  3. 10.0.0.1        frontend
    10.0.0.2        vnode1
    10.0.0.3        vnode2
    
  4. Use the frontend to access package repositories (/etc/apt/sources.list)
  5. deb http://frontend:3142/ftp.rediris.es/debian/ etch main contrib non-free
    deb http://frontend:3142/security.debian.org/ etch/updates main contrib non-free
    

To install other worker node copy the image to virtual_wn2.img, boot it (or mount it) and adjust its configuration parameters (hostname and network configuration)

Booting the cluster
Once the workernodes are installed, just boot them to use your cluster. Some hints:

  • Do not use the same MAC for the workernodes. Obvious… but it took me a while to find out why my network was not working ;)
  • Use a vnc connection if you need to access the workernode’s X system or to debug the boot process. If you are going to access the VM’s through ssh replace the -vnc option with -nographic
kvm -boot c -hda vnode1.dsk \
  -m 128\
  -vnc 127.0.0.1:0\
  -net nic,macaddr=52:54:00:12:34:56\
  -net tap,ifname=tap0&

kvm -boot c -hda vnode2.dsk \
  -m 128\
  -vnc 127.0.0.1:1\
  -net nic,macaddr=52:54:00:12:34:57\
  -net tap,ifname=tap1&

This will provide you with a basic environment to test any distributed/cluster technology. Have fun!

Ruben S. Montero