Category Archives: Uncategorized

SaaS enabled admission control for MCMC simulation in cloud computing infrastructures

The Computer Physics Communications journal  has just made available online our latest work on SaaS+PaaS architectures for service-driven computing. This is again the result of our collaboration with the Institute of Computing Technology from the Chinese Academy of Sciences and it can be accessed here.

CPC

Markov Chain Monte Carlo (MCMC) methods are widely used in the field of simulation and modelling of materials, producing applications that require a great amount of computational resources. Cloud computing represents a seamless source for these resources in the form of HPC. However, resource over-consumption can be an important drawback, specially if the cloud provision process is not appropriately optimized. In the present contribution we propose a two-level solution that, on one hand, takes advantage of approximate computing for reducing the resource demand and on the other, uses admission control policies for guaranteeing an optimal provision to running applications.

J.L. Vázquez-Poletti

RNA-seq Analysis in Forest Tree Species: Bioinformatic Problems and Solutions

The first results of an ongoing collaboration with the Forest Genetics and Ecophysiology Research Group from the Technical University of Madrid has just been published online by the Tree Genetics & Genomes journal. It can be accessed here.

Tree Genetics & Genomes

Direct sequencing of RNA (RNA-seq) using next-generation sequencing platforms has allowed a growing number of gene expression studies focused on forest trees in the last 5 years. Bioinformatic analyses derived from RNA-seq of forest trees are particularly challenging, because the massive genome length (~20.1 Gbp for loblolly pine) and the absence of annotated reference genomes require specific bioinformatic pipelines to obtain sound biological results. In the present manuscript, we review common bioinformatic challenges that researchers need to consider when analyzing RNA-seq data from forest tree species at the light of the experience acquired from recent studies. Furthermore, we list bioinformatic pipelines and data processing software available to overcome RNA-seq limitations. Finally, we discuss the impact of novel computation solutions, such as the cloud computing paradigm that allows RNA-seq analysis even for small research centers with limited resources.

J.L. Vázquez-Poletti

Predictive Component-level Scheduling for Reducing Tail Latency in Cloud Online Services

IEEE Xplore has published the result of one of our latest collaborations with the Institute of Computing Technology from the Chinese Academy of Sciences. This particular work was presented at the 44th International Conference on Parallel Processing (ICPP 2015) that took place in Beijing (China) on September. The paper can be accessed here.

ICPP2015

Modern latency-critical online services often rely on composing results from a large number of server components. Hence the tail latency (e.g. The 99th percentile of response time), rather than the average, of these components determines the overall service performance. When hosted on a cloud environment, the components of a service typically co-locate with short batch jobs to increase machine utilization, and share and contend resources such as caches and I/O bandwidths with them.

The highly dynamic nature of batch jobs in terms of their workload types and input sizes causes continuously changing performance interference to individual components, hence leading to their latency variability and high tail latency. However, existing techniques either ignore such fine-grained component latency variability when managing service performance, or rely on executing redundant requests to reduce the tail latency, which adversely deteriorate the service performance when load gets heavier.

In this paper, we propose PCS, a predictive and component-level scheduling framework to reduce tail latency for large-scale, parallel online services. It uses an analytical performance model to simultaneously predict the component latency and the overall service performance on different nodes. Based on the predicted performance, the scheduler identifies straggling components and conducts near-optimal component-node allocations to adapt to the changing performance interferences from batch jobs. We demonstrate that, using realistic workloads, the proposed scheduler reduces the component tail latency by an average of 67.05% and the average overall service latency by 64.16% compared with the state-of-the-art techniques on reducing tail latency.

J.L. Vázquez-Poletti

A framework for building hypercubes using MapReduce

The European Space Agency’s Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalog will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments.

In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigm but without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission.

In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.

More information in the article:

D. Tapiador, W. O’Mullane, A.G.A. Brown, X. Luri, E. Huedo, P. Osuna, A framework for building hypercubes using MapReduce, Computer Physics Communications, Volume 185, Issue 5, May 2014, Pages 1429-1438, ISSN 0010-4655, http://dx.doi.org/10.1016/j.cpc.2014.02.010.

OpenNebula in Google Summer of Code 2010!

This year OpenNebula has been selected as a Google Summer of Code (GSoC) mentoring organization. GSoC is a program that offers student developers stipends to write code for various open source projects. During the last six years GSoC has brought together nearly 3,400 students and more than 3,000 mentors  from nearly 100 countries worldwide. For more information about the program take a look to the GSoC FAQ.

GsoC2010

We are very excited about this great opportunity to work with very talented and self-motivated students. During the summer the students will be part of our community, and will have the opportunity to learn the basics of virtualization, cloud computing and OpenNebula.

If you are a student, and would be interested in participating in GSoC with OpenNebula as your mentoring organization, please take a look at our GSoC Ideas page.  This page lists projects that OpenNebula has proposed for GSoC, but it is not a closed list.  If you have an idea for a cool project that uses or extends OpenNebula, please contact one of the OpenNebula  GSoC mentors.  Also, if you are teaching distributed/cloud computing or related courses please share this information with your students.

Once you are ready to submit an application, remember that you must do so before April 9th through the GSoC webapp. So come and join us this summer to improve the OpenNebula Cloud Toolkit!

Ruben S. Montero

OpenNebula 1.4.0 released

The OpenNebula team is happy to announce that we have reached a stable state for the new 1.4 series of the OpenNebula Toolkit. During these months we have been working on new features that we hope will be helpful to manage your infrastructure. Downloads are available as source code as previous version but we also have created binary packages for RedHat/CentOS, Ubuntu, openSUSE and Fedora.

We want to thank the people actively using beta versions that provided us feedback to polish features and get rid of bugs before releasing this stable version.


Highlights of OpenNebula 1.4 are…

  • EC2 Query API interface for building OpenNebula-based clouds
  • OCCI interface for building OpenNebula-based clouds
  • Support for the VMware Hypervisor family
  • Multiple user support and access-right control for Virtual Machines and Virtual Networks
  • Advance contextualization support to integrate VM packs and implement multi-component services
  • Easy integration with your data-center services and procedures with a new hook system
  • Support block devices as VM images.
  • Support for LVM storage
  • Many bug fixes, and scalability and performance improvements in several components of the OpenNebula system
  • A whole new set of documentation pages, guides and examples

Quick Links

The OpenNebula Team!

Journal of Grid Computing: Special Issue on Clouds and Grids

This special issue of the Journal of Grid Computing is dedicated to recent advances in Cloud computing to simplify and optimize the use and operation of existing distributed computing infrastructures in science and engineering. Authors are invited to submit original, unpublished work describing current research and novel ideas in the area of Cloud Computing and its application to Grid and Cluster computing.
Additional information about this special issue can be found at the journal website

Topics of interest include, but are not limited to:

  • Experiences, performance and reliability running scientific applications in Clouds
  • Grid, Cluster and data-intensive computing in Clouds
  • Limitations of Cloud services and technologies for capability and capacity computing
  • Impact of virtualization on the performance of memory, CPU and I/O intensive, and latency sensitive applications, and virtualization support for specialized communication transports
  • Scientific Clouds offering services for the scientific and technical communities
  • Architectures for integration of Cloud technologies and services with Cluster and Grid infrastructures
  • On-demand and utility resource provision models
  • Service and infrastructure scalability and elasticity management for the efficient execution of virtualized Cluster and Grid platforms
  • New paradigms for computing on Cloud
  • Federation, interoperability and portability between Clouds
  • Cloud interfaces, programming models and tools

Paper Submission & Important Dates
The submitted paper must be formatted according to Journal of Grid Computing rules, check the website for more information

Submission: 28 February 2010
Review: 30 April 2010
Revision: 20 June 2010
Final acceptance decision: 10 July 2010
Publication: Sep 2010

Guest Editors
Ignacio M. Llorente
DSA-Research
Universidad Complutense de Madrid
Madrid, 28040, Spain
llorente@dacya.ucm.es

Ruben S. Montero
DSA-Research
Universidad Complutense de Madrid
Madrid, 28040, Spain
rubensm@dacya.ucm.es

Ruben S. Montero

OpenNebula 1.4 Beta 2, Released!

The OpenNebula team is happy to announce the second beta release of OpenNebula 1.4. This Beta 2 is aimed at testers, community members and cloud enthusiasts in order to identify bugs and regressions, so that 1.4 can fully replace OpenNebula 1.2 deployments.

After nearly three months has passed since the feature freeze for OpenNebula 1.4, the OpenNebula team has been working hard on polishing the new features, and solving bugs. While there could be some issues that need to be fixed before the stable release, OpenNebula 1.4 beta 2 shapes up nicely and brings an important number of improvements and innovations in Cloud computing.

Highlights of OpenNebula 1.4 are…

  • EC2 Query API interface for building OpenNebula-based clouds
  • OCCI interface for building OpenNebula-based clouds
  • Support for the VMware Hypervisor family
  • Multiple user support and access-right control for Virtual Machines and Virtual Networks
  • Advance contextualization support to integrate VM packs and implement multi-component services
  • Easy integration with your data-center services and procedures with a new hook system
  • Support block devices as VM images.
  • Many bug fixes, and scalability and performance improvements in several components of the OpenNebula system
  • A whole new set of documentation pages, guides and examples

Quick Links

The OpenNebula Team!

High Performance and Grid Computing in the Cloud

The HPCcloud discussion group has been created in order to address the growing interest in High Performance Computing and Grid Computing in the Cloud. The purpose of this group is to present experiences and scenarios by individuals, organizations and projects to illustrate how Cloud computing can enhance the different types of distributed and high performance computing infrastructures in science and engineering. The group covers the following aspects about innovative potential, benefits and challenges of new Cloud technologies and services in High Performance Computing (HPC) and Grid Computing research and business:

  • Cultural, security, political and legal barriers to implementing Cloud provisioning models in HPC and Grid environments
  • Architectures for integration of Cloud technologies and services with HPC and Grid infrastructures
  • Standardization of interactions between HPC and Grid platforms and Cloud infrastructures
  • Limitations of existing Cloud services and technologies for the capability and capacity computing demands of the HPC and Grid communities in the execution of both tightly-coupled HPC and loosely-coupled HTC applications
  • HPC Clouds offering platforms with HPC devices and configurations, and Scientific Clouds offering specific services for the scientific and technical computing community
  • Impact of virtualization on the performance of memory, CPU and I/O intensive, and latency sensitive applications, and virtualization support for specialized communication transports
  • Service and infrastructure scalability and elasticity management for the efficient execution of virtualized HPC and Grid platforms
  • Challenges of porting HPC applications to the Cloud and new computing paradigms for HPC on Cloud

You are invited to use this group to promote your events related to HPC and/or Grid Computing in the Cloud.

Relevant links

Ignacio M. Llorente

The Infrastructure Quadrant

The Cloud Computing movement is a melting pot of distributed technologies and paradigms that produce new terms at an incredible fast pace. So it is usually difficult for newcomers to figure out how to take advantage of these new technologies or if they fit in their current IT infrastructure at all.

So, let us try to classify the current infrastructure provisioning trends (the IaaS brand of Cloud Computing) using two simple parameters: where do you obtain the resources for your applications (locally or remotely), and how are those resources obtained (physical or virtualized):

  • Own site (Local – Physical): This is the classical provisioning scheme that we’ve been using for years, one service one machine; not much to say here.
  • Grid (Remote – Physical): the resources are obtained from a remote site for an specific service, eg. batch job processing in scientific Grids, or web applications in a typical hosting scenario. In this case you get fixed configurations with a limited control over the remote resources
  • Private Cloud (Local – Virtual): the resources are obtained from your own infrastructure in the form of Virtual Machines. So you can obtained the classical benefits of virtualization (e.g. consolidation, isolation, easy replication of configurations…) but for your infrastructure as a whole and not just for one server.
  • The Cloud (Remote – Virtual): the resources are obtained from an external (cloud) provider. Unlike the Grid and thanks to VMs, you have total control of the resources you are “buying”, you can install what you need. Usually the provider in this case is another company, but it could be a partner, in that case it is called a Community Cloud.

Let’s briefly review three resource provisioning examples in use:

Classical IT Outsourcing (Own site + Grid). This is the well accepted provisioning scheme adopted by many companies. Some of the core services are hosted in the in-house infrastructure and others are moved to an external hosting. Usually research centers use this model to store and analyze big amounts of data, such as those generated by LHC, or to solve grand challenge applications
Cloud Outsourcing (Own site + Cloud). Similar as above but you get VM’s instead of pre-configured environments to support your service workload. In this case, the VMs can be configured to register to the local services(e.g. a clustered web server), so the capacity assigned to the service can grow with its demands.
The Hybrid Cloud (Private Cloud + Cloud). Nowadays the use of Virtual Machines is a common practice, for example to easily get developing and testing environments. This model can be combined with a Cloud if the some of VMs are obtained from a remote provider, typically to satisfy peak demands.

Probably, using this quadrant you can better plan the resource provision strategy for your site or understand what they are trying to sell to you!

Ruben S. Montero