A framework for building hypercubes using MapReduce

The European Space Agency’s Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalog will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments.

In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigm but without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission.

In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.

More information in the article:

D. Tapiador, W. O’Mullane, A.G.A. Brown, X. Luri, E. Huedo, P. Osuna, A framework for building hypercubes using MapReduce, Computer Physics Communications, Volume 185, Issue 5, May 2014, Pages 1429-1438, ISSN 0010-4655, http://dx.doi.org/10.1016/j.cpc.2014.02.010.

DSA-Research Group at UCM Joins €3.6 Million Consortium Tasked With Enabling Federated Cloud Networking

Challenging research in the flagship European Project on SDN, NFV and Cloud

Madrid, Spain – 5 February 2015 – DSA-Research (UCM) today announced it has joined a consortium of leading research organisations and universities from the U.K., Germany, Spain, Belgium, Israel and Italy focused on developing new innovative techniques to federate cloud network resources and to derive the integrated management cloud layer that enables an efficient and secure deployment of federated cloud applications.

The BEACON project will deliver a homogeneous virtualization layer, on top of heterogeneous underlying physical networks, computing and storage infrastructures, providing enablement for automated federation of applications across different clouds and datacentres. Senior member of DSA-Research and Scientific Coordinator of the project, Eduardo Huedo, said:

“BEACON will provide innovative techniques to federate cloud network resources and an integrated management cloud layer to enable the efficient and secure deployment of multi-cloud applications, which aggregate compute, storage and network resources from distributed infrastructures. This brings up new challenges for the research done at DSA-Research, like virtual networks spanning multiple datacentres, automated high-availability across datacentres, or datacentre location-aware elasticity. The technology developed as a result of this research will be contributed to the OpenNebula cloud management platform.”

DSA-Research (UCM) is joined on the project by Flexiant (U.K.), CETIC (Belgium), OpenNebula Systems (Spain), IBM Israel (Israel), Universita di Messina (Italy) and Lufthansa Systems (Germany).


BEACON is a collaborative research project co-funded under the ICT theme of HORIZON 2020 Research Programme of the European Union.

For more information visit www.beacon-project.eu.

About DSA-Research (UCM)

The DSA (Distributed Systems Architecture) Research Group at Complutense University of Madrid conducts research in advanced distributed computing and virtualization technologies for large-scale infrastructures and resource provisioning platforms.

The group founded the OpenNebula open-source project, widely used technology to build IaaS cloud infrastructures; is a co-founder of the OGF Working Group on Cloud Computing Interface; and participates in the main European projects in cloud computing, such as RESERVOIR, flagship of European research initiatives in virtualized infrastructures and cloud computing, BonFIRE, 4CaaSt, StratusLab, PANACEA and CloudCatalyst. The results of the research have been published in several leading publications on virtualization and cloud computing, and members of the group participate in the Program Committee of the most important workshops and conference in the research field. The group founded the Spanish Initiative in Grid Middleware and the Working Group on SOI and Grids of INES – Spanish Technology Platform on Software and Services; and is involved in NESSI.

For more information visit www.dsa-research.org.

Performance evaluation of a signal extraction algorithm for the Cherenkov Telescope Array’s Real Time Analysis pipeline

The IEEE Xplore Digital Library has made available another of our latest conference papers. This time was at the IEEE International Conference on Cluster Computing 2014, which took place at Madrid past September.

IEEE Cluster 2014

The work was presented in the form of a poster entitled “Performance evaluation of a signal extraction algorithm for the Cherenkov Telescope Array’s Real Time Analysis pipeline” and the paper can be accessed here.


In this paper, several versions of a signal extraction algorithm, pertaining to the entry stage of the Cherenkov Telescope Array‘s Real Time Analysis pipeline, were implemented and optimized using SSE2, POSIX threads and CUDA. Results of this proof of concept let us gain an insight into the suitability of each platform, and the performance each one can deliver, to carry out this particular task.

This work constitutes a first step in the “cloudification” of this application and represents the first publication of my PhD student Juan José Rodríguez-Vázquez in this context.


J.L. Vázquez-Poletti

A Multi-Capacity Queuing Mechanism in Multi-Dimensional Resource Scheduling

Springer has published a volume of its Lecture Notes in Computer Science series with our paper entitled “A Multi-Capacity Queuing Mechanism in Multi-Dimensional Resource Scheduling”. This contribution was presented at the International Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, held in conjunction with the ACM Symposium on Principles of Distributed Computing, that took place in Paris (France) past July 15th.

The volume can be accessed here and the paper is the result of an ongoing collaboration with the research group led by Prof. Lucio Grandinetti (University of Calabria, Italy).

ARMS-CC2014With the advent of new computing technologies, such as cloud computing and contemporary parallel processing systems, the building blocks of computing systems have become multi-dimensional. Traditional scheduling algorithms based on a single-resource optimization like processor fail to provide near optimal solutions. The efficient use of new computing systems depends on the efficient use of all resource dimensions. Thus, the scheduling algorithms have to fully use all resources. In this paper, we propose a queuing mechanism based on a multi-resource scheduling technique. For that, we model multi-resource scheduling as a multi-capacity bin-packing scheduling algorithm at the queue level to reorder the queue in order to improve the packing and as a result improve scheduling metrics. The experimental results demonstrate performance improvements in terms of waittime and slowdown metrics.


J.L. Vázquez-Poletti

A Model to Calculate Amazon EC2 Instance Performance in Frost Prediction Applications

Last week the First HPCLATAM – CLCAR Joint Conference took place in Valparaiso, Chile. There, a joint work with Prof. Carlos García Garino‘s research group (Universidad Nacional de Cuyo, Argentina) was presented. This work, entitled “A Model to Calculate Amazon EC2 Instance Performance in Frost Prediction Applications” has been published by Springer through its Communications in Computer and Information Science series.


Frosts are one of the main causes of economic losses in the Province of Mendoza, Argentina. Although it is a phenomenon that happens every year, frosts can be predicted using Agricultural Monitoring Systems (AMS). AMS provide information to start and stop frosts defense systems and thus reduce economic losses. In recent years, the emergence of infrastructures called Sensor Clouds improved AMS in several aspects such as scalability, reliability, fault tolerance, etc. Sensor Clouds use Wireless Sensor Networks (WSN) to collect data in the field and Cloud Computing to store and process these data. Currently, Cloud providers like Amazon offer different instances to store and process data in a profitable way. Moreover, due to the variety of offered instances arises the need for tools to determine which is the most appropriate instance type, in terms of execution time and economic costs, for running agro-meteorological applications. In this paper we present a model targeted to estimate the execution time and economic cost of Amazon EC2 instances for frosts prediction applications.


J.L. Vázquez-Poletti


Research stay at the Chinese Academy of Sciences

In the past month I had the pleasure and the honor to be hosted again by the Chinese Academy of Sciences, Beijing This was 3 years after the previous invitation.

Chinese Academy of Sciences

During this period I gave talks on cloud computing at the following institutions:

Talk at Tsinghua University

The talk introduced the basics of cloud computing and displayed real use cases of applications pertaining to emergent areas such as Bioinformatics and Space Exploration in which I have been involved in the past years.

Also, there have been some meetings pursuing collaboration opportunities. As a result, some initial joint work was started by our research group, ICMSEC and ICT.

Talk and meeting at ICT-CAS

Summarizing, this period has been very productive. The new opportunities that have arisen are a good example on how cloud computing is a hot technology.

J.L. Vázquez-Poletti

Spot Price prediction for Cloud Computing using Neural Networks

The International Journal of Computing has made available our paper entitled “Spot Price prediction for Cloud Computing using Neural Networks”. This work is the result of a collaboration with the research groups led by Prof. Lucio Grandinetti (University of Calabria, Italy) and Associate Prof. Volodymyr O. Turchenko (Ternopil National Economic Universit, Ukraine).


Advances in service-oriented architectures, virtualization, high-speed networks, and cloud computing has resulted in attractive pay-as-you-go services. Job scheduling on such systems results in commodity bidding for computing time. Amazon institutionalizes this bidding for its Elastic Cloud Computing (EC2) environment. Similar bidding methods exist for other cloud-computing vendors as well as multi–cloud and cluster computing brokers such as SpotCloud. Commodity bidding for computing has resulted in complex spot price models that have ad-hoc strategies to provide demand for excess capacity. In this paper we will discuss vendors who provide spot pricing and bidding and present the predictive models for future short-term and middle-term spot price prediction based on neural networks giving users a high confidence on future prices aiding bidding on commodity computing.


J.L. Vázquez-Poletti

Chapter in the Handbook of Research on Architectural Trends in Service-Driven Computing

At the end of June the Handbook of Research on Architectural Trends in Service-Driven Computing has been released by IGI Global. This publication, divided in 2 volumes, explores, delineates, and discusses recent advances in architectural methodologies and development techniques in service-driven computing. The handbook is an inclusive reference source for organizations, researchers, students, enterprise and integration architects, practitioners, software developers, and software engineering professionals engaged in the research, development, and integration of the next generation of computing.


We participated in the elaboration of this publication with the 28th Chapter, entitled “Admission Control in the Cloud: Algorithms for SLA-Based Service Model”.

Cloud Computing is a paradigm that allows the flexible and on-demand provisioning of computing resources. For this reason, many institutions have moved their systems to the Cloud, and in particular, to public infrastructures. Unfortunately, an increase in the demand for Cloud results in resource shortages affecting both providers and consumers. With this factor in mind, Cloud service providers need Admission Control algorithms in order to make a good business decision on the types of requests to be fulfilled. At the same time, Cloud providers have a desire to maximize the net income derived from provisioning the accepted service requests and minimize the impact of un-provisioned resources. This chapter introduces and compares Admission Control algorithms and proposes a service model that allows the definition of Service Level Agreements (SLAs).

Title: Handbook of Research on Architectural Trends in Service-Driven Computing
Editors: Raja Ramanathan and Kirtana Raja
Pub. date: June 2014
Pages: 759
Volume: 23 of Advances in Parallel Computing
ISBN13: 9781466661783
J.L. Vázquez-Poletti

Regulated Condition-Event Matrices for Cloud Environments

Scalable Computing: Practice and Experience has just published our recent paper entitled “Regulated Condition-Event Matrices for Cloud Environments”. This work is the result of a collaboration with Prof. Patrick Martin (Queen’s University, Canada) and introduces the PhD Thesis core of my student Richard M. Wallace. The paper can be accessed here.


Distributed event-based systems (DEBS) are networks of computing devices. These systems have been successfully implemented by commercial vendors. Cloud applications depend on message passing and inter-connectivity methods exchanging data and performing inter-process communication. Both DEBS and Clouds need time-coordinated methods of control not dependent on a single time domain. While DEBS have specific implementation languages for complex events, Cloud systems do not. Clouds and DEBS have not yet presented an explicit separation of temporally based event processing from computations. Using a regulated, isomorphic, temporal architecture (RITA), a specific language and separation of temporal event processing from processing computation is achieved. RITA provides a functional programming style for developers using familiar language constructs for integration with existing processing code without forcing the developer to work in multiple coding paradigms requiring extensive “glue code” allowing coding paradigms to work together. This paper introduces RITA as a guarded condition-event system that has explicit separation of event processing and computation with constructs allowing integration of time-aware events for multiple time domains found in Cloud or existing distributed computing systems.


J.L. Vázquez-Poletti

Invited talk at HPC2014 (Cetraro, Italy)

From July 7th to 11th Cetraro (Italy) will host again its famous International Advanced Workshop on High Performance Computing. Its main aim is to present and debate advanced topics, open questions, future developments, and challenging applications related to advanced high-performance distributed computing and data systems, encompassing implementations ranging from traditional clusters to warehouse-scale data centers, and with architectures including hybrid, multicore, distributed, and cloud models.

HPC2014 Cetraro

And this year’s motto is “from Clouds and Big Data to Exascale and Beyond”, which is itself a statement of intentions.

For the second time, I’m very honored to attend as invited speaker. This year I’ll give a talk entitled “Clouds for Meteorology, two cases study”.

Clouds for Meteorology, two cases study

Meteorology is among the most promising areas that benefit from cloud computing, due to its intersection with society’s critical aspects. Executing meteorological applications involves HPC and HTC challenges, but also economic ones.  My talk will introduce two cases with different backgrounds and motivations, but always sharing a similar cloud methodology: the first one is about weather forecasting in the context of planet Mars exploration; and the second one deals with data processing from weather sensor networks, in the context of an agriculture improving plan at Argentina.

I’ll of course take the advantage of this travel to meet again with many colleagues from the previous edition of HPC, in order to continue and expand current collaborations, which have been very productive in past 2 years.


J.L. Vázquez-Poletti