Double date at GridKa School 2015

Next week I’ll be in Karlsruhe (Germany) for this year’s GridKa School edition. This event takes place in Karlsruhe Institute of Technology since 2003.

gridka_logoIf two years ago I gave a talk, this time my participation will be double.

Talk: From Mars to Earth through Cloud Computing

Our society has benefited from Space exploration in many ways. Many of the inventions we use nowadays have their origin in or have been improved by Space research. Computer Science is not an exception.

maps

This talk will introduce the application of Cloud Computing done by me in the context of different Mars missions: Mars MetNet (Spain-Russia-Finland), MSL Curiosity (NASA) and ExoMars2016 (ESA). The achieved know-how allowed the optimization of other areas on Planet Earth, such as weather forecast and agricultural wireless sensor networks processing.

Tutorial: HPCCloud 101, HPC on Cloud Computing for newcomers

Never been into Cloud Computing before? Do you think that an extra computing power is crucial for your research? Do you have some neat parallel codes that your institution doesn’t allow you to execute because the cluster is full? Maybe this tutorial is for you!

HPC Cloud 101The tutorial will cover the following topics:

OpenNebulaAs Virtual Clusters deployed by StarCluster have Sun Grid Engine and OpenMPI installed you are more than welcome to bring your own codes and give them a try!

J.L. Vázquez-Poletti

Interoperating grid infrastructures with the GridWay metascheduler

This paper describes the GridWay metascheduler and exposes its latest and future developments, mainly related to interoperability and interoperation. GridWay enables large-scale, reliable, and efficient sharing of computing resources over grid middleware. To favor interoperability, it shows a modular architecture based on drivers, which access middleware services for resource discovery and monitoring, job execution and management, and file transfer. This paper presents two new execution drivers for Basic Execution Service (BES) and Computing Resource Execution and Management (CREAM) services and introduces a remote BES interface for GridWay. This interface allows users to access GridWay’s job metascheduling capabilities, using the BES implementation of GridSAM. Thus, GridWay now provides to end users more possibilities of interoperability and interoperation.

More information in the article:

Ismael Marín Carrión, Eduardo Huedo and Ignacio M. Llorente: Interoperating grid infrastructures with the GridWay metascheduler, Concurrency and Computation: Practice and Experience, Volume 27, Issue 9, June 2015, Pages 2278-2290, ISSN 1532-0634, http://dx.doi.org/10.1002/cpe.2971.

Producing approximate results with small correctness losses for cloud interactive services

Last week the 12th ACM International Conference on Computing Frontiers (CF’15) took place in Ischia (Italy). There our paper entitled “SARP: producing approximate results with small correctness losses for cloud interactive services” was presented. This work is a result of the collaboration with the Institute of Computing Technology from the Chinese Academy of Sciences, which started during my latest research stay there.

SARP: producing approximate results with small correctness losses for cloud interactive servicesDespite the importance of providing fluid responsiveness to user requests for interactive services, such request processing is very resource expensive when dealing with large-scale input data. These often exceed the application owners’ budget when services are deployed on a cloud, in which resources are charged in monetary terms. Providing approximate processing results is a feasible solution for such problem that trades off request correctness (quantified by output quality) for response time reduction. However, existing techniques in this area either use partial input data or skip expensive computations to produce approximate results, thus resulting in large losses in output quality on a tight resource budget.

In this paper, we propose SARP, a Synopsis-based Approximate Request Processing framework to produce approximate results with small correctness losses even using small amount of resources. To achieve this, SARP conducts full computations over the statistical aggregation of the entire input data using two key ideas:

  1. Offline synopsis management that generates and maintains a set of synopses that represent the statistical aggregation of original input data at different approximation levels.
  2. Online synopsis selection that considers both the current resource allocation and the workload status so as to select the synopsis with the maximal length that can be processed within the required response time. We demonstrate the effectiveness of our approach by testing the recommendation services in E-commerce sites using a large, real-world dataset.

Using prediction accuracy as the output quality, the results demonstrate:

  1. SARP achieves significant response time reduction with very small quality losses compared to the exact processing results.
  2. Using the same processing time, SARP demonstrates a considerable reduction in quality loss compared to existing approximation techniques.

J.L. Vázquez-Poletti

Cost-Effective Resource Configurations for Multi-Tenant Database Systems in Public Clouds

The International Journal of Cloud Applications and Computing has just published our paper entitled “Cost-Effective Resource Configurations for Multi-Tenant Database Systems in Public Clouds”. This work is the result of a collaboration with Prof. Patrick Martin‘s research group (Queen’s University, Canada).

International Journal of Cloud Applications and Computing (IJCAC)Cloud computing is a promising paradigm for deploying applications due to its large resource offerings on a pay-as-you-go basis. This paper examines the problem of determining the most cost-effective provisioning of a multi-tenant database system as a service over public clouds. The authors formulate the problem of resource provisioning, and then define a framework to solve it. Their framework uses heuristic based algorithms to select cost-effective configurations. The algorithms can optionally balance resource costs against penalties incurred from the violation of Service Level Agreements (SLAs) or opt for non SLA violating configurations. The specific resource demands on the virtual machines for a workload and SLAs are accounted for by the performance and cost models, which are used to predict performance and expected cost respectively. The work validates our approach experimentally using workloads based on standard TPC database benchmarks in the Amazon EC2 cloud.

 

J.L. Vázquez-Poletti

GWpilot: Enabling multi-level scheduling in distributed infrastructures with GridWay and pilot jobs

Current systems based on pilot jobs are not exploiting all the scheduling advantages that the technique offers, or they lack compatibility or adaptability. To overcome the limitations or drawbacks in existing approaches, this study presents a different general-purpose pilot system, GWpilot. This system provides individual users or institutions with a more easy-to-use, easy-to-install, scalable, extendable, flexible and adjustable framework to efficiently run legacy applications. The framework is based on the GridWay meta-scheduler and incorporates the powerful features of this system, such as standard interfaces, fair-share policies, ranking, migration, accounting and compatibility with diverse infrastructures. GWpilot goes beyond establishing simple network overlays to overcome the waiting times in remote queues or to improve the reliability in task production. It properly tackles the characterisation problem in current infrastructures, allowing users to arbitrarily incorporate customised monitoring of resources and their running applications into the system. This functionality allows the new framework to implement innovative scheduling algorithms that accomplish the computational needs of a wide range of calculations faster and more efficiently. The system can also be easily stacked under other software layers, such as self-schedulers. The advanced techniques included by default in the framework result in significant performance improvements even when very short tasks are scheduled.

More information in the article:

A.J. Rubio-Montero, E. Huedo, F. Castejón, R. Mayo-García, GWpilot: Enabling multi-level scheduling in distributed infrastructures with GridWay and pilot jobs, Future Generation Computer Systems, Volume 45, April 2015, Pages 25-52, ISSN 0167-739X, http://dx.doi.org/10.1016/j.future.2014.10.003.

Distributed scheduling and data sharing in late-binding overlays

Pull-based late-binding overlays are used in some of today’s largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime. This helps overcome the problems of these complex environments: heterogeneity, imprecise status information and relatively high failure rates. In addition, the late job assignment allows dynamic adaptation to changes in grid conditions or user priorities. However, as the scale grows, the central assignment queue may become a bottleneck for the whole system. This article presents a distributed scheduling architecture for late-binding overlays, which addresses this issue by letting execution nodes build a distributed hash table and delegating job matching and assignment to them. This reduces the load on the central server and makes the system much more scalable and robust. Scalability makes fine-grained scheduling possible and enables new functionalities, like the implementation of a distributed data cache on the execution nodes, which helps alleviate the commonly congested grid storage services.

More information in the article:

Delgado Peris, A.; Hernandez, J.M.; Huedo, E., “Distributed scheduling and data sharing in late-binding overlays,” High Performance Computing & Simulation (HPCS), 2014 International Conference on , vol., no., pp.129,136, 21-25 July 2014.
http://dx.doi.org/10.1109/HPCSim.2014.6903678

A framework for building hypercubes using MapReduce

The European Space Agency’s Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalog will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments.

In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigm but without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission.

In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.

More information in the article:

D. Tapiador, W. O’Mullane, A.G.A. Brown, X. Luri, E. Huedo, P. Osuna, A framework for building hypercubes using MapReduce, Computer Physics Communications, Volume 185, Issue 5, May 2014, Pages 1429-1438, ISSN 0010-4655, http://dx.doi.org/10.1016/j.cpc.2014.02.010.

DSA-Research Group at UCM Joins €3.6 Million Consortium Tasked With Enabling Federated Cloud Networking

Challenging research in the flagship European Project on SDN, NFV and Cloud

Madrid, Spain – 5 February 2015 – DSA-Research (UCM) today announced it has joined a consortium of leading research organisations and universities from the U.K., Germany, Spain, Belgium, Israel and Italy focused on developing new innovative techniques to federate cloud network resources and to derive the integrated management cloud layer that enables an efficient and secure deployment of federated cloud applications.

The BEACON project will deliver a homogeneous virtualization layer, on top of heterogeneous underlying physical networks, computing and storage infrastructures, providing enablement for automated federation of applications across different clouds and datacentres. Senior member of DSA-Research and Scientific Coordinator of the project, Eduardo Huedo, said:

“BEACON will provide innovative techniques to federate cloud network resources and an integrated management cloud layer to enable the efficient and secure deployment of multi-cloud applications, which aggregate compute, storage and network resources from distributed infrastructures. This brings up new challenges for the research done at DSA-Research, like virtual networks spanning multiple datacentres, automated high-availability across datacentres, or datacentre location-aware elasticity. The technology developed as a result of this research will be contributed to the OpenNebula cloud management platform.”

DSA-Research (UCM) is joined on the project by Flexiant (U.K.), CETIC (Belgium), OpenNebula Systems (Spain), IBM Israel (Israel), Universita di Messina (Italy) and Lufthansa Systems (Germany).

About BEACON

BEACON is a collaborative research project co-funded under the ICT theme of HORIZON 2020 Research Programme of the European Union.

For more information visit www.beacon-project.eu.

About DSA-Research (UCM)

The DSA (Distributed Systems Architecture) Research Group at Complutense University of Madrid conducts research in advanced distributed computing and virtualization technologies for large-scale infrastructures and resource provisioning platforms.

The group founded the OpenNebula open-source project, widely used technology to build IaaS cloud infrastructures; is a co-founder of the OGF Working Group on Cloud Computing Interface; and participates in the main European projects in cloud computing, such as RESERVOIR, flagship of European research initiatives in virtualized infrastructures and cloud computing, BonFIRE, 4CaaSt, StratusLab, PANACEA and CloudCatalyst. The results of the research have been published in several leading publications on virtualization and cloud computing, and members of the group participate in the Program Committee of the most important workshops and conference in the research field. The group founded the Spanish Initiative in Grid Middleware and the Working Group on SOI and Grids of INES – Spanish Technology Platform on Software and Services; and is involved in NESSI.

For more information visit www.dsa-research.org.

Performance evaluation of a signal extraction algorithm for the Cherenkov Telescope Array’s Real Time Analysis pipeline

The IEEE Xplore Digital Library has made available another of our latest conference papers. This time was at the IEEE International Conference on Cluster Computing 2014, which took place at Madrid past September.

IEEE Cluster 2014

The work was presented in the form of a poster entitled “Performance evaluation of a signal extraction algorithm for the Cherenkov Telescope Array’s Real Time Analysis pipeline” and the paper can be accessed here.

IEEE_CLUSTER_JJR_LOWRES_

In this paper, several versions of a signal extraction algorithm, pertaining to the entry stage of the Cherenkov Telescope Array‘s Real Time Analysis pipeline, were implemented and optimized using SSE2, POSIX threads and CUDA. Results of this proof of concept let us gain an insight into the suitability of each platform, and the performance each one can deliver, to carry out this particular task.

This work constitutes a first step in the “cloudification” of this application and represents the first publication of my PhD student Juan José Rodríguez-Vázquez in this context.

 

J.L. Vázquez-Poletti

A Multi-Capacity Queuing Mechanism in Multi-Dimensional Resource Scheduling

Springer has published a volume of its Lecture Notes in Computer Science series with our paper entitled “A Multi-Capacity Queuing Mechanism in Multi-Dimensional Resource Scheduling”. This contribution was presented at the International Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, held in conjunction with the ACM Symposium on Principles of Distributed Computing, that took place in Paris (France) past July 15th.

The volume can be accessed here and the paper is the result of an ongoing collaboration with the research group led by Prof. Lucio Grandinetti (University of Calabria, Italy).

ARMS-CC2014With the advent of new computing technologies, such as cloud computing and contemporary parallel processing systems, the building blocks of computing systems have become multi-dimensional. Traditional scheduling algorithms based on a single-resource optimization like processor fail to provide near optimal solutions. The efficient use of new computing systems depends on the efficient use of all resource dimensions. Thus, the scheduling algorithms have to fully use all resources. In this paper, we propose a queuing mechanism based on a multi-resource scheduling technique. For that, we model multi-resource scheduling as a multi-capacity bin-packing scheduling algorithm at the queue level to reorder the queue in order to improve the packing and as a result improve scheduling metrics. The experimental results demonstrate performance improvements in terms of waittime and slowdown metrics.

 

J.L. Vázquez-Poletti