Challenges and Opportunities of Amazon Serverless Lambda Services in Bioinformatics

Our contribution to the the International Workshop on Parallel and Cloud-based Bioinformatics and Biomedicine (ParBio), held in conjunction with the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (ACM-BCB 2019), has just been published by the ACM Digital Library.

This is a result of our ongoing collaboration with the University Magna Graecia of Catanzaro (Italy).

Currently, several factors are moving biomedical research towards a (big) data-centred science. This yields new challenges for computer science solutions when dealing with bioinformatics applications. Among others, efficient storage, preprocessing, integration and analysis of omics and clinical data, result in a bottleneck on the analysis pipeline. This may be faced using cloud technology. This paper discusses the challenges and opportunities of deploying bioinformatics applications using the Amazon Serverless Lambda services. First experiments show that serverless computing is useful for this particular bioinformatics high-throughput application, because it simplifies resource management.

J.L. Vázquez-Poletti

Public Cloud Provisioning for Venus Express VMC Image Processing

Communications on Applied Mathematics and Computation has just published online our latest work on public cloud provisioning for a Space Exploration problem. This is the result of an ongoing collaboration with the Russian Space Research Institute (IKI). It can be accessed here.

In this paper, we consider the implementation of the cloud computing strategy to study data sets associated to the atmospheric exploration of the planet Venus. More concretely, the Venus Monitoring Camera (VMC) onboard Venus Express orbiter provided the largest and the longest so far set of ultraviolet (UV), visible and near-IR images for investigation of the atmospheric circulation. To our best knowledge, this is the first time where the analysis of data from missions to Venus is integrated in the context of the cloud computing. The followed path and protocols can be extended to more general cases of space data analysis, and to the general framework of the big data analysis.

J.L. Vázquez-Poletti

Serverless Computing: From Planet Mars to the Cloud

Computing in Science & Engineering has just published online our latest work on serverless computing for planet Mars applications. It can be accessed here.

Serverless computing is a new way of managing computations in the cloud. We show how it can be put to work for scientific data analysis. For this, we detail our serverless architecture for an application analyzing data from one of the instruments onboard the ESA Mars Express orbiter, and then, we compare it with a traditional server solution.

J.L. Vázquez-Poletti

NGScloud: RNA-seq analysis of non-model species using cloud computing

Bioinformatics has just published online our recent work on cloud computing tools for the Next-Generation Sequencing (NGS) area. This journal paper can be accessed here.

RNA-seq analysis usually requires large computing infrastructures. NGScloud is a bioinformatic system developed to analyze RNA-seq data using the cloud computing services of Amazon that permit the access to ad hoc computing infrastructure scaled according to the complexity of the experiment, so its costs and times can be optimized. The application provides a user-friendly front-end to operate Amazon’s hardware resources, and to control a workflow of RNA-seq analysis oriented to non-model species, incorporating the cluster concept, which allows parallel runs of common RNA-seq analysis programs in several virtual machines for faster analysis.

NGScloud is freely available at A manual detailing installation and how-to-use instructions is available with the distribution.

This work is part of the PhD work from Fernando Mora-Márquez, which I’m currently co-advising.

J.L. Vázquez-Poletti

Modeling and Simulation of the Atmospheric Dust Dynamic: Fractional Calculus and Cloud Computing

The International Journal of Numerical Analysis and Modeling has just made available our latest work which puts together fractional calculus and cloud computing for solving one of the Martian research key challenges. It can be accessed here.

The dust aerosols have an important effect on the solar radiaion in the Martial atmosphere and both surface and atmospheric heating rates, which are also basic drivers of atmospheric dynamics. Aerosols cause an attenuation of the solar radiation traversing the atmosphere and this attenuation is modeled by the Lambert-Beer-Bouguer law, where the aerosol optical thickness plays an important role. Through Angstrom law, the aerosol optical thickness can be approximated and this law allows to model attenuation of the solar radiation traversing the atmosphere by a fractional diffusion equation. The analytical solution is available in the case of one space dimension. When we extend the fractional diffusion equation to the case of two or more space variables, we need large and massive computations to approach numerically the solutions. In this case a suitable strategy is to use the cloud computing to carry out the simulations. We present an introduction to cloud computing applied to the fractional diffusion equation in one dimension.

J.L. Vázquez-Poletti

CloudMix: Generating Diverse and Reducible Workloads for Cloud Systems

Our latest contribution to the 10th IEEE International Conference Cloud Computing (CLOUD 2017) is available online and can be accessed here.

The prosperity of cloud computing offers common infrastructures to a wide range of applications. Understanding these applications’ workload behaviors is the premise of designing, managing, and optimizing cloud systems. Considering the heterogeneity and diversity of cloud workloads, for the sake of fairness, cloud benchmarks must be able to accurately replicate their behaviors in cloud systems, including both the usages of cloud resources and the microarchitectural behaviors beyond the virtualization layer. Furthermore, workloads spanning long durations are usually required to achieve representativeness in evaluation. Hence the more challenging issue is to significantly reduce the evaluation duration while still preserving their workload characteristics.

This paper presents our efforts towards generating cloud workloads of diverse behaviors and reducible durations. Our benchmark tool, CloudMix, employs a repository of reducible workload blocks (RWBs) as the high level abstraction of workload behaviors, including usages of the two most important cloud resources (CPU and memory) and their pairing microarchitectural operations. CloudMix further introduces an efficient methodology to combine RWBs to synthesize and replicate diverse cloud workloads in real-world traces. The effectiveness of CloudMix is demonstrated by generating a variety of reducible workloads according to a Google cluster trace and by applying these workloads in job scheduling optimization on Hadoop YARN. The evaluation results show: (i) when the workload durations are reduced by 100 times, the replication errors of workload behaviors are smaller than 2.08%; (ii) when providing fast evaluations (workload durations are reduced by 10 to 100 times) to recommend the optimal setting in YARN job scheduling, the performance degradation in the recommended setting is just 0.69% compared to that of the actual optimal setting.

J.L. Vázquez-Poletti

Performance study of a signal-extraction algorithm using different parallelisation strategies for the Cherenkov Telescope Array’s real-time-analysis software

Concurrency and Computation: Practice and Experience just published our latest work on parallelisation strategies in the context of the the Cherenkov Telescope Array project. This is a result of an ongoing collaboration with CIEMAT (Spain) and INAF (Italy) and it can be accessed here.

In this work, a signal-extraction algorithm pertaining to the Cherenkov Telescope Array’s real-time-analysis pipeline has been parallelised using SSE, POSIX Threads and CUDA. Because of the observatory’s constraints, the online analysis has to be conducted on site, on hardware located at the telescopes, and compels a search for efficient computing solutions to handle the huge amount of measured data. This work is framed in a series of studies which benchmark several algorithms of the real-time-analysis pipeline on different architectures to gain an insight into the suitability and performance of each platform.

J.L. Vázquez-Poletti

SaaS enabled admission control for MCMC simulation in cloud computing infrastructures

The Computer Physics Communications journal  has just made available online our latest work on SaaS+PaaS architectures for service-driven computing. This is again the result of our collaboration with the Institute of Computing Technology from the Chinese Academy of Sciences and it can be accessed here.


Markov Chain Monte Carlo (MCMC) methods are widely used in the field of simulation and modelling of materials, producing applications that require a great amount of computational resources. Cloud computing represents a seamless source for these resources in the form of HPC. However, resource over-consumption can be an important drawback, specially if the cloud provision process is not appropriately optimized. In the present contribution we propose a two-level solution that, on one hand, takes advantage of approximate computing for reducing the resource demand and on the other, uses admission control policies for guaranteeing an optimal provision to running applications.

J.L. Vázquez-Poletti

RNA-seq Analysis in Forest Tree Species: Bioinformatic Problems and Solutions

The first results of an ongoing collaboration with the Forest Genetics and Ecophysiology Research Group from the Technical University of Madrid has just been published online by the Tree Genetics & Genomes journal. It can be accessed here.

Tree Genetics & Genomes

Direct sequencing of RNA (RNA-seq) using next-generation sequencing platforms has allowed a growing number of gene expression studies focused on forest trees in the last 5 years. Bioinformatic analyses derived from RNA-seq of forest trees are particularly challenging, because the massive genome length (~20.1 Gbp for loblolly pine) and the absence of annotated reference genomes require specific bioinformatic pipelines to obtain sound biological results. In the present manuscript, we review common bioinformatic challenges that researchers need to consider when analyzing RNA-seq data from forest tree species at the light of the experience acquired from recent studies. Furthermore, we list bioinformatic pipelines and data processing software available to overcome RNA-seq limitations. Finally, we discuss the impact of novel computation solutions, such as the cloud computing paradigm that allows RNA-seq analysis even for small research centers with limited resources.

J.L. Vázquez-Poletti

Synopsis-Based Approximate Request Processing for Low Latency and Small Correctness Loss in Cloud Online Services

The International Journal of Parallel Programming has just made available online our latest work on approximate request processing in cloud online services. This is the result of our collaboration with the Institute of Computing Technology from the Chinese Academy of Sciences and it can be accessed here.

SARP: Synopsis-Based Approximate Request Processing for Low Latency and Small Correctness Loss in Cloud Online Services

Despite the importance of providing quick responsiveness to user requests for online services, such request processing is very resource expensive when dealing with large-scale service datasets. These often exceed the service providers’ budget when services are deployed on a cloud, in which resources are charged in monetary terms. Providing approximate processing results in request processing is a feasible solution for such problem that trades off result correctness (e.g. prediction or query accuracy) for response time reduction. However, existing techniques in this area either use parts of datasets or skip expensive computations to produce approximate results, thus resulting in large losses in result correctness on a tight resource budget. In this paper, we propose Synopsis-based Approximate Request Processing (SARP), a SARP framework to produce approximate results with small correctness losses even using small amount of resources. To achieve this, SARP conducts computations over synopses, which aggregate the statistical information of the entire service dataset at different approximation levels, based on two key ideas: (1) offline synopsis management that generates and maintains a set of synopses that represent the aggregation information of the dataset at different approximation levels. (2) Online synopsis selection that considers both the current resource allocation and the workload status so as to select the synopsis with the maximal length that can be processed within the required response time. We demonstrate the effectiveness of our approach by testing the recommendation services in e-commerce sites using a large, real-world dataset. Using prediction accuracy as the result correctness metric, the results demonstrate: (i) SARP achieves significant response time reduction with very small correctness losses compared to the exact processing results; (ii) using the same processing time, SARP demonstrates a considerable reduction in correctness loss compared to existing approximation techniques.

J.L. Vázquez-Poletti