Tag Archives: Grid

GridWay 5.10 is now available!

Dear GridWay users,

A new stable release of the GridWay Metascheduler is available for download. It comes with improved drivers for CREAM and BES,the possibility to perform a FHS-compliant installation, changes in the scheduler to enqueue jobs, and other minor features and bugfixes. You can find more information in the release notes and the documentation.

Again, these developments have been funded by the IGE project. Packages for the GridWay core and the different drivers will be included in IGE 2.1.

Try it and send us your feedback!

The GridWay team.

Request for Requirements from EU Globus Users

Dear GridWay users,

The Initiative for Globus in Europe (IGE) is an EU project aimed at meeting the needs of the European Globus Community. As such, we are very interested in hearing about any requirements that you may have regarding Globus technologies, so we can consider them in our future development cycles. We’re here to help you, so if you need something from Globus, this is your chance to let us know!

If you’re aware of others using Globus (perhaps you know they have requirements?), please feel free to forward this message accordingly.

These requirements may originate within your projects or within your wider research communities, and might include (but are not limited to) requests for new features, enhancements of existing features or requests for bug fixes.

If you have any requirements for Globus feel free to register them through our Community Requirements page at http://www.ige-project.eu/hub/rt, or alternatively, email support@ige-project.eu. In either case we’ll get back to you.

We look forward to hearing from you!

Best Regards,

The Initiative for Globus in Europe

GridWay 5.8 released!

A new stable release of the GridWay Metascheduler is available for download.

This release comes with a new installation procedure, improvements in job polling, logging with syslog, forwarding of resource requirements to LRMSs, automatic disposal of jobs, a new execution driver for CREAM (Computing Resource Execution And Management) as used in gLite, a technology preview of an execution driver for OGSA-BES (Open Grid Services Architecture-Basic Execution Service) and, as always, some bug fixing. You can find more information in the release notes.

This stable release is based on a previous development one, 5.7, which has been thoroughly tested. It has been funded by the IGE project (Initiative for Globus in Europe), which aims to support and enhance the provision of Globus Grid software to European e-Infrastructures, like EGI (European Grid Infrastructure). Packages for the GridWay core and its GT5 drivers are included in the IGE Tech Preview.

IGE logo

Enjoy it! Your feedback is more than welcome.

The GridWay team.

HPC, Grid and Cloud Computing in Cetraro

I am attending the INTERNATIONAL ADVANCED RESEARCH WORKSHOP ON HIGH PERFORMANCE COMPUTING AND GRIDS in Cetraro (Italy). This is the 9th edition of the workshop organized by Prof. Lucio Grandinetti. I have to say the venue of the workshop, at Grand Hotel San Michele, is just perfect. The panel of speakers includes representatives of the more relevant Grid and HPC research initiatives and technologies around the world. The abstracts of the presentations are available online at the workshop site.

Cloud Computing for on-Demand Resource Provisioning

This is the title of the talk that I gave in the Workshop. The aim of the presentation was to show the benefits of the separation of resource provisioning from job execution management in different deployment scenarios. Within an organization, the incorporation of a new virtualization layer under existing Cluster and HPC middleware stacks decouples the execution of the computing services from the physical infrastructure. The dynamic execution of working nodes, on virtual resources supported by virtual machine managers such as the OpenNEbula Virtual Infrastructure Engine, provides multiple benefits, such as cluster consolidation, cluster partitioning and heterogeneous workload execution. When the computing platform is part of a Grid Infrastructure, this approach additionally provides generic execution support, allowing Grid sites to dynamically adapt to changing VO demands, so overcoming many of the obstacles for Grid adoption.

The previous scenario can be modified so the computing services are executed on a remote virtual infrastructure. This is the resource provision paradigm implemented by some commercial and scientific infrastructure Cloud Computing solutions, such as Globus VWS or Amazon EC2, which provide remote interfaces for control and monitoring of virtual resources. In this way a computing platform could scale out using resources provided on-demand by a provider, so supplementing local physical computing services to satisfy peak or unusual demands. Cloud interfaces can also provide support for the federation of virtualization infrastructures, so allowing virtual machine managers to access resources from remote resources providers or Cloud systems in order to meet fluctuating demands. The OpenNEbula Virtual Infrastructure Engine is being enhanced to access on-demand resources from EC2 and Globus-based clouds. This scenario is being studied in the context of the RESERVOIR– Resources and Services Virtualization without Barriers — EU-funded initiative

Download the slides

Towards a New Model for the Infrastructure Grid

This is the title of my contribution in the Panel “From Grids to Cloud Services”, chaired by Charlie Catlett, in the Workshop. The aim of the presentation was to introduce the discussion on the future of compute grid infrastructures, from infrastructures for the sharing of basic resource services to infrastructures for the sharing of hardware resources. A widely distributed virtual infrastructure, inspired in the federation of cloud systems as providers of virtualized resources (hardware) as a service, would not require end users to learn new interfaces and port their applications to the expected runtime environment. The sharing of resources would be performed at resource level, so local job managers could scale out to partner or commercial clouds, transparently to end users. This new model provides additional benefits, such as the support to any service, seamless integration with any service middleware stack…; at the cost of the virtualization overhead in the execution of the jobs.

It was very interesting to share this position on cloud computing with other researchers from Grid and HPC fields. So the question is: Are the existing compute Grid Infrastructures going to evolve to Grids of Clouds?. In other words, Which model is better for end users and site administrators?, to share basic infrastructure services or the physical infrastructure?.

Download the slides

Ignacio Martín Llorente

A Virtual Infrastructure Layer for Cluster and Grid Computing

Cluster, and so Grid site, administrators have to deal with the following requirements when configuring and scaling their infrastructure:

  • Heterogeneous configuration demands. Users often require specific versions of different software components (e.g. operating system, libraries or post-processing utilities). The cost of the installation, configuration and maintenance of user-specific or VO-specific worker nodes limits the flexibility of the infrastructure.
  • Performance partitioning. Most of the computing infrastructures do not allow administrators to isolate and partition the performance of the physical resources they devote to different computing clusters or Grid infrastructures. This limits the quality of service and reliability of actual computing platforms, preventing a wide adoption of the Grid paradigm.

In order to overcome these challenges, we propose a new virtualization layer between the service and the physical infrastructure layers, which seamless integrates with existing Grid and cluster middleware stacks. The new virtualization layer extends the benefits of VMMs (Virtual Machine Monitors) from a single physical resource to a cluster of resources, decoupling a server not only from the physical infrastructure but also from the physical location. In the particular case of computing clusters, this new layer supports the dynamic execution of computing services, working nodes, from different computer clusters on a single physical cluster.

layers.png

OpenNebula is the name of a new open-source technology that transforms a physical infrastructure into a virtual infrastructure by dynamically overlaying VMs over physical resources. So computing services, such as working nodes managed by existing LRMs (Local Resource Managers) like SGE, Condor, OpenPBS…, could be executed on top of the virtual infrastructure; so allowing a physical cluster to dynamically execute multiple virtual clusters.

The separation of resource provisioning, managed by OpenNebula, from job execution management, managed by existing LRMs, provides the following benefits:

  • Cluster consolidation because multiple virtual working nodes can run on a single physical resource, reducing the number of physical systems and so space, administration, power and cooling requirements. The allocation of physical resources to virtual nodes could be dynamic, depending on its computing demands, by leveraging the migration functionality provided by existing VMMs
  • Cluster partitioning because the physical resources of a cluster could be used to execute virtual working nodes bound to different virtual clusters
  • Support for heterogeneous workloads with multiple (even conflicting) software requirements, allowing the execution of software with strict requirements as jobs that will only run with a specific version of a library or legacy application execution

Consequently, this approach provides the flexibility required to allow Grid sites to execute on-demand VO-specific working nodes and to isolate and partition the physical resources. Additionally, the architecture offers other benefits to the administrator of the cluster, such as high availability, support for planned maintenance and changing capacity availability, performance partitioning, protection against malicious use of resources…

The idea of a virtual infrastructure which dynamically manages the execution of VMs on physical resources is not new. There exist several VM Management proprietary solutions to simplify the use of virtualization, so providing the enterprise with the potential benefits this technology may offer. Examples of products for the centralized management of the life-cycle of a VM workload on a pool of physical resources are: Platform VM Orchestrator, IBM Virtualization Manager, Novell ZENworks, VMware Virtual Center, and HP VMManager.

The OpenNebula Virtual Infrastructure Engine differentiates from those VM management systems in its highly modular and open architecture to meet the requirements of cluster administrators. The OpenNebula Engine provides a command line interface for monitoring and controlling VMs and physical resources quite similar to that provided by well-known LRMs. Such interface allows its integration with third-party tools, such as LRMs, service adapters, VM image managers…; to provide a complete solution for the deployment of flexible and efficient computing clusters. The service layer decoupling from the infrastructure layer allows an straightforward extension of the previous idea to any kind of service. In this way any physical infrastructure can be transformed into a very effective provisioning platform.

A Technology Preview of OpenNebula is available for download under the terms of the Apache License, version 2.0.

Ignacio Martín Llorente

A Method for Ranking High Throughput Computing Environments

TOP500 lists computers ranked by their performance on the LINPACK Benchmark. It is clear that no single number can reflect the performance of a computer. Linpack is, however, a representative benchmark to evaluate computing platforms as High Performance Computing (HPC) environments, that is in the dedicated execution of a single tightly coupled parallel application. On the other hand, an HTC application comprises the execution of a set of independent tasks, each of which usually performs the same calculation over a subset of parameter values. Although, the HTC model is widely used in Science, Engineering and Business, there is not representative bechmark and model to evaluate the performance of computing platforms as HTC environments. At first sight, it could be agued that there is no need for such a performance model. We agree on this for static and homogeneous systems. However, how can we evaluate a system consisting of heterogeneous and/or dynamic components?.

Benchmarking of Grid infrastructures has always been a highly polemic area. The heterogeneity of the components and the high number of layers in the middleware stack make difficult even to define the aim and scope of the benchmark. A couple of years ago we wrote a paper entitled “Benchmarking of High Throughput Computing Applications on Grids”  (R. S. Montero, E. Huedo and I. M. Llorente) for the Parallel Computing Journal presenting a pragmatic approach to evaluate the performance of a Grid infrastructure when running High Throughput Computing (HTC) applications. We demonstrated that the complexity of a whole Grid infrastructure can be represented by only two performance parameters, which can be used to compare infrastructures. The proposed performance model is independent from the middleware stack and valid for any computing infrastructure, so being also applicable for the evaluation of clusters and HPC servers.

The Performance Model

Our proposal is to follow an approach similar to that used by Hockney and Jesshope to characterize the performance of homogeneous array architectures on vector computations. A first-order description of a Grid can be made by using the following formula for the number of tasks completed as a function of time:

n(t)=R*t-N

Note that given the heterogeneous nature of a Grid, the execution time of each task can differ greatly. So the following analysis is valid for general HTC applications, where each task may require distinct instruction streams. The coefficients of the line are called:

  • Asymptotic performance (R): the maximum rate of performance in tasks executed per second. In the case of an homogeneous array of P processors with an execution time per task T, we have R = P/T.
  • Half-performance length (N): the number of task required to obtain the half of the asymptotic performance. This parameter is also a measure of the amount of parallelism in the system as seen by the application. In the homogeneous case, for an embarrassingly distributed application we obtain N = P/2.

The above linear relation  can be used to define the performance of the system (tasks completed per second) on actual applications with a finite number of tasks:

r(n)=R/(1+N/n)

graph.png

Interpretation of the Parameters of the Model

This linear model can be interpreted as an idealized representation of a heterogeneous Grid, equivalent to an homogeneous array of 2processors with an execution time per task 2* N/R.

equivalencia-grid-homogeneo.jpg

The half-performance length (N), on the other hand, provides a quantitative measure of the heterogeneity in a Grid. This result can be understood as follows, faster processors contribute in a higher degree to the performance obtained by the system. Therefore the apparent number of processors (2N), from the application’s point of view, will be in general lower than the total processors in the Grid (P). We can define the degree of heterogeneity (m) as 2N/P. This parameter varies form m = 1 in the homogeneous case, to m = 0 when the actual number of processors in the Grid is much greater than the apparent number of processors (highly heterogeneous).

N is an useful characterization parameter for Grid infrastructures in the execution of HTC applications. For example, let us consider two different Grids with a similar asymptotic performance. In this case, by analogy with the homogeneous array, a lower N parameter reflects a better performance (in terms of wall time) per Grid resource, since the same performance (in terms of  throughput) is delivered by a smaller ‘‘number of processors”.

comparacion-de-infraestructuras.jpg

The Benchmark

We propose the OGF DRMAA implementation of the ED benchmark in the NAS Grid Benchmark suite, with an appropriate scaling to stress the computational capabilities of the infrastructure, as benchmark to apply the performance model. The ED benchmark comprises the execution of several independent tasks. Each one consists in the execution of the SP flow solver  with a different initialization parameter for the flow field. These kind of HTC applications can be directly expressed with the DRMAA interface as bulk jobs.

DRMAA represents a suitable and portable API to express distributed communicating jobs, like the NGB. In this sense, the use of standard interfaces allows the comparison between different Grid implementations, since neither NGB nor DRMAA are tied to any specific Grid infrastructure, middleware or tool. DRMAA is implemented with the following available Resource Manager systems: Condor, LSF, Globus GridWay, Grid Engine and PBS.

sp-drmaa.png

In the paper we present both an intrusive and a non-intrusive methods to obtain the performance parameters.  The light-weight non-intrusive probes provide continual information on the health of the Grid environment, and so a way to measure the dynamic capacity of the Grid, which could eventually be used to generate global meta-scheduler strategies.

An Invitation to Action

We have demonstrated in several publications how the first-order model reflects performance of complex infrastructures running HTC applications. So, why don’t we create a TOP500-like ranking of infrastructures?. The ranking could be dynamic, obtaining the parameters with the non-intrusive probes. We have all the ingredients:

  • A model representing the achieved performance by using only two parameters: asymptotic performance (R) and half-performance length (N)
  • A benchmark representative of HTC applications: embarrassingly distributed test  included in the NAS Grid Benchmark suite
  • A standard to express the benchmark: OGF DRMAA

Ignacio Martín Llorente

 

Grid Gateways: Building Utility Computing Infrastructures with Globus

While research institutions are interested in Partner Grids that provide access to a higher computing performance to satisfy peak demands and support to face collaborative projects; enterprises understand grid computing as a way to address the changing service needs in an organization. They are interested in in-house resource sharing, to achieve a better return from their information technology investment, supplemented by outsourced resources, to satisfy peak or unusual demands. An Outsourced/Utility Grid would provide pay-per-use computational power when Enterprise Grid resources are overloaded. Such hierarchical grid organization may be extended recursively to federate a higher number of Partner or Outsourced Grid infrastructures with consumer/provider relationships. This would allow supplying resources on demand, making resource provision more agile and adaptive. It would offer, therefore, access to a potentially unlimited computational capacity, causing IT costs to transform from fixed to variable.

 fetch.png

In the context of the GridWay project we have developed a Grid Gateway that exposes a WSRF interface to a metascheduling instance, so enabling the creation of hierarchical grid structures. GridGateWay consists of a set of Globus services hosting a GridWay Metascheduler, thus providing a uniform, standard interface for the secure and reliable submission, monitoring and control of jobs. Most functionality is provided through GRAM (Grid Resource Allocation and Management), while scheduling information is provided through MDS (Monitoring and Discovery Service). The security requirement at the user level is addressed by GSI (Globus Security Infrastructure).

The new technology allows different layers of metaschedulers to be arranged in a hierarchical structure. In this arrangement, each target grid is handled as another resource, that is, the underlying grid is characterized as a single resource in the source grid, by means of grid gateways. This strategy encourages companies to federate their grids in order to have a better return of IT investment, and also satisfy peak demands of computation. Furthermore, this solution allows for gradual deployment (from fully in-house to fully outsourced), in order to deal with the obstacles for grid technology adoption, such as enterprise scepticism and IT staff resistance.

This approach also provides the components required for interoperability between existing Grid infrastructures. It is clear that we can’t wait for a single global grid to arise or to become predominant. Instead, we should work to build a seamless integration of the existing grids, which may eventually constitute the ultimate, capital-letter Grid, Grid of grids, or InterGrid, in the same way that the Internet was born. Grid interoperability can be achieved by means of common, ideally standard, grid interfaces, whose existence is an important (if not essential) characteristic of grid systems. Unfortunately, common interfaces (and even less standard ones) are not always available for given services. Then, the use of grid adapters and gateways becomes necessary. In particular, an interoperability solution based on grid gateways provides the infrastructures with significant benefits in terms of autonomy, scalability, deployment and security.

Well, what are you waiting for?, components are open-source, license is Apache v2.0, and we are willing to collaborate with you.

Ignacio Martin Llorente