TOP500 lists computers ranked by their performance on the LINPACK Benchmark. It is clear that no single number can reflect the performance of a computer. Linpack is, however, a representative benchmark to evaluate computing platforms as High Performance Computing (HPC) environments, that is in the dedicated execution of a single tightly coupled parallel application. On the other hand, an HTC application comprises the execution of a set of independent tasks, each of which usually performs the same calculation over a subset of parameter values. Although, the HTC model is widely used in Science, Engineering and Business, there is not representative bechmark and model to evaluate the performance of computing platforms as HTC environments. At first sight, it could be agued that there is no need for such a performance model. We agree on this for static and homogeneous systems. However, how can we evaluate a system consisting of heterogeneous and/or dynamic components?.
Benchmarking of Grid infrastructures has always been a highly polemic area. The heterogeneity of the components and the high number of layers in the middleware stack make difficult even to define the aim and scope of the benchmark. A couple of years ago we wrote a paper entitled “Benchmarking of High Throughput Computing Applications on Grids” (R. S. Montero, E. Huedo and I. M. Llorente) for the Parallel Computing Journal presenting a pragmatic approach to evaluate the performance of a Grid infrastructure when running High Throughput Computing (HTC) applications. We demonstrated that the complexity of a whole Grid infrastructure can be represented by only two performance parameters, which can be used to compare infrastructures. The proposed performance model is independent from the middleware stack and valid for any computing infrastructure, so being also applicable for the evaluation of clusters and HPC servers.
The Performance Model
Our proposal is to follow an approach similar to that used by Hockney and Jesshope to characterize the performance of homogeneous array architectures on vector computations. A first-order description of a Grid can be made by using the following formula for the number of tasks completed as a function of time:
Note that given the heterogeneous nature of a Grid, the execution time of each task can differ greatly. So the following analysis is valid for general HTC applications, where each task may require distinct instruction streams. The coefficients of the line are called:
- Asymptotic performance (R): the maximum rate of performance in tasks executed per second. In the case of an homogeneous array of P processors with an execution time per task T, we have R = P/T.
- Half-performance length (N): the number of task required to obtain the half of the asymptotic performance. This parameter is also a measure of the amount of parallelism in the system as seen by the application. In the homogeneous case, for an embarrassingly distributed application we obtain N = P/2.
The above linear relation can be used to define the performance of the system (tasks completed per second) on actual applications with a finite number of tasks:
Interpretation of the Parameters of the Model
This linear model can be interpreted as an idealized representation of a heterogeneous Grid, equivalent to an homogeneous array of 2N processors with an execution time per task 2* N/R.
The half-performance length (N), on the other hand, provides a quantitative measure of the heterogeneity in a Grid. This result can be understood as follows, faster processors contribute in a higher degree to the performance obtained by the system. Therefore the apparent number of processors (2N), from the application’s point of view, will be in general lower than the total processors in the Grid (P). We can define the degree of heterogeneity (m) as 2N/P. This parameter varies form m = 1 in the homogeneous case, to m = 0 when the actual number of processors in the Grid is much greater than the apparent number of processors (highly heterogeneous).
N is an useful characterization parameter for Grid infrastructures in the execution of HTC applications. For example, let us consider two different Grids with a similar asymptotic performance. In this case, by analogy with the homogeneous array, a lower N parameter reflects a better performance (in terms of wall time) per Grid resource, since the same performance (in terms of throughput) is delivered by a smaller ‘‘number of processors”.
We propose the OGF DRMAA implementation of the ED benchmark in the NAS Grid Benchmark suite, with an appropriate scaling to stress the computational capabilities of the infrastructure, as benchmark to apply the performance model. The ED benchmark comprises the execution of several independent tasks. Each one consists in the execution of the SP flow solver with a different initialization parameter for the flow field. These kind of HTC applications can be directly expressed with the DRMAA interface as bulk jobs.
DRMAA represents a suitable and portable API to express distributed communicating jobs, like the NGB. In this sense, the use of standard interfaces allows the comparison between different Grid implementations, since neither NGB nor DRMAA are tied to any specific Grid infrastructure, middleware or tool. DRMAA is implemented with the following available Resource Manager systems: Condor, LSF, Globus GridWay, Grid Engine and PBS.
In the paper we present both an intrusive and a non-intrusive methods to obtain the performance parameters. The light-weight non-intrusive probes provide continual information on the health of the Grid environment, and so a way to measure the dynamic capacity of the Grid, which could eventually be used to generate global meta-scheduler strategies.
An Invitation to Action
We have demonstrated in several publications how the first-order model reflects performance of complex infrastructures running HTC applications. So, why don’t we create a TOP500-like ranking of infrastructures?. The ranking could be dynamic, obtaining the parameters with the non-intrusive probes. We have all the ingredients:
- A model representing the achieved performance by using only two parameters: asymptotic performance (R) and half-performance length (N)
- A benchmark representative of HTC applications: embarrassingly distributed test included in the NAS Grid Benchmark suite
- A standard to express the benchmark: OGF DRMAA
Ignacio Martín Llorente