The Grid Workloads Archive

The primary purpose of the Grid Workloads Archive is to provide (anonymized) workload traces from grid environments to researchers and to practitioners alike.

Grid Workloads Archive logo

1. Motivation and Goals

Large-scale multi-site infrastructures (Grids) provide the needed computational support for e-Science. The current situation of the research in resource management in grid resembles strikingly that of the parallel production environments, a decade ago: surprisingly little is known about the real behavior of the studied systems, and most research results are based on empirically composed workloads. Following the analogy with the parallel production environments' world, the evolution of research in grids is tightly connected to the existence and the quality of workload traces from real grid environments.

The goal of the Grid Workloads Archive is to provide a virtual meeting place where practitioners and researchers can exchange grid workload traces. To facilitate the exchanges, a standard Grid Workloads Format (GWF) must be provided. Based on GWF, a high-availability, easily accessible, database of existing grid workload traces must be created. For practitioners, tools that convert workload trace formats specific to different platforms into GWF data must be made available. Finally, links with similar efforts from related research communities (e.g., the Parallel Workloads Archive, PWA) must be established.

2. Approach

Our approach to building the Grid Workloads Archive is:

  • We will design a (plain-text) format that can be used equally through the occasional script, and through a relational database system. The GWF standard will provide a unitary format for Grid workload traces storage, use, and, most importantly, exchange. For data reuse, community inter-connection, and administrative purposes, the GWF format will be designed as an extension to the Standard Workload Format used by the PWA. Note: A  first version of the GWF is now available! Please send us your comments at gwa@tudelft.nl.
  • We will make available online tools for easy GWF parsing and conversion from different grid log formats. Generic tools for statistical analysis of the data in GWF format will also be provided.
  • To offer web-based access to the published grid traces, we will design, build, install, and maintain the Grid Workloads Archive's database.
  • We will build tools for the automatic submission of (anonymized) traces from grid environments.
  • We will create and maintain the Grid Workloads Archive's web site.

3. Timeline

  • Nov/2006 - Grid Workloads Archive's charter adoption
  • Dec/2006 - Grid Workload Format finished
  • Jan/2007 - Grid Workloads Archive web site online
  • Feb/2007 - Web-based interface to the Grid Workloads Archive (database) - first grid trace online
  • Dec/2007 - contact coordinators of major production grid environments - 10 grid traces online
  • Until 2010 - goals: 20 grid traces, automatic submission from several large-scale production grids

4. Contributing

How can I contribute the the Grid Workloads Archive?
If you have traces from your grid system, either complete or partial, please contact us at gwa@tudelft.nl. We will help you gather the traces, we will anonimize and process the traces, we will publish with your consent the traces, and we will give credit where its due.

5. Team

Team members (alphabetical order):

 

 

Side content