Organizations Developing Grid Computing Toolkits and the Framework | Windows Vista(TM) Plain & Simple (Bpg-Plain & Simple)

To achieve a successful adoption of Grid Computing requires an adequate infrastructure, security services, key services, applications, and portals. Let us now explore and identify some of the most prominent organizations responsible for the toolkits, middleware, and framework for Grid Computing.

Globus

The Globus ^[4] project is a multi-institutional research effort to create a basic infrastructure and high-level services for a computational grid. A computational grid is defined as hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities (Foster & Kesselman, 1998). They have now evolved into an infrastructure for resource sharing (hardware, software, applications, and so on) among heterogeneous virtual organizations. These grids enable high creativity by increasing the average and peak computational performance available to important applications regardless of the spatial distribution of both resources and users.

The details on Globus infrastructure provided in Figure 2.3 are based on the latest release from Globus called Globus GT3. Globus provides layered software architecture with a low-level infrastructure to host high-level services defined for grid. These high-level services are related to resource discovery, allocation, monitoring, management, security, data management, and access. The lower layer infrastructure (GT3 Core ) provides a framework to host the high-level services.

Figure 2.3. Globus GT3 middleware, core, and high-level services present a wide variety of capabilities.

graphics/02fig03.gif

Some of the core high-level services included with the existing Globus toolkit are found in the following discussion.

Globus Resource Allocation Manager (GRAM)

GRAM provides resource allocation, process creation, monitoring, and management services. GRAM simplifies the use of remote systems by providing a single standard interface for requesting and using remote system resources for the execution of "jobs." The most common use of GRAM is the remote job submission and control facility. However, GRAM does not provide job scheduling or resource brokering capabilities. We could see that the job scheduling facilities are normally provided by the local system. GRAM uses a high-level Resource Specification Language (RSL) to specify the commands and maps them to the local schedulers and computers.

Grid Security Infrastructure (GSI)

GSI provides a single-sign-on, run anywhere authentication service with support for local control over access rights and mapping from global to local user identities. While keeping the existing GSI mechanisms, the current GSI3 standard is in alignment with the Web service security standards by defining a GSI profile for WS-Security. ^[5]

Information Services

A GT3 Information service provides information about grid resources, for use in resource discovery, selection, and optimization.

The Monitoring and Discovery Service (MDS) is an extensible grid information service that combines data discovery mechanisms with the Lightweight Directory Access Protocol (LDAP). The MDS provides a uniform framework for providing and accessing system configuration and status information such as computer server configuration, network status, or the locations of replicated datasets. The current GT3 framework merges the MDS with the XML data framework for better integration with existing Web services and OGSA.

The latest Globus Toolkit (GT3) is a java implementation of the OGSI specification. The discussion on the architecture and programming model of the GT3 infrastructure software and the details on the high-level services are deferred to the last section of this book.

Legion

Legion, ^[6] a middleware project initiated by the University of Virginia, is object-based metasystems software for grid applications. The goal of the Legion project is to promote the principled design of distributed system software by providing standard object representations for processors, data systems, file systems, and so on. Legion applications are developed in terms of these standard objects. Groups of users can construct a shared virtual workspace to collaborate on research and exchange information.

Figure 2.4 shows the architecture of a Legion system. Legion sits on top of the user's operating system and acts as mediator between its own host(s) and other required resources. Legion's scheduling and security policies act on behalf of the user in undertaking time-consuming negotiations with outside systems and system administrators. To allow users to take advantage of a wide range of possible resources, Legion offers a user-controlled naming system called context space , so that users can easily create and use objects in distributed systems.

Figure 2.4. Legion application architecture.

graphics/02fig04.gif

An Interface Definition Language (IDL) is defined to describe the method signatures ( name , parameter, and return values) supported by the object interface. We could see that these objects provide a scalable persistence mechanism by storing the inactive objects (objects in "inert" state) to the secondary storage.

Some of the important characteristics of Legion systems are summarized below.

Everything is an object

In a Legion system, Legion Object represents a variety of hardware and software resources, which respond to member function invocations from other objects in the system. Legion defines the message format and high-level protocol for object interaction (through IDL), but not the programming language or the communications protocol.

Classes manage their own instances

Every Legion object is defined and managed by its class object. Class objects are given system-level responsibility; classes create new instances, schedule them for execution, activate and deactivate them, and provide information about their current location to client objects. These classes whose instances are themselves classes are called metaclasses .

Users can provide their own classes

Legion allows its users to define and build their own "class" objects. This enables the Legion programmers to have a flexible architecture model for their "metaclasses" with the capabilities to determine and even change the system-level mechanisms of their objects.

Core objects implement common services

Legion defines the interface and basic functionality of a set of core object types that support basic system services, such as naming, binding, object creation, activation, deactivation , and deletion.

Some of the core objects defined by the Legion system are:

Host objects : Abstractions of processing resources which may represent a single processor or multiple hosts and processors
Vault objects : Provide persistent storage for scalable persistence of the objects
Binding object : Maps the object IDs to the physical addresses
Implementation objects : Allow legion objects to run as processes in the systemand contain a machine code that is executed on a request to create the object or activate it.

Figure 2.5 shows Legion object A with its class object (metaclass) and the corresponding basic system services.

Figure 2.5. Legion core object and relationship.

graphics/02fig05.gif

In 1997, the first Legion toolkit was released, and in the following year, Applied Metacomputing (later relaunched as Avaki Corporation) was established to exploit the toolkit for commercial purposes.

Condor and Condor-G

Condor ^[7] is a tool for harnessing the capacity of idle workstations for computational tasks . Condor is well suited for parameter studies and high throughput computing, where jobs generally do not need to communicate with each other.

We can classify Condor as a specialized workload management system for computation- intensive jobs. Like other full-featured batch systems, Condor provides a job queuing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Upon receiving serial or parallel jobs from the user, the Condor system places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion.

We can make use of Condor to manage a cluster of dedicated compute nodes. It is suitable for effectively harnessing the CPU power from idle workstations. Condor has mechanisms for matching resource requests (jobs) with resource offers (machines).

While Condor software tools focus on harnessing the power of opportunistic and dedicated resources, Condor-G is a derivative software system, which leverages the software from Condor and Globus with major focus on the job management services for grid applications. This is a combination of interdomain resource management protocols of Globus (GRAM, Index Services) with the intradomain resource management methods of Condor. Figure 2.6 shows a sample usage of Condor-G in combination with Globus. As shown, Condor-G contains a GASS Server, which is used to transfer jobs to and from the execution center. The Condor-G Grid manager uses GRAM to get the Job progress information from Globus Gate Keeper.

Figure 2.6. Remote execution of Condor-G on Globus-managed resource using Globus Job manager.

graphics/02fig06.gif

Condor software is used by both scientific and commercial organizations. The major scientific initiative that uses Condor includes NSF Middleware Initiative (NMI), Grid Physics Network (GriPhyN), International Virtual Data Grid laboratory (iVDGL), TerraGrid, and so on. Some of the prominent commercial uses of condor software involve solving computational Grid Computing problems, as done by Micron Technologies, CORE Digital Pictures, and NUG30 Optimization Problem Solver.

Nimrod

Nimrod ^[8] provides a user interface for describing the "parameter sweep" problems, with resulting independent jobs being submitted to a resource management system.

Nimrod-G is a derivative software system, which harnesses the software from Nimrod and Globus to harness multi-domain resources as if they all belong to the one personal domain. It provides a simple declarative parametric language for expressing the parameters for execution. This system exposes novel resource management and job scheduling algorithms based on the economic principles of computing. Such a set of resource trading services is called GRACE (Grid Architecture for Computational Economy). GRACE provides mechanisms to negotiate on the QoS parameters, deadlines, and computational costs. In addition, it offers incentive for relaxing requirements. We could see that depending on users' QoS requirements, these resource brokers dynamically lease Grid services at runtime depending on their cost, quality, and availability.

Leveraging the services provided by grid middleware systems develops the Nimrod-G toolkit and resource broker. These middleware systems include Globus, Legion, GRACE, and so forth.

As illustrated in Figure 2.7, the Nimrod architecture defines the following components :

Nimrod-G clients , which can provide tools for creating parameter sweep applications, steering and control monitors, and customized end-user applications and GUIs
The Nimrod-G resource broker, which consists of a Task farming engine (TFE), a scheduler that performs resource discovery, trading and scheduling features, a dispatcher and actuator, and agents for managing the jobs on the resource

Figure 2.7. Architecture of Nimrod-G.

graphics/02fig07.gif

It is important to note that the Nimrod-G broker provides its services by leveraging the grid middleware systems including Globus, Legion, Condor, and so on.

As we have previously discussed, the core feature of the Nimrod-G toolkit is the support for user-defined deadlines. For example: "Get this simulation done in 10 minutes with a budget of USD $200." Also, budget constraint for scheduling optimizations is a part of the core features.

Nimrod-G facilitates the execution of the user requirement by managing supply and demand of resources in the grid using a set of resource trading services.

The most important scheduling algorithms used in Nimrod-G are:

Cost optimization ” uses the cheapest resource
Time optimizations ” results in parallel execution of the job
Cost-time optimization ” similar to cost optimization but if there are multiple jobs with the same cost, then the time factor is taken into consideration
Conservative time strategy ” similar to time optimization, but guarantees that each unprocessed job has a minimum budget per job

Parametric Computational Experiments

Parametric computational experiments are becoming increasingly important in science and engineering as a means of exploring the behavior of complex systems. For example, a flight engineer may explore the behavior of a wing by running a computational model of the airfoil multiple times while varying key parameters such as angle of attack, air speed, and so on.

The results of these multiple experiments yield a picture of how the wing behaves in different parts of parametric space.

Many practitioners of Grid Computing believe that economic policy/criteria-driven Grid Computing, as depicted by Nimrod-G, is a major interest to the utility computing world.

UNICORE (UNiform Interface to COmputer REsource)

The UNICORE ^[9] project is funded by the German Ministry of Education and Research with the design goal including a uniform and easy-access graphical user interface (GUI), open architecture based on the concept of an abstract job, a consistent security architecture, minimal interface with local administrative procedures, and exploitation of the existing and emerging technologies including Web and Java.

UNICOREpro was produced within the UNICORE to provide a uniform interface for job preparation and secure submission of the job similar to a portal. This enables users to create workflow for job execution and control execution behaviors. This is an open source project developed using Java technology. The UNICOREpro server provides capabilities for authorization, job management, data transfer, and batch interface.

A project called GRIP (GRid Interoperability Project) was started in 2002 to achieve the interoperability between UNICORE and Globus. The EUROGRID software is based on the UNICORE system developed and used by the leading German HPC centers.

NSF Middleware Initiative (NMI)

NMI ^[10] was created by the National Science Foundation (NSF) to help scientists and researchers use the Internet to effectively share instruments, laboratories, and data and to collaborate with each other. Middleware is software that connects two or more otherwise separate applications across the Internet or local area networks.

Middleware makes resource sharing seem transparent to the end user, providing capabilities, consistency, security, and privacy.

NMI consists of two teams :

Grid Research Integration Deployment and Support (GRIDS) Center. The GRIDS ^[11] center is responsible for defining, developing, deploying, and supporting an integrated and stable middleware infrastructure created from a number of open source grid and other distributed computing technology frameworks. It intends to support 21st-century science and engineering applications by working closely with a number of universities and research organizations.

Some of the open source packages included in this middleware are Globus Toolkit, Condor-G, GSI-OpenSSH, Network Weather service, Grid Packaging Tools, GridConfig, MPICH-G2, MyProxy, and so on.

Enterprise and Desktop Integration Technologies (EDIT) Consortium. EDIT ^[12] develops tools, practices, and architectures to leverage campus infrastructures to facilitate multi-institutional collaboration.

EDIT provides software to support a wider variety of desktop security, video, and enterprise uses with a directory schema. This facilitates the federated model of directory-enabled interrealm authentication and authorization. In addition, they are responsible for conventions and best practice guidelines, architecture documents, policies, and to provide services to manage the middleware. Some of the open sources packages included in this middleware are: LDAP Operational ORCA Kollector (LOOK), Privilege and Role Management Infrastructure Standards Validation (PERMIS), openSAMIL, and others.

The latest release (Release 3) of the NMI middleware consists of 16 software packages. The above two teams of the NMI is creating production-quality middleware using open-source and open-standards approaches. They continue to refine processes for team-based software development, documentation, and technical support. The software packages included in the NMI solution have been tested and debugged by NMI team members , so that various users, campuses, and institutions can easily deploy them. In addition, it helps to facilitate directory-enabled (LDAP) sharing and exchanging of information to support authentication and authorization among campuses and institutions.

The aforementioned best practices and policy deliverables have been reviewed and deployed by leading campuses and institutions. Some of the major initiatives using this middleware suite include NEESgrid (Network for Earthquake Engineering Simulation), GriPhyN, and the iVDGL.