Content area
This paper describes a system for using the World Wide Web to distribute computational tasks to multiple hosts on the Web. A programmer with a computation to distribute registers it with a Web server. An idle host uses this server to identify available computations and downloads a Java class to perform the computation - this class is called a distriblet. The paper describes the programs written to carry out the load distribution, the structure of a distriblet class, and the experience in using this system.
David Finkel: Department of Computer Science, Worcester Polytechnic Institute, Worcester, Massachusetts, USA
Craig E. Wills: Department of Computer Science, Worcester Polytechnic Institute, Worcester, Massachusetts, USA
Brian Brennan: Southern New England Telecommunications, New Haven, Connecticut, USA
Chris Brennan: ATI Research, Marlborough, Massachusetts, USA
Introduction
For many years, some computer applications have been run as distributed programs. Different portions of the application run on different computers on a local area network. The advantages of distributed computing are well known. One advantage is improved performance: multiple computers are put to the task of running the application. Another advantage is scalability: if more work on an application is required, it may be possible to run additional instances of certain portions of the application on additional computers.
Since the advent of the World Wide Web, researchers have sought ways of using the vast computational resources of the hosts on the Web to implement distributed computations. Even simple Web browsing is an example of distributed computing: some of the work involved in displaying a Web page on a Web browser is performed by the browser's machine, and some by the Web server. In fact, a Web page may actually display material from several different Web servers, so there may be many machines involved in this simple distributed computation.
The Web can also be used for general-purpose distributed computing. In the next section, we describe several prior research projects directed at using the Web for general-purpose distributed computing. One attraction of using the Web for general-purpose distributed computing is the possibility of using the computational power of idle hosts on the Internet to take part in a distributed computation. Our project is a contribution to the solution of this problem.
There are three principal impediments to a general approach for solving this problem. First, the wide variety of machine architectures and operating systems on the Web makes it difficult to share code in such a heterogeneous environment. Second, to ensure that computations do not make unauthorized accesses on a remote machine, some form of authentication is often required. This restriction often limits the scope of distributing computations. Third, sharing of resources such as files or databases can be difficult in such an environment.
The availability of the Java programming language and Java virtual machine (JavaSoft, 1997; Sun Microsystems, 1996) has opened up the possibility of new approaches to distributed computing on the Web. Java programs can be run on essentially any computer; Java programs run inside an execution environment called a Java virtual machine, which deals with the differences in machine hardware and operating systems. Thus Java programs can be run on any computer for which a Java virtual machine is available, enabling the well-known "write once, run anywhere" property of Java. Java also provides facilities for downloading software on the Web and for creating secure execution environments. For these reasons, we adopted Java as the software basis for our distributed computation project.
In our approach, a programmer writes Java classes that are downloaded and run on idle machines on the Web in order to implement a distributed computation. In keeping with the names of the "applet" and "servlet" Java classes, we call our classes "distriblets", since they are designed for distributed computing. In the resulting system, Java is used to provide a parallel programming environment suitable for coarse-grain parallel computations. Java classes are downloaded from a server to a machine on the Internet and the results of the computation are returned to the server. Using this system, programmers can potentially have a large number of machines executing their applications. Additional details of our system can be found in Brennan and Brennan (1996).
After discussing related work, the paper describes the architecture and implementation of the load distribution system and our experience in using it. Based on this experience we describe ideas for additional work and how such a system could be implemented and paid forin the context of the World Wide Web. We conclude with a summary of our findings.
Related work
One motivation for distributing computations is to share computational load across a set of machines by putting otherwise idle machines to work. There have been a large number of research efforts directed to load sharing in a distributed system (Mutka and Livny, 1987; Wills and Finkel, 1995). Most of these efforts require that all of the machines taking part in the distributed computation have the same hardware architecture and operating system; typically, the machines are connected over a local area network. The results of these research efforts show that distributing load to idle machines can lead to dramatically improved overall performance. These results encouraged us to seek the same improvement by sharing computational load across the World Wide Web.
Distributed computing across the Internet is being done with the Distributed.Net project (Distributed.net, 1998). In this project, willing computer users download client programs to solve a portion of a larger problem. Different client programs are provided for different architectures, and the client downloads the appropriate program for its architecture. The current focus of this project is cracking the RC5 and DES encryption algorithms with different size keys.
More specific to our work, the Charlotte project (Baratloo et al., 1996) provides a processing environment based on the World Wide Web. Charlotte implements a shared memory and inter-process communication paradigm currently used in multiple processor machines. Charlotte gives standard Java applets the ability to access variables on the host computer as if they were their own. Thus, Charlotte uses the medium of the World Wide Web to create a parallel programming environment.
Another research project is called WebOS. The goal of the WebOS is to provide operating system-type services to hosts across the Web. Researchers have looked at providing a common set of operating system services to wide-area applications (Vahdat et al., 1998). This work seeks to provide mechanisms such as naming, persistent storage, resource management and authentication to provide a platform for applications that looks much like a local operating system. The services of WebOS are rich, but require clients and servers to be running the WebOS system in order to participate.
One of the possible extensions of our project that we discuss later in the paper involves paying the participants for taking part in the distributed computation. In another research effort, the ReGTime system provides an accounting mechanism for connecting clients in need of computing power with servers who have available computing power (Zahn and Ungerer, 1998). This work has been implemented in a workstation environment with application written in C and Perl.
Design and implementation
Implementation architecture
There are three main components to our load distributing system:
- (1) the Helper application;
- (2) the distribution server; and the
- (3) computation server.
The Helper application is a Java application that runs on a computer on the Internet. When the computer is idle, the Helper application contacts the distribution server to locate a computation to be performed. The Helper application then connects to a computation server with work to distribute, downloads a Java class to execute, executes it, and returns the result to the computation server.
The Helper application, the distribution server, and the computation server have all been implemented as part of this project. An application programmer who wishes to prepare an application for distribution must prepare aJava class to perform the computation. Predetermined methods must be implemented in this class: these are used by the Helper application to download the arguments to use in the computation, to execute the computation, and to return the result to the computation server. The application programmer must also register the computation with the distribution server,so that Helper applications can locate the computation server.
The Helper application has been written as a Java application. Java programs are either applets or applications. Java applets are designed to be downloaded by Web browsers and executed within the browser. Java applets have security restrictions so that they cannot damage the computer on which they are running. For example, Java applets cannot access local files on the machine on which they are running, and cannot make network connections to sites other than the one from which they have been downloaded. Applications, however, do not have these restrictions, and can perform the usual functions of an application program. The restriction on network connections required that the Helper application be written as a Java application instead of an applet; the Helper application needs to be able to create network connections to both the distribution server and the computation server. In order to provide the kind of security available in executing an applet, we implemented a security model in the Helper application.
Details of the implementation
The following describe the main points of the operation of the Helper application:
- (1) Only compute work when idle. In order to use a computer without disturbing its owner, we set the default Java priority of each computation class to 2 on a scale of 1 (low) to 10 (high). Embedded in the helper application we have placed a safeguard to make sure that the class does not set itself above a priority of 4. Because a job is capped at a priority of 4, it will never interfere with the normal operations of the helper computer.
- (2) Select a computation to assist. Once a computer is free to do work, the Helper application contacts the distribution server and downloads a list of running computations. Currently, the computation choice is completely random, but in the future a more selective scheduling algorithm can be added to accommodate priority computations.
- (3) Download work. After the helper finds a computation to help, it contacts the appropriate computation server and asks for work using a TCP connection to the server. The server replies with an identification number and a distributable class name, or a message saying that there is no more work. The Helper application checks to see if it has already downloaded the class from this server. If it has not, it requests the Java byte-code of the class. The distributable class is written by the developer to contain all of the application-specific implementations of the job distribution. The class must inherit from the class distributable, which was developed as part of this project, and must have methods getArgs, run, and sendResults as described below.
- (4) Retrieve startup data. Once the Helper application has the class description, it instantiates the class into an object and calls the object's getArgs method, passing it the streams associated with the TCP connection. The resulting data transmission is handled by the programmer.
- (5) Execute. Execution is done by calling the object's run method. The programmer's run method can operate on the data in any way permitted. Because users would not want their computers executing foreign code without prior knowledge of its functionality, we created a security policy that closely resembles the applet security policy. The only difference is that an applet has certain rights on documents and classes contained in the same or related URL that the applet was loaded from. But, because a downloaded class is not referenced by a URL, the class does not get the same rights as an applet does. A downloaded class's network ability is restricted to connecting only to the host it came from. It also has no file system access, no access to other threads and no access to the Helper application itself, just like an applet. With these security restrictions, users should feel safe and willing to allow their computer to be used for unknown computations.
- (6) Transmit results. After run returns, the Helper application calls the object's sendResults method. sendResults is the application developer's method that transmits any necessary results over the network.
- (7) Get more work. The Helper application has the option to get more work from the same server, or to go to another server.
Communications among the components
The Helper application, distribution server, and computation server components work together using a common protocol. Figure 1 shows the communications between these components as described above.
Results
In order to test our system, we developed two applications that we distributed over the network. The first, Adder, added all the integers in a large range by breaking the range into subranges, and distributing the subranges to different machines to add. This experience demonstrated that it was straightforward to implement the required classes to create a distriblet. This application required only minimal network traffic: two integers to download the arguments, and one integer to return the result.
The second application was a Mandelbrot fractal image generator. The Mandelbrot image generator is a computationally intensive application that can be easily be divided into pieces that can be distributed to multiple machines for computation. Again, we were able to implement the necessary distriblet to distribute the computation. We used the Mandelbrot image generator to run several tests of the performance of our system.
The first test involved determining if the distributed computation executed more quickly than the corresponding computation running on a single machine. While our tests showed a speed-up effect when running the distributed computation, the effect was difficult to quantify. The machines we had available were of differing speeds, and we determined that the performance on individual machines depended strongly on the efficiency of the Java virtual machine implementation available.
We also used the Mandelbrot program to study the effect of network failures on the computation. We simulated network failures by having some of the downloaded applications fail to return a result. The computation server, which is responsible for determining the arguments to be used by each Helper application, is designed to time out if a Helper application does not return a result before a timer expires. When a time-out occurs, the same set of arguments will be sent to another Helper application.
In our tests, we simulated a 30 per cent failure rate. The application was still able to complete successfully, although it required about 44 per cent more time than the same application without failures.
The final set of performance tests involved an assessment of the network overhead involved in our system. For a given computation, we varied the number of "chunks" into which we divided the computation. The more chunks, the more network overhead. In these tests, we used only one helper machine so that there was no difference in the amount of parallelism as we varied the number of chunks. Thus, the differences in completion time of the different trials were due only to the differing amount of overhead: the time required to transmit and receive the parameters and results, and the network transmission time.
Figure 2 shows the relation between time to complete the computation and the number of chunks. As expected, increasing the number of chunks increases the computation time. The slope of this line indicates that each additional chunk added 1.33 seconds to the completion time. Even though precise value of this overhead would vary, depending on the speeds of the machines and network being used, this result indicates that the size of the computation being distributed should be large enough to justify this additional overhead. Thus, our system is suitable for coarse-grain parallel computation.
Future work
In the course of our work, we have identified opportunities for extensions and improvements. We are currently investigating these alternative approaches. One improvement is to make a distributable applet, so that a user can download a computation to perform by visiting a Web site, without the need to be running a Helper application. Ideally this connection can be long-lasting so that multiple pieces of a computation can be executed during the span of single connection. Such a mechanism could lead to a Web site to which users connect when they know their machine will be idle for a period of time. This scenario would lead to an enormous number of machines on the Internet available to support parallel computations, and make use of the vast computational power of idle computers on the Internet.
Alternatively, the Helper application could be built into widely available Web browsers. Such a built-in helper might allow a less stringent security model than currently available with Java applets and allow for computations to make network connections to machines besides the one providing the download. This approach would allow more flexibility in the type of computations that could be distributed. Other improvements focus on security and authentication, so that a computation server could authenticate the identity of a host returning results to it, and ensure that the results being returned were correct.
To provide an incentive to users to participate, a micropayment system could be used. Micropayment systems, such as the Millicent system developed by Digital Equipment Corporation (Digital Equipment Corporation, 1997), have been proposed as a way of collecting small payments for access to Web resources. In the Millicent system, users can connect to a Millicent-enabled site (a "broker") and use a credit card to purchase scrip, which can in turn be used to purchase access to Web resources. The scrip is stored in digital form on the user's computer until it is spent to pay for access to some Web resource. The advantage of using the micropayment scheme instead of individual credit card transactions is that the micropayment system will support very small payments - perhaps as small as a fraction of a US cent. The overhead of a credit card transaction is too expensive to permit transactions for such small amounts of money.
These same micropayment systems could be used to pay users a small amount for allowing their system to be used to download and execute distributed computations. Thus users could use the time when their systems would otherwise be idle to earn credits to be used later for access to Web resources. This could provide sufficient incentive for large numbers of users to make their systems available for distributed computations. We are currently investigating implementing this extension to our system. The study will address both the technical issues of incorporating the micropayments into our system as well as the human issues of whether micropayments will provide sufficient incentive for users to participate.
Summary
We have described a project to design and implement a load distribution system written in Java to run over the World Wide Web. We implemented the framework of the system: the Helper application, the distribution server, and the computation server. A programmer wishing to prepare an application for execution using our system needs to create a Java class, following our specifications, to carry out the computation. A user willing to help by executing part of the computation needs only to start the Helper application, which will automatically detect when the computer is idle and locate a computation. The Helper application downloads a Java class to perform the computation, downloads the arguments to use, executes the computation, and returns the results. A strong security model is built into the Helper application, so that users can be confident that they will not download computations that will harm their systems. We have shown that our system is practical, and have performed an analysis of the overhead it requires.
References
1. Baratloo, A., Karaul, A., Kadem, M.Z. and Wyckoff, P. (1996, "Charlotte: metacomputing on the Web", Proceedings of the International Conference on Parallel and Distributed Systems, Dijon..
2. Brennan, B. and Brennan, C. (1996, "Dynamic application load distribution with Java", Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA, MQP CEW-9603.
3. Distributed.net (1998, "Distributed.net Node Zero", World Wide Web, http://www.distributed.net/
4. JavaSoft (1996, "The JavaSoft homepage", World Wide Web, http://www.javasoft.com
5. Digital Equipment Corporation (1997, "Millicent", World Wide Web, http://www.millicent.digital.om
6. Mutka, M.W. and Livny, M. (1987, "Profiling workstations available capacity for remote execution", Performance '87, Proceedings, 12th IFIP WG 7.3 Symposium on Computer Performance, Brussels.
7. Sun Microsystems (1996, "Java computing", World Wide Web, http://www.sun.com/java
8. Vahdat, A., Anderson, T., Dahlin, M., Belani, E., Culler, D., Eastham, P. and Yoshikawa, C. (1998, "WebOS: operating system services for wide area applications", Proceedings of the Seventh Symposium on High Performance Distributed Computing, Chicago, IL.
9. Wills, C.E. and Finkel, D. (1995, "Scaleable approaches to load sharing in the presence of multicasting", Computer Communications, Vol. 18 No. 9, pp. 620-30.
10. Zahn, A. and Ungerer, T. (1998, "Experiences and enhancements of ReGTime: a system for trading computing power", Proceedings of the Workshop on Distributed Computing on the Web, Rostock.
Caption: Figure 1; Communication procedure; Figure 2; Network overhead
Copyright MCB UP Limited (MCB) 1999
