BACKGROUND

USING MODULES TO INTEGRATE DISTRIBUTED COMPUTING
IN THE UNDERGRADUATE CS CURRICULUM

ABSTRACT: Our department is embarking on an effort to integrate the study of distributed computing across our CS curriculum, starting with our introductory course for majors and concluding with a capstone course. Although theoretical and systems issues are not ignored, the focus of this effort is on the development of distributed programming as a means for solving problems. This paper describes our overall plan and some of our progress to date in our early courses.

Although the increase in diversity and availability of parallel multiprocessors shows no sign of abatement, the one truly ubiquitous parallel computer system continues to be the LAN of workstations, and the one massively parallel system to which "everyone" has access, if not authorization, is the Internet. Recognition of network computing as an important platform for parallel computing has resulted in the widespread research efforts in the design and use of a host of message-passing based programming environments, including both languages and support libraries [3-6, 11-13, 19].

In spite of these strides and their importance, until recently the only exposure to concurrent programming that most undergraduate CS majors received was in the context of an operating systems course. Distributed programming as a topic at the undergraduate level was even rarer. Now, however, it is generally recognized that concurrency, parallel computing and distributed programming are all appropriate and, in fact, highly desirable at the undergraduate level [16-18,20]. Furthermore, if we hope to draw our undergraduates into research experiences, a sophisticated level of experience in these areas is essential, not only for research in distributed systems per se, but in scientific computing, artificial intelligence and other computationally challenging areas.

There is a problem however. There is already a considerable CS "core" of essential courses, yet the total number of credits that a college student will take is fixed by the institution and increases in major requirements often meet with stiff resistance from college governance bodies. Furthermore, not only is it increasingly accepted that all CS majors should have some experience with parallelism and distributed computing but other areas, such as GUIs and human factors, database, optimization, logic programming compete for the all too small set of required advanced undergraduate course credits. Our response as faculty must be to find parsimonious ways of teaching these areas.

Our department is using two approaches to develop institutionally realistic ways of integrating the advances in distributed programming into our undergraduate curriculum. First, we are in the process of designing a set of modules on a variety of aspects of distributed computing. These modules are suitable for seamless integration into many of our existing courses. They range in duration from 1-week to 1-month and, in most cases, augment rather than replace existing topics in their "host" courses, e.g. CS 1 and 2, and discrete math. Second, we are creating a cluster of advanced undergraduate electives that integrate the methods, problems and results of current research in distributed programming into junior and senior level curriculum. The primary focus of these courses will be distributed applications, with some attention paid as well to systems issues. These courses include artificial intelligence, simulation, workstation programming, decision support software as well as a specially designed capstone course.

In the next section of this paper we describe what brought us to this endeavour and why we feel we have a strong likelihood of success. In succeding sections we give an overview of the modules that we have selected for core CS courses and some of the advanced undergraduate electives. We then describe one module in greater detail and conclude by discussing its impact.

BACKGROUND

This curricular effort is in part motivated by the thread of distributed programming that runs through much of our own research. Much of this work centers around the development of the DP library [3], a low-level but portable, flexible and reliable library of process management and communication routines designed to support the development of parallel applications in a distributed environment. Other groups in this department have used this and other libraries to build tools to solve problems in condensed matter physics and computational chemistry [7] and to attack computationally intractable problems from operations research and artificial intelligence [2]. The development and testing of parallel pseudorandom number generators appropriate for distributed Monte Carlo programs is another line of research that relates to distributed programming [8].

From another vantage point, this effort builds on and is motivated by our own experiences teaching distributed computing in upper electives. The network programming module in our Workstation Programming course covers both client-server network applications and parallel distributed programming. Given the unfamiliarity of the students with either topic, only a limited exposure can be provided. Undergraduate students' activity in research projects is an important part of our program, but again, due to the paucity of information about distributed or parallel computing in our current undergraduate curriculum, the students must spend considerable time familiarizing themselves with the concepts of distributed computing before they can actively participate in many of our research efforts.

Although our experiences have been generally positive in that it is clear that undergraduates are capable of learning and working productively with distributed systems, we share a consensus that a greater readiness on the part of our students is possible and would be very beneficial. We also believe that a greater coordination between the advanced elective courses would allow us to bring students into the research labs sooner and more effectively.

We are also motivated by the experiences of others who have introduced advanced courses in parallel computing [9, 10, 15 for example] as well as modules on parallel computing earlier in the curriculum [14, 18]. Kotz in particular has taken an approach that we have used, that is, the development of special purpose software environments to facilitate student labwork.

The move to a broader, across-the-curriculum approach to parallel computing has also been proposed [20] and very positive steps taken in that direction have been reported [16, 17]. The latter, like Kotz, has brought parallel computing into the curriculum as early as CS-2 and writes that it is "imperative to integrate parallel computation throughout the undergraduate curriculum including introductory courses".

Most of this work has been oriented towards parallel rather than distributed computing. Our project then may also be seen as contributing to the general move to bring parallel computation into the curriculum. For institutions with limited resources, the distributed computing curriculum has a special appeal because it requires less special purpose equipment.

MODULES: DISTRIBUTED PROGRAMMING ACROSS THE CURRICULUM

We guarantee that all CS majors have significant experience in distributed computing by the time they are ready for the courses in the advanced cluster by introducing a set of distributed programming modules for our elementary and intermediate level courses. This "X across the curriculum" has been a widely used strategy in other curricular areas ("writing across the curriculum", "quantitative reasoning across the curriculum", and so on.). To be viable, the distributed programming modules must have these following characteristics:

(1) Seamless with respect to the "host" course. Neither faculty nor students will do anything other than at best pay lip service to a module that seems out of place in a course. By analogy, "writing across the curriculum" cannot be implemented by arbitrary assignment of additional papers; special kinds of writing assignments must be devised that are appropriate for each course. The same is true for distributed programming within a CS curriculum.

(2) Reinforce rather than replace existing material. Instructors in all courses claim there is not enough time to cover all the required topics. Our modules minimize the replacement of existing topics by providing examples and exercises that present a context for the standard topics covered in a course. By focusing the examples (on distributed computing) we lose very little in the course but increase the student's awareness and sophistication with respect to this area.

The modules that are being developed are shown in Figure 1 below.

-----------------------------------------------------------------------------------------------------------------

Figure 1: Modules for distributed computing in the early CS curriculum.

-----------------------------------------------------------------------------------------------------------------

Module 1: Clients and servers-I (CS 1) 1

Course Synopsis: The first course in the majors sequence. An introduction to elementary programming

using C. No prior experience with programming is assumed. Students write small, simple

programs of up to 150 lines of code.

Module Duration: 1 week.

Module Synopsis: The idea of a network of computers with cooperating programs is introduced, along with

the idea of some programs playing the role of client and others the role of server.

Module Exercise: A simple client interface is given to the students to access the services of a remote server.

Module 2a: Clients and servers -II (CS 2)

Course Synopsis: The second course in the majors sequence. Students are introduced the discipline of

programming and software engineering. Module design and specification, test drivers and

test suites, informal formal methods, pipelines, diverse i/o styles, pointers and dynamic

storage allocation.

Module Duration: 1 week.

Module Synopsis: The client-server model, introduced briefly in the first course is expanded.

Module Exercise: Students write more advanced client programs, building on their CS 1 work.

A simple server interface is given to the students to use to create a simple remote server.

Module 2b: Symmetric parallel distributed programming (CS 2)

Module Duration: 1 week.

Module Synopsis: Introduction to the idea of parallel processing, using a MIMD, message passing model.

Discussion of the network as a MIMD message-passing platform. Simple parallelization

strategies.

Module Exercise: A simple library supporting static symmetric distributed process creation and synchronous

communication [1] is given to use in parallelizing some elementary program (such as a

quadrature).

Module 3: Distributed computing problems (Discrete math)

Course Synopsis: A typical undergraduate discrete math course.

Module Duration: Entire semester.

Module Synopsis: This module consists of a set of exercises that are based on examples drawn from

idealized or real problems associated with networks and distribute programming. Exercises of

this type are developed for each part of the course.

Module 4: Distributed shell programming (Shell programming)

Course Synopsis: An introduction to shell programming.

Module Duration: 1 week.

Module Synopsis: Using access to the NIS hosts file and the rsh command, techniques for building

distributed shell scripts that run in parallel are discussed.

Module Exercises: Students write distributed shell programs-- for example a program that surveys

the network for a particular user or counts number of users across the network.

-----------------------------------------------------------------------------------------------------------------

MODULE 2A: CLIENTS AND SERVERS

In this section we describe one of the modules in detail. It takes place in the second programming course. Since the current cohort has not had module 1 in their introductory course, the module this semester incorporates that work as well.

The Web is used as a natural and attractive context for this module. It provides an exciting (to the students at least) service and one that is readily visible. We prepare for this module throughout the semester in two ways. On the one hand, the student login environment (OpenWindows on a Sun workstation) always brings up a web client and much of the course information is provided through a course home page. So the students early on become quite familiar with this network service. Furthermore, through a sequence of gentle and "fun" exercises, we guarantee the prerequisite web sophistication:

(1) WWW hunt (1) and (2)-- two automatically generated scavenger hunt exercises that force the student to familarize herself with both jumping through hyper-text links and using the browser's find/search features. This is less necessary for the students (who tend to jump right into the web-- who doesn't?) than for assuring the instructor that they all have done so.

(2) WWW contest-- another test as a web client user: students submit the URL of their favorite page outside the university. This too is probably unnecessary but it guarantees clarity as to what constitutes an URL and reconfirms web familiarity.

(3) Build Your Own HOME page-- this assignment guarantees their at least cursory familiarity with HTML.

(4) Playing Tag (HTML style)-- here they write C code that processes HTML-- picking out tags and treating them differently from the text between the tags.

(5) A simple word-wrap program-- the students write code that formats plain-text in a primitve but sensible fashion.

(6) URL Parsing-- the students write a function that parses an URL

(7) A Hyper-Simple HTTP Browser -- using the tcpclient module that we provide (see Figure 2), the students put the above pieces together to write a very simple client. The client recieves an URL as an argument, parses it, uses the tcpopen routine to access the server. Following the HTTP protocol their program sends a GET request to the server for the resource. Their program then reads and displays the html text, ignoring all tags except <BR> and <P>. Writing the program is greatly simplified by employing the routines developed in the previous three exercises-- illustrating one of the software engineering principles that the course seeks to convey.

From this point, the students can be given more sophisticated tasks:

(8) Write a client that searches the Web looking for URLs with a particular string in their associated TITLE or text;

(9) Improve the tag processing of the above browswer-- handle more format tags and list hypertext links and permit jumps.

Most of the programming effort in these assignments involves working with strings, pointers, conventional i/o, separate compiled modules and interface files-- the traditional concerns of this course. What is different is that they are now conducted in a context which makes the students acutely aware of client-server computing. The students leave this module with a clear notion of client-server relationship, address and protocol-- building blocks that will be used, reinforced and elaborated on later in their studies.

-----------------------------------------------------------------------------------------------------------------

Figure 2: The tcpclient module: hiding the socket interface

-----------------------------------------------------------------------------------------------------------------

/* tcpclient.h */

FILE *tcpopen(char *hostname, int hostport);

void tcpover(FILE *fp);

-----------------------------------------------------------------------------------------------------------------

The first function, tcpopen, is analogous to fopen, except instead of opening a file, it creates a bidirectional connection to a server listening to a port (hostport) on the given machine (hostname) on the internet.

Like fopen, tcpopen returns a FILE * value, which can be used with standard calls like fprintf, fscanf, fputc, fgetc, and so on.

For example:

fp = tcpopen("acc6.its.brooklyn.cuny.edu", 80);

will open port number 80 of the machine named "acc6.its.brooklyn.cuny.edu".

Apart from the fact that it is opening a connection across the network rather than a file, there are only difference between tcpopen and fopen is that if the program does input and output, then it must call tcpover() each time it switches from input to output or visa versa.

CONCLUSION

We are at the outset of an on-going effort and so it would be presumptuous to make a final assessment. Still, the module approach seems to be working in that it has not displaced anything from the syllabus but has clearly added a new dimension to the course. The students, as one might expect, are quite enthusiastic because the net is a popular topic. For many, this material is the highlight of the course.

The advent of a cohort of students who already are well-versed in network concepts and terminology and who have written small clients, servers and parallel distributed programs brings a new era for those of us who teach the cluster of advanced courses that incorporate substantial levels of distributed computing. For example, in our course on Workstation Programming, we will look at the details of the RPC mechanism and its use as well as examine traditional distributed computing issues such as thread management, synchronization of clocks and fault detection. Greater attention can now be paid to network protocols and their implementation and a lab where students build their own network layers using the Unix STREAMS facility will be designed and integrated into the course. In courses such as Decision Support Software and Simulation students will progress more rapidly from the relatively trivial parallelized independent realizations to more complex models of parallelization and communication. In Programming Languages, students will be well-prepared to assess and understand the tradeoffs in languages such as SR and C-Linda. We hope the result of this will be graduates ready to assume the challenges of computing in the next decade.

REFERENCES

1. Arnow, D.M.: XDP: A simple library for teaching a distributed programming module. Proceedings of the 26th SIGCSE Technical Symposium, Nashville, Tennessee, March 1995.

2. Arnow, D.M., McAloon, and C. Tretkoff: Parallel integer goal programming. Proceedings of the 23rd Annual ACM Computer Science Conference. Nashville, Tennessee, March 1995.

3. Arnow, D.M.: DP -- a library for building reliable, portable distributed programming systems. Proceedings of the USENIX Winter `95 Technical Conference. New Orleans, Jan. 1995.

4. Andrews, G.R.: The distributed programming language SR-- mechanisms, design and implementation. Software-- Practice and Experience 12,8 (Aug. 1982).

5. Bal, H. E., Steiner, J. G., and Tanenbaum, A.S: Programming languages for distributed computing systems. Computing Surveys 21,3 (Sept. 1989).

6. Butler, R., and Lusk, E.: User's guide to the P4 programming system. Tech. Rep. ANL-92/17, Argonne Nat. Lab. (1992).

7. Chen, J.: Distributed Green's function Monte Carlo calculations. Ph.D Thesis, Dept. of CS, CUNY (1994).

8. Chen, J. and Whitlock, P.A.: An Implementation of a System of Distributed Pseudorandom Number Generators," in Lecture Notes in Statistics, ed. H. Niederreiter, (Springer-Verlag, 1995) in press.

9. R. James Duckworth, Introducing Parallel Processing Concepts Using the MASPAR MP-1 Computer, 25th SIGCSE Technical Symposium on Computer Science Education, SIGCSE Bulletin 26,1 March 1994 Phoenix.

10. Allan Fisher and Thomas Gross. Teaching the Programming of Parallel Computers. 22nd SIGCSE Technical Symposium on Computer Science Education, SIGCSE Bulletin 23,1 March 1991 San Antonio.

11. Geist, G.A. and Sunderam, V.S.: PVM-- Network-based concurrent computing on the PVM system. Concurrency: Practice and Experience 4(4) (Jun., 1992).

12. Gelernter, D.: Generative communication in Linda. ACM Transactions on Programming Languages and Systems 7, 1 (Jan. 1985).

13. Goldberg, Arthur P.: Concert/C Tutorial: An Introduction to a Language for Distributed C Programming, IBM Watson Research Center, Yorktown Heights, (Mar., 1993).

14. Harlan, Robert and Akulis, Joseph: Parallel Threads: Parallel Computation Labs for CS 3 and CS 4. 26th SIGCSE Technical Symposium on Computer Science Education, SIGCSE Bulletin 27,1 March 1995 Nashville.

15. Hartman, Janet and Sanders, Dean: Teaching a Course in Parallel Processing with Limited Resources, 22nd SIGCSE Technical Symposium on Computer Science Education, SIGCSE Bulletin 23,1 March 1991 San Antonio.

16. John, David: Integration of Parallel Computation in Introductory Computer Science. 23rd SIGCSE Technical Symposium on Computer Science Education, SIGCSE Bulletin 24,1 March 1992 Kansas City.

17. John, David: NSF Supported Projects: Parallel Computation as an Integrated Component in the Undergraduate Curriculum 25th SIGCSE Technical Symposium on Computer Science Education, SIGCSE Bulletin 26,1 March 1994 Phoenix.

18. Kotz, David: A Data-Parallel Programming Library for Education (DAPPLE). 26th SIGCSE Technical Symposium on Computer Science Education, SIGCSE Bulletin 27,1 March 1995 Nashville

19. MPI Forum: Message Passing Interface Standard (Draft). Oak Ridge National Laboratory. (Nov. 1993).

20. William Toll, Decision Points in the Introduction of Parallel Processing into the Undergraduate Curriculum. 26th SIGCSE Technical Symposium on Computer Science Education, SIGCSE Bulletin 27,1 March 1995 Nashville.

1. As should be apparent from the synopsis above, our "CS 2" course is not the standard ACM CS 2 course that focuses on data structures. The latter is the third course in our programming sequence.