MINIMALISTIC

NETWORK

GENERATOR



An Implementation

Exercise





















Juan G. Molinari

Rochester Institute of Technology

Rochester, NY 14623









ITEE 271

Telecommunications Fundamentals

Hallas





I n t r o d u c t i o n

It is likely an understatement to say that telecommunications technology has experienced a quantum leap in the last 150 years. As significant a development as Gutenberg's printing press, the telegraph made it possible to instantly transmit written information over distances that previously would have taken weeks. This was such an obvious improvement over any previous method that some of the basic problems common to all forms of telecommunication we take for granted today were virtually ignored. Some of those issues arise from the condition of being able to provide communication between any two points, and only two points, and other issues arise from the very elimination of such a restriction (for instance, Raghavendra, 1984). Even though telegraphic signals were criss-crossing the Atlantic by cable as early as 1858, it was not until 1878 that the first telephone exchange was installed in Boston, Massachusetts. This may not seem like such an important development until one imagines the situation created by the necessity of having a separate telephone connection for every person which one wishes to be able to reach; a company president who wants each of his ten regional managers to be a phone "call" away would need to have mounted on his office wall a row of telephones resembling the courtesy phone bank in O'Hare International. It is important to note that while the luxury of telephone communication was certainly unattainable for any popular means until well into this century, the issue remained one of financial proportions; if the executive in question had unlimited funds (as well as unlimited need) he could in theory be in direct telephone contact with as many people as he wished, and provide interconnections among his underlings as well, a situation of which the logical extreme would be a meshed network(1).

A telephone exchange, however, provides for the dynamic allocation of telephone lines, operating under the exact same principles as a railroad switch, which can redirect trains off one rail and unto another if the first was occupied with another train. As a graphic demonstration of the benefits of a switched versus a fixed-point-to-point system, take these two hypothetical networks, each having 5 nodes. In the A network, a meshed connectivity has been implemented so that every node has a direct connection to every other node, giving us a total of 10 full-duplex lines. This can be very expensive. For the B network nodes (3) and (4) have been recruited to serve as tandems, or hubs. Data generated at node (2) which is destined for node (5), for example, would first be routed to node (3) and then (4), which would then transmit it to its ultimate destination.

One of the physical realities of "sharing" a line is that data throughput would suffer. As a specific example, suppose that node (5) generates an average of 50K per day destined for node (3), and that node (4), in turn, generates 100K per day for node (2). What that means is that the (43) line would have to be able to handle both data streams simultaneously, thus increasing its cost(2).

As stated before, the nature of the network will determine which of the two approaches would be better suited for the transmission of information; network A would be adequate, in the strict sense of the word, for a phone network except for the fact that (a) the number of phones in one's living room would equal the number of people one wished to be able to reach, and (b) the cost of implementing such a network would increase proportionally, making it financially unrealistic. If the network, on the other hand, was destined for pure data transmission, then the A model would be able to get data from any node to any other node much quicker than the B model, which would require additional time to travel from one node to another (for various hardware-related reasons). On the other hand, having a network with only 4 lines instead of 10 (as in our example) would prove considerably more affordable, even if there is an extra cost for high-capacity lines.

The debate, of course, begins when it is time to decide which nodes are to have direct connections between them, and which lines are to have more capacity than others. There are many factors to take into consideration in any network that tries to avoid the financial "black hole" that is meshed-connection (e.g., Guterl, 1983). For example, node (4) in the above example could employ additional hardware which would allow it to buffer the data generated by node (5), allowing it to hold it in temporary storage until the line became free; since the data is literally "taking turns" using the (43) line it would perhaps be unnecessary to provide for additional throughput, thus a one-time payment for the buffering hardware would eliminate the need for a high-speed line which inevitably incurs steep monthly payments ad perpetuum. Sorting out possibilities like these, and many others, is a task that requires the skills of a creative network administrator, not unlike that of a personnel resource manager who must coordinate the various skills of a number of employees in order to get a job done. Some discrete algorithms do exist for making this task easier(3), and this exercise will focus on one of the simpler ones.

Kruskal's algorithm for finding "minimal spanning trees" (Kershenbaum, 1993) is simple and straightforward. The term spanning tree, first of all, refers to a fully connected(4) network in which there is only one "route" from any node to any other node ("non-looping"). For example, graph A to the right represents a fully connected, non-looping spanning tree. On the other hand, graph B shows a network which is not (a) non-looping, because, for example, there is more than one "route" to node (1) from node (5) (e.g., 521 or 5231), nor (b) connected, as node (7) has no way of reaching any of the other nodes in the network. Kruskal's algorithm, as mentioned above, is a method of deliberating graphs that look like A above while striving to minimize the overall distance between nodes(5). The procedure is as follows:

1) Measure the distance between all nodes in the network; this should yield a list of N*(N-1)/2 node-pairs and the distance between them. Call this set TP, for "total pairs".

2) Sort TP so that the first element in the set is the smallest, that is, the first element in the set should be the pair of nodes with the smallest distance between them.

3) Remove the first (smallest) element in TP and add that node connection (edge, in graph theory) to a graph G.

4) Repeat step (3) for all connections (edges) in TP until G is fully connected, rejecting those that form loops in G.

Simple!

M e t h o d

We've covered the what; this next section deals with the how, in sufficient detail (along with the actual code used -- see Appendix) for anyone who is interested enough in the subject matter to easily recreate or elaborate on the work presented.

First, the data. A table of V-H coordinates was taken from Fitzgerald, 1993 (p.388), which provides for a handy list of nodes and their relative locations. V-H coordinates are used primarily for determining, not the geographic location of a node (which is often not useful in itself), but the distance between a pair of nodes (at least in this case). The formula yielding the distance in miles is:

This formula was implemented in a function called distance() which took as its arguments four integers (ordinals), processed them according to the above formula, and returned the result to the calling function. This procedure required (in comprehensible form) 11 lines of C code. This fact is reported merely to give the reader an idea of how concepts that human beings take as a matter of course (such as the above formula) translate into a complicated sequence of instructions whose presentation is prepared for the computer's benefit. This relationship is one of the main points in this study, and one which will be elaborated later.



What The Program Does

The very first thing the program does upon execution is begin reading the list of nodes, whether from the keyboard or from a file(6), using a predetermined format. Each record contains four field; in order (1) town, (2) state, (3) V, (4) H. These records are placed in an unsorted linked-list(7). After the entire "master" set of nodes has been read, a subset of "network nodes"(8) is selected at random(9). The reason for this dual-step process is to allow for the same initial data set to generate different possible initial scenarios. A list of all possible node-pair combinations, (which would form a fully-meshed network) is compiled and placed in an ordered linked-list(10), shortest distance first. The head of this list (the first item) is removed, one at a time, and attached to what will become the physical network, which is represented internally as a linked-list of node-pairs, identical in form to that used to hold the meshed network. Up to this point the handling of the data has been relatively straightforward; the program has merely had to read in information and arrange it in linear form.

The most difficult problems (by far) encountered in this exercise were those of detecting loops in the network, and for ending the run when connectivity has been achieved.

As a way of identifying the difficulty, note that it is easy for human beings to detect either situation mentioned above because we process information in parallel form, taking in an entire graph at once (within reason) and detecting loops and connectivity because the information is interpreted in abstract form; that is, loops can be seen as easily-recognized geometrical figures such as triangles (as in the graph above), quadrilaterals, octagons, etc. The same problem exists for that of connectivity; while it is a simple matter for a human being to look at the above graph and say "Well, it is in two parts, so it cannot be connected", a computer would have to be taken by the hand and taught every elementary step in order to come to the same conclusion. The reason for this is that computers (at present) process information in strictly serial form; a computer's definition of a loop is not "a closed link of nodes", but along the lines of "a path formed when there is a way to leave a node and return to it without passing over the same node". Needless to say, this can translate to many pages of code, if it is not handled intelligently.

Luckily the solution for the first problem proved to be the key for the second as well. In short, what was required was a way to model the data so that the answer could be detected easily by a computer; a plain connected graph in the form of a dynamically-linked tree clearly wouldn't do. The answer becomes clearer when one considers that a partially-connected network has various "groups" or "clumps" of nodes, which are fully connected in themselves. At first, the network has no links and each node is its own clump. When the first link is added, the two nodes at each end of the link become a single, connected clump. As links are added to any of the nodes in that clump, all the nodes involved are consolidated into a single clump.

As can be seen in the "before & after", there were three original connected clumps, comprised each of nodes (1,2), (3,4) and (5). A link (13) was added which connected the two logical clumps, and merged then into one, comprised of nodes (1, 2, 3, 4). Node (5), lacking any connections, remains in its own clump. If we were then to add another link, say (15), the lone clump would be consolidated with the larger clump, forming one single clump encompassing the entire network. At this point, the network would be fully connected and we would have reached the end of our run.

How do we teach the computer to think in terms of "clumps"? Internally, the original state of the example above (before any links are established) would be represented as shown on the right. If N is the number of nodes in the network, we start out with a linked-list of N arrays, each of length N. Each of these arrays represents a clump in our network, and the elements with a "1" in them represents the index of the nodes that are members in that clump(11). When we add a link to the network, the arrays (clumps) where each of the two nodes are members are consolidated into one by (a) performing a logical union on the two clump-arrays in question, thus creating a new "clump" which encompasses all the nodes in each of the two original clumps, and (b) eliminating one of the nodes in the linked-list of arrays/clumps.

Thus, our list of arrays/clumps might(12) look like shown at right before the (13) link was added to our previous example. We have three arrays/clumps, the first made up of nodes (1,2), the second of nodes (3,4), and a third containing a single node (5).

After adding the (13) link, we find that there are now only two arrays/clumps left, the first made up of the nodes (1,2,3,4) (which is the logical union of the two former clumps), and a second clump, containing only node (5). It need hardly be stated that connectivity will have been achieved when there is only one array/clump, containing nothing but "1"s.

As stated before, this scheme for representing the connectivity status of our network has a dual benefit, which is that it allows us to detect loops even before we actually attach a new edge to our network. Visually, the situation can be represented by the graph at the right (which is unrelated to the previous examples). If we are considering a (78) link, all we would need do in order to check for a loop is to make the computationally simple inquiry of whether the two nodes (7,8) belong to the same array/clump. In this case, we can see (as humans) that they do, simply because they are contained within a common gray area in our graph. The computer can also see this fact, because both the 7th and 8th index of the same array/clump would contain a "1".

C o n c l u s i o n

The purpose of this exercise was to serve as an exploration of the complexities that are involved with the design of communication networks. Clearly, only a minimal subset of the issues were even mentioned(13), but this serves the purpose just as well, because we can see that just two of those issues (connectivity and loop detection) require several pages of explanation when the problem needs to be translated to a form that a computer can easily digest. For example, routines for determining data load between nodes in the network were written (and can still be examined in the Appendix) but the data they generate was not used in this exercise because the complexity involved in the introduction of a new criteria for network evaluation (i.e., line load, in addition to line length) would have been well beyond the scope of this study.

And perhaps that is the crux of the complexity of network analysis. A human being can (with a little practice) look at the diagram of a network containing many dozens of nodes of different types, connected with lines of various capacities, and say without much deliberation "This network needs another hub here." A computer, on the other hand, cannot. We can, however, utilize monolithic (and expensive) programs that give each and every atomic element in the system huge amounts of attention while creating thousands upon thousands of virtual scenarios, evaluating them under some sort of predetermined criteria, finally coming up with an answer in the form of "This new network I've constructed would be more efficient than the original one." It is a fundamentally different form of thinking, and this difference makes network analysis a very troublesome matter.



A c k n o w l e d g m e n t

I would like to thank professor Stanislaw P. Radziszowski of the Computer Science & Information Technology department at RIT, who not only made the suggestion of Kruskal's algorithm itself, but also provided the schemes for checking for network connectivity and loop detection. The completion of the program portion of this exercise would not have been possible without his input.



R e f e r e n c e s





Bursky, Dave. "Simulator Eases Communication Network Design." Electronic Design, 37:97-98, Oct 12, 1989.

Fitzgerald, Jerry. Business Data Communications. John Wiley & Sons, Inc., New York, 1993. pp388-392.

Guterl, Fred. "Networks: Wiring a Nation." IEEE Spectrum, 20:76, Nov 1983.

Kershenbaum, Aaron. Telecommunications Network Design Algorithms. McGraw-Hill, Inc., New York, 1993. pp144-153.

Raghavendra, C. S. "Fault Tolerance in Regular Network Architectures." IEEE Micro, 23:44-47, Dec 1984.

Tenenbaum, Aaron M. Data Structures Using Pascal. Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1981.

1. A meshed network is one in which every node has a direct (usually full-duplex) connection to every other node, so that there are (N-1) connections attached to each node.

2. This is not always the case; many other factors come into play, and some of these will be mentioned further on.

3. In addition, there exist some software packages which will simply throw raw (and not so raw) computing power at the problem, considering every single conceivable network configuration and spitting out an "optimal" one. These software packages can save a large company as much as $50,000 a year in network costs, and coincidentally, this is how much they usually cost (Bursky, 1989).

4. Connected, that is, there is some link, direct or indirect, between any node and any other node in the network.

5. This goal is important when designing a communications network, as line cost is usually dependant on the two major factors of capacity and length.

6. This is determined at execution time by the user.

7. For a reference to all sorts of data structures and algorithms for manipulating them, see Tenenbaum, 1981.

8. The master set of records used in this study numbered 354, and the formula for determining the size of the subset was (rand() % 353) / 10 + 5.

9. The selection algorithms was really nothing more complicated than choosing a random number between 1 and N, and then choosing the Nth node in the linked-list of nodes. Something more efficient might have been devised (such as selection from a non-linear data structure) if time had permitted. As things stood, it worked, so it was left alone.

10. The algorithm for "meshing" a network can be done simply by means of a pair of nested loops; the outer one runs once for every node on a list, and so does the inner one. Inside the inner loop would be the code that pairs the "current node" in the inner loop with the "current node" in the outer loop.

11. In our implementation, nodes are assigned an index number as they are chosen from the master-set of nodes, the first one being "1", and so on until N. Thus, the first element in each of the clumps/arrays represents the node with index "1", and so on.

12. This depends strictly, of course, on the order the nodes were chosen, the arrangement of the edges, etc.

13. Hence the title of the paper. What, you thought I meant "Minimalistic Network Generator" ?