MINIMALISTIC
NETWORK
GENERATOR
An Implementation
Exercise
Juan G. Molinari
Rochester Institute of Technology
Rochester, NY 14623
ITEE 271
Telecommunications Fundamentals
Hallas
I n t r o d u c t i o n
It is likely an understatement to say that telecommunications technology has experienced a
quantum leap in the last 150 years. As significant a development as Gutenberg's printing press, the
telegraph made it possible to instantly transmit written information over distances that previously
would have taken weeks. This was such an obvious improvement over any previous method that
some of the basic problems common to all forms of telecommunication we take for granted today
were virtually ignored. Some of those issues arise from the condition of being able to provide
communication between any two points, and only two points, and other issues arise from the very
elimination of such a restriction (for instance, Raghavendra, 1984). Even though telegraphic signals
were criss-crossing the Atlantic by cable as early as 1858, it was not until 1878 that the first
telephone exchange was installed in Boston, Massachusetts. This may not seem like such an
important development until one imagines the situation created by the necessity of having a separate
telephone connection for every person which one wishes to be able to reach; a company president
who wants each of his ten regional managers to be a phone "call" away would need to have mounted
on his office wall a row of telephones resembling the courtesy phone bank in O'Hare International.
It is important to note that while the luxury of telephone communication was certainly unattainable
for any popular means until well into this century, the issue remained one of financial proportions;
if the executive in question had unlimited funds (as well as unlimited need) he could in theory be
in direct telephone contact with as many people as he wished, and provide interconnections among
his underlings as well, a situation of which the logical extreme would be a meshed network(1).
One of the physical realities of "sharing" a line is that data throughput would suffer. As a
specific example, suppose that node (5) generates an average of 50K per day destined for node (3),
and that node (4), in turn, generates 100K per day for node (2). What that means is that the (43)
line would have to be able to handle both data streams simultaneously, thus increasing its cost(2).
As stated before, the nature of the network will determine which of the two approaches
would be better suited for the transmission of information; network A would be adequate, in the
strict sense of the word, for a phone network except for the fact that (a) the number of phones in
one's living room would equal the number of people one wished to be able to reach, and (b) the cost
of implementing such a network would increase proportionally, making it financially unrealistic.
If the network, on the other hand, was destined for pure data transmission, then the A model would
be able to get data from any node to any other node much quicker than the B model, which would
require additional time to travel from one node to another (for various hardware-related reasons).
On the other hand, having a network with only 4 lines instead of 10 (as in our example) would prove
considerably more affordable, even if there is an extra cost for high-capacity lines.
The debate, of course, begins when it is time to decide which nodes are to have direct
connections between them, and which lines are to have more capacity than others. There are many
factors to take into consideration in any network that tries to avoid the financial "black hole" that is
meshed-connection (e.g., Guterl, 1983). For example, node (4) in the above example could employ
additional hardware which would allow it to buffer the data generated by node (5), allowing it to
hold it in temporary storage until the line became free; since the data is literally "taking turns" using
the (43) line it would perhaps be unnecessary to provide for additional throughput, thus a one-time
payment for the buffering hardware would eliminate the need for a high-speed line which inevitably
incurs steep monthly payments ad perpetuum. Sorting out possibilities like these, and many others,
is a task that requires the skills of a creative network administrator, not unlike that of a personnel
resource manager who must coordinate the various skills of a number of employees in order to get
a job done. Some discrete algorithms do exist for making this task easier(3), and this exercise will
focus on one of the simpler ones.
1) Measure the distance between all nodes in the network; this
should yield a list of N*(N-1)/2 node-pairs and the
distance between them. Call this set TP, for "total
pairs".
2) Sort TP so that the first element in the set is the smallest,
that is, the first element in the set should be the pair of
nodes with the smallest distance between them.
3) Remove the first (smallest) element in TP and add that node
connection (edge, in graph theory) to a graph G.
4) Repeat step (3) for all connections (edges) in TP until G is
fully connected, rejecting those that form loops in G.
Simple!
M e t h o d
We've covered the what; this next section deals with the how, in sufficient detail (along with
the actual code used -- see Appendix) for anyone who is interested enough in the subject matter to
easily recreate or elaborate on the work presented.
First, the data. A table of V-H coordinates was taken from Fitzgerald, 1993 (p.388), which
provides for a handy list of nodes and their relative locations. V-H coordinates are used primarily for
determining, not the geographic location of a node (which is often not useful in itself), but the
distance between a pair of nodes (at least in this case). The formula yielding the distance in miles
is:
What The Program Does
The very first thing the program does upon execution is begin reading the list of nodes,
whether from the keyboard or from a file(6), using a predetermined format. Each record contains four
field; in order (1) town, (2) state, (3) V, (4) H. These records are placed in an unsorted linked-list(7).
After the entire "master" set of nodes has been read, a subset of "network nodes"(8) is selected at
random(9). The reason for this dual-step process is to allow for the same initial data set to generate
different possible initial scenarios. A list of all possible node-pair combinations, (which would form
a fully-meshed network) is compiled and placed in an ordered linked-list(10), shortest distance first.
The head of this list (the first item) is removed, one at a time, and attached to what will become the
physical network, which is represented internally as a linked-list of node-pairs, identical in form to
that used to hold the meshed network. Up to this point the handling of the data has been relatively
straightforward; the program has merely had to read in information and arrange it in linear form.
The most difficult problems (by far) encountered in this exercise were those of detecting
loops in the network, and for ending the run when connectivity has been achieved.
Luckily the solution for the first problem proved to be the key for the second as well. In
short, what was required was a way to model the data so that the answer could be detected easily by
a computer; a plain connected graph in the form of a dynamically-linked tree clearly wouldn't do.
The answer becomes clearer when one considers that a partially-connected network has various
"groups" or "clumps" of nodes, which are fully connected in themselves. At first, the network has
no links and each node is its own clump. When the first link is added, the two nodes at each end of
the link become a single, connected clump. As links are added to any of the nodes in that clump,
all the nodes involved are consolidated into a single clump.
As stated before, this scheme for representing the connectivity status of our network has a
dual benefit, which is that it allows us to detect loops even before we actually attach a new edge to
our network. Visually, the situation can be represented C o n c l u s i o n
The purpose of this exercise was to serve as an exploration of the complexities that are
involved with the design of communication networks. Clearly, only a minimal subset of the issues
were even mentioned(13), but this serves the purpose just as well, because we can see that just two of
those issues (connectivity and loop detection) require several pages of explanation when the problem
needs to be translated to a form that a computer can easily digest. For example, routines for
determining data load between nodes in the network were written (and can still be examined in the
Appendix) but the data they generate was not used in this exercise because the complexity involved
in the introduction of a new criteria for network evaluation (i.e., line load, in addition to line length)
would have been well beyond the scope of this study.
And perhaps that is the crux of the complexity of network analysis. A human being can (with
a little practice) look at the diagram of a network containing many dozens of nodes of different
types, connected with lines of various capacities, and say without much deliberation "This network
needs another hub here." A computer, on the other hand, cannot. We can, however, utilize
monolithic (and expensive) programs that give each and every atomic element in the system huge
amounts of attention while creating thousands upon thousands of virtual scenarios, evaluating them
under some sort of predetermined criteria, finally coming up with an answer in the form of "This
new network I've constructed would be more efficient than the original one." It is a fundamentally
different form of thinking, and this difference makes network analysis a very troublesome matter.
A c k n o w l e d g m e n t
I would like to thank professor Stanislaw P. Radziszowski of the Computer Science &
Information Technology department at RIT, who not only made the suggestion of Kruskal's
algorithm itself, but also provided the schemes for checking for network connectivity and loop
detection. The completion of the program portion of this exercise would not have been possible
without his input.
R e f e r e n c e s
Bursky, Dave. "Simulator Eases Communication Network Design." Electronic Design, 37:97-98,
Oct 12, 1989.
Fitzgerald, Jerry. Business Data Communications. John Wiley & Sons, Inc., New York, 1993.
pp388-392.
Guterl, Fred. "Networks: Wiring a Nation." IEEE Spectrum, 20:76, Nov 1983.
Kershenbaum, Aaron. Telecommunications Network Design Algorithms. McGraw-Hill, Inc., New
York, 1993. pp144-153.
Raghavendra, C. S. "Fault Tolerance in Regular Network Architectures." IEEE Micro, 23:44-47,
Dec 1984.
Tenenbaum, Aaron M. Data Structures Using Pascal. Prentice-Hall, Inc., Englewood Cliffs, New
Jersey, 1981.
1. A meshed network is one in which every node has a direct (usually full-duplex) connection to every other
node, so that there are (N-1) connections attached to each node.
2. This is not always the case; many other factors come into play, and some of these will be mentioned
further on.
3. In addition, there exist some software packages which will simply throw raw (and not so raw) computing
power at the problem, considering every single conceivable network configuration and spitting out an "optimal"
one. These software packages can save a large company as much as $50,000 a year in network costs, and
coincidentally, this is how much they usually cost (Bursky, 1989).
4. Connected, that is, there is some link, direct or indirect, between any node and any other node in the
network.
5. This goal is important when designing a communications network, as line cost is usually dependant on the
two major factors of capacity and length.
6. This is determined at execution time by the user.
7. For a reference to all sorts of data structures and algorithms for manipulating them, see Tenenbaum, 1981.
8. The master set of records used in this study numbered 354, and the formula for determining the size of the
subset was (rand() % 353) / 10 + 5.
9. The selection algorithms was really nothing more complicated than choosing a random number between 1
and N, and then choosing the Nth node in the linked-list of nodes. Something more efficient might have been
devised (such as selection from a non-linear data structure) if time had permitted. As things stood, it worked, so it
was left alone.
10. The algorithm for "meshing" a network can be done simply by means of a pair of nested loops; the outer
one runs once for every node on a list, and so does the inner one. Inside the inner loop would be the code that pairs
the "current node" in the inner loop with the "current node" in the outer loop.
11. In our implementation, nodes are assigned an index number as they are chosen from the master-set of
nodes, the first one being "1", and so on until N. Thus, the first element in each of the clumps/arrays represents the
node with index "1", and so on.
12. This depends strictly, of course, on the order the nodes were chosen, the arrangement of the edges, etc.
13. Hence the title of the paper. What, you thought I meant "Minimalistic Network Generator" ?

A telephone exchange, however, provides for the dynamic allocation of telephone lines,
operating under the exact same principles as a railroad switch, which can redirect trains off one rail
and unto another if the first was occupied with another train.
As a graphic demonstration of the benefits of a switched versus
a fixed-point-to-point system, take these two hypothetical
networks, each having 5 nodes. In the A network, a meshed
connectivity has been implemented so that every node has a
direct connection to every other node, giving us a total of 10
full-duplex lines. This can be very expensive. For the B
network nodes (3) and (4) have been recruited to serve as
tandems, or hubs. Data generated at node (2) which is destined
for node (5), for example, would first be routed to node (3) and
then (4), which would then transmit it to its ultimate
destination.

Kruskal's algorithm for finding "minimal spanning
trees" (Kershenbaum, 1993) is simple and straightforward.
The term spanning tree, first of all, refers to a fully connected(4)
network in which there is only one "route" from any node to
any other node ("non-looping"). For example, graph A to the
right represents a fully connected, non-looping spanning tree.
On the other hand, graph B shows a network which is not (a)
non-looping, because, for example, there is more than one
"route" to node (1) from node (5) (e.g., 521 or 5231), nor
(b) connected, as node (7) has no way of reaching any of the
other nodes in the network. Kruskal's algorithm, as mentioned above, is a method of deliberating
graphs that look like A above while striving to minimize the overall distance between nodes(5). The
procedure is as follows:
This formula was implemented in a function called distance() which took as its arguments
four integers (ordinals), processed them according to the above formula, and returned the result to
the calling function. This procedure required (in comprehensible form) 11 lines of C code. This fact
is reported merely to give the reader an idea of how concepts that human beings take as a matter of
course (such as the above formula) translate into a complicated sequence of instructions whose
presentation is prepared for the computer's benefit. This relationship is one of the main points in this
study, and one which will be elaborated later.
As a way of identifying the difficulty, note that it is
easy for human beings to detect either situation mentioned
above because we process information in parallel form, taking
in an entire graph at once (within reason) and detecting loops
and connectivity because the information is interpreted in
abstract form; that is, loops can be seen as easily-recognized
geometrical figures such as triangles (as in the graph above), quadrilaterals, octagons, etc. The same
problem exists for that of connectivity; while it is a simple matter for a human being to look at the
above graph and say "Well, it is in two parts, so it cannot be connected", a computer would have to
be taken by the hand and taught every elementary step in order to come to the same conclusion. The
reason for this is that computers (at present) process information in strictly serial form; a computer's
definition of a loop is not "a closed link of nodes", but along the lines of "a path formed when there
is a way to leave a node and return to it without passing over the same node". Needless to say, this
can translate to many pages of code, if it is not handled intelligently.

As can be seen in the "before & after", there were
three original connected clumps, comprised each of nodes
(1,2), (3,4) and (5). A link (13) was added which
connected the two logical clumps, and merged then into
one, comprised of nodes (1, 2, 3, 4). Node (5), lacking any
connections, remains in its own clump. If we were then to
add another link, say (15), the lone clump would be
consolidated with the larger clump, forming one single
clump encompassing the entire network. At this point, the
network would be fully connected and we would have
reached the end of our run.
How do we teach the computer to think in
terms of "clumps"? Internally, the original state of
the example above (before any links are
established) would be represented as shown on the
right. If N is the number of nodes in the network,
we start out with a linked-list of N arrays, each of
length N. Each of these arrays represents a clump in our network, and the elements with a "1" in
them represents the index of the nodes that are members in that clump(11). When we add a link to the
network, the arrays (clumps) where each of the two nodes are members are consolidated into one by
(a) performing a logical union on the two clump-arrays in question, thus creating a new "clump"
which encompasses all the nodes in each of the two original clumps, and (b) eliminating one of the
nodes in the linked-list of arrays/clumps.
Thus, our list of arrays/clumps might(12) look
like shown at right before the (13) link was added
to our previous example. We have three
arrays/clumps, the first made up of nodes (1,2), the
second of nodes (3,4), and a third containing a
single node (5).
After adding the (13) link, we find that
there are now only two arrays/clumps left, the first
made up of the nodes (1,2,3,4) (which is the logical
union of the two former clumps), and a second
clump, containing only node (5). It need hardly be
stated that connectivity will have been achieved
when there is only one array/clump, containing nothing but "1"s.
by the graph at the right (which is unrelated
to the previous examples). If we are considering a (78) link, all we would need do in order to check
for a loop is to make the computationally
simple inquiry of whether the two nodes (7,8)
belong to the same array/clump. In this case,
we can see (as humans) that they do, simply
because they are contained within a common
gray area in our graph. The computer can also
see this fact, because both the 7th and 8th index
of the same array/clump would contain a "1".