Chapter 7

Chapter 7

Functions and friends

Technically a function is a block of code which takes in parameters and returns a numerical value. That's what we're use to in math at least. Many languages such as pascal make a distinction between functions and what are called subroutines.. A subroutine is not meant to actually return a value, but instead to just work on code that might be repeated throughout the program. Languages like C don't make such a distinction, though you do have to specify a return argument of void if you don't return anything. Perl is even more abstracted. Everything is simply called a subroutine.

Here is a trivial example of a program that could use subroutines:

1  #!/usr/bin/perl -w
2  use strict;
3  
4  # declare variables
5  my $value1; # The first user provided value
6  my $value2; # The second user provided value
7  
8  $value1 = -1;
9  
10 BLOCK1: {
11     print "Give me a number between 1 and 10: ";
12     chomp( $value1 = <STDIN> );
13     if ( $value1 < 1 || $value1 > 10 )
14     {
15         print "Incorrect, please try again.\n";
16         redo BLOCK1;
17     }
18 }
19 
20 print "Thank for you providing the number $value1.\n";
21 
22 print "Now please provide a second number.\n";
23 
24 $value2 = -1;
25 BLOCK2 {
26     print "Give me a number between 1 and 10: ";
27     chomp( $value2 = <STDIN> );
28     if ( $value2 < 1 || $value2 > 10 )
29     {
30         print "Incorrect, please try again.\n";
31         redo BLOCK2;
32     }
33 }
34 
35 print "Thank for you providing the number $value2.\n";
36 
37 print "The sum of your two numbers are ", ( $value1 + $value2 ), ".\n";
no_sub.pl

The program is relatively straight forward.. We read in two numbers from the user, but we're checking to make sure that the entries are between 1 and 10 inclusively. The only peculiar part is lines 16 and 31 which use redo. This jumps to the top of the given block. Obviously we have a lot of redundancy here. So now we'll rewrite the program with subroutines.

1  #!/usr/bin/perl -w
2  use strict;
3  
4  # declare variables
5  my $value1; # The first user provided value
6  my $value2; # The second user provided value
7  
8  ####################
9  # Function: getNum1to10
10 # Purpose:  Get a valid constrained number from the user
11 # In:       None
12 # Out:      The extracted number
13 ####################
14 sub getNum1to10
15 {
16     my $value = -1;  # user provided value
17 
18   BLOCK: {
19         print "Give me a number between 1 and 10: ";
20         chomp( $value = <STDIN> );
21         if ( $value < 1 || $value > 10 )
22         {
23             print "Incorrect, please try again.\n";
24             redo BLOCK;
25         }
26     }
27 
28     print "Thank for you providing the number $value.\n";
29 
30     return $value;
31 } # end sub getNum1to10
32 
33 $value1 = getNum1to10();
34 
35 print "Now please provide a second number.\n";
36 
37 $value2 = getNum1to10();
38 
39 print "The sum of your two numbers are ", ( $value1 + $value2 ), ".\n";
simple_sub.pl

The most important sections of this new program are lines 14, 33 and 37. Line 14 declares a subroutine; a chunk of reusable and named code. Lines 33 and 37 invoke the subroutine and save it's returned value to local variables. Thus the redundant semi-complex code is eliminated.. This makes for a smaller program, a more readible program (because we've associated a name with the complex operation), and we have a reduced likelyhood of typing in syntax-errors.

Lines 8-13 are called a function-header. It is completely optional, and for very small subroutines, it's generally over-kill, but you should be in the habbit of providing these sorts of descriptions. Namely, lines 8,13 stand out as a function-header; when there are dozens of functions in a file, this will help quickly identify the start of each of them. Line 9 is redundant with line 14, but it helps identify what this huge comment block really is; a function-header. Line 10 is a summary of what the function does. This is akin to having section-comments like line 4. Lines 11 and 12 are subtle but important; especially in perl. Normally in languages like c, all the inputs and outputs of a function are named and thus somewhat readible (assuming the variable names are meaningful). In perl, there is no such prototype. Thus in time, you might forget what the function's inputs and outputs are. This is especially true in very large programs. To facilitate this, there are a couple of rules of thumb to go by.

  1. Describe the inputs/outputs as we've done here in lines 11-12.
  2. Have only one return statement, and put it at the very bottom of the function, as with line 30
  3. Have any inputs extracted within the first 3 or 4 lines of code. Note that this example has no inputs.
These are good programming practices for any language. It reduces the chances of forgetting an important input/output; especially when only taking quick glances at older code.

An important thing to note is that we've declared a variable inside the function on line 16. This variable can only be seen within this function. In fact, it's only visible within this invocation of the function. If the function is called multiple times or even in parallel, then the value of $value is not shared / remembered. It would be considered an error to do the following:

sub mySub
{
  my $number = 2;
} # end mySub

print "Number is $number\n";
The reason this is an error is because the variable is declared inside the scope of the subroutine, and is thus not accessible outside of it.

In fact, scoping is even more general than this. Any curly-brackets can be used to encompase a new scope for variable declarations. It's even possible to reuse old variable names. For example.

my $var = 1;
my $val = 2;
if ( 1 == 1 )
{
  my $var = 2;
  print "var = $var, val = $val\n"; # prints "var = 2, val = 2"
}
print "var = $var, val = $val\n"; # prints "var = 1, val = 2"
It may seem bizzar that the last line prints 1, but remember that the scope of $var changes inside the curly-brackets of the if-statement. Once you leave an inner-scope, you return to the previously defined scope as if the inner scope never existed.

Granted, defining variables inside if-statements is considered bad-practice (they should go at the top of the program or function), it's often useful for very temporary instances; especially in longer while-loops. But most importantly, it's essential for subroutines as we saw on line 16 in "simple_sub.pl".

Function input/output

We've already seen the above function getNum1to10 return a value, but that's only one form.

1   #!/usr/bin/perl -w
2   use strict;
3   
4   ###################  Declare Functions ###############
5   
6   ##############################
7   # Function: average
8   # Purpose:  Calculates the average of a bunch of numbers
9   # In:       array of ints - numbers to average
10  # Out:      float - the floating point average of the inputs
11  # Pre:      The inputs are not checked to verify that they are integers
12  # Post:     None
13  ##############################
14  sub average
15  {
16      my @digits = @_; # The input integers
17      my $total  = 0;  # The running total (initialized)
18      my $digit;       # An index into the list of digits
19      my $num_digits;  # The size of the array;
20  
21      # "scalar = array" means to extract the size of the array
22      $num_digits = @digits;
23  
24      # Run through the numbers
25      for $digit ( @digits )
26      {
27          $total += $digit;
28      }
29  
30      return $total / $num_digits;
31  } # end average
32  
33  ##############################
34  # Function: deviation
35  # Purpose:  Calculates the deviation of a bunch of numbers
36  # In:       array of ints - numbers to average
37  # Out:      float - the floating point statistical deviation of the inputs
38  # Pre:      The inputs are not checked to verify that they are integers
39  # Post:     None
40  ##############################
41  sub deviation
42  {
43      my @digits = @_; # The input integers
44      my $total  = 0;  # The running total (initialized)
45      my $digit;       # An index into the list of digits
46      my $num_digits;  # The size of the array;
47      my $average;     # The average of the set of numbers
48  
49      # "scalar = array" means to extract the size of the array
50      $num_digits = @digits;
51  
52      # calculate the average (from above function)
53      $average = average( @digits );
54  
55      # Run through the numbers
56      for $digit ( @digits )
57      {
58          $total += ( $digit - $average ) ** 2;
59      }
60  
61      return $total / $num_digits ** 2;
62  } # end deviation
63  
64  ##############################
65  # Function: stddev
66  # Purpose:  Calculates the standard deviation of a bunch of numbers
67  # In:       array of ints - numbers to average
68  # Out:      float - the floating point statistical std-dev of the inputs
69  # Pre:      The inputs are not checked to verify that they are integers
70  # Post:     None
71  ##############################
72  sub stddev
73  {
74      return sqrt( deviation(@_) );
75  } # end stddev
76  
77  ##############################
78  # Function: stddevFromDev
79  # Purpose:  Calculates the standard deviation previosly calculated deviation
80  # In:       float - The deviation of a set of numbers
81  # Out:      float - the floating point statistical std-dev of the input
82  # Pre:      The input is not checked to verify that it is an integer
83  # Post:     None
84  ##############################
85  sub stddevFromDev
86  {
87      return sqrt( $_[0] );
88  } # end stddevFromDev
89  
90  ##############################
91  # Function: statistics
92  # Purpose:  Calculates the average, deviation and std-dev of a bunch of numbers
93  # In:       array of ints - numbers to average
94  # Out:      hash:
95  #             size => int      # how many numbers were provided
96  #             avg => float     # The average
97  #             dev => float     # The statistical deviation
98  #             stddev => float  # The statistical standard deviation
99  # Pre:      The inputs are not checked to verify that they are integers
100 # Post:     None
101 ##############################
102 sub statistics
103 {
104     my @digits = @_;  # The input
105     my %result;       # The soon-to-be output
106 
107     # Fill in the return data-structure with calculations
108     %result = (
109                size    => scalar( @digits ),
110                avg     => average( @digits ),
111                dev     => deviation( @digits ),
112                #stddev => stddev( @digits ), # more efficiently calculated below
113               );
114 
115     # compute stddev from pre-computed deviation
116     $result{stddev} = stddevFromDev( $result{dev} );
117 
118     return %result;
119 } # end statistics
120 
121 ######################### MAIN #########################
122 
123 # Declare variables
124 my $number = -1;   # User-specified number
125 my @numbers;       # collection of user-entered numbers
126 my %statistics;    # calculated statistics on @numbers
127 
128 
129 # get list of numbers
130 while ( $number ne "" )
131 {
132     print "Enter a number (enter when finished): ";
133     chomp( $number = <STDIN> );
134     push @numbers, $number if $number ne "";
135 }
136 
137 # calculate the statistics
138 %statistics = statistics( @numbers );
139 
140 # display the results
141 print <<EOS;
142 
143 Here are the statistics:
144 
145 Num Els:   $statistics{size}
146 Average:   $statistics{avg}
147 Deviation: $statistics{dev}
148 Std Dev:   $statistics{stddev}
149 EOS
sub_examples.pl

Now this is a doozy. We're starting to encounter some very large files. Also note that some of these functions seem generic and not particularly taylored to this program.. It would be nice if we could just write them once, save them off to their own file, then reuse them later.. Those are called modules, and we'll get to that later.

Lines 4 and 121 are somewhat new. These are major section separators. They are helpful in quickly finding an important section of code such as MAIN, the function-declarations, class-declarations (later), etc. They stand out more than the simple section declarations and typically mean that there will be more than one sub-section. This is kind of like a customized outline format.

You should notice 5 functions; each with their own function-header. For a single-use such as this, the function-headers seem to be larger than the functions, and indeed these functions could be written in a much more compact manner. For example, stddev could be written:

sub stddev { return average(@_)/ @_; }
I'm attempting to encourage good programming practice and in general, code-reuse (which we'll get to later). Thus, going through all the extra descriptions allows 3'rd parties to quickly understand the intention and the dynamics of a given function. In general, if something is going to be thrown away immediately, use the short-hand. If something is potentially going to be reused in other contexts or simply seen by other developers, use the long-hand.

On lines 11-12, you'll notice pre/post.. These are pre and post conditions which describe the assumptions made by the developer with respect to this function.. These might assume that a file is at the beginning, that there is an interative input stream (i.e. a keyboard prompt), etc. On line 11, for example, we know that it's possible that the user might have provided something other than an integer (a distinct possibility in perl with it's generic data-types). It's meaningless to add "hello" to 5, but perl will happily assume "hello" equates to zero and you'd never know the difference.

On lines 16, 43, 74 and 104, we're extracting input parameters from the funny variable called @_. The at-symbol says that it's an array, and the underscore should bring back memories of the $_ temporary variable. This is a pre-defined variable that represents the arguments of the enclosing function. The parameters in the called function are magically associated with the @_ array. Thus, for example, if we have:

sub test{ print $_[2]; }
test(1,2,3,4);
We'll get 3 as a result on the screen. (remember that arrays start at index zero).

Understanding that $_[...] is an index into @_ makes it easier to understand line 87, which is simply extracting the first parameter from the input-list.

Line 22 is an array operation to determine the size of the array again.

Lines 25-28 are a simple foreach-loop, where we use the c-like shortcut variable += value.

Line 53 is interesting. We're calling a function from within a function. From this you should be able to see how we can break a complex problem up into simple related ones.

line 58 has the funny ** symbol.. This is a raised-power math operation. Some languages don't even have it as a math operator (in c, you have to call var = pow(val);). We could just as easily said $tmp = calculation; $result = $tmp * $tmp;, but then we'd have to save off an intermediate result.

Line 74 shows the built-in sqrt function. There's all sorts of transcendental functions like this readily available for your convinience. (Though you'll have to check to make sure.. Other obvious functions like average aren't included).

Lines 108-113 assign a hash as we've seen before, BUT, the values are the results of function calls. Further, on line 109, we're trying to directly extract the size of the input-array. Unfortunately if we assign an array inside an array, it will simply expand the array and copy all the elements; this isn't what we want.. So we manually coerse the context into a scalar (that of a simple variable) with the function scalar.

Line 112 is commented out to show what could go there, but we have an optimization that we use later on. On line 116 we use the stored value for deviation and pass it to an optimized function stddevFromDev, which simply skips having to recalculate. This is just a good example to get you use to seeing hashes used in different contexts.

Line 118 is the first time we've seen a user-defined function return anything other than a scalar. In fact we're returning an array, which might be confusing since we seem to be returning a hash on line 118 and we seem to be accepting and assigning a hash on line 138. But what is actually happening is that line 118 is converting the hash into an array, and line 138 is recreating a brand new hash out of the array, just as we filled in the hash on lines 108-113. This is somewhat innefficient, especially for large hashesh (with dozens or thousdans of entries), but for small programs like this, the percentage of waste is marginal. Later we'll discover ways of passing-by-reference so that this becomes as efficient as returning an integer.

The important point here is that all parameters to or from functions are coerced into array-form. Thus you can do most anything with a function that you can do with an array including:

our $name = 'Michael';
our $age = 26;
sub getName { return $name; }
sub getAge { return $age; }
sub getNameAge { return ( getName(), getAge() ); }

my ( $tmp_name, $tmp_age ) = getNameAge();
This program snippet does two interesting things. First, we see that we're using the array to array assignment again as with:
($a, $b) = (1, 2)
But we're also declaring a list of variables. The my statement normally can only declare a single variable. But you can declare more than one variable if you use parenthesis.. One benifit of doing this is that you have the makings for an array assignment for your initialization. This is very common for use in temporary variables as in here. You may also note that we used the our keyword to declare global variables for use with getName/getAge.

By now you may be wondering about the use of parenthesis in perl. Some times we use them, and some times we don't. In some languages like C, they're manditory for function-calls, if-statements, etc., but optional for grouping mathematical operations. In perl they're optional everywhere except for temporary array creation such as array-assignments/initialization. They're also manditory on the block-form of if/while/for. Though you'll recall that if you rearrange one of these you can avoid the curly brackets and the parenthesis entirely. Thus when you use parenthesis for a function call, it's either to make it more readible, or to group arguments that don't belong to the function. For example:

my $total = average( 1, 2 ) * 2;
Here, without the parenthesis, we'd have 2 * 2 instead of the quantity * 2.

Lastly, on lines 140-149 we have a funny format for something that looks like a print-block. Specifically, we have <<EOS on line 141, a bunch of text with no quotes from lines 144-148, and an EOS on line 149. This is called a here-document and is another one of perl's text-centric features (borrowed for the UNIX-shell language). In perl, whenever you see two less thans followed by a word, that symbol is replaced with subsequent lines until that symbol is found on a line by itself. The entire group is considered a single text-string that will replace the <<EOS symbol. It's somewhat confusing to the uninitiated, but it's very convinient. It also works exactly the same as if you'd used quotes, except now you don't have to wory about escaping quotes. You'll notice the semi-colon is on line 140 and not 149. This is because the syntax - as described above - treats the here-document is if it were all on the first line, and thus you have to complete the rest of the line (which in perl requires having semi-colons on each complete expression). This would be equivalent to the following valid perl:

print "

Here are the statistics:

Num Els:   $statistics{size}
Average:   $statistics{avg}
Deviation: $statistics{dev}
Std Dev:   $statistics{stddev}
";
But notice how the end-quote and the semi-colon kind of get lost in the mix. Thankfully here we didn't have to worry about escaping quotes so some of the advantages are avoided. Though this does allow for easier editing later on (in case we want to add quotes but forget to escape them). Note that in many other languages such as java or C, you're not allowed to have an enter inside a quote. The entire quote must reside on a single line. You have to use escapes that represent enter; namely "\n". So in C, this would be:
printf( "\n\nHere are the statistics:\n\nNum Els:   $statistics{size}\nAverage:   $statistics{avg}\nDeviation: $statistics{dev}\nStd Dev:   $statistics{stddev}\n" );
Ignore the fact that c doesn't interpolate variables into strings for a moment. Instead assume that we're simply calling the perl printf function in an attempt to mimic c.

The above is much harder to read and definately harder to properly format. In fact, normally what would happen is that we'd break this up into smaller strings as with:

printf( "\n\nHere are the statistics:\n\n" );
printf( "Num Els:   $statistics{size}\n" );
printf( "Average:   $statistics{avg}\n" );
printf( "Deviation: $statistics{dev}\n" );
printf( "Std Dev:   $statistics{stddev}\n" );
The perl method is definately better. Other languages, such as python, use a slightly more readible tripple-quote to denote a multi-line block, but that offers nothing over the single-quote perl-method. The reason perl can be this extensible is that it has made sacrifices elsewhere. Thus perl is definately a text-optimized language - which can get in the way at times (not least of which are the dollar-signs that you have to type everywhere). You'll note that I

Lastly, the mneumonic EOS stands for End-Of-String, and is purely conventional. Feel free to use what-ever word or mneumonic that's appropriate. Sometimes having the symbol be a name relavant to the text helps, just as with function names, but the same symbol can be repeated in different sections of the same code, so there's no motivation. Plus seeing EOS standas out to experienced perl programmers.


Chapter 7
Table of Contents
Articles Home
Michael L Maraist
Last modified: Wed Sep 4 13:41:15 EDT 2002