<< , >> , Title

3. Seven Tricks for Compiling Persistent Languages

In this section, seven tricks are described which may be employed to efficiently solve the problems described in Section 1. They are:

  1. the introduction of native code into a running system,
  2. the ability to call other native code,
  3. linking to persistent data and environment support code,
  4. linking to the static environment,
  5. reducing memory accesses,
  6. the ability to run programs that cope with garbage collection and snapshot, and
  7. reducing memory allocation overhead by allocating frames lazily.

3.1. The Introduction of Native Code Into a Running System

In order to support both integrated programming environments and run-time reflection, the Napier88 system contains a compiler that is callable at run-time. Various compiler interfaces exist and are described elsewhere [5]. All the interfaces take a description of the source text and environment information as parameters and produce an executable procedure injected into an infinite union type.

This functionality requires that the code generation technology be capable of supporting the dynamic generation of native code and its introduction into the persistent system. This may be achieved in four steps:

  1. the compiler generates a C program,
  2. the resulting C program is compiled in the normal manner to produce an object or executable file,
  3. the executable native code is extracted from the object or executable file, and
  4. the compiler creates one persistent object per compiled procedure and copies the instruction sequence for each procedure into the object.

Two techniques may be used to extract native code from the executable file produced by the C compiler: writing a utility based on the C linker or by generating self extracting code.

Object files generated by the C compiler contain linkage information that is used by the C linker. This information could be extracted by other programs capable of reading the object code format. However, the format of object files is operating system and/or architecture-dependent and therefore a separate utility needs to be written for each host environment. A viable alternative is to generate self extracting code, i.e. a program which, when run, will output all or some of its executable code.

The use of self-extracting code makes the compiler independent of the architecture-dependent structure of executable programs. Using this approach, the C compiler is directed to produce an executable program which is immediately executed. This program copies the executable code for each function into a temporary file in a known format, independent of the host architecture. Function pointers are used to find the start of each instruction sequence. It is assumed that all memory between function pointers is code. This may result in extra data being copied, but guarantees that all the instructions of each function are copied.

This technique relies on specific assumptions about the C compiler. In particular, each compiled C function is assumed to contain no non-local references and all the local references are relative. That is, the compiled code is individual pure code sequences.

[Diagram (2.4k)]

Figure 1: Introducing code into the persistent store

3.2. The Ability to Call Other Native Code

Since Napier88 supports first class procedures, a piece of native code must be able to call arbitrary compiled Napier88 procedures. These procedures may either be in the static environment of the caller, extracted from a data structure in the store, or passed as a parameter. When C is used as a target language, procedure call conventions may be based on jumps (gotos) or C function calls.

A major reason for using C as an intermediate form is to obtain access to the considerable optimisation technology already in existence. This optimisation technology is given more scope when Napier88 procedures are encoded as C functions with all invocation performed using C function calls. This presents the C compiler with independent compilation units over which its optimisers may operate. However, due to the presence of first class procedures, many global optimisations such as in-line expansions are not possible. For example, the C compiler is unable to trace execution between functions. Indeed, some functions may not exist at the time of compilation.

Utilising C functions has the advantage that calls to and from the run-time support can pass parameters, get results and save return addresses automatically. On some processors such as the SPARC [17], this is optimised by the use of windowed register sets. Native code-generated procedures can also use this mechanism to call each other.

C function calls have one major disadvantage: the C stack contains compiler and architecture-dependent data such as return addresses and saved register values. These addresses include object pointers that may be modified by the garbage collector and addresses which must be rebound over a snapshot and system restart. In both cases, some mechanism is required that allows both pointers and dynamic state information to be accessed and relinked to the appropriate addresses following a garbage collection or system restart. Ideally this mechanism should be architecture independent. This problem is addressed in Section 3.6.

An alternative is made possible by a GNU C extension called computed gotos. GNU CC lets you obtain the address of a goto label using the prefix operator "&&". Such an address may be stored in a C variable and later used as the target for a jump instruction with a statement of the form "goto *(expression)".

The Napier88 run-time system can call arbitrary generated code by computing an address within the code object and jumping to it. When generated code requires some support code to be executed, it jumps to an address within the run-time system. (How this address is calculated is discussed in Section 3.3.) Before jumping, it saves the address at which its execution should continue in a global variable.

Implementing source level procedure calls using jumps requires the provision of a mechanism to enable parameter passing. This may make procedure calling inefficient since any hardware support for procedure calling cannot be employed. This technique does have two advantages: it is extremely easy to implement and the location of all data is entirely under the control of the code generator.

A final point is that the decision to use jumps or C procedure calls affects the form of the generated code. If jumps are to be used, the code is structured as a collection of blocks, with each block corresponding to a source level procedure. If C function calls are employed, the generated code consists of a collection of C function definitions.

3.3. Linking to Persistent Data and Environment Support Code

By definition, programs written in persistent programming languages access persistent data held in some persistent object store. Programs may be statically bound to persistent data, as is the case in languages such as Galileo [1] or DBPL [10]. However, the exclusive use of static binding precludes system evolution. For this reason, many persistent languages, including Napier88, permit dynamic binding to persistent data. This requirement necessitates that native code be capable of dynamically binding to data in the persistent object store. In addition to user level code, compiled code must be capable of invoking the run-time support system. The support system contains functions such as the persistent object management code and garbage collector that would be extremely space inefficient to include in every piece of compiled code. The linking mechanism employed to link to persistent data naturally influences the mechanism used to invoke the run-time support.

Dynamic linking may be achieved in one of three ways:

  1. by performing a linkage edit on compiled code,
  2. through register indirection, and
  3. through parameter passing.

The requirements of dynamic linking may be met by performing a linkage edit on generated code whenever the addresses it contains may have changed. This approach requires that code objects be modified dynamically and that all code objects include symbolic information which are architecture/operating system dependent.

          /* declare a structure containing */
          /* the address of an object creation function */
struct global_data {
     int (*create_object)(int) ;
} the_globals ;

          /* declare a fixed register */
          /* %g7 is a sparc global register */
register struct global_data *fixed_register asm( "%g7" ) ;

void init_table(void)
{
          /* the store's object creation function */
     extern int SH_create_object( int ) ;

     fixed_register = &the_globals ;
     fixed_register->create_object = &SH_create_object ;
}

Figure 2: Register indirection.

An alternative strategy is to note that many systems address global data by indexing through a fixed global register. BCPL employed this technique by providing a data structure called the global vector. Dynamic linking by register indirection can be easily encoded in GNU C since specific registers can be treated as C variables. The implementation of this technique requires that, upon initialisation, the support code construct a data structure containing the addresses of persistent data and support code. A pointer to this data structure is placed in some register. Whenever generated code wishes to address global data or the run-time support, it simply dereferences the allocated register variable. Figure 2 illustrates the initialisation of a fixed register and a global table with the address of an object creation function. Figure 3 illustrates how a C function could use dynamic linking via the fixed register to create an object of size 7.

struct global_data {
     int ( *create_object )(int) ;
} ;
register struct global_data *fixed_register asm( "%g7" ) ;

void Cfunction(void)
{
     int object_id ;

          /* create an object of size 7 */
     object_id = fixed_register->create_object( 7 ) ;
}

Figure 3: Dynamic linking.

The advantage of this technique is that it allows the persistent store and run-time support to be freely re-implemented as long as they retain their specified interfaces. The disadvantages are that it permanently reserves a register for this purpose and cannot be implemented on some architectures.

An alternative to the fixed register approach is available if generated code is structured as C functions as described in Section 3.2. A pointer to the global data structure may be passed as parameter. The invoked procedure passes this pointer to any procedures it calls and so on. The advantage of this technique is that it is architecture independent. The disadvantage is that a pointer must be passed as a parameter on every procedure call; although this overhead is not onerous on architectures such as SPARC that support register windows.

3.4. Linking to the Static Environment

Napier88 is a block structured Algol-like language with nested scope. Although C is block structured, it does not support nested functions and therefore some mechanism must be provided in the generated code to support scope. The interpreted version of Napier88 executes on a machine known as the Persistent Abstract Machine (PAM) [2]. In this implementation, procedure values or closures are implemented as a pair of pointers: one to an object containing code, the code vector, the other to the activation record of the defining procedure, the static link. Since procedures are first class values, they may escape the scope in which they were declared, therefore an activation record may be retained beyond the execution of the procedure which created it. Consequently, some stack activation records may not be deleted on procedure return and may only be reclaimed by garbage collection. Consequently activation records must be allocated on a heap rather than a conventional stack.

This run-time architecture may be reused in a system that generates native code by implementing each closure in the same manner as PAM and presenting each procedure call with a pointer to a heap object containing the activation of the lexicographically enclosing procedure. This pointer is placed in the activation record of the invoked procedure so that it may be dereferenced by the native code in order to access variables that are in scope.

A disadvantage of this technique is that, if C functions are generated, two stacks need to be maintained: a heap based Napier88 stack and the C stack. However, it has the advantage that object pointers are automatically rebound if the objects they address are moved by garbage collection since the pointers to them all reside in the heap.

3.5. Reducing Memory Accesses

As described in Section 3.4, in order to support block structure, all Napier88 variables may be placed in an activation record contained in a heap object. In the interpreted system, in order to perform a computation such as "a := a + b", the PAM pushes the values of a and b onto the stack, incrementing a stack pointer each time. Then the plus instruction pops both values, adds them and pushes the result. Finally, the result is removed from the stack, the stack pointer decremented and the result written to its destination. Such computation is expensive because it dynamically maintains stack pointers and operates on memory rather than in registers. This expense can be reduced in three complementary ways: the elimination of stack pointers, the transformation of source expressions into C expressions and the use of local C variables.

In an interpretative system, many memory accesses are due to stack top relative addressing. In practice, there are very few cases where the stack top cannot be statically determined by the compiler's internal stack simulation.

In a system that generates C code, many simple sequences of stack-based PAM instructions may be directly replaced by C expressions. In order to implement "a := a + b", a C sequence may be generated such as:

local_frame[ a_index ] =
     local_frame[ a_index ] + local_frame[ b_index ]

Such optimisations must be applied with care if the semantics of the high level programming language are to be preserved. In general, C does not guarantee the order of evaluation whereas Napier88 specifies left to right evaluation order. For this reason, expressions which cannot be guaranteed to be free from side-effects (including function calls) are not permitted in the generated C expressions. Nevertheless, this class of optimisation has a dramatic effect on the speed of generated code, largely due to optimisation possibilities.

In Section 3.4 we describe why a heap based Napier88 stack is required in addition to the C stack if Napier88 procedures are mapped onto C functions. Whilst many Napier88 expressions may be translated into C expressions, they still contain a performance bottleneck: all the data is referenced relative to the current Napier88 activation frame base.

A solution to this problem is to declare C variables, local to each generated function, which corresponded to locations in the PAM activation record. However, it is not always possible to represent source-level variables as C variables. For example, local procedures must be able to access variables in outer scope levels. Thus these variables must be stored in PAM objects if accessed by other procedures. On the other hand, leaf procedures (those that do not contain any other procedures) can keep their local variables in C variables with impunity, provided that they are still able to save their data into PAM frames when it is necessary to checkpoint the store or perform garbage collection.

This problem was solved by dividing generated code for procedures into two groups: the easy cases and the harder cases. The easy procedures, like leaf procedures, use C variables and are prepared to copy their values into PAM stack frames when necessary. The harder cases always keep their data in PAM stack frames. Since simple leaf procedures are a common case (akin to class methods in object-oriented languages) this yields a significant performance improvement.

3.6. Surviving Garbage Collection and System Snapshots

As described in Section 3.2, some mechanism is required that allows both pointers and dynamic state information to be accessed and relinked to the appropriate addresses following a garbage collection or system restart. The mechanism described in this section explicitly encodes source-level procedures as restartable C functions which are parameterised with a restart point and return a scalar status value. The restart point is used to indicate where in the procedure the code should start executing. The first call to a Napier88 procedure is performed by a C function call with a 0 restart point. The status value indicates if the procedure executed to completion or encountered some hindrance such as a request to make a system snapshot to invoke a garbage collection.

The PAM stack of persistent activation records, described in Section 3.4, is utilised by the restartable C functions. When a Napier88 procedure is called, a persistent object is created to represent its activation record. This object provides a repository in which data may be saved over garbage collections and checkpoints.

3.6.1. Garbage Collection and Checkpointing

In the Napier88 system, all garbage collection is performed synchronously with the computation: that is the computation stops when the garbage collector is running. Napier88 procedures recognise the need for garbage collection when heap space is exhausted.

When they start executing, all Napier88 procedures register the address of the heap object containing the activation record in a global data structure. This ensures that the activation record can be located following a garbage collection. Before a garbage collection or checkpoint is executed, the generated code must ensure that their entire dynamic call chain is stored in PAM heap objects over which the garbage collector can operate. This is achieved by each procedure saving its entire state in the corresponding PAM stack frame. The saved state includes a resume address which may be passed to the function when it is restarted to indicate where it should continue computation. After saving its state, each procedure returns a status value to its caller indicating that it too should save its state and return the same status value to its caller.

Eventually, the flow of control returns to the run-time support which services the request by reading the global data structure and applying the appropriate function. Since each executing C function has saved its state in a heap object and returned, there is no data on the C stack. The mechanism is therefore architecture independent. Figure 5 shows a restartable C function for the Napier88 procedure shown in Figure 4.

proc()
begin
     let x := 7	                ! declare an integer x
     let S := struct( a := x )  ! declare a structure S
                                !   intialised using x
     ...
end

Figure 4: Napier88 procedure.

struct global_data{ 
     int (*create_object)(int); 
     int *local_frame; 
};
register struct global_data *fixed_register asm( "%g7" ) ;

int Nproc(int restart_point)
{
     int x ;           /* the variable x */
     int *S ;          /* object id for structure S */

                       /* restart using label arithmetic */
     goto *(&&start + restart_point);

start:
     x = 7 ;
create:
     S = fixed_register->create_object( SIZE_OF_S ) ;
     if ( S == NULL )  /* did the create fail */
     {                 /* YES, garbage collect */

                       /* save x in current stack frame */
        fixed_register->local_frame[ x_offset ] = x ;
                       /* save restart point in stack frame */
        fixed_register->local_frame[ ResumeAddress ] 
           = &&restart-&&start ;
                       /* return garbage collect request*/
        return unwind_and_continue ;

restart:               /* restore x */
        x = fixed_register->local_frame[ x_offset ] ;
        goto create ;  /* repeat attempt to create S */
     }
     S[ 2 ] = x ;      /* initialise S */
     ...
                       /* update local_frame */
                       /* to point at caller */
     fixed_register->local_frame 
        = fixed_register->local_frame[ DLink ] ;
     return OK ;       /* normal completion*/
}

Figure 5: The C function implementing the Napier88 procedure shown in Figure 4.

In the code generation system which we constructed, the status value is an enumeration containing three labels:

OK:
which indicates that the procedure ran to completion,
unwind_and_reapply:
which indicates that something abnormal occurred in the procedure so the caller should save state and unwind and that the procedure should be reapplied when computation re-commences, and
unwind_and_continue:
which indicates that something abnormal occurred in the procedure so the caller should save state and unwind, but that the procedure has completed therefore the caller's next instruction should be executed when computation of the caller re-commences.

3.6.2. Restarting a Napier88 Program

Restarting a saved Napier88 program execution is performed in 3 steps. The first step is to find the PAM stack frame for the currently executing procedure; the address of this frame is held in the global data structure. Secondly, a pointer to the code object for the currently executing procedure is read from the stack frame. Finally, the C function in the code object is called and passed the restart point saved in the stack frame.

When a restarted procedure returns, it returns to the run-time support rather than its original caller. It will also have copied its Napier88 result, if any, into the caller's stack frame. This frame is found by following the dynamic link information stored in the stack frame. Since the caller's state has been stored in its stack frame together with an appropriate resume address, it can also be restarted.

3.7. Lazy Frame Allocation

The final trick employed was to avoid allocating stack frames for procedure unless absolutely necessary. As described in Section 3.5, many procedures are leaf procedures and as such only require their state to be saved in a heap object if a garbage collection or system snapshot is required. A considerable performance increase in performance may be obtained by only creating heap objects for stack frames when required to do so. This trick needs to be applied with care. Garbage collections are usually only invoked when the system has run out of space. The creation of objects at such times can be counter-productive!

In our system, when a Napier88 procedure call is made, space is reserved in the heap for the frame that may be required. When the procedure returns the reserved space is released. This ensures that there is always space available for dumping procedure state even when a garbage collection is required.


<< , >> , Title