The techniques described above were not all conceived or implemented at once, but at different stages in continuous development. The compiler and run-time system to support native code were built from a working compiler that generated PAM code and PAM code interpreter written in C. The run-time system still contains the interpreter code; interpretative and native code may coexist and call one another in our implementation.
The first step was to utilise the PAM abstract machine architecture by generating C code which replaced PAM instructions with explicit calls to the C functions that interpret them. This kept the changes to the compiler and interpreter manageable. Where the compiler previously generated PAM instruction n, it would now generate an instance of a C macro called "Pam_n"; this macro expanded into a call to the C function that implemented instruction n.
The structure of the code generation mechanism and the communication between generated code and the run-time system was established and tested prior to the development of efficient code generation patterns.
The second step was to replace calls to the simpler interpreted instructions with equivalent in-lined C code. This was accomplished by rewriting the C macros for these instructions.
The net effect of this step was to produce a working Napier88 system where most of the interpretative decoding overhead had been removed. However, the generated code still followed the PAM stack model, explicitly manipulating a persistent activation record through stack pointers. Since the C compiler cannot determine the global effects of assignments to the frame, it will ensure they are all performed. This effectively defeats an optimiser since it cannot elide superfluous frame assignments.
Where appropriate this allowed the C compiler to perform constant folding and avoided unnecessary memory references. This resulted in excellent optimisation of this class of Napier88 expressions.
The next stage was to take advantage of the fact that the locations of the stack tops are statically known. Stack-based operations were rewritten so that they read and wrote directly to known offsets into stack frames rather than incrementing and decrementing stack pointers and performing stack loads. Run-time stack pointers are still needed to handle polymorphism , but only during short and well-defined windows of uncertainty. Extra parameters were supplied to the C macros to indicate the relative stack locations.
Although this technique simplified the C code produced, it still encoded calculations as manipulations of stack locations in main memory and so inhibited effective use of an optimiser.
The next major stage was to declare C variables, local to the generated function, which corresponded to locations on the PAM stack. The word at location n, previously accessed as Frame[n], was now treated as the variable F_n. As described earlier, this optimisation is only applied to leaf procedures. Code was also generated to save and restore the variables, where necessary.
Having decided to keep local data in C variables, we realised that many leaf procedures do not actually need to have PAM frame objects allocated at all, and that we could reduce the time overhead of function call by omitting this allocation. However, as described above, it is sometimes necessary to have a frame later in the execution of the procedure - for example, if a garbage collection is imminent. We therefore implemented lazy frame allocation for non-polymorphic leaf procedures. This required that the native code calling mechanism be modified so that the callee allocated the activation record. Arguments to Napier88 procedure calls were passed as C arguments.