<< , >> , Title

2. Choosing A Compiler Target Language

Perhaps the most obvious method of code generation is to generate native code directly. This has the advantage that the writer of the code generator has complete control over:

  1. the mapping of code onto objects,
  2. linkage to the run-time support, and
  3. the location of pointers in data structures and registers.

Generating native code directly is also extremely costly since the compiler produced is architecture-dependent. An alternative to generating native code directly is to utilize existing code generation tools. Some advantages of this approach include:

  1. reuse of existing code generation technology,
  2. sophisticated optimisers are available, and
  3. the compilers can abstract over architecture-specific features.

The ability to reuse existing code generation technology is a significant advantage. For example, even low level tools such as assemblers include optimisers which relieve the compiler of the complexities of generating and backpatching instruction sequences. Higher level tools, such as compilers, incorporate more sophisticated optimisers which have been the subject of considerable research and development effort. Thus, this approach is a potentially cost-effective method of generating high quality code.

The range of tools investigated included assembly language, RTL and C [9]. Register Transfer Language, RTL, is an intermediate form used by the GNU C compiler [14]. RTL provides a rich set of abstract operators to describe a computation in terms of data flow between an arbitrary number of virtual registers. The GNU C compiler parses C source to produce a parse tree decorated with RTL. A range of optimisation techniques are applied to the parse tree which include the allocation of virtual registers to physical registers.

It was originally thought that a Napier88 compiler could generate an RTL representation of a program and have the GNU C code generator produce architecture-specific native code. However, RTL proved to be a poor choice since it does not completely define the program semantics without a parse tree, and it depends on machine specific descriptions. The developers of the GNU C compiler suggested generating of C code and passing it to the full GNU C compiler [16]. C is an excellent target language since it is:

  1. low level,
  2. easy to generate,
  3. can be written in an architecture-independent manner,
  4. highly available, and
  5. has good optimisers.

The C system chosen was GNU C, since it provides two very useful extensions over ANSI C. Firstly, it allows arithmetic on goto labels. This feature may be used to support saving and restoring state over checkpoints and garbage collections. Secondly, it is possible to explicitly map global variables onto fixed registers. This feature may be used to efficiently link generated code with the run-time support. A further advantage is that GNU C is freely available for most architectures, thus the use of GNU specific C extensions need not limit portability.


<< , >> , Title