LINFO

Compiler Definition

A compiler is a specialized computer program that converts source code written in one programming language into another language, usually machine language (also called machine code) so that it can be understood by processors (i.e., logic chips).

Source code is the version of software (usually an application program or an operating system) as it is originally written (i.e., typed into a computer) by a human in plain text (i.e., human readable alphanumeric characters). Source code can be written in any of numerous programming languages, some of the most popular of which are C, C++, Java, Perl, PHP, Python and Tcl/Tk. The output of a compiler is referred to as object code.

Brief History

The first compiler was developed in 1952 by Grace Hopper, a pioneering computer scientist. She said that she invented it because she was lazy and wished that "the programmer may return to being a mathematician." She is also well known for her important role in the development of the COBOL programming language (which is still in widespread use for business applications), including the development of the first COBOL compiler.

The number of compilers grew swiftly accompanying the proliferation of programming languages and processors and the advances in compiler technology. The 1990s saw a surge in the introduction of free compilers and compiler development tools, including those developed as part of the GNU project, whose purpose is to create a high performance and completely free operating system.

The highly regarded GCC (GNU Compiler Collection) is considered by many to be the most important piece of free software (i.e., software that is free not only in a monetary sense but also with regard to all aspects of use). Formerly called the GNU C Compiler, it now contains compilers for the C, C++, Objective C, Fortran, Java and Ada programming languages. It has been ported to (i.e., modified to run on) more processors and operating systems than any other compiler, and it runs on in excess of 60 platforms (i.e., combinations of processors and operating systems).

Types of Compilers

Compilers can be classified in various ways, including by their input language, their target code (i.e., output language), the platform they run on and whether they are proprietary (i.e., commercial) software or free software.

Although most compilers are designed to convert source code into machine code, compilers also exist that translate source code written in one high level language into that written in another. Other compilers translate source code into an intermediate language that still needs further processing.

For example, some compilers have been developed to convert programs written in some high level language into an equivalent program written in C. This is useful because it can increase the portability (i.e., the ability to be run on various platforms) of programs written in less common languages. The reason is that once a program has been converted to C, it is easy to recompile it for almost any platform because C compilers are available for almost any platform.

A compiler that is intended to produce machine code to run on the same platform that the compiler itself runs on is sometimes called a native-code compiler. A cross compiler, which produces machine code that is intended to run on a different platform than it runs on, can be very useful when introducing new hardware platforms.

Compiler Structure and Operation

Compilers are very complex programs, and compiler design is a very complicated task (and an excellent way to deepen ones understanding of computer science).

Compiling has several phases. They include extracting words from the source code, analyzing the sequence of such words and checking whether they match the syntax of the programming language for which the compiler is intended. While doing this, a compiler must adhere strictly to the meaning of the program being compiled.

At the same time, it must also be able to achieve the following goals, which are to some extent mutually exclusive, according to priorities set by the programmer: (1) maximize the speed of the compiled code, (2) minimize the size of the compiled code, (3) maximize the speed of the compilation process (i.e., compile-time efficiency), (4) maximize the debugging ability of the compiler (because most programs do not run optimally the first time they are compiled) and (5) maximize the useful feedback (i.e., reporting errors back to the programmer).

One key to simplifying the compiler development process has been the adoption of a three stage design. The first stage, called the front end, translates the source code into an intermediate representation. The second stage, referred to as the optimizer, optimizes the code according to the various switches on the compiler set by the programmer. The third stage, the back end, produces code in the output language. This structure allows either the front end to retarget the compiler's source code language or the back end to retarget the output language, thereby making compilers more portable (e.g., for multiple dialects of a programming language).

After compilation, the object code must be linked with any required libraries of supporting routines by the linker before it is capable of being executed. The compiler usually invokes the linker automatically.

Compiling Versus Interpreting

Not all source code is compiled. With some programming languages (e.g., Perl and Tcl) the source code is frequently executed directly using an interpreter rather than first compiling it and then executing the resulting machine code. An interpreter is a program that reads source code one statement at a time, translates the statement into machine code, executes the machine code statement, then continues with the next statement.

It is generally faster to run compiled code than to run a program under an interpreter. This is largely because the interpreter must analyze each statement in the source code each time the program is executed and then perform the desired conversion, whereas this is not necessary with compiled code because the source code was fully analyzed during compilation. However, it can take less time to interpret source code than the total needed to both compile and run it, and thus interpreting is frequently used when developing and testing source code for new programs.