Anazor: Compiler Overview

A compiler is a program that accepts as an input a program text in a certain language and produces as an output another text in another language while preserving the meaning of the text.

Compiler ===> Source --> Object Translation.

Conceptual Framework of a Compiler

Compiler carries front end analysis on input text which is done to fully capture the input text and then it carries out a semantic representation of the front end analysis by decoding the functional meaning. After this is done one now obtains the file translation ready for back end synthesis. If the source code compilers successfully, the executable object file is generated.

Compiler construction has a wide applicability in programming such as file conversion systems. For Example:

1.   Tex text formatters that convert tex texts to dvi formats.
2.   Post script interpreters that converts post script texts too image rendering instructions for a specific printer.

One of the many reasons for studying compiler construction is in the general useful data structures and algorithms that it contains. For Example: Hashing, pre - computed tables, stack mechanisms, gabbage collection, dynamic programming and graph algorithms.

The heart of every compiler is the semantic representation of the program being compiled.

3 Major Reasons for studying compiler construction are:

1.   It involves proper structuring of problems.
2.   It involves judicious use of formalism.
3.   It involves the use of tools and facilities for program optimization.

Without the strict separation of analysis and synthesis, each new language would require a completely new set of compilers but with the strict separation of analysis and synthesis, a new front end for a particular language can be combined with existing back ends for current machines.

Judicious use of formalism greatly reduces the effort of programming. For Example: Regular extensions and context programs used in lexical and syntactic context.

Translation

The translation process is guided by the structure of the analysed text.
A typical translation process essentially consists of the following parts:

1.   The sequence of characters of a source text is translated into a corresponding sequence of symbols of the vocabulary of the language which is termed as lexical analysis.
2.   The sequence of symbols is transformed into a representation that directly mirrors the syntactic structure of the source text and lets the structure easily be recognized which is termed as syntax analysis or parsing.
3.   Verification of whether compatibility rules are observed by a program is an additional duty of a compiler which can be termed as type checking or semantic representation.
4.   On the basis of the representation resulting from syntax analysis a sequence of instructions taken from the instruction set of the target computer is generated which can be termed as code generation.

A compiler has two parts which are:

1.   Front End - This comprises lexical and syntax analysis and also type checking or semantic representation. It generates a tree representing the syntactic structure of the source text.
2.   Back End - This handles code generation.

The main advantage of the compilers front end solution lies in the independence of the front end of the target computer and its instruction set because the same front end serves all.

The back end consisted of an interpreter program of which the linear instruction sequence was called P-Code. The set back to the back end solution is the inherent loss of efficiency common to interpreters.

A compiler which generates code for a compiler different from the one executing the compiler is a CROSS COMPILER.

A Language is defined by the following:

1.   A set of terminal symbols
2.   A set of non terminal symbols
3.   A set of syntactic equations also known as productions
4.   A start symbol

A language can therefore, be defined as a set of sequences of terminal symbols which starting with the start symbol can be generated by repeated application of syntactic equations.
The repeated application of syntactic equations ix called substitutions.

The grammar of a programming language is not specified in terms of the input character but of input tokens. Before feeding the input program text to the parser, it must be divided into tokens which is the responsibility of the lexical analyser and the activity is referred to as "to tokenize".

Note that the part of the compiler that performs the analysis of the source language text is called the FRONT END while the part that does the target language synthesis is known as the BACK END.

The compiler can perform the actions specified by the semantic representation but the code generating back end is then replaced by the interpreter back end known as the interpreter for the following reasons:

1.   Writing an interpreter takes less work than writing a back end.
2.   Performing actions straight from the semantic representation allows better error checking and reporting.
3.   Interpreters achieve higher security as compared to compilers.

Multi-pass compilers are those compilers that processes the source code or abstract syntax tree of a program several times.
Single-pass compilers are those compilers that processes the source code or abstract syntax tree of a program just once.

Each pass takes the result of the previous pass as the input and creates an intermediate output. In this way, the intermediate code is improved pass by pass until the final pass emits the final code.

Multi-pass compilers are sometimes called wide compilers because of the greater scope of passes.
Note that the last stage of a multi-pass compiler is the only stage that is machine dependent.

Advantages of Multi-Pass Compilers:

1.   It is machine independent - Since the multiple passes include a modular structure and the code generation decoupled from the other steps of the compiler, the passes can be reused for different hardware or machine.
2.   It is used for more expressive languages.

Note that the wider scope of a multi-pass compiler allows better code generation (smaller size, faster code) compared to the output of one pass compiler at the cost of higher compiler time and memory consuption.

Stages of A Typical Multi-Pass Compiler

Lexical Analysis:   This stage of a multi-pass compiler is to remove irrelevant information from the source program that syntax analysis will not be able to use or interpret. This step means that forward declaration is generally not necessary if a multi-pass compiler is used.

Syntax Analysis:    This is responsible for looking at syntax rules of the language and building some intermediate representation of the language. For Example: Abstract Syntax Tree, Directed Acyclic Graph

Semantic Analysis:    This takes the representation made from syntax analysis and applies semantic rules to the representation to make sure that the program meets the semantic rule requirements of the language.

Code Generation:    This is the final stage of the multi-pass compiler where codes are generated.

A programming language provides essentially three components for describing its computation from initial state to final state which are:

1.   Data types, objects and values with operations defined upon them
2.   Rules stating the chronological relationship among specified operators
3.   Rules stating the static structure of the program

Note that these components together constitute the level of abstraction on which one can formulate algorithm in the language.

Anazor

Wednesday, 8 April 2015

Compiler Overview

No comments:

Post a Comment