Friday 17 April 2015

Summary of Introduction to Compiler Design and Construction

A compiler is a program that accepts as an input a program text in a certain language and produces as output a program text in another language while preserving the meaning of the text.

A conceptual framework of a computer

EDA ===>  Electronic Design Automation
VHDL ===>  Very High Speed Integrated Hardware Descriptive Language
 The OS layer is responsible for coordinating activities between the application layer and the hardware interface. The coordination must take place in the presence of the BIOS or firmware. Thus, OS transfers control to the hardware through the BIOS.

A compiler carries front end analysis on input text which is done to fully capture the input text and then it carries out a semantic representation of the front end analysis by decoding the functional meaning. After this is done one now obtains the file translation ready for back end synthesis. If the source code compiles successfully, the executable object file is generated.


 General Structure of a Compiler

Attribute grammar are formalisms that can be used for handling the context of long distance relations in a program that links to its declaration.
The use of attribute grammar can be extended to interpolation and code generation.



It is worthy to note that one can generate a program if he or she has a proper formalism in which to describe what a program should do using a program generator. For Example: Lexical analysis generated from grammar (syntax description) and code generations generated from machine descriptions.

Advantages of Generating Programs rather than Writing by Hand


1.   It is highly efficient and less prone to errors.
2.   It is more flexible and can be easily modified.
3.   The input of a program generator is of much higher level of attraction than the hand written can be.
4.   Pre-canned or tailored code can be added to the generated program enhancing its power at a low cost.
5.   A formal description can be used to generate more than one type of a program.

Compiler construction has a wide applicability in programming such as file conversion systems. For Example:

1.   Tex Text Formatters: They convert tex text to dvi formats
2.   Post Script Interpreters: They convert post scripts texts to image rendering instructions for a specific printer.

One of the reasons for studying compiler construction is in the general useful data structure and algorithms that it contains. For Example: Hashing, Pre-computed tables, Stack mechanisms, gabbage collections, dynamic programming, graph algoritm.

The heart of every compiler is the semantic representation of the program being compiled.

Comparison Between a Compiler and an Interpreter


Similarity

In both compiler and interpreter, the program that is processed has an intermediate form which is then interpreted by some interpreting mechanism.

Differences

Compilation
1.   The program processing is considerable.
2.   The interpreting mechanism is the hardware, CPU.
3.   Its program execution is relatively fast.
4.   The resulting intermediate form a machine specific executable code is low level.

Interpretation
1.   The program processing is minimal to moderate.
2.   The interpreting mechanism is a software or program.
3.   Its program execution is relatively slow.
4.   The resulting intermediate form some system specific data structure and is from high to medium level.

The three major reasons why we study compilers are:

1.   It involves proper structuring of problems.
2.   It involves judicious use of formalism.
3.   It involves use of tools and facilities for program optimization.

Note that without the strict separation of analysis and synthesis, each new language would require a completely new set of compilers. With the strict separation of analysis and synthesis, a new front end for that particular language can be combined with existing back end for current machines. Therefore, for L languages and M machines, L front ends and M back ends are needed requiring L + M modules rather than L X M modules. Remember that nothing is free of charge because without the strict separation of analysis and synthesis, generating of simpler codes for routine calls would be available.



Note that judicious use of formalism greatly reduces the effort of programming. For Example: regular expressions and context programs used in lexical and syntactic context.

The syntax tree of a program is a data structure which shows precisely how the various segments of a program text are to be viewed in terms of grammar..
The syntax tree can be obtained through a process called parsing.
Parsing is the process of structuring a text according to a given grammar.

A Parse Tree for y = b² - 4ac
For larger and more complicated grammar, the parser can be generated by a parse generator.
The exact form of the parse tree as required by the grammar is called Abstract Syntax Tree (AST).
Detailed information about the semantics can be attached to the nodes in this tree through ANNOTATIONS which are stored in additional data field in the nodes hence the term "ANNOTATED ABSTRACT SYNTAX TREE".

Examples of Annotations are:


1.   Type Information:   In this information, the assignment mode concerns a boolean array assignment.
2.   Optimization Information:   In this information, the expression does not contain a function call.

Note that the type information is related to the semantics and is used for error checking while optimization information is related to code generation. Annotations ina node is called the attributes of the node and represent the grammar symbol.

A translation program (compiler) constitutes of a translation process which is guided by the structure of the analysed text.

The translation process essentially consists of the following parts:

1.   The sequence of characters of a source text is translated into a corresponding sequence of symbols of the vocabulary of the language which is known as LEXICAL ANALYSIS.
2.   The sequence of symbols is transformed into a representation that directly mirrors the syntactic structure of the source text. This phase is known as SYNTAX ANALYSIS.
3.   Verification of whether compatibility rules are observed by a program is an additional duty of a compiler. This phase is known as TYPE CHECKING.
4.   On the basis of the representation resulting from syntax analysis, a sequence of instruction taken from the instruction set of the target computer is generated. This is known as CODE GENERATION.

Note that very frequent access to disk storage results to long compilation times. Front end deals with lexical analysis, syntax analysis and type checking while back end deals with code generation.

The main advantage of front end solution lies in the independence of the front end of the target computer and its instruction set because the same front end serves all computers and languages.

The back end consisted of an interpreter program of which the linear instruction sequence was called a P-CODE. The drawback of the back end was the inherent loss of efficiency common to interpreters.

A cross compiler is a compiler that generates code for a computer different from the one executing the compiler.

A language is defined by the following:

1.   A set of non terminal symbols
2.   A set of terminal symbols
3.   A set of syntactic equations also called productions
4.   A start symbol

Therefore, a language can be defined as a set of sequence of terminal symbols which starting with the start symbol can be generated by repeated application of syntactic equation.

The idea of defining languages and their grammar with mathematical precision goes way back to N. Chomsky.
Algol 60 was the first programming language to be defined formally and precisely. The use of the Chomsky formalism is also responsible for the term PROGRAMMING LANGUAGE because it seems to exhibit a structure of a spoken language but in the true sense it is not a language because it is not spoken but it is a formalism.

Note that the interpretation of sentences always rest on the recognition of their syntactic structure and every sentence must have a single structure in order to be unambiguous.
A language is regular if its syntax can be expressed by a single EBNF expression.
A regular expression is an expression in which only terminal symbols occur and a single equation suffices.
A regular expression that results for both identifier and integer sentence recognition is called SYNTAX ANALYSIS.

For the recognition of regular sentences a finite automation also called a STATE MACHINE is necessary and sufficient. In each step of the state machine, it reads the next symbol and change state. The resulting state is solely determined by the previous state and the symbol read. If the resulting state is unique, then the state machine is deterministic otherwise it is non deterministic. If the state machine is formulated as a program, the state is represented by the current point of program execution.
Procedure error terminates program execution signalling that the symbol sequence read so far does not belong to the language.

The grammar of a programming language is not specified by the input character but by the input tokens.
Examples of Input tokens are:
1.   String
2.   Identifiers
3.   Numbers
4.   Compound Operators (++, ?=)
5.   Keyboard
6.   Wires
7.   Separators (;)

Before feeding the input program text to the parser, it must be divided into tokens which is the responsibility of the LEXICAL ANALYSER and this activity is referred to as "to tokenize".
Note that anything analysis refers to front end while synthesis refers to back end.

The compiler can perform the actions specified by the semantic representation but the code generating back end is then replaced by the interpreter back end called the interpreter for the following reasons:

1.   It demands less work.
2.   It allows better error checking and reporting.
3.   It also achieves high security.

Multipass compilers
are those compilers that processes the source code or abstract syntax tree of a program several times. This is the opposite of single pass compilers that processes the program just once.

Each pass takes the result of the previous pass as the input and creates an intermediate output. In this way, the intermediate code is improved pass by pass until the final pass emits the final code. Multipass compilers are sometimes called wide compilers referring to the greater scope of passes. Note that the final stage is the only stage that is machine dependent.

Advantages of Multipass Compilers are:

1.   It is machine independent.
2.   It is used for expressive languages.

The wider scope of these compilers allows better code generation (smaller size and faster code) compared to the output of one pass compiler at the cost of higher compiler time and memory consumption.

Steps of a Typical Multipass Compiler
1.   Lexical Analysis
2.   Syntax Analysis
3.   Semantic Representation
4.   Code Generation

A programming language provides essentially three components for describing its computation from initial state to final state which are:

1.   Data types, objects and values with operations defined upon them.
2.   Rules stating the chronological relationship among specified operators.
3.   Rules stating the (static) structure of a program.

These components together constitute the level of abstraction on which one can formulate algorithm in the language.

Four Features of a Good Compiler are:

1.   It should not be time consuming
2.   It should be bug free
3.   It should generate accurate code corresponding to the source code provided.
4.   It should have a better optimization performance.
5.   It should be able to conserve space and memory.

Programming paradigm is a fundamental style of computer programming that serves as a way of building the structure and elements of computer programs.

Service Oriented Programming is a programming paradigm that uses 'services' as the unit of the computer work, to design and implement integrated business applications and mission control software programs. This programming paradigm focuses on communication between systems using 'services'. It supports the basic programming constructs of sequencing, selection and iteration. It provides a robust base for a semantic approach to programming integration and application logic.

Parallel programming paradigm is a programming paradigm that comprises the simultaneous execution of the same task to obtain faster results.

A compiler and an IDE are kind of related because in virtually all IDE's there is a compiler in it. The difference is that in an IDE there are debugger tools, text editor, clean and build packages, and a lot of automated plug ins but all these extra packages and tools are not in a compiler. A compiler just transforms a high level language to a machine executable code. One can use a compiler differently for example GCC compilers can be operated differently by just feeding it your source text which it the transsforms to an object executable file. In this vein, I think its write to call this IDE's or application packages compilers because they have compilers in it.

1.   Visual Studio
2.   Net Beans
3.   Eclipse

No comments:

Post a Comment