Symbol Table
Symbol Table
Symbol Table
SYMBOL TABLE
CHAPTER HIGHLIGHTS 9.1 Operation on Symbol Table 9.2 Symbol Table Implementation 9.3 Data Structure for Symbol Table 9.3.1 9.3.2 9.3.3 9.3.4 List Self Organizing List Hash Table Binary Search Tree
292
fter syntax tree have been constructed, the compiler must check whether the input program is typecorrect (called type checking and part of the semantic analysis). During type checking, a compiler checks whether the use of names (such as variables, functions, type names) is consistent with their definition in the program. Consequently, it is necessary to remember declarations so that we can detect inconsistencies and misuses during type checking. This is the task of a symbol table. Note that a symbol table is a compile-time data structure. Its not used during run time by statically typed languages. Formally, a symbol table maps names into declarations (called attributes), such as mapping the variable name x to its type int. More specifically, a symbol table stores: For each type name, its type definition. For each variable name, its type. If the variable is an array, it also stores dimension information. It may also store storage class, offset in activation record etc. For each constant name, its type and value. For each function and procedure, its formal parameter list and its output type. Each formal parameter must have name, type, type of passing (by-reference or by-value), etc.
Symbol Table
293
push(12) 6) pop() remove the head of table[12] pop() 7) pop() remove the head of table[12] pop() Recall that when we search for a declaration using lookup, we search the bucket list from the beginning to the end, so that if we have multiple declarations with the same name, the declaration in the innermost scope overrides the declaration in the outer scope. (4) Handling Reserve Keywords: Symbol table also handle reserve keywords like PLUS, MINUS, MUL etc. This can be done in following manner. insert (PLUS, PLUS); insert (MINUS, MINUS); In this case first PLUS and MINUS indicate lexeme and other one indicate token.
EOS
EOS
EOS
AND
EOS
EOS
EOS
ARRAY arr_lexeme
294
When lexical analyzer reads a letter, it starts saving letters, digits in a buffer lex_bufffer. The string collected in lex_bufffer is then looked in the symbol table, using the lookup operation. Since the symbol table initialized with entries for the keywords plus, minus, AND operator and some identifiers as shown in figure 9.1 the lookup operation will find these entries if lex_buffer contains either div or mod. If there is no entry for the string in lex_buffer, i.e., lookup return 0, then lex_buffer contains a lexeme for a new identifier. An entry for the identifier is created using insert( ). After the insertion is made; n is the index of the symbol-table entry for the string in lex_buffer. This index is communicated to the parser by setting tokenval to n, and the token in the token field of the entry is returned.
Variable a b c d
Symbol Table
295
In above figure (a) represent the simple list and (b) represent self organzing list in which Id1 is related to Id2 and Id3 is related to Id1.
296
Structure of hash table look like as
\n
\n
\n
\n
Symbol Table
297
(i) List
Variable u a b c x y sum
\n
sum
\n
\n
\n
298
(iii) Search Tree
u
y b
sum
Symbol Table
299
Handling the different manners of entries that need to be stored in a symbol table can be done in various ways (described in section 9.4). One different method from 9.4 is object-oriented class based implementation one might define an abstract base class to represent a generic type of entry and then derive classes from this to represent entries for variables or constants. The traditional way, still required if one is hosting a compiler in a language that does not support inheritance as a concept, is to make use of union (in C++ terminology). Since the class-based implementation gives so much scope for exercises, we have chosen to illustrate the variant record approach which is very efficient and quite adequate for such a simple language. We extend the declaration of the TABLE_entries type to be struct TABLE_entries { TABLE_alfa name; // identifier TABLE_idclasses idclass; // class union { struct { int value; } c; // constants struct { int size, offset; // number of words, relative address bool scalar; // distinguish arrays } v; // variables }; };
TRIBULATIONS
9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 How would you check that no identifier is declared more than once? How do real compilers deal with symbol tables? How do real compilers keep track of type checking via symbol table? Why should name equivalence be easier to handle than structural equivalence? Why do some languages simply prohibit the use of anonymous types and why dont more languages forbid them? How do you suppose compilers keep track of storage allocation for struct or RECORD types, and for union or variant record types? Find out how storage is managed for dynamically allocated variables in language like C++. How does one cope with arrays of variable (dynamic) length in subprograms? Identifiers that are undeclared by virtue of mistyped declarations tend to be trying for they result in many subsequent errors being reported. Perhaps in languages as simple as ours one could assume that all undeclared identifiers should be treated as variables and entered as such in the symbol table at the point of first reference. Is this a good idea? Can it easily be implemented? What happens if arrays are undeclared? C language array declaration is different from a C++ one the bracketed number in C language specifies the highest permitted index value, rather than the array length. This has been done so that one can declare variables like VAR Scalar, List[10], VeryShortList[0];
9.9
300
How would you modify C language and Topsy to use C++ semantics, where the declaration of VeryShortList would have to be forbidden? 9.10 Yet another approach is to construct the symbol table using a hash table which probably yields the shortest times for retrievals. Develop a hash table implementation for your C compiler. 9.11 We might consider letting the scanner interact with the symbol table. Develop a scanner that stores the strings for identifiers and string literals in a string table. 9.12 Develop a symbol table handler that utilizes a simple class hierarchy for the possible types of entries inheriting appropriately from a suitable base class. Once again construction of such a table should prove to be straightforward regardless of whether you use a linear array, tree or hash table as the underlying storage structure. Retrieval might call for more initiative since C++ does not provide syntactic support for determining the exact class of an object that has been statically declared to be of a base class type. 9.13 Why can we easily allow the declaration of a pointer type to precede the definition of the type it points to even in a one-pass system? 9.14 One might accuse the designers of Pascal and C of making a serious error of judgement they do not introduce a string type as standard but rely on programmers to manipulate arrays of characters and to use error level ways of recognizing the end or the length of a string. Do you agree? Discuss whether what they offer in return is adequate and if not why not. Suggest why they might deliberately not have introduced a string type. 9.15 Many authors dislike pointer types because they allow insecure programming. What is meant by this? How could the security be improved? If you do not like pointer types, can you think of any alternative feature that would be more secure?