The symbol table dump provides a method whereby third party tools can interface with the C and C++ producers. The producer outputs information on the identifiers declared within a source file, their uses etc. into a file which can then be post-processed by a separate tool. Any error messages and warnings can also be included in this file, allowing more sophisticated error presentation tools to be written.
The file to be used as the symbol table output file, plus details
of what information is to be included in the dump file can be specified
using the -d
command-line option.
The format of the dump file is described below; a
summary of the syntax is given as an annex.
A symbol table dump file consists of a sequence of characters giving information on identifiers, errors etc. arising from a translation unit. The fundamental lexical tokens are a number, consisting of a sequence of decimal digits, and a string, consisting of a sequence of characters enclosed in angle braces. A string can have one of two forms:
string : <characters> &number<characters>In the first form, the characters are terminated by the first
>
character encountered. In the second form, the
number of characters is given by the preceding number. No
white space is allowed either before or after the number.
To aid parsers, the C++ producer always uses the second form for strings
containing more than 100 characters. There are no escape characters
in strings; the
characters can contain any characters, including newlines and
#
, except that the first form cannot contain a
>
character.
Space, tab and newline characters are white space. Comments begin
with
#
and run to the end of the line. Comments are treated
as white space. All other characters are treated as distinct lexical
tokens.
A symbol table dump file takes the form of a list of commands of various kinds conveying information on the analysed file. This can be represented as follows:
dump-file : command-listopt command-list : command command-listopt command : version-command identifier-command scope-command override-command base-command api-command template-command promotion-command error-command path-command file-command include-command string-commandThe various kinds of command are discussed below. The first command in the dump file should be of the form:
version-command : V number number stringwhere the two numbers give the version of the dump file format (the version described here is 1.1 so both numbers should be 1) and the string gives the language being represented, for example,
<C++>
.
A location within a source file can be specified using three
numbers and two strings. These give respectively, the
column number, the line number taking #line
directives
into account, the line number not taking #line
directives
into account, the file name taking #line
directives into
account, and the file name not taking #line
directives
into account. Any or all of the trailing elements can be replaced
by
*
to indicate that they have not changed relative to
the last location given. Note that for the two line numbers,
unchanged means that the difference of the line numbers, taking
#line
directives into account or not, is unchanged.
Thus:
location : number number number string string number number number string * number number number * number number * number * *Note that there is a concept of the current file location, relative to which other locations are given. The initial value of the current file location is undefined. Unless otherwise stated, all location elements update the current file location.
Each identifier is represented in the symbol table dump by a unique number. The same number always represents the same identifier.
The number representing an identifier is introduced in the first declaration or use of that identifier and thereafter the number alone is used to denote the identifier:
identifier : number = identifier-name accessopt scope-identifier number
The identifier name is given by:
identifier-name : string C type D type O string T typedenoting respectively, a simple identifier name, a constructor for a type, a destructor for a type, an overloaded operator function name, and a conversion function name. The empty string is used for anonymous identifiers.
The optional identifier access is given by:
access : N B Pdenoting
public
, protected
and
private
respectively. An absent access is equivalent
to public
. Note that all identifiers, not just class
members, can have access specifiers; however the access of a non-member
is always public
.
The scope (i.e. class, namespace, block etc.) in which an identifier is declared is given by:
scope-identifier : identifier *denoting either a named or an unnamed scope.
Each declaration or use of an identifier is represented by a command of the form:
identifier-command : D identifier-info type-info M identifier-info type-info T identifier-info type-info Q identifier-info U identifier-info L identifier-info C identifier-info W identifier-info type-infowhere:
identifier-info : identifier-key location identifiergives the kind of identifier being declared or used, the location of the declaration or use, and the number associated with the identifier. Each declaration may, depending on the identifier-key, associate various type-info with the identifier, giving its type etc.
The various kinds of identifier-command are described below.
Any can be preceded by I
to indicate an implicit declaration
or use. D
denotes a definition. M
(make)
denotes a declaration. T
denotes a tentative definition
(C only). Q
denotes the end of a definition, for those
identifiers such as classes and functions whose definitions may be
spread over several lines. U
denotes an undefine operation
(such as #undef
for macro identifiers). C
denotes a call to a function identifier; L
(load) denotes
other identifier uses. Finally W
denotes implicit type
information such as the C producer gleans from its
weak prototype analysis.
The various identifier-keys are their associated type-info fields are given by the following table:
Key | Type information | Description |
---|---|---|
K |
* |
keyword |
MO |
sort | object macro |
MF |
sort | function macro |
MB |
sort | built-in macro |
TC |
type | class tag |
TS |
type | structure tag |
TU |
type | union tag |
TE |
type | enumeration tag |
TA |
type | typedef name |
NN |
* |
namespace name |
NA |
scope-identifier | namespace alias |
VA |
type | automatic variable |
VP |
type | function parameter |
VE |
type | extern variable |
VS |
type | static variable |
FE |
type identifieropt | extern function |
FS |
type identifieropt | static function |
FB |
type identifieropt | built-in operator function |
CF |
type identifieropt | member function |
CS |
type identifieropt | static member function |
CV |
type identifieropt | virtual member function |
CM |
type | data member |
CD |
type | static data member |
E |
type | enumerator |
L |
* |
label |
XO |
sort | object token |
XF |
sort | procedure token |
XP |
sort | token parameter |
XT |
sort | template parameter |
The function identifier keys can optionally be followed by
C
indicating that the function has C linkage, and
I
indicating that the function is inline. By default,
functions declared in a C++ dump file have C++ linkage and functions
declared in a C dump file have C linkage. The optional
identifier which forms part of the type-info of these
functions is used to form linked lists of overloaded functions.
Each identifier belongs to a scope, called its parent scope, in which it is declared. For example, the parent of a member of a class is the class itself. This information is expressed in an identifier declaration using a scope-identifier. In addition to the obvious scopes such as classes and namespaces, there are other scopes such as blocks in function definitions. It is possible to introduce dummy identifiers to name such scopes. The parent of such a dummy identifier will be the enclosing scope identifier, so these dummy identifiers naturally represent the block structure. The parent of the top-level block in a function definition can be considered to be the function itself.
Information on the start and end of such scopes is given by:
scope-command : SS scope-key location identifier SE scope-key location identifierwhere:
scope-key : N S B D H CT CF CCgives the kind of scope involved: a namespace, a class, a block, some other declarative scope, a declaration block (see below), a true conditional scope, a false conditional scope or a target dependent conditional scope.
A declaration block is a sequence of declarations enclosed in directives of the form:
#pragma TenDRA declaration block identifier begin .... #pragma TenDRA declaration block endThis allows the sequence of declarations to be associated with the given identifier in the symbol dump file. This technique is used in the API description files to aid analysis tools in determining which declarations are part of the API.
Other information associated with an identifier may be expressed using other dump commands. For example:
override-command : O identifier identifieris used to express the fact that the two identifiers are virtual member functions, the first of which overrides the second.
The command:
base-command : B identifier-key identifier base-graph base-graph : base-class base-class ( base-list ) base-class : number = Vopt accessopt type-name number : base-list : base-graph base-listoptassociates a base class graph with a class identifier. Any class which does not have an associated base-command can be assumed to have no base classes. Each node in the graph is a type-name with an associated list of base classes. A
V
is used
to indicate a virtual base class. Each node is numbered; duplicate
numbers are used to indicate bases identified via the virtual base
class structure. Any base class can then be referred to as:
base-number : number : type-nameindicating the base class with the given number in the given class.
The command:
api-command : X identifier-key identifier stringassociates the external token name given by the string with the given tokenised identifier.
The command:
template-command : Z identifier-key identifier token-application specialise-infois used to introduce an identifier corresponding to an instance of a template, token-application. This instance may correspond to a specialisation of the primary template; this information is represented by:
specialise-info : identifier token-application *where
*
indicates a non-specialised instance.
The built-in types are represented in the symbol table dump as follows:
Type | Encoding | Type | Encoding |
---|---|---|---|
char | c |
float | f |
signed char | Sc |
double | d |
unsigned char | Uc |
long double | r |
signed short | s |
void | v |
unsigned short | Us |
(bottom) | u |
signed int | i |
bool | b |
unsigned int | Ui |
ptrdiff_t | y |
signed long | l |
size_t | z |
unsigned long | Ul |
wchar_t | w |
signed long long | x |
- | - |
unsigned long long | Ux |
- | - |
Named types (classes, enumeration types etc.) can be represented by the corresponding identifier or token application:
type-name : identifier token-applicationComposite and qualified types are represented in terms of their subtypes as follows:
Type | Encoding |
---|---|
const type |
C type |
volatile type |
V type |
pointer type | P type |
reference type | R type |
pointer to member type | M type-name : type |
function type | F type parameter-types |
array type | A natopt : type |
bitfield type | B nat : type |
template type | t parameter-listopt : type |
promotion type | p type |
arithmetic type | a type : type |
integer literal type | n lit-baseopt lit-suffixopt |
weak function prototype (C only) | W type parameter-types |
weak parameter type (C only) | q type |
Other types can be represented by their textual representation using
the form Q
string, or by *
, indicating
an unknown type.
The parameter types for a function type are represented as follows:
parameter-types : : exception-specopt func-qualifieropt : . exception-specopt func-qualifieropt : . exception-specopt func-qualifieropt . , type parameter-typeswhere the
::
form indicates that there are no further
parameters, the .:
form indicates that the parameters
are terminated by an ellipsis, and the ..
form indicates
that no information is available on the further parameters (this can
only happen with non-prototyped functions in C). The function qualifiers
are given by:
func-qualifier : C func-qualifieropt V func-qualifieroptrepresenting
const
and volatile
member functions.
The function exception specifier is given by:
exception-spec : ( exception-listopt ) exception-list : type type , exception-listwith an absent exception specifier, as in C++, indicating that any exception may be thrown.
Array and bitfield sizes are represented as follows:
nat : + number - number identifier token-application stringwhere a string is used to hold a textual representation of complex values.
Template types are represented by a list of template parameters, which
will have previously been declared using the XT
identifier
key, followed by the underlying type expressed in terms of these parameters.
The parameters are represented as follows:
parameter-list : identifier identifier , parameter-list
Integer literal types are represented by the value of the literal followed by a representation of the literal base and suffix. These are given by:
lit-base : O Xrepresenting octal and hexadecimal literals respectively (decimal is the default), and:
lit-suffix : U l Ul x Uxrepresenting the
U
, L
, UL
,
LL
and ULL
suffixes respectively.
Target dependent integral promotion types are represented using
p
, so for example the promotion of unsigned short
is represented as pUs
. Information on the other cases,
where the promotion type is known, can be given in a command of the
form:
promotion-command : P type : typeThus the fact that the promotion of
short
is int
would be expressed by the command Ps:i
.
A sort in the symbol table dump corresponds to the sort of
a token declared in the #pragma token
syntax. Expression tokens are represented as follows:
expression-sort : ZEL type ZER type ZEC type ZNcorresponding to
lvalue
, rvalue
and
const
EXP
tokens of the given type, and
NAT
or INTEGER
tokens, respectively. Statement
tokens are represent by:
statement-sort : ZS
Type tokens are represented as follows:
type-sort : ZTO ZTI ZTF ZTA ZTP ZTS ZTUcorresponding to
TYPE
, VARIETY
, FLOAT
,
ARITHMETIC
, SCALAR
, STRUCT
or
CLASS
, and UNION
token respectively. There
are corresponding TAG
forms:
tag-type-sort : ZTTS ZTTU
Member tokens are represented using:
member-sort : ZM type : type-namewhere the first type gives the member type and the second gives the parent structure or union type.
Procedure tokens can be represented using:
proc-sort : ZPG parameter-listopt ; parameter-listopt : sort ZPS parameter-listopt : sortThe first form corresponds to the more general form of
PROC
token, that expressed using { .... | .... }
, which has
separate lists of bound and program parameters. These token parameters
will have previously been declared using the XP
identifier
key. The second form corresponds to the case where the bound and
program parameter lists are equal, that expressed as a PROC
token using ( .... )
. A more specialised version of
this second form is a FUNC
token, which is represented
as:
func-sort : ZF type
As noted above, template parameters are represented by a sort.
Template type parameters are represented by ZTO
, while
template expression parameters are represent by ZEC
(recall that such parameters are always constant expressions). The
remaining case, template template parameters, can be represented as:
template-sort : ZTt parameter-listopt :
Finally, the number of parameters in a macro definition is represented by a sort of the form:
macro-sort : ZUO ZUF numbercorresponding to a object-like macro and a function-like macro with the given number of parameters, respectively.
Given an identifier representing a PROC
token or a template,
an application of that token or an instance of that template can be
represented using:
token-application : T identifier , token-argument-list :where the token or template arguments are given by:
token-argument-list : token-argument token-argument , token-argument-listNote that the case where there are no arguments is generally just represented by identifier; this case is specified separately in the rest of the grammar.
A token-argument can represent a value of any of the sorts listed above: expressions, integer constants, statements, types, members, functions and templates. These are given respectively by:
token-argument : E expression N nat S statement T type M member F identifier C identifierwhere:
expression : nat statement : expression member : identifier string
Each error in the C++ error catalogue is
represented by a number. These numbers happen to correspond to the
position of the error within the catalogue, but in general this need
not be the case. The first use of each error introduces the error
number by associating it with a string giving the error name.
This has the form cpp.
error where error
gives an error name from the C++ (cpp
) error catalogue.
Thus:
error-name : number = string number
Each error message written to the symbol table dump has the form:
error-command : ES location error-info EW location error-info EI location error-info EF location error-info EC error-info EA error-argumentdenoting constraint errors, warnings, internal errors, fatal errors, continuation errors and error arguments respectively. Note that an error message may consist of several components; the initial error plus a number of continuation errors. Each error message may also have a number of error argument associated with it. This error information is given by:
error-info : error-name number numberwhere the first number gives the number of error arguments which should be read, and the second is nonzero to indicate that a continuation error should be read.
Each error argument has one of the forms:
error-argument : B base-number C scope-identifier E expression H identifier-name I identifier L location N nat S string T type V number V - numbercorresponding to the various syntactic categories described above. Note that a location error argument, while expressed relative to the current file location, does not change this location.
It is possible to include information on header files within the symbol
table dump. Firstly a number is associated with each directory on
the #include
search path:
path-command : FD number = string stringoptThe first string gives the directory pathname; the second, if present, gives the associated directory name as specified in the
-N
command-line option.
Now the start and end of each file are marked using:
file-command : FS location directory FE locationwhere directory gives the number of the directory in the search path where the file was found, or
*
if the file was found
by other means. It is worth noting that if, for example, a function
definition is the last item in a file, the FE
command
will appear in the symbol table dump before the QFE
command
for the end of the function definition. This is because lexical analysis,
where the end of file is detected, takes place before parsing, where
the end of function is detected.
A #include
directive, whether explicit or implicit, can
be represented using:
include-command : FIA location string FIQ location string FIN location string FIS location string FIE location string FIR locationthe first three corresponding to header names of the forms
<....>
, "...."
and [....]
respectively, the next two corresponding to start-up
and end-up files, and the final form
being used to resume the original file after the #include
directive has been processed.
It is possible to dump information on string literals to the symbol table dump file using the commands:
string-command : A location string AC location string AL location string ACL location stringrepresenting string literals, character literals, wide string literals and wide character literals respectively. The given string gives the string text.
Part of the TenDRA Web.
Crown
Copyright © 1998.