CV_SPEC
BUILTIN_TYPE
BASE_TYPE
INT_TYPE
FLOAT_TYPE
CLASS_INFO
CLASS_USAGE
CLASS_TYPE
GRAPH
VIRTUAL
ENUM_TYPE
TYPE
DECL_SPEC
HASHID
QUALIFIER
IDENTIFIER
MEMBER
NAMESPACE
NAT
FLOAT
STRING
NTEST
RMODE
EXP
OFFSET
TOKEN
INSTANCE
ERROR
VARIABLE
LOCATION
POSITION
BITSTREAM
BUFFER
OPTIONS
PPTOKEN
This section describes the type system used in the C++ producer. Unless
otherwise stated the types are declared using the
calculus
tool as
part of the algebra, c_class.alg
. The design of this
type algebra was clearly largely based on the concepts underlying
the C++ language; however TDF provided an important influence, not
merely as the intended target language, but also because of its clear
presentation of essential language features.
The primitive types used within the algebra c_class
are
defined as follows:
int = "int" ; unsigned = "unsigned" ; string = "character *" ; ulong_type (ulong) = "unsigned long" ; BITSTREAM_P (bits) = "BITSTREAM *" ; PPTOKEN_P (pptok) = "PPTOKEN *" ;The integral types are self-explanatory. All string literals used in the C++ producer are based on the character type:
typedef unsigned char character ;hence the definition of
string
. The remaining primitive
give links to those portions of the type system which are defined
outside of the algebra. The types BITSTREAM
and PPTOKEN
are described below.
CV_SPEC
The enumeration type CV_SPEC
(short name cv
)
is used to represent a C++ type qualifier. It takes the form of a
bitfield, the elements of which can be or-ed together to represent
combinations of type qualifiers. The cv-qualifiers are represented
by cv_const
and cv_volatile
in the obvious
manner. The value cv_lvalue
is used as a qualifier to
indicate whether a type is an lvalue or an rvalue. Other values are
used in function types to represent the function language linkage.
BUILTIN_TYPE
The enumeration type BUILTIN_TYPE
(ntype
)
is used to represent the built-in C++ types (char
,
float
, void
etc.). It is used chiefly as
an index into tables of type information.
BASE_TYPE
The enumeration type BASE_TYPE
(btype
) is
used to represent a C++ simple type specifier such as signed
,
short
or int
. It takes the form of a bitfield,
the elements of which can be or-ed together to represent combinations
of type specifiers. Its chief use is when reading a type from the
input file; the various simple type specifiers are combined to give
a value of this type, which is then mapped to an actual C++
type.
INT_TYPE
The union type INT_TYPE
(itype
) is used
to represent an integral or bitfield C++ type. The basic integral
types are given by the basic
field. Bitfield types are
represented by the bitfield
field. There are also fields
representing target dependent integral promotion, arithmetic and integer
literal types, plus VARIETY
tokens. Only one INT_TYPE
object is created for each integral type.
FLOAT_TYPE
The union type FLOAT_TYPE
(ftype
) is used
to represent a floating point C++ type. The basic floating point
types are given by the basic
field. There are also fields
representing target dependent argument promotion and arithmetic types,
plus FLOAT
tokens. Only one FLOAT_TYPE
object is created for each floating point type.
CLASS_INFO
The enumeration type CLASS_INFO
(cinfo
)
is used to represent information relating to a class or enumeration
definition. It takes the form of a bitfield, the elements of which
can be or-ed together to represent various combinations of properties.
CLASS_USAGE
The enumeration type CLASS_USAGE
(cusage
)
is used to represent information relating to the way a class is used.
It takes the form of a bitfield, the elements of which can be or-ed
together to represent various combinations of properties.
CLASS_TYPE
The union type CLASS_TYPE
(ctype
) is used
to represent a C++ class or union. The main components are an
identifier giving the class name,
class information and class
usage fields, a namespace giving the class
members, a graph representing the base class
structure, and a virtual function table. Only
one
CLASS_TYPE
object is created for each class or union.
Each class maintains a list, pals
, of class and function
identifiers which are declared as friends of that class. It also
maintains a list, chums
, of those class types which declare
it to be a friend (this is what is actually used in the access checks).
Similarly each function identifier maintains a list,
chums
, of those class types which declare it to be a
friend.
Each class maintains a list of its constructors, destructors and conversion functions (included inherited conversion functions). It also maintains a list of its virtual base classes. This information can be obtained by other means but it is more convenient to record it within the class type itself.
GRAPH
The union type GRAPH
(graph
) is used to
represent a directed acyclic graph arising from the base classes of
a class. Each node of the graph has a head
which is
a
class type, and several tails
which
give the base class graphs for that class. Each node has pointers,
top
, to the top of the graph (i.e. the most derived class),
and up
, to the node of which the current node is a direct
base. Each node also has an access
field which gives
information on the base access, whether it is virtual or not, and
so on, in the form of a DECL_SPEC
.
Virtual bases are handled by the equal
field which defines
an equivalence relation on the graph which identifies equivalent virtual
bases.
VIRTUAL
The union type VIRTUAL
(virt
) is used to
represent the virtual functions declared in a class. The table
field is used to represent a virtual function table, and consists
primarily of a list of VIRTUAL
objects giving the virtual
functions for the associated class. These virtual functions are of
four kinds, each represented by a union field. A virtual function
first declared in a class is represented by the simple
field; a virtual function in a class which overrides an inherited
virtual function is represented by the override
field;
an inherited, non-overridden virtual function which is not overridden
in a base class is represented by the
inherit
field; a inherited, non-overridden virtual function
which is overridden in some base class is represented by the
complex
field.
ENUM_TYPE
The union type ENUM_TYPE
(etype
) is used
to represent a C++ enumeration type. This consists primarily of an
identifier giving the enumeration name, a
class information field, a type
giving the underlying representation of the enumeration type, and
a list of identifiers giving the enumerators comprising
the enumeration.
TYPE
The union type TYPE
(type
) is used to represent
a C++ type. Every type has an associated type qualifier,
qual
, which determines whether the type is
const
, volatile
or an lvalue. A type may
also have an associated identifier, name
,
giving the corresponding type name (the null identifier being used
for unnamed types). The other type components are determined by the
union tag. Each of the type constructs above has a corresponding
field in the TYPE
union:
integer
for integral types,
floating
for floating point types,
bitfield
for bitfield types,
compound
for class or union types,
and
enumerate
for enumeration types.
There are also fields top
and bottom
corresponding to void
and bottom (the type used to represent
values which never return).
Other fields of the TYPE
union represent composite types;
for example, the array
field, representing array types,
comprises a base type, sub
, and an integer
constant giving the array bound, size
. These are
generally simple, apart from func
, representing a function
type. This has the obvious components: a return type, ret
,
a list of parameter types, ptypes
, and a flag indicating
ellipsis functions, ellipsis
. It also has an associated
namespace, pars
, in which the function
parameters are declared. The parameter identifiers are extracted
from this as a list, pids
. Member function qualifiers
and language linkage information are represented by a
CV_QUAL
, mqual
. The implicit
extra parameter for member functions is recorded in the list
mtypes
, which adds this extra type to the start of
ptypes
. Finally except
gives any exception
specifiers; the case where the exception specifier is absent being
represented by the special value, univ_type_set
.
DECL_SPEC
The enumeration type DECL_SPEC
(dspec
) is
used to represent information on the declaration and usage of an identifier.
It takes the form of a bitfield, the elements of which can be or-ed
together to represent various combinations of properties. The 32
bits in this bitfield (the maximum which can be represented portably)
are a significant restriction. This means that the same member of
DECL_SPEC
is often used to mean different things in different
contexts. This can prove confusing on occasions.
HASHID
The union type HASHID
(hashid
) is used to
represent a C++ identifier name. The simplest form of identifier
name,
name
, consists of just a string of characters, such as
foo
. Extended identifier names, ename
,
are similar, but may contain Unicode characters. There are however
other forms of identifier name in C++: conversion function names (conv
) such as operator int
, overloaded operator names
(op
) such as operator+
, constructor names
(constr
), and destructor names (destr
).
There are also names which are used for anonymous identifiers (anon
).
Note the distinction between an identifier name and an actual
identifier. The latter is a meaning associated
with a name in a particular context. Every identifier name has an
associated underlying meaning, id
. This is used to handle
keywords and macros, but for most identifier names this will be a
dummy identifier. Nested underlying meanings (such as a macro hiding
a keyword) are handled by linking the alias
fields of
the corresponding identifiers. Every identifier name also has a cache
field which is used to record the look-up of this name as
an unqualified identifier. This may be set to the null identifier
to indicate that the look-up needs to be re-evaluated.
Identifier names are stored in one of a small number of hash tables,
linked using their next
field. Each name has only one
entry in these tables, allowing equality of names to be implemented
as EQ_hashid
.
QUALIFIER
The enumeration type QUALIFIER
(qual
) is
used to represent the various ways in which an identifier name can
be qualified. For example, ::A::a
is represented by
qual_full
. The value qual_mark
is used
in the representation of function identifier expressions to indicate
that overload resolution has been performed.
IDENTIFIER
The union type IDENTIFIER
(id
) is used to
represent the various kinds of C++ identifiers. Every identifier
has an associated identifier name, a parent
namespace, a declaration information
field, and a location for its declaration or definition.
Each identifier also has an
alias
field which is normally used to represent the aliasing
which can occur in inheritance or using
declarations.
The various fields of the IDENTIFIER
union correspond
to the various kinds of identifier which can arise in C++ - class
names, functions, variables, class members, macros, keywords etc.
Each field has appropriate components giving its type, its definition
or whatever other information is required. For example, the variable
field has a type and two expressions,
giving the constructor and destructor values for the object.
Most of these identifier components are self-explanatory, however
the treatment of overloaded functions bears discussion. The various
fields representing functions have an over
component
which is used to link overloaded functions together. A set of overloaded
functions is treated as if it were a single IDENTIFIER
- the first in the list - for the purposes of storing in a namespace
member; the other overloaded meanings are accessed by chasing
down the over
components. In other situations, whether
a function identifier represents a single function or a set of overloaded
functions can be worked out from the context. For example, in identifier
expressions the identifier qualifier is used to
mark whether overload resolution has taken place.
MEMBER
The union type MEMBER
(member
) is used to
represent a member of a namespace. Each member
contains two identifiers, id
and alt
. The
id
field gives the meaning associated with a particular
name in this namespace; the alt
field is used to represent
a type name which may be hidden by a non-type name.
There are two kinds of member, small
and large
,
corresponding to whether the namespace holds its members in a simple
linked list or in a hash table.
NAMESPACE
The union type NAMESPACE
(nspace
) is used
to represent the set of identifiers declared in a particular scope.
For example, the members declared in a C++ class or namespace, the
parameters declared in a function declarator and the local variables
declared in a block all form scopes. The various kinds of scope are
distinguished as different fields of the union, but there are basically
two categories. The first, such as function blocks, which have relatively
small numbers of elements, store their members as a simple linked
lists. The second, such as classes, which have larger numbers of
elements, store their members in hash tables. In both cases the elements
are stored using the MEMBER
type.
The key operation on a namespace is to look up a particular
identifier name in its linked list or hash table
of members to find the meaning, if any, associated with that name
in the namespace. This can be a complex operation because of the
need to take base classes and using
directives (as stored
in the use
component) into account.
NAT
The union type NAT
(nat
) is used to represent
an integer constant expression. Values are represented as lists of
16 bit 'digits'. Values which fit into a single digit are represented
by the small
field; larger values by the large
field. Negated values can be represented by the neg
field. Folding of integer constant expressions is performed in the
producer, however the result can only be represented as described
above if its value is target independent. Target dependent values
are represented by the calc
field which contains an
expression describing how to calculate the value.
The token
field is used to represent NAT
tokens.
Objects representing small integer constants are created at the start of the program and stored in a table for ease of access. Larger constants are created as and when they are required.
FLOAT
The union type FLOAT
(flt
) is used to represent
a floating point constant expression. There is only one field, simple
, which corresponds to a floating point literal. No folding
of floating point constant expressions is attempted in the producer
(it is virtually impossible to do so in a target independent manner).
Objects representing useful floating point constants (0.0, 1.0 etc.)
are created for each floating point type and stored as part of the
corresponding FLOAT_TYPE
. Other
values are created as and when they are required.
STRING
The union type STRING
(str
) is used to represent
a string constant expression. There is only one field,
simple
, which corresponds to a character string literal,
however the kind
field can be used to modify the interpretation
put on the characters appearing in the text
field. By default, each character in text
corresponds
to a single character in the literal; however an alternative representation,
in which text
consists of a sequence of multibyte characters
- one control character plus four value characters - is used in more
complex cases.
All strings are stored in a hash table intended to ensure that the
same STRING
object is used for equal string literals.
This not only saves space during the processing of the input file,
but also facilitates the output of shared string literals in the TDF
capsule.
Note that the terminal zero character does not form part of the
STRING
object. Instead information on this is stored
as part of the type of a string literal expression.
The text of the string literal is either truncated or padded with
zeros until its length matches the size of the array bound in the
type of the corresponding literal expression.
NTEST
The enumeration type NTEST
(ntest
) is used
to represent the various C++ relational operators (==
,
!=
, >
etc.). The values correspond to
the encoding of the TDF NTEST
sort, which facilitates
code generation. The values also have the property that the values
for complementary operators (such as <
and
>=
) always add up to the same value,
ntest_negate
, allowing operators to be complemented in
a straightforward manner.
RMODE
The enumeration type RMODE
(rmode
) is used
to represent the various C++ rounding modes (towards zero, towards
smaller etc.). The values correspond to the encoding of the TDF
RMODE
sort, which facilitates code generation.
EXP
The union type EXP
(exp
) is used to represent
a C++ expression or statement. Each expression has an associated
type, type
, but most of the information
about an expression is stored in one of the large number of fields
of the EXP
union. Most of these fields are fairly simple.
For example, there are fields corresponding to integer
literals, floating point literals,
string literals and identifiers.
Composite expressions are formed in the normal way; for example, there
are various binary operators comprising two argument expressions.
The
EXP
fields corresponding to statements are slightly more
complex. They each have a parent
field which points
to the enclosing statement. A couple of cases bear additional discussion.
The sequence
field represents a compound statement or
block. This contains a namespace, in which
any local variables are declared, and a list of expressions, giving
the statements comprising the block. The null namespace is used if
the block does not constitute a scope. The first statement in the
list is always a dummy to enable first
and last
pointers to be maintained to the start and end of the list without
having to worry about null lists.
The solve_stmt
field corresponds to the
TDF labelled
construct (in early versions of TDF
this construct was called solve
, hence the terminology).
The problem is that C and C++ labels and goto
s are totally
unstructured, whereas the TDF label constructs are structured. Any
statement which contains unstructured labels is enclosed in a
solve_stmt
construct, enclosing both the labelled statement
and all jumps to it (in general this cannot be done until the end
of the function). Any labels or variables which are bypassed by such
unstructured jumps also need to be pulled out to the solve_stmt
construct. It is not just explicit labels which can cause such problems;
complex switch
statements have the same effect.
OFFSET
The union type OFFSET
(off
) is used to represent
an offset expression. This is used as an adjunct to the normal
expression representation. The OFFSET
union has fields corresponding to a type offset (used in pointer arithmetic),
the offset of a member of a class and the offset of a base class.
There are also simple operations on offsets, such as multiplication
by an expression.
TOKEN
The union type TOKEN
(tok
) is used to represent
one of a number of different categories within the C++ language.
It corresponds to the sort of a token declared using the
#pragma token
syntax. Thus
there are fields corresponding to expression, statement, integer constant,
type, function, member and procedure tokens. The similarities between
PROC
tokens and templates have been remarked above; for
example, the parameters of the template:
template < class T, int n > class A { T a [n] ; // .... } ;are essentially equivalent to those in the procedure token:
PROC ( TYPE T, EXP const : int : n ) ....(recall that non-type template arguments are always constant expressions). Thus a field,
templ
, of the TOKEN
union
is used to represent lists of template parameters. Note that a further
field, class
, is also required to represent template
template parameters. A template type is represented
by a field, templ
, of the union TYPE
, which
comprises a template sort and a sub-type expressed in terms of the
template parameters.
In addition to representing token and template sorts in this way,
the
TOKEN
union is used to represent token and template arguments.
Each of the parameter sorts listed above has an appropriate
value
component which can store a value of that sort.
Many of the union types in the algebra, including types
and expressions, have a field of the form:
token -> { IDENTIFIER tok ; LIST TOKEN args ; }representing the given token identifier applied to the given list of arguments.
Template instances are represented slightly differently
from token applications. Each instance of a template class or
a template function gives rise to a new class or function
identifier. This identifier has an underlying form
giving the template identifier and the template arguments. This is
expressed as a token
member of the
TYPE
union (although it is not technically
a type, this happens to be the most convenient representation). Each
such form has an associated
INSTANCE
component which gives further
information about the template instance. The form for a template
function instance is stored in the form
component of
the corresponding identifier. The form for a template
class instance is stored in the form
component of the
corresponding class type.
Members of instances of template classes also have a form type, but
in this case the form is an instance
type. This gives
a link back to the corresponding member of the template class.
INSTANCE
The union type INSTANCE
(inst
) is used to
represent a particular instance of a template or token. Each
template sort has an associated list of all the
instances of that template, which is used to ensure that the same
template applied with the same arguments always has the same value.
Information on partial or explicit specialisations and usage information
are stored as part of the corresponding
INSTANCE
. Each template instance identifier has a link
back to its corresponding INSTANCE
via its
form
component.
ERROR
The union type ERROR
(err
) is used to represent
an error arising during the compilation of a C++ program. Errors are
first class objects within the producer and can be passed to and from
procedures. Each error has an associated severity
(serious, warning, none etc.). Simple errors are represented by the
simple
field, which consists of an index, number
,
into the error catalogue, plus a variable length list of error arguments.
Errors can be combined into composite errors using the
compound
field, which represents the join of two errors
-
head
followed by tail
.
The chief operation on an error after it has been built up is to report it. Each error report consists of an error object and a file location indicating where the error occurred.
VARIABLE
The structure type VARIABLE
(var
) is used
to represent a variable state and is used in the variable analysis
checks.
LOCATION
The structure type LOCATION
(loc
) is used
to represent a location in an input file. It comprises a pointer
to an
input file position, posn
, modified
by a line number, taking #line
directives into account,
line
. Note that character positions within the line
are not currently recorded.
POSITION
The structure type POSITION
(posn
) is used
to represent a position in an input file. It consists of two file
names,
file
taking #line
directives into account,
and
input
giving the actual file name, plus a line number
offset, offset
, which gives the difference between the
line number taking #line
directives into account and
the actual line number. Other information stored includes the datestamp
on the input file, datestamp
, and a pointer to a
file location which, for files included using
#include
, gives the location the file was included from.
BITSTREAM
The structure BITSTREAM
is not part of the
calculus
type system. It is used to represent a sequence
of bits such as is used, for example, in the encoding of TDF.
BUFFER
The structure BUFFER
is not part of the calculus
type system. It is used to represent a sequence of characters.
OPTIONS
The structure OPTIONS
is not part of the calculus
type system. It is used to represent the state of the
compiler options at a particular point
in the input file.
PPTOKEN
The structure PPTOKEN
is not part of the calculus
type system. It is used to represent a linked list of preprocessing
tokens. Each token has an associated sid
lexical token
number, tok
, plus additional data dependent on the token
type. Each token also records a pointer to the current
OPTIONS
value.
Part of the TenDRA Web.
Crown
Copyright © 1998.