The parser used in the C++ producer is generated using the
sid
tool. Because
of the large size of the generated code (1.3MB), the sid
output is run through a simple program, sidsplit
, which
splits the output into a number of more manageable modules. It also
transforms the code to use the PROTO
macros used in the rest of the program.
sid
is designed as a parser for grammars which can be
transformed into LL(1) grammars. The distinguishing feature of these
grammars is that the parser can always decide what to do next based
on the current terminal. This is not the case in C++; in some circumstances
a potentially unlimited look-ahead is required to distinguish, for
example, declaration statements from expression statements. In the
technical phrase, C++ is an LL(k) grammar. Fortunately there are relatively
few such situations, and sid
provides a mechanism, predicates,
for bypassing the normal parsing mechanism in these cases. Thus it
is possible, although difficult, to express C++ as a sid
grammar.
The sid
grammar file, syntax.sid
, is closely
based on the ISO C++ grammar. In particular, the same production
names have been used. The grammar has been extended slightly to allow
common syntactic errors to be detected elegantly. Other parsing errors
are handled by sid
's exception mechanism. At present
there is only limited recovery after such errors.
The lexical analysis routines in the C++ producer are hand-crafted,
based on an initial version generated by the simple lexical analyser
generator,
lexi
. lexi
has been used more directly
to generate the lexical analysers for certain of the other automatic
code generating tools, including calculus
, used in the
producer.
The sid
grammar contains a number of entry points. The
most important is parse_file
, which is used to parse
a complete C++ translation unit. The syntax for the
#pragma TenDRA
directives is
included within the same grammar with two entry points,
parse_tendra
in normal use, and parse_preproc
for use in preprocessing mode. There are also entry points in the
grammar for each of the kinds of token argument.
The parsing routines for token and template arguments are largely
hand-crafted, based on these primitives.
Certain parsing operations are performed before control passes to
the
sid
grammar. As mentioned above, these include the processing
of token and template applications. The other important case concerns
nested name specifiers. For example, in:
class A { class B { static int c ; } ; } ; int A::B::c = 0 ;the qualified identifier
A::B::c
is split into two terminals,
a nested name specifier, A::B::
, and an identifier, c
,
which is looked up in the corresponding namespace. Note that it is
at this stage that name look-up occurs. An identifier can be mapped
to one of a number of terminals, including keywords, type names,
namespace names and other identifiers, according to the result of
this look-up. If the look-up gives a macro then this is expanded
at this stage.
Part of the TenDRA Web.
Crown
Copyright © 1998.