tspec - An API Specification Tool

January 1998

1 - Introduction

2 - Overview of tspec

2.1 - Specification Levels
2.2 - Input Layout
2.3 - Output Layout
2.4 - Copyright Messages
2.5 - Command-line Options

3 - Specifying API Structure

3.1 - +SUBSET
3.2 - +IMPLEMENT and +USE

4 - Specifying Objects

4.1 - Object Names
4.2 - +FUNC
4.3 - +EXP and +CONST
4.4 - +MACRO
4.5 - +STATEMENT
4.6 - +DEFINE
4.7 - +TYPE
4.8 - +TYPEDEF
4.9 - +FIELD
4.10 - +NAT
4.11 - +ENUM
4.12 - +TOKEN

5 - Other tspec Constructs

5.1 - +IF, +ELSE and +ENDIF
5.2 - Quoted Text
5.3 - C Comments
5.4 - File Properties

6 - Miscellaneous Topics

6.1 - Fine Control of Included Files
6.2 - Protection Macros
6.3 - Index Printing
6.4 - TDF Library Building

7 - Changes in tspec 2.0

8 - References

1. Introduction

As explained in reference 1, TDF may be regarded as an abstract target machine which can be used to facilitate the separation of target independent and target dependent code which characterises portable programs. An important aspect of this separation is the Application Programming Interface, or API, of the program. Just as, for a conventional machine, the API needs to be implemented on that machine before the program can be ported to it, so for that program to be ported to the abstract TDF machine, an "abstract implementation" of the API needs to be provided.

But of course, an "abstract implementation" is precisely what is provided by the API specification - it is an abstraction of all the possible API implementations. Therefore the TDF representation of an API must reflect the API specification. As a consequence, compiling a program to the abstract TDF machine is to check it against the API specification rather than, as with compiling to a conventional machine, against at best a particular implementation of that API.

In this document we address the problem of how to translate a standard API specification into its TDF representation, by describing a tool, tspec, which has been developed for this purpose.

The low level form which is used to represent APIs to the C to TDF producer is the #pragma token syntax described in reference 3. However this is not a suitable form in which to describe API specifications. The #pragma token syntax is necessarily complex, and can only be checked through extensive testing using the producer. Instead an alternative form, close to C, has been developed for this purpose. API specifications in this form are transformed by tspec into the corresponding #pragma token statements, while it applies various internal checks to the API description.

Another reason for introducing tspec is that the #pragma token syntax is currently limited in some areas. For example, at present it has very limited support for expressing constancy of expressions. By allowing the tspec syntax to express this information, the API description will contain all the information which may be needed in future upgrades to the #pragma token syntax. Thus describing an API using tspec is hopefully a one off process, whereas describing it directly to the #pragma token syntax could require periodic reworkings. Improvements in the #pragma token syntax will be reflected in the translations produced by future versions of tspec.

The tspec syntax is not designed to be a formal specification language. Instead it is a pragmatic attempt to capture the common specification idioms of standard API specifications. A glance at these specifications shows that they are predominantly C based, but with an added layer of abstraction - instead of saying that t is a specific C type, they say, there exists a type t, and so on. The tspec syntax is designed to reflect this.

2. Overview of tspec

2.1. Specification Levels

Let us begin by examining the various levels of specification with which tspec is concerned. At the lowest level it is concerned with objects - the types, expressions, constants etc. which comprise the API - and indeed most of this document is concerned with how tspec describes these objects. At the highest level, tspec is concerned with APIs. We could just describe an API as being a set of objects, however this is to ignore the internal structure of APIs.

At the most obvious level the objects in an API are spread over a number of different system headers. For example, in ANSI, the objects concerned with file input and output are grouped in stdio.h, whereas those concerned with string manipulation are in string.h. But a further level of refinement is also required. For example, ANSI specifies that the type size_t is defined in both stdio.h and string.h. Therefore tspec needs to be able to represent subsets of headers in order to express this intersection relation.

To conclude, tspec distinguishes four levels of specification - APIs (which are sets of headers), headers (which are sets of objects), subsets of headers, and objects. It identifies APIs by an identifying name chosen by the person performing the API description. The (purely arbitrary) convention is for short, lower case names, for example:

ansi refers to ANSI C (X3.159),
posix refers to POSIX 1003.1,
xpg3 refers to X/Open Portability Guide 3.

In this document, headers are identified by the API they belong to and the header name. Thus ansi:stdio.h refers to the stdio.h header of the ANSI API. Finally subsets of headers are identified by the header and the subset name. If, for example, the stdio.h header of ANSI has a subset named file, then this is referred to as ansi:stdio.h:file.

2.2. Input Layout

The tspec representation of an API is arranged as a directory with the same name as the API, containing a number of files, one for each API header. For example, the ANSI API is represented by a directory ansi containing files ansi/stdio.h, ansi/string.h etc. In addition each API directory contains a master file (for ANSI it would be called ansi/MASTER) which lists all the headers comprising that API.

When tspec needs to find an API directory it does so by searching along its input directory path. This is a colon separated list of directories to be searched. This may be specified in a number of ways. A default search list is built into tspec, however this may be overridden by the system variable TSPEC_INPUT. Directories may be added to the start of the path using the -Idir command-line option (see section 2.5 for a complete list of options). The current working directory is always added to the start of the path.

2.3. Output Layout

tspec actually outputs two sets of output files, the include output files, containing the #pragma token directives corresponding to the input API, and the source output files, which provide a rig for TDF library building (see section 6.4). These output files and directories are built up under two standard output directories - the include output directory, incl_dir say, and the source output directory, src_dir say. tspec has default values for these directories built in, but these may be overridden in a number of ways. Firstly, if the system variable TSPEC_OUTPUT is defined to be dir, say, then incl_dir is dir/include and src_dir is dir/src. Secondly, incl_dir and src_dir can be set independently using the system variables TSPEC_INCL_OUTPUT and TSPEC_SRC_OUTPUT respectively. Finally, they may also be set using the -Odir and -Sdir command-line options respectively.

As an example of the mapping from input files to output files, the header ansi:stdio.h is mapped to the include output file incl_dir/ansi.api/stdio.h and the source output file src_dir/ansi.api/stdio.c. The header subset ansi:stdio.h:file is mapped to its own pair of output files, incl_dir/shared/ansi.api/file.h and src_dir /ansi.api/file.c.

The default output file names can be overridden by means of the INCLNAME and SOURCENAME file properties described in section 5.4.

By default, tspec only creates an output file if the date stamps on all the input files it depends on indicate that it needs updating. In effect, tspec creates an internal makefile from the dependencies it deduces. This behaviour can be overridden by means of the -f command-line option, which forces all output files to be created.

In addition, tspec only creates the source output file if it is needed for TDF library building. If the corresponding include output file does not contain any token specifications then the source output file is suppressed (see section 6.4).

2.4. Copyright Messages

tspec will optionally add a copyright message to the start of each include output file. This message is copied from a file which may be specified either using the TSPEC_COPYRIGHT system variable, or by the -Cfile command-line option.

2.5. Command-line Options

There are three main forms for invoking tspec on the command-line, depending on whether it is desired to process an entire API, a single header from that API, or only a subset of that header. These are given respectively as:

	tspec [options] api
	tspec [options] api header
	tspec [options] api header subset

The valid options include:

The option -Cfile specifies the copyright message file (see section 2.4).
The option -Idir adds a directory to the input directory search path (see section 2.2).
The option -Odir specifies the include output directory (see section 2.3).
The option -Sdir specifies the source output directory (see section 2.3).
The -c option causes tspec to only check the input files and not to generate any output files.
The -e option causes tspec only to run its preprocessor phase, writing the result to the standard output.
The -f option forces tspec to create all output files regardless of date stamps.
The -i option causes tspec to print an index of all the objects in the input files (see section 6.3).
The -p option indicates to tspec that its input has already been preprocessed (i.e. it is the output of a previous -e option).
The -r option causes tspec to only produce output for implemented objects, and not used objects (see section 3.2).
The -s option causes tspec to check all the headers in an API separately rather than, as with the -c option, all at once.
The -u option causes tspec to generate unique token names for the specified objects (see section 4.1.1).
The -v option causes tspec to enter verbose mode, in which it reports on the output files it creates. If two -v options are given then tspec enters very verbose mode, in which it gives more information on its activities.
The -V option causes tspec to print its current version number (this document refers to version 2.0).

In addition tspec has a local input mode for translating single headers which are not part of an API into the corresponding #pragma token statements. The form:

	tspec [options] -l file

processes the input file file, writing the include output file to the standard output.

3. Specifying API Structure

The basic form of the tspec description of an API has already been explained in section 2.2 - it is a directory containing a set of files corresponding to the headers in that API. Each file basically consists of a list of the objects declared in that header. Each object specification is part of a tspec construct. These constructs are identified by keywords. These keywords always begin with + to avoid conflict with C identifiers. Comments may be inserted at any point. These are prefixed by # and run to the end of the line.

In addition to the basic object specification constructs, tspec also has constructs for imposing structure on the API description. It is these constructs that we consider first.

3.1. +SUBSET

A list of tspec constructs within a header can be grouped into a named subset by enclosing them within:

	+SUBSET "name" := {
	    ....
	} ;

where name is the subset name. These named subsets can be nested, but are still regarded as subsets of the parent header.

Subsets are intended to give a layer of resolution beyond that of the entire header (see section 2.1). Each subset is mapped onto a separate pair of output files, so unwary use of subsets is discouraged.

3.2. +IMPLEMENT and +USE

tspec has two import constructs which allow one API, or header, or subset of a header to be included in another. The first construct is used to indicate that the given set of objects is also declared in the including header, and takes one of the forms:

	+IMPLEMENT "api" ;
	+IMPLEMENT "api", "header" ;
	+IMPLEMENT "api", "header", "subset" ;

The second construct is used to indicate that the objects are only used in the including header, and take one of the forms:

	+USE "api" ;
	+USE "api", "header" ;
	+USE "api", "header", "subset" ;

For example, posix:stdio.h is an extension of

ansi:stdio.h

, so, rather than duplicate all the object specifications from the latter in the former, it is easier and clearer to use the construct:

	+IMPLEMENT "ansi", "stdio.h" ;

and just add the extra objects specified by POSIX. Note that this makes the relationship between the APIs ansi and posix absolutely explicit. tspec is as much concerned with the relationships between APIs as their actual contents.

Objects which are specified as being declared in more than one header of an API should also be treated using +IMPLEMENT. For example, the type size_t is declared in a number of ansi headers, namely stddef.h, stdio.h, string.h and time.h. This can be handled by declaring size_t as part of a named subset of, say, ansi:stddef.h:

	+SUBSET "size_t" := {
	    +TYPE (unsigned) size_t ;
	} ;

and including this in each of the other headers:

	+IMPLEMENT "ansi", "stddef.h", "size_t" ;

Another use of +IMPLEMENT is in the MASTER file used to list the headers in an API (see section 2.2). This basically consists of a list of +IMPLEMENT commands, one per header. For example, with ansi it consists of:

	+IMPLEMENT "ansi", "assert.h" ;
	+IMPLEMENT "ansi", "ctype.h" ;
	....
	+IMPLEMENT "ansi", "time.h" ;

To illustrate +USE, posix:sys/stat.h uses some types from posix:sys/types.h but does not define them. To avoid the user having to include both headers it makes sense for the description to include the latter in the former (provided there are no namespace restrictions imposed by the API). This would be done using the construct:

	+USE "posix", "sys/types.h" ;

On the command-line tspec is given one set of objects, be it an API, a header, or a subset of a header. This causes it to read that set, which may contain +IMPLEMENT or +USE commands. It then reads the sets indicated by these commands, which again may contain +IMPLEMENT or +USE commands, and so on. It is possible for this process to lead to infinite cycles, but in this case tspec raises an error and aborts. In the legal case, the collection of sets read by tspec is the closure of the set given on the command-line under +IMPLEMENT and +USE. Some of these sets will be implemented - that it to say, connected to the top level by a chain of +IMPLEMENT commands - others will merely be used. By default tspec produces output for all these sets, but specifying the -r command-line option restricts it to the implemented sets.

For further information on the +IMPLEMENT and +USE commands see section 6.1.

4. Specifying Objects

The main body of any tspec description of an API consists of a list of object specifications. Most of this section is concerned with the various tspec constructs for specifying objects of various kinds, however we start with a few remarks on object names.

4.1. Object Names

4.1.1. Internal and External Names

All objects specified using tspec actually have two names. The first is the internal name by which it is identified within the program, the second is the external name by which the TDF construct (actually a token) representing this object is referred to for the purposes of TDF linking. The internal names are normal C identifiers and obey the normal C namespace rules (indeed one of the roles of tspec is to keep track of these namespaces). The external token name is constructed by tspec from the internal name.

tspec has two strategies for making up these token names. The first, which is default, is to use the internal name as the external name (there is an exception to this simple rule, namely field selectors - see section 4.9). The second, which is preferred for standard APIs, is to construct a "unique name" from the API name, the header and the internal name. For example, under the first strategy, the external name of the type FILE specified in ansi:stdio.h would be FILE, whereas under the second it would be ansi.stdio.FILE. The unique name strategy may be specified by passing the -u command-line option to tspec (see section 2.5) or by setting the UNIQUE property to 1 (see section 5.4).

Both strategies involve flattening the several C namespaces into the single TDF token namespace, which can lead to clashes. For example, in posix:sys/stat.h both a structure, struct stat, and a procedure, stat, are specified. In C the two uses of stat are in different namespaces and so present no difficulty, however they are mapped onto the same name in the TDF token namespace. To work round such difficulties, tspec allows an alternative external form to be specified. When the object is specified the form:

	iname | ename

may be used to specify the internal name iname and the external name ename.

For example, in the stat case above we could distinguish between the two uses as follows:

	+TYPE struct stat | struct_stat ;
	+FUNC int stat ( const char *, struct stat * ) ;

With simple token names the token corresponding to the structure would be called struct_stat, whereas that corresponding to the procedure would still be stat. With unique token names the names would be posix.stat.struct_stat and

posix.stat.stat

respectively.

Very occasionally it may be necessary to precisely specify an external token name. This can be done using the form:

	iname | "ename"

which makes the object iname have external name ename regardless of the naming strategy used.

4.1.2. More on Object Names

Basically the legal identifiers in tspec (for both internal and external names) are the same as those in C - strings of upper and lower case letters, decimal digits or underscores, which do not begin with a decimal digit. However there is a second class of local identifiers - those consisting of a tilde followed by any number of letters, digits or underscores - which are intended to indicate objects which are local to the API description and should not be visible to any application using the API. For example, to express the specification that t is a pointer type, we could say that there is a locally named type to which t is a pointer:

	+TYPE ~t ;
	+TYPEDEF ~t *t ;

Finally it is possible to cheat the tspec namespaces. It may actually be legal to have two objects of the same name in an API - they may lie in different branches of a conditional compilation, or not be allowed to coexist. To allow for this, tspec allows version numbers, consisting of a decimal pointer plus a number of digits, to be appended to an identifier name when it is first introduced. These version numbers are purely to tell tspec that this version of the object is different from a previous version with a different version number (or indeed without any version number). If more than one version of an object is specified then which version is retrieved by tspec in any look-up operation is undefined.

4.2. +FUNC

The simplest form of object to specify is a procedure. This is done by means of:

	+FUNC prototype ;

where prototype is the full C prototype of the procedure being declared. For example, ansi:string.h contains:

	+FUNC char *strcpy ( char *, const char * ) ;
	+FUNC int strcmp ( const char *, const char * ) ;
	+FUNC size_t strlen ( const char * ) ;

Strictly speaking, +FUNC means that the procedure may be implemented by a macro, but that there is an underlying library function with the same effect. The exception is for procedures which take a variable number of arguments, such as:

	+FUNC int fprintf ( FILE *, const char *, ... ) ;

which cannot be implemented by macros. Occasionally it may be necessary to specify that a procedure is only a library function, and cannot be implemented by a macro. In this case the form:

	+FUNC (extern) prototype ;

should be used. Thus:

	+FUNC (extern) char *strcpy ( char *, const char * ) ;

would mean that strcpy was only a library function and not a macro.

Increasingly standard APIs are using prototypes to express their procedures. However it still may be necessary on occasion to specify procedures declared using old style declarations. In most cases these can be easily transcribed into prototype declarations, however things are not always that simple. For example, xpg3:stdlib.h declares malloc by the old style declaration:

	void *malloc ( sz )
	size_t sz ;

which is in general different from the prototype:

	void *malloc ( size_t ) ;

In the first case the argument is passed as the integral promotion of size_t, whereas in the second it is passed as a

size_t

. In general we only know that size_t is an unsigned integral type, so we cannot assert that it is its own integral promotion. One possible solution would be to use the C to TDF producer's weak prototypes (see reference 3). The form:

	+FUNC (weak) void *malloc ( size_t ) ;

means that malloc is a library function returning

void
*

which is declared using an old style declaration with a single argument of type size_t. (For an alternative approach see section 4.8.)

4.3. +EXP and +CONST

Expressions correspond to constants, identities and variables. They are specified by:

	+EXP type exp1, ..., expn ;

where type is the base type of the expressions expi as in a normal C declaration list. For example, in ansi:stdio.h:

	+EXP FILE *stdin, *stdout, *stderr ;

specifies three expressions of type FILE *.

By default all expressions are rvalues, that is, values which cannot be assigned to. If an lvalue (assignable) expression is required its type should be qualified using the keyword lvalue. This is an extension to the C type syntax which is used in a similar fashion to const. For example, ansi:errno.h says that errno is an assignable lvalue of type int. This is expressed as follows:

	+EXP lvalue int errno ;

On the other hand, posix:errno.h states that errno is an external value of type int. As with procedures the (extern) qualifier may be used to express this as:

	+EXP (extern) int errno ;

Note that this automatically means that errno is an lvalue, so the lvalue qualifier is optional in this case.

If all the expressions are guaranteed to be literal constants then one of the equivalent forms:

	+EXP (const) type exp1, ..., expn ;
	+CONST type exp1, ..., expn ;

should be used. For example, in ansi:errno.h we have:

	+CONST int EDOM, ERANGE ;

4.4. +MACRO

The +MACRO construct is similar in form to the +FUNC construct, except that it means that only a macro exists, and no underlying library function. For example, in xpg3:ctype.h we have:

	+MACRO int _toupper ( int ) ;
	+MACRO int _tolower ( int ) ;

since these are explicitly stated to be macros and not functions. Of course the (extern) qualifier cannot be used with +MACRO.

One thing which macros can do which functions cannot is to return assignable values or to assign to their arguments. Thus it is legitimate for +MACRO constructs to have their return type or argument types qualified by lvalue, whereas this is not allowed for +FUNC constructs. For example, in svid3:curses.h, a macro getyx is specified which takes a pointer to a window and two integer variables and assigns the cursor position of the window to those variables. This may be expressed by:

	+MACRO void getyx ( WINDOW *win, lvalue int y, lvalue int x ) ;

4.5. +STATEMENT

The +STATEMENT construct is very similar to the +MACRO construct except that, instead of being a C expression, it is a C statement (i.e. something ending in a semicolon). As such it does not have a return type and so takes one of the forms:

	+STATEMENT stmt ;
	+STATEMENT stmt ( arg1, ..., argn ) ;

depending on whether or not it takes any arguments. (A +MACRO without any arguments is an +EXP, so the no argument form does not exist for +MACRO.) As with +MACRO, the argument types argi can be qualified using lvalue.

4.6. +DEFINE

It is possible to insert macro definitions directly into tspec using the +DEFINE construct. This has two forms depending on whether the macro has arguments:

	+DEFINE name %% text %% ;
	+DEFINE name ( arg1, ..., argn ) %% text %% ;

These translate directly into:

	#define name text
	#define name( arg1, ..., argn ) text

The macro definition, text, consists of any string of characters delimited by double percents. If text is a simple number or a single identifier then the double percents may be omitted. Thus in ansi:stddef.h we have:

	+DEFINE NULL 0 ;

4.7. +TYPE

New types may be specified using the +TYPE construct. This has the form:

	+TYPE type1, ..., typen ;

where each typei has one of the forms:

name for a general type (about which we know nothing more),
(struct) name for a structure type,
(union) name for a union type,
struct name for a structure tag,
union name for a union tag,
(int) name for an integral type,
(signed) name for a signed integral type,
(unsigned) name for an unsigned integral type,
(float) name for a floating type,
(arith) name for an arithmetic (integral or floating) type,
(scalar) name for a scalar (arithmetic or pointer) type.

To make clear the distinction between structure types and structure tags, if we have in C:

	typedef struct tag { int x, y ; } type ;

then type is a structure type and tag is a structure tag.

For example, in ansi we have:

	+TYPE FILE ;
	+TYPE struct lconv ;
	+TYPE (struct) div_t ;
	+TYPE (signed) ptrdiff_t ;
	+TYPE (unsigned) size_t ;
	+TYPE (arith) time_t ;
	+TYPE (int) wchar_t ;

4.8. +TYPEDEF

It is also possible to define new types in terms of existing types. This is done using the +TYPEDEF construct, which is identical in form to the C typedef construct. This construct can be used to define pointer, procedure and array types, but not compound structure and union types. For these see section 4.9 below.

For example, in xpg3:search.h we have:

	+TYPE struct entry ;
	+TYPEDEF struct entry ENTRY ;

There are a couple of special forms. To understand the first, note that C uses void function returns for two purposes. Firstly to indicate that the function does not return a value, and secondly to indicate that the function does not return at all (exit is an example of this second usage). In TDF terms, in the first case the function returns TOP, in the second it returns BOTTOM. tspec allows types to be introduced which have the second meaning. For example, we could have:

	+TYPEDEF ~special ( "bottom" ) ~bottom ;
	+FUNC ~bottom exit ( int ) ;

meaning that the local type ~bottom is the BOTTOM form of void. The procedure exit, which never returns, can then be declared to return ~bottom rather than void. Other such special types may be added in future.

The second special form:

	+TYPEDEF ~promote ( x ) y ;

means that y is an integral type which is the integral promotion of x. x must have previously been declared as an integral type. This gives an alternative approach to the old style procedure declaration problem described in section 4.2. Recall that:

	void *malloc ( sz )
	size_t sz ;

means that malloc has one argument which is passed as the integral promotion of size_t. This could be expressed as follows:

	+TYPEDEF ~promote ( size_t ) ~size_t ;
	+FUNC void *malloc ( ~size_t ) ;

introducing a local type to stand for the integral promotion of

size_t

4.9. +FIELD

Having specified a structure or union type, or a structure or union tag, we may wish to specify certain fields of this structure or union. This is done using the +FIELD construct. This takes the form:

	+FIELD type {
	    ftype field1, ..., fieldn ;
	    ....
	} ;

where type is the structure or union type and field1, ..., fieldn are field selectors derived from the base type ftype as in a normal C structure definition. type may have one of the forms:

(struct) name for a structure type,
(union) name for a union type,
struct name for a structure tag,
union name for a union tag,
name for a previously declared structure or union type.

Except in the final case (where it is not clear if type is a structure or a union), it is not necessary to have previously introduced type using a +TYPE construct - this declaration is implicit in the +FIELD construct.

For example, in ansi:time.h we have:

	+FIELD struct tm {
	    int tm_sec ;
	    int tm_min ;
	    int tm_hour ;
	    int tm_mday ;
	    int tm_mon ;
	    int tm_year ;
	    int tm_wday ;
	    int tm_yday ;
	    int tm_isdst ;
	} ;

meaning that there exists a structure with tag tm with various fields of type int. Any implementation must have these corresponding fields, but they need not be in the given order, nor do they have to comprise the whole structure.

As was mentioned above (in 4.1.1), field selectors form a special case when tspec is making up external token names. For example, in the case above, the token name for the tm_sec field is either tm.tm_sec or ansi.time.tm.tm_sec, depending on whether or not unique token names are used.

It is possible to have several +FIELD constructs referring to the same structure or union. For example, posix:dirent.h declares a structure with tag dirent and one field, d_name, of this structure. xpg3:dirent.h extends this by adding another field, d_ino.

There is a second form of the +FIELD construct which has more in common with the +TYPEDEF construct. The form:

	+FIELD type := {
	    ftype field1, ..., fieldn ;
	    ....
	} ;

means that the type type is defined to be exactly the given structure or union type, with precisely the given fields in the given order.

4.10. +NAT

In the example given in section 4.9, posix:dirent.h specifies that the d_name field of struct dirent is a fixed sized array of characters, but that the size of this array is implementation dependent. We therefore have to introduce a value to stand for the size of this array using the +NAT construct. This has the form:

	+NAT nat1, ..., natn ;

where nat1, ..., natn are the array sizes to be declared. The example thus becomes:

	+NAT ~dirent_d_name_size ;
	+FIELD struct dirent {
	    char d_name [ ~dirent_d_name_size ] ;
	} ;

Note the use of a local variable to stand for a value, namely the array size, which is invisible to the user (see section 4.1.2).

As another example, in ansi:setjmp.h we know that jmp_buf is an array type. We therefore introduce objects to stand for the type which it is an array of and for the size of the array, and define jmp_buf by a +TYPEDEF command:

	+NAT ~jmp_buf_size ;
	+TYPE ~jmp_buf_elt ;
	+TYPEDEF ~jmp_buf_elt jmp_buf [ ~jmp_buf_size ] ;

Again, local variables have been used for the introduced objects.

4.11. +ENUM

Currently tspec only has limited support for enumeration types. A +ENUM construct is translated directly into a C definition of an enumeration type. The +ENUM construct has the form:

	+ENUM etype := {
	    entry,
	    ....
	} ;

where etype is the enumeration type being defined - either a type name or enum etag for some enumeration tag etag - and each entry has one of the forms:

	name
	name = number

as in a C enumeration type. For example, in xpg3:search.h we have:

	+ENUM ACTION := { FIND, ENTER } ;

4.12. +TOKEN

As was mentioned in section 1, the #pragma token syntax is highly complex, and the token descriptions output by tspec form only a small subset of those possible. It is possible to directly access the full #pragma token syntax from tspec using the construct:

	+TOKEN name %% text %% ;

where the token name is defined by the sequence of characters text, which is delimited by double percents. This is turned into the token description:

	#pragma token text name #

No checks are applied to text. A more sophisticated mechanism for defining complex tokens may be introduced in a later version of tspec.

For example, in ansi:stdarg.h a token va_arg is defined which takes a variable of type va_list and a type t and returns a value of type t. This is given by:

	+TOKEN va_arg %% PROC ( EXP lvalue : va_list : e, TYPE t ) EXP rvalue : t : %% ;

See reference 3 for more details on the token syntax.

5. Other tspec Constructs

Although most tspec constructs are concerned either with specifying new objects or imposing structure upon various sets of objects, there are a few which do not fall into these categories.

5.1. +IF, +ELSE and +ENDIF

It is possible to introduce conditional compilation into the API description by means of the constructs:

	+IF %% text %%
	+IFDEF %% text %%
	+IFNDEF %% text %%
	+ELSE
	+ENDIF

which are translated into:

	#if text
	#ifdef text
	#ifndef text
	#else /* text */
	#endif /* text */

respectively. If text is just a simple number or a single identifier the double percent delimiters may be excluded.

A couple of special +IFDEF (and also +IFNDEF) forms are available which are useful on occasion. These are:

	+IFDEF ~building_libs
	+IFDEF ~protect ( "api", "header" )

The macros in these constructs expand respectively to

__BUILDING_LIBS

which, by convention is defined if and only if TDF library building is taking place (see section 6.4), and the protection macro tspec makes up to protect the file api:header against multiple inclusion (see section 6.2).

5.2. Quoted Text

It is sometimes desirable to include text in the specification file which will be copied directly into one of the output files - for example, sections of C. This can be done by enclosing the text for copying into the include output file in double percents:

	%% text %%

and text for copying into the source output file in triple percents:

	%%% text %%%

In fact more percents may be used. An even number always indicates text for the include output file, and an odd number the source output file. Note that any # characters in text are copied as normal, and not treated as comments. This also applies to the other cases where percent delimiters are used.

5.3. C Comments

A special case of quoted text are C style comments:

	/* text */

which are copied directly into the include output file.

5.4. File Properties

Various properties of individual sets of objects or global properties can be set using file properties. These take the form:

	$property = number ;

for numeric (or boolean) properties, and:

	$property = "string" ;

for string properties.

The valid property names are as follows:

APINAME is a string property which may be used to override the API name of the current set of objects.
FILE is a string property which is used by the tspec preprocessor to indicate the current input file name.
FILENAME is a string property which may be used to override the header name of the current set of objects.
INCLNAME is a string property which may be used to set the name of the include output file in place of the default name given in section 2.3. Setting the property to the empty string suppresses the output of this file.
INTERFACE is a numeric property which may be set to force the creation of the source output file and cleared to suppress it.
LINE is a numeric property which is used by the tspec preprocessor to indicate the current input file line number.
METHOD is a string property which may be used to specify alternative construction methods for TDF library building (see section 6.4).
PREFIX is a string property which may be used as a prefix to unique token names in place of the API and header names (see section 4.1.1).
PROTECT is a string property which may be used to set the macro used by tspec to protect the include output file against multiple inclusions (see section 6.2). Setting the property to the empty string suppresses this macro.
SOURCENAME is a string property which may be used to set the name of the source output file in place of the default name given in section 2.3. Setting the property to the empty string suppresses the output of this file.
SUBSETNAME is a string property which may be used to override the subset name of the current set of objects.
UNIQUE is a numeric property which may be used to switch the unique token name flag on and off (see section 4.1.1). For standard APIs it is recommended that this property is set to 1 in the API MASTER file.
VERBOSE is a numeric property which may be used to set the level of the verbose option (see section 2.5).
VERSION is a string property which may be used to assign a version number or other identification to a tspec description. This information is reproduced in the corresponding include output file.

6. Miscellaneous Topics

In this section we round up a few miscellaneous topics.

6.1. Fine Control of Included Files

The +IMPLEMENT and +USE commands described in section 3.2 are capable of further refinement. Normally each such command is translated into a corresponding inclusion command in both the include and source output files. Occasionally this is not desirable - in particular the inclusion in the source output file can cause problems during TDF library building. For this reason the tspec syntax has been extended to allow for fine control of the output corresponding to +IMPLEMENT and +USE commands. This takes the forms:

	+IMPLEMENT "api" (key) ;
	+IMPLEMENT "api", "header" (key) ;
	+IMPLEMENT "api", "header", "subset" (key) ;

with corresponding forms for +USE. key specifies which output files the inclusion commands should appear in. It can be:

??, indicating neither output file,
!?, indicating the include output file only,
?!, indicating the source output file only,
!!, indicating both output files (this is the same as the normal form).

The second refinement comes from the fact that APIs fall into two categories - the base APIs, such as ansi, posix and xpg3, and the extension APIs, such as x11, the X Windows API. The latter can be used to extend the former, so that we can form ansi plus x11, posix plus x11, and so on. Base APIs may be distinguished in tspec by including the command:

	+BASE_API ;

in their MASTER file. Occasionally, in an extension API, we may wish to include a version of a header from the base API, but, because this base API is not fixed, not be able to use a simple

+USE

command. Instead the special form:

	+USE ( "api" ), "header" ;

is provided for this purpose (this is the only permitted form). It indicates that tspec should use the api version of header for checking purposes, but allow the inclusion of the version from the base API in normal use.

6.2. Protection Macros

Each include output file is surrounded by a construct of the form:

	#ifndef MACRO
	#define MACRO
	....
	#endif /* MACRO */

to protect it against multiple inclusions. Normally tspec will generate the macro name, MACRO, but it can be set using the PROTECT file property (see section 5.4). Setting PROTECT to the empty string suppresses the protection construct altogether. (Also see section 5.1.)

6.3. Index Printing

If it is invoked with the -i command-line option, instead of creating its output file, tspec prints an index of all the objects it has read to the standard output. This information includes the external token name associated with the object, whether the object is implemented or used, and where in the API description it is defined. It also includes a brief description of the object. It is intended that these indexes should be usable as quick reference guides to the underlying APIs.

6.4. TDF Library Building

As was explained in reference 1, the #pragma token headers output by tspec are used for two purposes - checking applications against the API during normal compilation and checking implementations against the API during TDF library building. This dual use does necessitate some extra work for tspec. It is not always possible to use exactly the same code in the two cases (usually because the C rules on, for example, structure definitions get in the way during library building). tspec uses a standard macro, __BUILDING_LIBS, to distinguish between the two cases. It is assumed to be defined if and only if library building is taking place. tspec descriptions can access this macro directly using ~building_libs (see section 5.1).

The actual library building process consists of compiling the #pragma token descriptions of the objects comprising the API along with the implementation of that API from the system headers (or wherever). This creates the local token definitions for this API, which may be stored in a token library. To facilitate this process tspec creates the source output files for each implemented header api:header containing something like:

	#pragma implement interface <../api/header>
	#include <header>

together with a makefile to compile all these programs to token definitions and to combine these token definitions into a token library. In fact two makefiles are created in the source output directory (see section 2.3). The first is called M_api and is designed for stand-alone library construction. The second is called Makefile and is designed for use with the library building script MAKE_LIBS provided with tspec.

There are other methods whereby the source output file may be changed into a set of token definitions. For example, in c:sys.h the METHOD file property (see section 5.4) is set to TDP, causing the tdp program to be invoked to produce the definitions for the basic C tokens for the system. As another example consider:

	$METHOD = "TNC" ;
	+MACRO double fl_abs ( double ) ;
	%%%
	    ( make_tokdef fl_abs ( exp x ) exp
		( floating_abs impossible x ) )
	%%%

The include output file will specify a token fl_abs which takes a double and returns a double. The TNC method tells MAKE_LIBS that the source output file, which will just contain the quoted text:

	( make_tokdef fl_abs ( exp x ) exp
	    ( floating_abs impossible x ) )

is an input file for the TDF notation compiler, tnc (see reference 2). Thus we have defined a token which directly accesses the TDF floating_abs construct.

7. Changes in tspec 2.0

This document describes tspec version 2.0. tspec 2.0 contains significant changes from previous releases. For convenience the main changes which are visible to the tspec user are listed here:

The added specification level of named subsets of headers has been introduced (see section 2.1). This has been done by introducing the +SUBSET construct and extending the +IMPLEMENT and +USE constructs, as well as the command-line options. The previous method of dealing with such subsets - namely shared headers - is now obsolete and its use is discouraged.
A number of new command-line options have been added, and some of the existing options have been modified slightly (see section 2.5).
The suffix .api has been added to the output directories (see section 2.3) to avoid possible confusion with other include file directories.
The use of identifiers beginning with ~ as local variables is new (see section 4.1.2).
The +STATEMENT and +DEFINE constructs (see section 4.5 and section 4.6) are new.
The (extern), (weak) and (const) qualifiers for +FUNC and +EXP (see section 4.2 and section 4.3) are new.
The (signed) and (unsigned) qualifiers for +TYPE (see section 4.7) are new.
The ~special type constructor (see section 4.8) is new.
The ~abstract type constructor has been abandoned.
The +BASE_API command described in section 6.1 is new.
The indexing routines (see section 6.3) have been greatly improved.

8. References

"TDF and Portability", DRA, 1993.

"The TDF Notation Compiler", DRA, 1993.

"The C to TDF Producer", DRA, 1993.