The TDF notation compiler, tnc
, is a tool for translating
TDF capsules to and from text. This paper gives a brief introduction
to how to use this utility and the syntax of the textual form of TDF.
The version here described is that supporting version 3.1 of the TDF
specification.
tnc
has four modes, two input modes and two output modes.
These are as follows:
tnc
internal representation,
Due to the modular nature of the program it is possible to form versions
of tnc
in which not all the modes are available. Passing
the -version flag to tnc
causes it to report which
modes it has implemented.
Any application of tnc
consists of the composite of an
input mode and an output mode. The default action is read-encode,
i.e. translate an input test file into an output TDF capsule. Other
modes may be specified by passing the following command line options
to tnc
:
The only other really useful action is decode-write, i.e. translate an input TDF capsule into an output text file. This may also be specified by the -print or -p option. The actions decode-encode and read-write are not precise identities, they do however give equivalent input and output files.
In addition, the decode mode may be modified to accept a TDF library as input rather than a TDF capsule by passing the addition flag:
to tnc
.
The overall syntax for tnc
is as follows:
tnc [ options ... ] input_file [ output_file ]If the output file is not specified, the standard output is used.
The rest of this paper is concerned with the form required of the input text file. The input can be divided into eight classes.
The characters (
and )
are used as delimiters
to impose a syntactic structure on the input.
White space comprises sequences of space, tab and newline characters, together with comments (see below). It is not significant to the output (TDF notation is completely free-form), and serves only to separate syntactic units. Every identifier, number etc. must be terminated by a white space or a delimiter.
Comments may be inserted in the input at any point. They begin with
a #
character and run to the end of the line.
An identifier consists of any sequence of characters drawn from the
following set: upper case letters, lower case letters, decimal digits,
underscore (_
), dot (.
), and tilde (~
),
which does not begin with a decimal digit. tnc
generates
names beginning with double tilde (~~
) for unnamed objects
when in decode mode, so the use of such identifiers is not
recommended.
Numbers can be given in octal (prefixed by 0
), decimal,
or hexadecimal (prefixed by 0x
or 0X
). Both
upper and lower case letters can be used for hex digits. A number
can be preceded by any number of +
or -
signs.
A string consists of a sequence of characters enclosed in double quotes
("
). The following escape sequences are recognised:
\n
represents a newline character,
\t
represents a tab character,
\xxx
, where xxx
consists of three octal
digits, represents the character with ASCII code xxx
.
Newlines are not allowed in strings unless they are escaped. For all
other escaped characters, \x
represents x
.
A single minus character (-
) has a special meaning. It
may be used to indicate the absence of an optional argument or optional
group of arguments.
A single vertical bar (|
) has a special meaning. It may
be used to indicate the end of a sequence of repeated arguments.
The basic input syntax is very simple. A construct consists of an identifier followed by a list of arguments, all enclosed in brackets in a Lisp-like fashion. Each argument can be an identifier, a number, a string, a blank, a bar, or another construct. There are further restrictions on this basic syntax, described below.
construct : ( identifier arglist ) argument : construct | identifier | number | string | blank | bar arglist : (empty) | argument arglist
The construct ( identifier )
, with an empty argument
list, is equivalent to the identifier argument identifier
.
The two may be used interchangeably.
Except at the outermost level, which forms a special case discussed
below, every construct and argument has an associated sort. This is
one of the basic TDF sorts: access
, al_tag
,
alignment
, bitfield_variety
, bool
,
callees
, error_code
, error_treatment
,
exp
, floating_variety
, label
,
nat
, ntest
, procprops
, rounding_mode
, shape
, signed_nat
, string
,
tag
, transfer_mode
, variety
,
tdfint
or tdfstring
.
Ignoring for the moment the shorthands discussed below, the ways of
creating constructs of sort exp
say, correspond to the
TDF constructs delivering an exp
. For example, contents
takes a shape
and an exp
and delivers
an exp
. Thus:
( contents arg1 arg2 )where
arg1
is an argument of sort shape
and arg2
is an argument of sort exp
, is
a sort-correct construct. Only constructs which are sort correct in
this sense are allowed.
As another example, because of the rule concerning constructs with no arguments, both
( true )and
falseare valid constructs of sort
bool
.
TDF constructs which take lists of arguments are easily dealt with. For example:
( make_nof arg1 ... argn )where
arg1
, ..., argn
are all arguments
of sort exp
, is valid. A vertical bar may be used to
indicate the end of a sequence of repeated arguments.
Optional arguments should be entered normally if they are present. Their absence may be indicated by means of a blank (minus sign), or by simply omitting the argument.
The vertical bar and blank should be used whenever the input is potentially
ambiguous. Particular care should be taken with apply_proc
(which is genuinely ambiguous) and labelled
.
The TDF specification should be consulted for a full list of valid
TDF constructs and their argument sorts. Alternatively the tnc
help facility may be used. The command:
tnc -help cmd1 ... cmdnprints sort information on the constructs or sorts
cmd1
,
..., cmdn
. Alternatively:
tnc -helpprints this information for all constructs. (To obtain help on the sort
alignment
as opposed to the construct alignment
use alignment_sort
. This confusion cannot occur elsewhere.)
Numbers can occur in two contexts, as the argument to the TDF constructs
make_nat
and make_signed_nat
. In the former
case the number must be positive. The following shorthands are understood
by tnc
:
number for ( make_nat number ) number for ( make_signed_nat number )depending on whether a construct of sort
nat
or signed_nat
is expected.
Strings are nominally of sort tdfstring
. They are taken
to be simple strings (8 bits per character). Multibyte strings (those
with other than 8 bits per character) may be represented by means
of the multi_string
construct. This takes the form:
( multi_string b c1 ... cn )where
b
is the number of bits per character and c1
,
...,cn
are the codes of the characters comprising the
string. These multibyte strings cannot be used as external names.
In addition, a simple (8 bit) string can be used as a shorthand for
a TDF construct of sort string
, as follows:
string for ( make_string string )
In TDF simple tokens, tags, alignment tags and labels are represented
by numbers which may, or may not, be associated with external names.
In tnc
however they are represented by identifiers. This
brings the problem of scoping which does not occur in TDF. The rules
are that all tokens, tags, alignment tags and labels must be declared
before they are used. Externally defined objects have global scope,
and the scope of a formal argument in a token definition is the definition
body. For those constructs which introduce a local tag or label -
for example, identify
, make_proc
, make_general_proc
and variable
for tags and conditional
,
labelled
and repeat
for labels - the scope
of the object is as set out in the TDF specification.
The following shorthands are understood by tnc
, according
to the argument sort expected:
tag_id for ( make_tag tag_id ) al_tag_id for ( make_al_tag al_tag_id ) label_id for ( make_label label_id )
The syntax for token applications is as follows:
( apply_construct ( token_id arg1 ... argn ) )where
apply_construct
is the appropriate TDF token application
construct, for example, exp_apply_token
for tokens declared
to deliver exp
's. The token arguments arg1
,
..., argn
must be of the sorts indicated in the token
declaration or definition. For tokens without any arguments the alternative
form:
( apply_construct token_id )is allowed.
The token application above may be abbreviated to:
( token_id arg1 ... argn )the result sort being known from the token declaration. This in turn may be abbreviated to:
token_idwhen there are no token arguments.
Care needs to be taken with these shorthands, as they can lead to
confusion, particularly when, due to optional arguments or lists of
arguments, tnc
is not sure what sort is coming next.
The five categories of objects represented by identifiers - TDF constructs,
tokens, tags, alignment tags and labels - occupy separate name spaces,
but it is a good idea to try to avoid duplication of names.
By default all these shorthands are used by tnc
in write
mode. If this causes problems, the -V flag should be passed
to tnc
.
At the outer level tnc
is expecting a sequence of constructs
of the following forms:
Included files may be of three types - text, TDF capsule or TDF library. For TDF capsules and libraries there are two include modes. The first just decodes the given capsule or set of capsules. The second scans through them to extract token declaration information. These declarations appear in the output file only if they are used elsewhere.
The syntax for an included text file is:
( include string )where
string
is a string giving the pathname of the file
to be included. tnc
applies read to this sub-file
before continuing with the present file.
Similarly, the syntaxes for included TDF capsules and libraries are:
( code string ) ( lib string )respectively.
tnc
applies decode to this capsule
or set of capsules (provided this mode is available) before continuing
with the present file.
The syntaxes for extracting the token declaration information from a TDF capsule or library are:
( use_code string ) ( use_lib string )Again, these rely on the decode mode being available.
All tokens, tags and alignment tags have an internal name, namely the associated identifier, but this name does not necessarily appear in the corresponding TDF capsule. There must firstly be an associated declaration or definition at the outer level - tags internal to a piece of TDF do not have external names. Even then we may not wish this name to appear at the outer level, because it is local to this file and is not required for linking purposes. Alternatively we may wish a different external name to be associated with it in the TDF capsule.
As an example of how tnc
allows for this, consider token
declarations (although similar remarks apply to token definitions,
alignment tag definitions etc.). The basic form of the token declaration
is:
( make_tokdec token_id ... )This creates a token with both internal and external names equal to
token_id
. Alternatively:
( local make_tokdec token_id ... )creates a token with internal name
token_id
but no external
name. This allows the creation of tokens local to the current file.
Again:
( make_tokdec ( string_extern string ) token_id ... )creates a token with internal name
token_id
and external
name given by the string string
. For example, to create
a token whose external name is not a valid identifier, it would be
necessary to use this construct. Finally:
( make_tokdec ( unique_extern string1 ... stringn ) token_id ... )creates a token with internal name
token_id
and external
name given by the unique name consisting of the strings string1
,
..., stringn
.
The local
quantifier should be used consistently on all
declarations and definitions of the token, tag or alignment tag. The
alternative external name should only be given on the first occasion
however. Thereafter the object is identified by its internal name.
The basic form of a token declaration is:
( make_tokdec token_id ( arg1 ... argn ) res )where the token
token_id
is declared to take argument
sorts arg1
, ..., argn
and deliver the result
sort res
. These sorts are given by their sort names,
al_tag
, alignment
, bitfield_variety
etc. For a token with no arguments the declaration may be given in
the form:
( make_tokdec token_id res )A token may be declared any number of times, provided the declarations are consistent.
This basic declaration may be modified in the ways outlined above to specify the external token name.
The basic form of a token definition is:
( make_tokdef token_id ( arg1 id1 ... argn idn ) res def )where the token
token_id
is defined to take formal arguments
id1
, ..., idn
of sorts arg1
,
..., argn
respectively and have the value def
,
which is a construct of sort res
. The scope of the tokens
id1
, ..., idn
is def
.
For a token with no arguments the definition may be given in the form:
( make_tokdef token_id res def )A token may be defined more than once. All definitions must be consistent with any previous declarations and definitions (the renaming of formal arguments is allowed however).
This basic definition may be modified in the ways outlined above to specify the external token name.
The basic form of an alignment tag declaration is:
( make_al_tagdec al_tag_id )where the alignment tag
al_tag_id
is declared to exist.
This basic declaration may be modified in the ways outlined above to specify the external alignment tag name.
The basic form of an alignment tag definition is:
( make_al_tagdef al_tag_id def )where the alignment tag
al_tag_id
is defined to be def
,
which is a construct of sort alignment
. An alignment
tag may be declared or defined more than once, provided the definitions
are consistent.
This basic definition may be modified in the ways outlined above to specify the external alignment tag name.
The basic forms of a tag declaration are:
( make_id_tagdec tag_id info dec ) ( make_var_tagdec tag_id info dec ) ( common_tagdec tag_id info dec )where the tag
tag_id
is declared to be an identity, variable
or common tag with access information info
, which is
an optional construct of sort access
, and shape dec
,
which is a construct of sort shape
. A tag may be declared
more than once, provided all declarations and definitions are consistent
(including agreement of whether the tag is an identity, a variable
or common).
These basic declarations may be modified in the ways outlined above to specify the external tag name.
The basic forms of a tag definition are:
( make_id_tagdef tag_id def ) ( make_var_tagdef tag_id info def ) ( common_tagdef tag_id info def )where the tag
tag_id
is defined to be an identity, variable
or common tag with value def
, which is a construct of
sort exp
. Non-identity tag definitions also have an optional
access
construct, info
. A tag must have
been declared before it is defined, but may be defined any number
of times. All declarations and definitions must be consistent (except
that common tags may be defined inconsistently) and agree on whether
the tag is an identity, a variable, or common.
These basic definitions may be modified in the ways outlined above to specify the external tag name.
The input in read (and to a lesser extent decode) mode
is checked for shape correctness if the -check or -c
flag is passed to tnc
. This is not guaranteed to pick
up all shape errors, but is better than nothing.
When in write mode the results of the shape checking may be
viewed by passing the -cv flag to tnc
. Each expression
is associated with its shape by means of the:
( exp_with_shape exp shape ) -> exppseudo-construct. Unknown shapes are indicated by
....
The target independent TDF capsules produced by the C -> TDF compiler,
tcc
, do not contain declarations or definitions for all
the tokens they use. Thus tnc
cannot fully decode them
as they stand. However the necessary token declaration information
may be made available to tnc
by using the use_lib
construct. The commands:
( use_lib library ) ( code capsule )will decode the TDF capsule
capsule
which uses tokens
defined in the TDF library library
.
The main limitations in the current version of tnc
are
as follows:
In addition, far more of the checks (scopes, shape checking, checking of consistency of declarations and definitions etc.) are implemented for read mode rather than decode mode. To shape check a TDF capsule, it will almost certainly be more effective to translate it into text and check that.
Another limitation is that the scoping rules for local tags do not
allow such tags to be accessed outside their scopes using env_offset
.
Here is the manual page for tnc.
NAME: tnc
- TDF notation compiler
SYNOPSIS: tnc [ options ] input-file [ output-file ]
DESCRIPTION: tnc
translates TDF capsules to and
from text. It has two input modes, read and decode.
In the first, which is default, input-file
is a file
containing TDF text. In the second input-file
is a TDF
capsule. There are also two output modes, encode and write.
In the first, which is default, a TDF capsule is written to output-file
(or the standard output if this argument is absent). In the
second, TDF text is written to output-file
.
Combination of these modes give four actions: text to TDF (which is default), TDF to text, text to text and TDF to TDF. The last two actions are not precise identities, but they do give equivalent files.
The form of the TDF text format and more information about tnc
can be found in the document The TDF Notation Compiler.
OPTIONS:
-c or -cv or -check
Specifies that tnc
should apply extra checks to input-file
. For example, simple shape checking is applied. These checks
are more efficient in read mode than in decode mode.
If the -cv option is used in write mode, all the information
gleaned from the shape checking appears in output-file
.
-d or -decode
Specifies that tnc
should be in decode mode. That
is, that input-file
is a TDF capsule.
-e or -encode
Specifies that tnc
should be in encode mode. That
is, that output-file
is a TDF capsule.
-help subject
... Makes tnc
print
its help message on the given subject(s). If no subject is given,
all the help messages are printed.
-Idir
Adds the directory dir
to the search path used by tnc
to find included files in read mode.
-l or -lib
In decode mode, specifies that input-file
is not
a TDF capsule, but a TDF library. All the capsules comprising the
library are decoded.
-o output-file
Gives an alternative method of specifying the output file.
-p or -print
Specifies that tnc
should be in decode and write
modes. That is, that input-file
is a TDF capsule and
output-file
should consist of TDF text. This option makes
tnc
into a TDF pretty-printer.
-q
Specifies that tnc
should not check duplicate tag declarations
etc for consistency, but should use the first declaration given.
-r or -read
Specifies that tnc
should be in read mode. That
is, that input-file
should consist of TDF text.
-V In write mode, specifies that the output should be in the "verbose" form, with no shorthand forms.
-version
Makes tnc
print its version number.
-w or -write
Specifies that tnc
should be in write mode. That
is, that output-file
should consist of TDF text.
SEE ALSO: tdf(1tdf).
Part of the TenDRA Web.
Crown
Copyright © 1998.