C Checker Reference Manual

January 1998

5.1 - Introduction

5.2 - Unreachable code analysis

5.3 - Case fall through

5.4 - Unusual flow in conditional statements

5.4.1 - Empty if statements
5.4.2 - Use of assignments as control expressions
5.4.3 - Constant control expressions

5.5 - Operator precedence

5.6 - Variable analysis

5.6.1 - Order of evaluation
5.6.2 - Modification between sequence points
5.6.3 - Operand of sizeof operator
5.6.4 - Unused variables
5.6.5 - Values set and not used
5.6.6 - Variable which has not been set is used

5.7 - Overriding the variable analysis

5.7.1 - Discarding variables
5.7.2 - Setting variables
5.7.3 - Exhaustive switch statements
5.7.4 - Non-returning functions

5.8 - Discard Analysis

5.8.1 - Discarded function returns
5.8.2 - Discarded computed values
5.8.3 - Unused static variables and procedures

5.9 - Overriding the discard analysis

5.9.1 - Discarding function returns and computed values
5.9.2 - Preserving unused statics

5 Data Flow and Variable Analysis

5.1 Introduction

The checker has a number of features which can be used to help track down potential programming errors relating to the use of variables within a source file and the flow of control through the program. Examples of this are detecting sections of unused code, flagging expressions that depend upon the order of evaluation where the order is not defined, checking for unused static variables, etc.

5.2 Unreachable code analysis

Consider the following function definition:

	int f ( int n )
	{
		if ( n ) {
			return ( 1 );
		} else {
			return ( 0 );
		}
		return ( 2 );
	}

The final return statement is redundant since it can never be reached. The test for unreachable code is controlled by:

	#pragma TenDRA unreachable code permit

where permit is replaced by disallow to give an error if unreached code is detected, warning to give a warning, or allow to disable the test (this is the default).

There are also equivalent command-line options to tchk of the form -X:unreached=state, where state can be check, warn or dont.

Annotations to the code in the form of user-defined keywords may be used to indicate that a certain statement is genuinely reached or unreached. These keywords are introduced using:

	#pragma TenDRA keyword REACHED for set reachable
	#pragma TenDRA keyword UNREACHED for set unreachable

The statement REACHED then indicates that this portion of the program is actually reachable, whereas UNREACHED indicates that it is unreachable. For example, one way of fixing the program above might be to say that the final return is reachable (this is a blatant lie, but never mind). This would be done as follows:

	int f ( int n ) {
		if ( n ) {
			return ( 1 );
	} else {
			return ( 0 )
		}
		REACHED
		return ( 2 );
	}

An example of the use of UNREACHED might be in the function below which falls out of the bottom without a return statement. We might know that, because it is never called with c equal to zero, the end of the function is never reached. This could be indicated as follows:

	int f ( int c ){
		if ( c ) return ( 1 );
		UNREACHED
	}

As always, if new keywords are introduced into a program then definitions need to be provided for
conventional compilers. In this case, this can be done as follows:

	#ifdef __TenDRA__
	#pragma TenDRA keyword REACHED for set reachable
	#pragma TenDRA keyword UNREACHED for set unreachable
	#else
	#define REACHED
	#define UNREACHED
	#endif

5.3 Case fall through

Another flow analysis check concerns fall through in case statements. For example, in:

	void f ( int n )
	{
		switch ( n ) {
			case 1 : puts ( "one" );
			case 2 : puts ( "two" );
		}
	}

the control falls through from the first case to the second. This may be due to an error in the program (a missing break statement), or be deliberate. Even in the latter case, the code is not particularly maintainable as it stands - there is always the risk when adding a new case that it will interrupt this carefully contrived flow. Thus it is customary to comment all case fall throughs to serve as a warning.

In the default mode, the TenDRA C checker ignores all such fall throughs. A check to detect fall through in case statements is controlled by:

	#pragma TenDRA fall into case permit

where permit is allow (no errors),

warning

(warn about case fall through) or disallow (raise errors for case fall through).

There are also equivalent command-line options to tcc of the form -X:fall_thru=state, where state can be check, warn or dont.

Deliberate case fall throughs can be indicated by means of a keyword, which has been introduced using:

	#pragma TenDRA keyword FALL_THROUGH for fall into case

Then, if the example above were deliberate, this could be indicated by:

	void f ( int n ){
		switch ( n ) {
			case 1 : puts ( "one" );
			FALL_THROUGH
			case 2 : puts ( "two" );
		}
	}

Note that FALL_THROUGH is inserted between the two cases, rather than at the end of the list of statements following the first case.

If a keyword is introduced in this way, then an alternative definition needs to be introduced for conventional compilers. This might be done as follows:

	#ifdef __TenDRA__
	#pragma TenDRA keyword FALL_THROUGH for fall into case
	#else
	#define FALL_THROUGH
	#endif

5.4 Unusual flow in conditional statements

The following three checks are designed to detect possible errors in conditional statements.

5.4.2 Use of assignments as control expressions

Using the C assignment operator, `=', when the equality operator `==' was intended is an extremely common problem. The pragma:

	#pragma TenDRA assignment as bool permit

is used to control the treatment of assignments used as the controlling expression of a conditional statement or a loop, e.g.

	if( var = 1 ) { ...

The options for permit are allow, warning and disallow. The default setting allows assignments to be used as control statements without raising an error.

5.4.3 Constant control expressions

Statements with constant control expressions are not really conditional at all since the value of the control statement can be evaluated statically. Although this feature is sometimes used in loops, relying on a break, goto or return statement to end the loop, it may be useful to detect all constant control expressions to check that they are deliberate. The check for statically constant control expressions is controlled using:

	#pragma TenDRA const conditional permit

where permit may be replaced by disallow to give an error when constant control expressions are encountered, warning to replace the error by a warning, or the check may be switched off using the allow (this is the default).

5.5 Operator precedence

The ISO C standard section 6.3, provides a set of rules governing the order in which operators within expressions should be applied. These rules are said to specify the operator precedence and are summarised in the table over the page. Operators on the same line have the same precedence and the rows are in order of decreasing precedence. Note that the unary +, -, * and & operators have higher precedence than the binary forms and thus appear higher in the table.

The precedence of operators is not always intuitive and often leads to unexpected results when expressions are evaluated. A particularly common example is to write:

	if ( var & TEST == 1) { ...
	}
	else { ...

assuming that the control expression will be evaluated as:

	( ( var & TEST ) == 1 )

However, the == operator has a higher precedence than the bitwise & operator and the control expression is evaluated as:

	( var & ( TEST == 1 ) )

which in general will give a different result

The TenDRA C checker can be configured to flag expressions containing certain operators whose precedence is commonly confused, namely:

&& versus ||
<< and >> versus + and -
& versus == != < > <= >= + and -
^ versus & == |= < > <= >= + and -
| versus ^ & == |= < > <= >= + and -

The check is switched off by default and is controlled using:

	#pragma TenDRA operator precedence status

where status is on, warning or off.

5.6 Variable analysis

The variable analysis checks are controlled by:

	#pragma TenDRA variable analysis status

where status is on, warning or off as usual. The checks are switched off in the default mode.

There are also equivalent command line options to tchk of the form -X:variable=state, where state can be check, warn or dont.

The variable analysis is concerned with the evaluation of expressions and the use of local variables, including function arguments. Occasionally it may not be possible to statically perform a full analysis on an expression or variable and in these cases the messages produced indicate that there may be a problem. If a full analysis is possible a definite error or warning is produced. The individual checks are listed in sections 5.6.1 to 5.6.6 and section 5.7 describes the source annotations which can be used to fine-tune the variable analysis.

5.6.1 Order of evaluation

The ISO C standard specifies certain points in the expression syntax at which all prior expressions encountered are guaranteed to have been evaluated. These positions are called sequence points and occur:

after the arguments and function expression of a function call have been evaluated but before the call itself;
after the first operand of a logical &&, or || operator;
after the first operand of the conditional operator, ?:;
after the first operand of the comma operator;
at the end of any full expression (a full expression may take one of the following forms: an initialiser; the expression in an expression statement; the controlling expression in an if, while, do or switch statement; each of the three optional expressions of a for statement; or the optional expression of a return statement).

Between two sequence points however, the order in which the operands of an operator are evaluated, and the order in which side effects take place is unspecified - any order which conforms to the operator precedence rules above is permitted. For example:

	var = i + arr[ i++ ] ;

may evaluate to different values on different machines, depending on which argument of the + operator is evaluated first. The checker can detect expressions which depend on the order of evaluation of sub-expressions between sequence points and these are flagged as errors or warnings when the variable analysis is enabled.

5.6.2 Modification between sequence points

The ISO C standard states that if an object is modified more than once, or is modified and accessed other than to determine the new value, between two sequence points, then the behaviour is undefined. Thus the result of:

	var = arr[i++] + i++ ;

is undefined, since the value of i is being incremented twice between sequence points. This behaviour is detected by the variable analysis.

5.6.3 Operand of sizeof operator

According to the ISO C standard, section 6.3.3.4, the operand of the sizeof operator is not itself evaluated. If the operand has any side-effects these will not occur. When the variable analysis is enabled, the checker detects the use of expressions with side-effects in the operand of the sizeof operator.

5.6.4 Unused variables

As part of the variable analysis, a simple test applied to each local variable at the end of its scope to determine whether it has been used in that scope. For example, in:

	int f ( int n )
	{
		int r;
		return ( 0 );
	}

both the function argument n and the local variable r are unused.

5.6.5 Values set and not used

This is a more complex test since it is applied to every instance of setting the variable. For example, in:

	int f ( int n )
	{
		int r = 1;
		r = 5;
		return ( r );
	}

the first value r is set to 1 and is not used before it is overwritten by 5 (this second value is used however). This test requires some flow analysis. For example, if the program is modified to:

	int f ( int n )
	{
		int r = 1;
		if ( n == 3 ) {
			r = 5;
		}
		return ( r );
	}

the initial value of r is used when n != 3, so no error is detected. However in:

	int f ( int n )
	{
		int r = 1;
		if ( n == 3 ) {
			r = 5;
		} else {
			r = 6;
		}
		return ( r );
	}

the initial value of r is overwritten regardless of the result of the conditional, and hence is unused.

5.6.6 Variable which has not been set is used

This test also requires some flow analysis, for example in:

	int f ( int n )
	{
		int r;
		if ( n == 3 ) {
			r = 5;
		}
		return ( r );
	}

the use of the variable r as a return value is reported because there are paths leading to this statement in which r is not set (i.e. when n != 3). However, in:

	int f ( int n )
	{
		int r;
		if ( n == 3 ) {
			r = 5;
		} else {
			r = 6;
		}
		return ( r );
	}

r is always set before it is used, so no error is detected.

5.7 Overriding the variable analysis

Although many of the problems discovered by the variable analysis are genuine mistakes, some may be as the result of deliberate decisions by the program writer. In this case, more information needs to be provided to the checker to convey the programmer's intentions. Four constructs are provided for this purpose: the discard variable, the set variable, the exhaustive switch and the non-returning function.

5.7.1 Discarding variables

Actively discarding a variable counts as a use of that variable in the variable analysis, and so can be used to suppress messages concerning unused variables and values assigned to variables. There are two distinct methods to indicate that the variable x is to be discarded. The first uses a pragma:

	#pragma TenDRA discard x;

which the checker treats as if it were a C statement, ending in a semicolon. Having a statement which is noticed by one compiler but ignored by another can lead to problems. For example, in:

	if ( n == 3 )
	#pragma TenDRA discard x;
		puts ( "n is three" );

tchk believes that x is discarded if n == 3 and the message is always printed, whereas other compilers will ignore the #pragma statement and think that the message is printed if n == 3. An alternative, in many ways neater, solution is to introduce a new keyword for discarding variables. For example, to introduce the keyword DISCARD for this purpose, the pragma:

	#pragma TenDRA keyword DISCARD for discard variable

should be used. The variable x can then be discarded by means of the statement:

	DISCARD ( x );

A dummy definition for DISCARD to use with normal compilers needs to be given in order to maintain compilability with those compilers. For example, a complete definition of DISCARD might be:

	#ifdef __TenDRA__
	#pragma TenDRA keyword DISCARD for discard variable
	#else
	#define DISCARD(x) (( void ) 0 )
	#endif

Discarding a variable changes its assignment state to unset, so that any subsequent uses of the variable, without an intervening assignment to it, lead to a "variable used before being set" error. This feature can be exploited if the same variable is used for distinct purposes in different parts of its scope, by causing the variable analysis to treat the different uses separately. For example, in:

	void f ( void ) {
		int i = 0;
		while ( i++ < 10 )
			{ puts ( "hello" ); }
		while ( i++ < 10 ) 
			{ puts ( "goodbye" ); }
	}

which is intended to print both messages ten times, the two uses of i as a loop counter are independent - they could have been implemented with different variables. By discarding i after the first loop, the second loop can be analysed separately. In this way, the error of failing to reset i to 0 can be detected.

5.7.2 Setting variables

In addition to discarding variables, it is also possible to set them. In deliberately setting a variable, the programmer is telling the checker to assume that some value will always have been assigned to the variable by that point, so that any "variable used without being set" errors can be suppressed. This construct is particularly useful in programs with complex flow control, to help out the variable analysis. For example, in:

	void f ( int n )
	{
		int r;
		if ( n != 0 ) r = n;
		if ( n > 2 ) {
			printf ( "%d\n", r );
		}
	}

r is only used if n > 2, in which case we also have n != 0, so that r has already been initialised. However, in its flow analysis, the TenDRA C checker treats all the conditionals it meets as if they were independent and does not look for any such complex dependencies (indeed it is possible to think of examples where such analysis would be impossible). Instead, it needs the programmer to clarify the flow of the program by asserting that r will be set if the second condition is true.

Programmers may assert that the variable, r, is set either by means of a pragma:

	#pragma TenDRA set r;

or by using, for example:

	SET ( r );

where SET is a keyword which has previously been introduced to stand for the variable setting construct using:

	#pragma TenDRA keyword SET for set

(cf. DISCARD above).

5.7.3 Exhaustive switch statements

A special case of a flow control construct which may be used to set the value of a variable is a switch statement. Consider the program:

	char *f ( int n ){
		char *r;
		switch ( n ) {
			case 1:r="one";break;
			case 2:r="two";break;
			case 3:r="three";break;
		}
		return ( r );
	}

This leads to an error indicating that r is used but not set, because it is not set if n lies outside the three cases in the switch statement. However, the programmer might know that f is only ever called with these three values, and hence that r is always set before it is used. This information could be expressed by asserting that r is set at the end of the switch construct (see above), but it would be better to express the cause of this setting rather than just its effect. The reason why r is always set is that
the switch statement is exhaustive - there are case statements for all the possible values of n.

Programmers may assert that a switch statement is exhaustive by means of a pragma immediately following it. For example, in the above case it would take the form:

	....
	switch ( n )
	#pragma TenDRA exhaustive
		{
			case 1:r="one";break;
			....

Again, there is an option to introduce a keyword, EXHAUSTIVE say, for exhaustive switch statements using:

	#pragma TenDRA keyword EXHAUSTIVE for exhaustive

Using this form, the example program becomes:

	switch ( n ) EXHAUSTIVE {
		case 1:r="one";break;

In order to maintain compatibility with existing compilers, a dummy definition for EXHAUSTIVE must be introduced for them to use. For example, a complete definition of EXHAUSTIVE might be:

	#ifdef __TenDRA__
	#pragma TenDRA keyword EXHAUSTIVE for exhaustive
	#else
	#define EXHAUSTIVE
	#endif

5.7.4 Non-returning functions

Consider a modified version of the program above, in which calls to f with an argument other than 1, 2 or 3 cause an error message to be printed:

	extern void error (const char*);
	char *f ( int n ) {
		char *r;
		switch ( n ) {
			case 1:r="one";break;
			case 2:r="two";break;
			case 3:r="three";break;
			default:error("Illegal value");
		}
		return ( r );
	}

This causes an error because, in the default case, r is not set before it is used. However, depending on the semantics of the function, error, the return statement may never be reached in this case. This is because the fact that a function returns void can mean one of two distinct things:

That the function does not return a value. This is the usual meaning of void.
That the function never returns, for example the library function, exit, uses void in this sense.

If error never returns, then the program above is correct; otherwise, an unset value of r may be returned.

Therefore, we need to be able to declare the fact that a function never returns. This is done by introducing a new type to stand for the non-returning meaning of void (some compilers use volatile void for this purpose). This is done by means of the pragma:

	#pragma TenDRA type VOID for bottom

to introduce a type VOID (although any identifier may be used) with this meaning. The declaration of error can then be expressed as:

	extern VOID error (const char *);

In order to maintain compatibility with existing compilers a definition of VOID needs to be supplied. For example:

	#ifdef __TenDRA__
	#pragma TenDRA type VOID for bottom
	#else
	typedef void VOID;
	#endif

The largest class of non-returning functions occurs in the various standard APIs - for example, exit and abort. The TenDRA descriptions of these APIs contain this information. The information that a function does not return is taken into account in all flow analysis contexts. For example, in:

	#include <stdlib.h>
	
	int f ( int n )
	{
		exit ( EXIT_FAILURE );
		return ( n );
	}

n is unused because the return statement is not reached (a fact that can also be determined by the unreachable code analysis in section 5.2).

5.8 Discard Analysis

A couple of examples of what might be termed "discard analysis" have already been described - discarded (unused) local variables and discarded (unused) assignments to local variables (see section 5.6.4 and 5.6.5). The checker can perform three more types of discard analysis: discarded function returns, discarded computations and unused static variables and procedures. These three tests may be controlled as a group using:

	#pragma TenDRA discard analysis status

where status is on, warning or off.

In addition, each of the component tests may be switched on and off independently using pragmas of the form:

	#pragma TenDRA discard analysis (function return) status
	#pragma TenDRA discard analysis (value) status
	#pragma TenDRA discard analysis (static) status

There are also equivalent command line options to tchk of the form -X:test=state, where test can be discard_all,

discard_func_ret

, discard_value or unused_static, and state can be check, warn or dont. These checks are all switched off in the default mode.

Detailed descriptions of the individual checks follow in sections 5.8.1 - 5.8.3. Section 5.9 describes the facilities for fine-tuning the discard analysis.

5.8.1 Discarded function returns

Functions which return a value which is not used form the commonest instances of discarded values. For example, in:

	#include <stdio.h>
	int main ()
	{
		puts ( "hello" );
		return ( 0 );
	}

the function, puts, returns an int value, indicating whether an error has occurred, which is ignored.

5.8.2 Discarded computed values

A rarer instance of a discarded object, and one which is almost always an error, is where a value is computed but not used. For example, in:

	int f ( int n ) {
		int r = 4 
		if ( n == 3 ) {
			r == 5;
		}
		return ( r );
	}

the value r == 5 is computed but not used. This is actually because it is a misprint for r = 5.

5.8.3 Unused static variables and procedures

The final example of discarded values, which perhaps more properly belongs with the variable analysis tests mentioned above, is for static objects which are unused in the source module in which they are defined. Of course this means that they are unused in the entire program. Such objects can usually be removed.

5.9 Overriding the discard analysis

As with the variable analysis, certain constructs may be used to provide the checker with extra information about a program, to convey the programmer's intentions more clearly.

5.9.1 Discarding function returns and computed values

Unwanted function returns and, more rarely, discarded computed values, may be actively ignored to indicate to the discard analysis that the value is being discarded deliberately. This can be done using the traditional method of casting the value to void:

	( void ) puts ( "hello" );

or by introducing a keyword, IGNORE say, for discarding a value. This is done using a pragma of the form:

	#pragma TenDRA keyword IGNORE for discard value

The example discarded value then becomes:

	IGNORE puts ( "hello" );

Of course it is necessary to introduce a definition of IGNORE for conventional compilers in order to maintain compilability. A suitable definition might be:

	#ifdef __TenDRA__
	#pragma TenDRA keyword IGNORE for discard value
	#else
	#define IGNORE ( void )
	#endif

5.9.2 Preserving unused statics

Occasionally unused static values are introduced deliberately into programs. The fact that the static variables or procedures x, y and z are deliberately unused may be indicated by introducing the pragma:

	#pragma TenDRA suspend static x y z

at the outer level after the definition of all three objects.