Semantic Checking - C

Semantic Checking - C

Post by johnvoltai » Tue, 01 Feb 2005 03:14:04


Hello! We need help.

1. We are trying to improve the semantic analysis of a particular
compiler, and we need to identify the errors that compilers usually
can or cannot detect. Please take time to review the following list of
errors we've gathered and check if any or all of them belongs to the
semantic checking routine during compilation/runtime. If you know a
specific semantic error that we missed, we'll appreciate it if you
could add the details.

2. The process we are planning to do is to make a static semantic
checking of C programs so that these kind of semantic errors would not
occur upon execution the program. This particular process is somewhat
like we can call as a diagnostic procedure that will walkthrough the
source code (without the necessity of running it) and see if any of
the statements will violate any semantic rules.

From what we've understand, static semantic checking refers to the
analysis of expected program meaning or flow before
compilation/execution, while dynamic semantic checking refers to the
analysis during execution. Could anyone affirm on these?

Is there anyone here that we could consult or discuss with regards to
this subject?

Good day to all and thanks in advance!

Sincerely,
John Voltaire

Common semantic errors in C language:

1. Use of function without function prototypes
2. Code with no effect (dead code)
3. Division by zero
4. Use of functions and variables which are defined but not used
5. Use of functions and variables with defined arguments that are never
used
6. Use of functions and variables that return either with or without
any assigned value
7. Use of functions and variables that return values that are never
used
8. Subscript out of bounds
9. Booleans that always evaluate true or evaluate false
10. Checking of infinite loops as well as loops that cannot be exited
or entered
11. Statements that cannot be reached during execution
12. No identifiers or variables are used twice in the same block or
scope
13. The number and types of arguments in a function call must be the
same as the number and types of the prototypes
14. A return statement must not have a return value unless it appears
in the function prototype that is declared to return a value
15. Break statements appear outside enclosing constructs where a break
statement may appear
16. Elements of enumerated types are repeated
17. Variable names appear in the same lexical scope
18. Labels are repeated
 
 
 

Semantic Checking - C

Post by nmm1 » Sat, 05 Feb 2005 12:39:40


Well, I could, but why should I do your work for you for no pay?
Being soft-hearted, I will do some of it.


As many compilers do, including gcc and most high-quality ones.


It is as good a set of definitions as any.


Almost certainly. But you should start by making it clear what basis
you are assuming - e.g. commercial consultancy, asking for free advice
or what.


Not errors and the former is always syntactic not semantic.


Those (and others) are all syntactic errors where the C standard
requires the compiler to diagnose them statically.


Regards,
Nick Maclaren.

 
 
 

Semantic Checking - C

Post by torben » Sat, 05 Feb 2005 12:41:01

johnvoltaire" < XXXX@XXXXX.COM > writes:



More or less. Static checking does not have to be before compilation
(as it can be done on the compiled code), but it is certainly before
execution. Dynamic checking is, indeed, at runtime.

Note that static checking can never be precise, as it is not
computable which parts of a program will ever be executed (so even
dead code analysis is approximate) or what variables a variable can
have. So you must decide wheter you want to err on the side of safety
(if the analysis is unsure, issue a warning or error message) or only
report errors that are certain to occur. A compromise may be to
report errors that will occur if the relevant bit of code is ever
executed.


It is statically checkable given the assumption that the code is
reachable.


There are two types of dead code: Unreachable code and code with no
visible net effect. Unreachable code can be approximated such that
you will find some definitely unreachable code, but you can not find
all unreachable code. Code with no visible effect can also be found
statically. Typically, it reduces to assignments to dead variables.
You must be careful that you don't remove assignments such as x=a[i],
as even if x isn't used, the lookup might trigger an error. And
silently removing erroneous code is bad, as the same program may later
fail if compiled with a compiler that does not remove this code.


You will rarely be able to find this statically. You may be able to
issue a warning that the division _may_ be by zero, but you will get
many false positives. But a static analysis might find cases where it
can see that the divisor is non-zero, so a dynamic check can be
removed. But with C, you might not want dynamic checks.


Also statically checkable, though again, you can not be precise.


Same as above.


Many compilers warn about this. It reduces to checking if there is a
return statement on every path from the entry point to the exit point.


It is visible from the type of a function whether it returns a value,
so this is easy. However, it is common in C to ignore returned values
from functions. Many standard functions return values that are
ignored more often than not.


As with division by zero, you can find cases where you can see that
this isn't possible and so avoid a dynamic check. However, there will
be many cases where no out-of-bounds can occur but the static analysis
will fail to recognize this, so writing error messages for unsure
cases will generate a lot of false positives. The problem with many
false positives is that programmers will tend to ignore these
warnings, even when they are not false positives (a case of "the boy
who cried wolf").


According to the C specification, they always will, as any integer has
an interpretation as a boolean (0 == false, nonzero == true). You can
analyse whether you get only 0 or 1 as results, but you will find that
a lot of programmers use "if (x)" to mean "if (x!=0)".


The first part is the halting problem and, hence undecidable. Also,
some programs deliberately have infinite loops that are meant to be
broken by interrupts. The second part is reachability similar to the
problems of unreachable code and whether there is a return on all
paths to the exit point of a function.


See above.


I assume you mean "declared" rather than "used". This is easily
checked.


A type issue, agian easily checke
 
 
 

Semantic Checking - C

Post by Jeremy Wri » Sat, 05 Feb 2005 12:42:05

> 1. We are trying to improve the semantic analysis of a particular

19. Variables that are used by not defined (on all paths to this
point).
 
 
 

Semantic Checking - C

Post by jacob navi » Sat, 05 Feb 2005 12:42:27

Here are some hints from the lcc-win32 compiler



lcc-win32 reports this as a warning.


lcc-win32 reports an assignment that is not used further
in the program. Other dead code could be reported if more
was done in this direction.


This is detected at compile time when the division is made
by two compile-time constants. Code could be generated at
run time to check division.


lcc-win32 reports this when the function/variable is static.
For non-static variables, this is done by the lcc-win32's linker
lcclnk.

This is reported and easy to detect.


A warning is issued when a function declared as returning a value
doesn't return one.


This is not reported but could be done. If you want this, it would be
very difficult, since this is common practice:

printf("hello\n");

doesn't use the return value of printf but it is valid code.


I have been arguing in this group (and in the standards discussion
group) about the necessity of doing this. The problem is that
you would need a new kind of pointers, (bounded pointers) that
would carry size information.


lcc-win32 forces this. Any boolean is casted to 1 or zero.


lcc-win32 checks this.

lcc-win32 checks this


lcc-win32 will warn you that a higher scope variable is shadowed
if you request this type of warnings.


This is checked


The same as item 6.


This is a syntax error


This is an error. It stops compilation.


I suppose that you mean identical variable names. This is an error
(redefinition).

The same error as 15: redefinition, syntax error. Compilation stops

Feel free to write me if you want further details.

lcc-win32: http://www.yqcomputer.com/ . *** ia.edu/~lcc-win32
 
 
 

Semantic Checking - C

Post by Neal Wan » Sat, 05 Feb 2005 12:43:47


static semantic checking is in the domain of static analysis. Yes, static
analysis checks programs' integrities without actually executing programs.
google "program static analysis", "static analysis" or "abstract
interpretation", you will find more details explanation.


The following classifies all errors you listed.

A. it can be determined statically.
1,2,4,5,6,7,11,12,13,14,15,16,17,18
B. it could partially be determined statically, and runtime check is
necessary to ensure free of errors.
3,8,9
C. In general it could not be detected statically.
10.

Actually, GCC can detect most of errors in class A.

Neal
 
 
 

Semantic Checking - C

Post by Tommy Thor » Sat, 05 Feb 2005 12:44:51

There is only so much heuristics can do. Better results can be had when
a bit of user annotation is added. Many project do this, Linus' sparse
checker is one such example.

It's quite common to write a general macros which when instantiated
results in unused variables, unreached code, lots of constant
expressions, etc. and without a richer annotation you going to get a lot
of false negatives, turning the warnings into a nusance rather than a
help (just like gcc moronic complaints about "a && b || c").


A subset of them are considered static (syntax or type) errors and are
already caught be most compilers:

> 13. The number and types of arguments in a function call must be the
> same as the number and types of the prototypes
> 14. A return statement must not have a return value unless it appears
> in the function prototype that is declared to return a value
> 15. Break statements appear outside enclosing constructs where a break
> statement may appear
> 16. Elements of enumerated types are repeated
> 17. Variable names appear in the same lexical scope
> 18. Labels are repeated


These are nearly never occur statically:

> 8. Subscript out of bounds


These are no always errors (think parameterizable programs). Some are
even very common (5 & 7)

> 2. Code with no effect (dead code)
> 4. Use of functions and variables which are defined but not used
> 5. Use of functions and variables with defined arguments that are
> never used
> 6. Use of functions and variables that return either with or without
> any assigned value
> 7. Use of functions and variables that return values that are never
> used
> 9. Booleans that always evaluate true or evaluate false
> 11. Statements that cannot be reached during execution
> 12. No identifiers or variables are used twice in the same block or
> scope

This is not an error, except maybe in 1st year students programs (the
"cannot be entered" is an instance of unreachable code ~ 2):

> 10. Checking of infinite loops as well as loops that cannot be exited
> or entered

Tommy
 
 
 

Semantic Checking - C

Post by hanna » Sun, 13 Feb 2005 12:20:32

Hello!





Many architectures trap division by zero in hardware anyway, so unless
you want a special kind of handling when it happens, compilers usually
don't need to include instructions for checking the divisor before the
division (attempt).


Kind regards,

Hannah.