Joanju - Programming Prolint by Example: "noeffect"

Yes, Prolint is open source, and yes, Jurjen is very smart, but no, adding new rules to Prolint is not rocket science!

Now that Prolint has been around for a while, we are starting to see various categories of Prolint rules emerging. Some find potential performance problems, some find problems in the code which will make translation (i18n) more difficult, some find problems which would make it more difficult to use dataservers, and some find problems which might just downright be a bug.

We'll have a look here at a rule which falls into the bug-finding category: "noeffect". Try the following code:

def var x as int init 1.
x eq 12.
display x.

In case you are wondering: yes, this sort of bug actually has been found in production code.

The statement "x eq 12." is actually just an expression. It compares x to 12, evaluates to "false", and that's it. It doesn't do anything. It has no effect.

The parser makes it easy to find statements which are just expressions. Every statement which becomes a branch in the syntax tree generated by the parser must have a "head" node. For a statement like:
DISPLAY "hello".
the DISPLAY node is the head node, and all other tokens in the display statement are child or grandchild nodes to the DISPLAY node.

For a statement which is just an expression, Proparse adds a "synthetic" node into the tree, just for tree organization. The synthetic node is an "Expr_statement" node, and it becomes the head node for the statement.

Finding all Expr_statement nodes gives us a starting point. Let's start looking at the code, starting with some comments about the comments:

/* ------------------------------------------------------------------------
   file    :  prolint/rules/noeffect.p
   by      :  John Green
   purpose :  Find statements which are nothing but expressions.
              Of those, evaluate which /cannot/ have an effect.
              User-defined functions might have an effect,
              and methods might have an effect.
              Of the built-in functions, I am only aware of the following
              having any effect:
              DYNAMIC-FUNCTION
              ETIME
              SETUSERID
              SUPER
              CURRENT-LANGUAGE
    -----------------------------------------------------------------

    Copyright (C) 2001,2002 John Green

    This file is part of Prolint.

    Prolint is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
    License as published by the Free Software Foundation; either
    version 2.1 of the License, or (at your option) any later version.

    Prolint is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
    Lesser General Public License for more details.

    You should have received a copy of the GNU Lesser General Public
    License along with Prolint; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
   ------------------------------------------------------------------------
*/

This standard text needs to be included in the header, it is a requirement of LPGL. Prolint is open-source and GPL, under the "Lesser GPL", which means that it can be included in commercial products, but you must include the source and you just can't claim it to be your own. Or something like that. You should not think of this as a limitation of your rights, instead the LPGL provides the benefit that every improvement to Prolint will always be publicly available with source. (Proparse is not free though - due to that "pay the rent" thing.)

{prolint/ruleparams.i}

Here we included the basic configuration and setup required for each Prolint rule. We don't need to know much about what goes on in there to write a basic rule, except that it is the source of some of the variables which we will use below.

RUN searchNode {&insuper} 
    (hTopnode,                /* "Program_root" node */
     "InspectNode":U,         /* name of callback procedure */
     "Expr_statement":U).     /* list of statements to search, ?=all */

RETURN.

Here we ran Prolint's search routine, which is found in the Prolint super procedure, lintsuper.p.

The first parameter it requires is the starting point in the syntax tree generated by Proparse. In almost all cases, we just want to start at the very "root" of the tree - the topmost node. The variable hTopnode is of course made available to us by ruleparams.i and it is a pointer to the topmost node in the syntax tree. That's probably even more than we really need to know.

The procedure searchNode uses a callback mechanism. So, searchNode does the search, and with the results of that search, it runs what we tell it to run. In our case, we want it to run InspectNode, which is an internal procedure in this .p, which we'll describe below. That's the second parameter.

How does searchNode know when it should call InspectNode? When it finds an Expr_statement node within the syntax tree. That's the final parameter.

That's it for getting Prolint to find us all of the nodes that we might be interested in. Now let's look at where the interesting work is done.

PROCEDURE InspectNode:
  /* purpose: see if the statement in theNode is 
     really a statement without effect */
  DEFINE INPUT  PARAMETER theNode        AS INTEGER NO-UNDO.
  DEFINE OUTPUT PARAMETER AbortSearch    AS LOGICAL NO-UNDO INITIAL NO.
  DEFINE OUTPUT PARAMETER SearchChildren AS LOGICAL NO-UNDO INITIAL NO.

Every callback procedure for searchNode requires three parameters. The first is an integer handle for the node that it found. The integer handle was set up by searchNode, and that handle is meaningful to the parser. In our case here, theNode will be a handle to an Expr_statement node.

For this rule, we don't use AbortSearch or SearchChildren. We'll talk about those parameters in future articles about Prolint architecture and other Prolint rules.

Now let's get back to one of our goals: we want to find all Expr_statement nodes, but there are certain expression statements which really do have an effect, and we don't want Prolint to raise warnings about those.

  DEFINE VARIABLE qname       AS CHARACTER NO-UNDO INITIAL "rule_noeffect":U.
  DEFINE VARIABLE i           AS INTEGER NO-UNDO.
  DEFINE VARIABLE numResults  AS INTEGER NO-UNDO.
  DEFINE VARIABLE result      AS INTEGER NO-UNDO.

  result = parserGetHandle().

  check-block:
  DO:

    IF parserQueryCreate(theNode, qname, "USER_FUNC":U) > 0 THEN
       LEAVE check-block.
    IF parserQueryCreate(theNode, qname, "DYNAMICFUNCTION":U) > 0 THEN
       LEAVE check-block.
    IF parserQueryCreate(theNode, qname, "ETIME":U) > 0 THEN
       LEAVE check-block. 
    IF parserQueryCreate(theNode, qname, "SETUSERID":U) > 0 THEN
       LEAVE check-block.
    IF parserQueryCreate(theNode, qname, "SUPER":U) > 0 THEN
       LEAVE check-block.

(Here we have run into direct calls to some of Proparse's "parser" functions. If you would like to know more beyond the fact that Proparse function names all begin with "parser", please see www.joanju.com/whitepapers/, and look for the link to "Proparse API Quick Introduction".)

We query for specific types of nodes within the Expr_statement branch. Each time the query is built, it returns the number of results, and if there are any results, then we skip out of check-block without issuing a warning. As described in the comments at the top of the program, user defined functions, as well as some built-in functions, are actually functions that perform some action. Most built-in functions only evaluate something and return a value - they do not actually change the state of the application.

For a better understanding of the next code snippet, let's first have a look at two example statements:

  /* hEditor is a handle to an editor widget. 
     We want to switch the editor to read-only: */
  hEditor:READ-ONLY.

  /* hQuery is a handle to a dynamic query.
     We want it to fetch the first record: */
  hQuery:GET-FIRST().

The first statement has no effect, because READ-ONLY is not a method, it is only a property. We want Prolint to raise a "noeffect" warning for this statement because the programmer probably made a mistake. The second statement actually does have effect (because it fetches a record into the buffer) so we do not want to raise a warning. So how can we tell if something is a property and not a method? To answer this, it is necessary to use the Proparse Tokenlister tool and see how Proparse translates these statement into tokens:

    Expr_statement    
        Widget_ref    
            Field_ref    
                ID    hEditor
            OBJCOLON    :
            READONLY    READ-ONLY
        PERIOD    .

    Expr_statement    
        Widget_ref    
            Field_ref    
                ID    hQuery
            OBJCOLON    :
            ID    GET-FIRST
            Method_param_list    
                LEFTPAREN    (
                RIGHTPAREN    )
        PERIOD    .

The Tokenlister shows us that we can recognize a property because it contains an "OBJCOLON" token while it does not contain a "Method_param_list" token. Having learned this, let's continue with the Prolint rule:

    /* The last thing we check on is methods. 
       a statement like  handle:READ-ONLY has no effect, 
       a statement like  handle:GET-FIRST() does have effect,
       so look for OBJCOLON _not_ followed by node type "Method_param_list" */
    ASSIGN
      result = parserGetHandle()
      numResults = parserQueryCreate(theNode, qname, "OBJCOLON":U).
    DO i = 1 TO numResults:
      parserQueryGetResult(qname, i, result).
      /* from "widattr" in the tree spec:
       * (OBJCOLON . #(Array_subscript...)? #(Method_param_list...)? )+
       * First, move to the method or attribute name node - the . 
       * (i.e. any) token after the OBJCOLON.
       * Then, simply  check next sibling twice.
       * First time might be Array_subscript.
       */
      parserNodeNextSibling(result, result).
      IF parserNodeNextSibling(result, result) = "Method_param_list":U THEN
         LEAVE check-block.
      IF parserNodeNextSibling(result, result) = "Method_param_list":U THEN
         LEAVE check-block.
    END.

In the check made above, we are assuming that attributes do not perform actions, and methods do. For the most part, this assumption works quite well, and it is certainly close enough for the purpose of issuing warnings.

    /* If we got here, then the expression statement probably has no effect.
     * "Expr_statement" is a synthetic node, so it won't have a line
     * number. Instead, use the first non-synthetic node for PublishResult.
     */                              
    parserNodeFirstChild(theNode, result).  
    DO WHILE parserGetNodeLine(result) EQ 0 :
       parserNodeFirstChild(result,result).
    END.

We've played a little trick here to find a node with a valid filename and line number. Some nodes in the parser's tree are inserted for tree structure only, and don't represent text from any source file. That is the case with an Expr_statement node. We just look for the first descendant node which does represent text from an actual sourcefile.

    RUN PublishResult {&insuper} (compilationunit,
                                  parserGetNodeFilename(result),
                                  parserGetNodeLine(result),
                                  "Statement has no effect":T,
                                  rule_id).

  END.  /* check-block */

Here we call Prolint's PublishResult procedure. PublishResult is responsible for sending a message to the active output handler. Prolint has various output handlers, and each of those is responsible for reporting a violation found in the code. Some output handlers report out to a text file, some report to the UI, some are geared towards putting the error messages into your favourite editor, and yet another reports the errors to a database for later analysis.

The arguments expected by the PublishResult procedure are:

The name of the current compile unit (the program being linted)
The name of the source file where the violation was found (may be an include file)
The line number within the source file where the violation was The error message to display
This rule's identifier - in this case, it is "noeffect"

The variables compileationunit and rule_id are defined and initialized via ruleparams.i.

  
  parserReleaseHandle(result).
  parserQueryClear(qname).

END PROCEDURE. /* InspectNode */

That's it! We tell the parser to release a couple of resources that we've grabbed, and we're done.

This rule had a few complications in it, and it actually requires more code than some of the simpler Prolint rules. However, I hope that it was still a good example for demonstrating the basic steps that are done in most Prolint rules:

Include ruleparams.i
Call searchNode with a node type to find and the name of a callback procedure
Define the callback procedure
The callback procedure may or may not perform further checks on the node that was found
For nodes that are found to have actually violated some rule, call the PublishResult procedure

Article by John Green, with contributions from Jurjen Dijkstra, corrections and suggestions from Judy Hoffman Green.

Programming Prolint by Example: noeffect