Using The API

Proparse has a pretty simple interface, and is one that can most easily be learned by example. Have a look at proparse/api/tokenlister.p as a simple example of the API. It is also a good example of using a recursive internal procedure to walk the syntax tree. Open that file up and keep it open while you read the rest of this section, which will fill in a few details.

Also have a look at examples/query_length.p which was described in Starting With Something Practical. It is a practical example of using Proparse's named queries to find occurrences of something specific in your source code.

Important Usage Notes

The DLL is intended to be loaded once, and left in memory for the duration of your Progress session. If you load it and unload it, you will find that the memory used by the DLL has not been released. Repeatedly loading and unloading the DLL will cause your Progress session to use more and more memory.

However, if you leave the DLL in memory, you will find that no matter how many times you make function calls into it, its memory usage will remain constant.

The sample programs load, but never release the DLL.

Also, you should remember to APPLY "CLOSE" to the handle that your use for proparse.p, so that it has a chance to clean up MEMPTRs etc. Don't just DELETE it.

The 4GL part of the API

A 4GL API is provided through two files: proparse/api/proparse.p and proparse/api/proparse.i. To use this API, run proparse/api/proparse.p persistently, store its handle in a named procedure handle, and then pass the name of that handle to proparse/api/proparse.i. We will refer to that procedure handle as the parser handle. Use APPLY "CLOSE" to close the persistent parser procedure when you are done with it.

Proparse.i provides the forward-declarations necessary to use the function calls defined in proparse.p. We provide a 4GL function definition for each of the DLL functions, which would normally have to be called with a RUN statement. We do this because the function definitions can make for some slightly more elegant code.

The functions are fairly consistent in their names and parameters. All parser function names begin with "parser", to help prevent name clashes with other function libraries. Many have input parameters like:

Configuration Settings

There are various configuration settings which are used for configuring Proparse's behavior. Some of these settings are simply configuration flags, like whether or not the parser has been initialized.

Some of these settings might be important in order for Proparse to be able to properly parse your source code. Specifically, some of these configuration settings determine how Proparse interprets functions within &IF preprocessor conditions, such as OPSYS, PROPATH, and KEYWORD-ALL.

Many of these settings are configured automatically by proparse.p, which does its job by looking directly at your Progress environment settings and then configuring Proparse accordingly. Some of these settings, though, need to be configured manually. The configuration for the preprocessor KEYWORD-ALL function is an example of one that must be configured manually.

The functions parserConfigGet and parserConfigSet are used for getting and setting Proparse's configurations.

See Configuration Settings Reference for details about the various settings.

Database Alias Names

If any of your programs use database alias names, then you must tell Proparse about those - otherwise there will be parse errors.

The following small program tells Proparse that "aliasname" is an alias for the database "dbname". Because the Proparse DLL is not unloaded after each use, the alias remains in effect (within Proparse) for the duration of your Progress session.

DEFINE VARIABLE parserHandle AS HANDLE NO-UNDO.
RUN proparse/api/proparse.p PERSISTENT SET parserHandle.
{proparse/api/proparse.i parserHandle}
parserSchemaAliasCreate("aliasname", "dbname").
APPLY "CLOSE" TO parserHandle.
Also see: parserSchemaAliasCreate and parserSchemaAliasDelete .

Parsing a program

Use parserParse with a CHARACTER filename INPUT parameter to parse a program. Watch its LOGICAL return value. If it is FALSE, then an error has occurred.

Node Handles

The parser API uses integers as references to pointers to nodes. All this means is that for you to define a node handle, you simply define an INTEGER variable. With your INTEGER defined, you use it to store the return value of the parserGetHandle function, which takes no arguments. You now have an integer number which has been handed to you by the DLL, which is to be used as a reference to a pointer (stored in the DLL) to a node.

Once you have created your node handle, you can use it as the INPUT parameter to other functions which require a node handle. One such function is parserNodeTop, which stores a handle to the topmost node in the node handle that you provide. The topmost node is always a special node of type Program_root.

There are functions for getting attribute values from your node. Those functions take your node handle as the single INPUT parameter, and then return the attributes. Examples of functions for getting node attributes are parserGetNodeType and parserGetNodeLine.

There are functions for getting other nodes, such as parserNodeFirstChild, parserNodeParent, parserNodeNextSibling, and parserNodePrevSibling. Those functions require two INPUT parameters. The first is the "where from" node handle. The second is the "store a pointer to the resulting node into this node handle" node handle. You can use the same node handle for both parameters, with the effect of changing your node handle to point from one node to the next. The function parserNodeStateHead is similar, but it finds the head node of the enclosing statement or block.

Releasing Node Handles

If you are recursively or iteratively creating node handles, you should release them as you finish with them. Use the parserReleaseHandle function to release your node handle.

By releasing a node handle, all you are doing is telling the DLL that it can now re-use that node pointer. If, for example, you release node number 12, it is possible that the next time you use the function parserGetNodeHandle, you will be given the number 12 again. There is no practical limit to the number of node handles that you can keep.

All nodes and node handles are cleared away each time you parse a new program (i.e.: each time you use the parserParse function).

Errors and Return Values

Functions for Proparse Error Handling

Proparse.p, as provided, does not display error messages or do any sort of error handling. That is for the sake of keeping it small and efficient, and also because different uses of the API will demand different kinds of error handling.

For the most part, it should be sufficient to check for error conditions in two places in your parser-based programs:

For example:
IF NOT parserParse(filename) THEN DO:
  MESSAGE parserErrorGetText() VIEW-AS ALERT-BOX ERROR.
  RETURN.
END.
/* ... and ... */
IF parserErrorGetStatus() < 0 THEN DO:
  MESSAGE parserErrorGetText() VIEW-AS ALERT-BOX ERROR.
  RETURN.
END.
The function parserErrorGetStatus returns -1 if a warning exists, and -2 if an error exists. Use parserErrorGetText to retrieve the error or warning text. Note that the error status remains in effect, and the error (or warning) text is available, until parserErrorClear is called, or until parserParse is called again.

Actual DLL Return Values

If you have looked into the code in proparse.p, then you may have noticed that, while the 4GL functions often return LOGICAL, many of the DLL calls are actually returning an integer.

For the DLL functions, where the return value does not have a conflicting meaning:

However, the parser*() functions defined in proparse.p just return a LOGICAL to keep things simple.

Queries

To make it possible to find specific nodes within your syntax tree without having to write recursive functions, Proparse has an API for named queries. Because the queries are named, you can create an arbitrary number of queries. The following functions are used for working with Proparse queries: parserQueryClear, parserQueryCreate, and parserQueryGetResult.

The program examples/query_length.p is a straightforward example which uses queries. You work with Proparse queries a little like the way that you work with database queries. First you create a query, and then you work with the result set. In the case of Proparse, your query finds nodes of the node type that you specify. Normally, your query would start at the topmost node (i.e. the node found with parserNodeTop), but sometimes your queries may start at other nodes. Perhaps your new query will start at a node which was found via a previous query.

The return value of function parserQueryCreate is an integer. It is the number of nodes in the query result set. This is the key to working with the result set. You store the result of parserQueryCreate in an INTEGER variable, we'll use a variable named numResults for discussion purposes here. Once you have the number of results, then you can simply loop for, say, yourCounter = 1 TO numResults. You pass yourCounter to parserQueryGetResult to fetch your results one at a time.

parserQueryGetResult also requires a valid node handle as a parameter, and after parserQueryGetResult has been called, you use that node handle to reference the node which was found as part of the query result set.

It is not normally necessary to use the parserQueryClear function to clear out the result set when you are done with it. Each time you call the parserParse function, any old queries get cleared out. However, if you are creating many queries in a loop, for a single parse, then you might want to clear out queries when you are done with them.

Unfiltered Queries

You can pass an empty string "" as the node type to parserQueryCreate to put all nodes into the results set. This allows you to view part of the tree, or all of the tree, as a flat set of nodes. Instead of writing a recursive program to walk through the tree structure, you can use a simple loop to visit each node. In the 4GL, simple loops can be much faster than recursive functions. Also, when the tree is flattened, operator nodes are placed in between their operands so that this feature is especially useful for printing out code.

Note however that with this approach, you lose the benefit of the structure of the tree. Some parsing applications are much better served with a tree structure than with a flat vector of nodes.

Queries and the Scanner

Queries work the same for investigating a scan result set (token/symbol list) as they do for investigating a parse result (syntax tree). There is an additional query option first_where_line= for working with scan results. See the Scanner subsection Using Queries with the Scanner for a description.

Hidden Tokens

Normally a parser discards tokens which are not necessary for evaluating the meaning behind the code being parsed. Obvious examples of unneeded tokens are whitespace and comments.

Proparse works a little different than most parsers. Proparse is designed so that you can work with those tokens within the syntax tree. Proparse preserves whitespace (WS) tokens, COMMENT tokens, as well as the following: AMPMESSAGE, AMPANALYZESUSPEND, AMPANALYZERESUME, AMPGLOBALDEFINE, AMPSCOPEDDEFINE, AMPUNDEFINE .

Unlike regular nodes, Proparse only allows you to work with one hidden token at a time. There are no handles to work with. Because of this, the functions have less parameters and are a little simpler to work with than the functions for regular nodes.

See the example program examples/codeprint1a.p for an example of using the hidden token functions for retrieving whitespace. For each node that it displays, it also displays the whitespace tokens (if any) which come immediately before it.

The function parserHiddenGetBefore finds the hidden token, if any, which comes immediately prior to the node referred to by the input node handle. It returns TRUE if a hidden token is found.

The function parserHiddenGetFirst finds the first hidden token, if any, which comes immediately prior to the node referred to by the input node handle. For example, if we have a node with three hidden tokens in front of it (say, whitespace, then a comment, then more whitespace):
hiddenFirst diagram
then parserHiddenGetFirst finds the hidden token containing the first whitespace, and makes that the "current hidden token".

The function parserHiddenGetNext would then find the comment, using parserHiddenGetNext again would find the second whitespace, and then a third call to parserHiddenGetNext would return FALSE - there would no longer be any hidden token available. The function parserHiddenGetPrevious of course goes in the opposite direction.

If a hidden token is available, then you can use the function parserHiddenGetType to get the current hidden token's type. The function parserHiddenGetText returns the current hidden token's text. In the case of "WS" tokens, this will be any number of contiguous space, tab, newline, and carriage return characters. The function parserHiddenGetFilename returns the name of the source file where the current hidden token's text came from. The function parserHiddenGetLine returns the line number within the source file where the current hidden token's text came from.

Node Attributes

There are standard attributes which all nodes have, such as node type, node text, node filename, and node line number. In addition to those standard node attributes, nodes may have an arbitrary number of other attributes. Some of those attributes may have been set by Proparse, and some of those attributes may have been set by you or by Proparse-based tools that you are using.

Use the function parserAttrGet to get a node's attributes.

The node attributes are stored within the syntax tree (within Proparse) with unique integer keys and unique integer values. Integers are stored in the nodes in the syntax tree, instead of character strings, to minimize the amount of storage space required by those attributes. However, to make programming from the 4GL easier on the eyes, Proparse does an internal mapping of attribute integer keys and values to unique attribute strings.

However, if you want to mark up the syntax tree (sometimes called "decorating the tree") with node attributes of your own, then you must set and get the node attributes with integer values. See parserAttrSet and parserAttrGetI.

To prevent clashes between different uses of attribute integers, we have established the following guidelines:

If third party tools want to reserve ranges, we will post a list of those ranges on our website. This would only ever be important if you want to use two tools against the same syntax tree, which is not likely to happen very often anyway.

See also: Node Attributes Reference

Store Type Attribute

For a "RECORD_NAME" node, you might want to know if that record or table reference is for a database table, a temp-table, or for a work-table. You can find out by using the unique attribute key string value of "storetype". For "RECORD_NAME" the possible unique attribute value strings are: "st-dbtable", "st-ttable", and "st-wtable". See also storetype in the Node Attributes Reference.

state2 Attribute

The value of the second keyword for disambiguating an otherwise ambiguous statement head node. See state2 for a more complete description.

statehead Attribute

Nodes have this attribute if they are the head node of a statement. See statehead.

Source Code Marking

Some kinds of parser-based tools need to be able to watch for special markings in your source code. For example, there is a class of tools called "lint" tools which are designed to look through your source code to find potential problems, especially potential problems that the compiler doesn't give you any warnings about. (Prolint is a tool developed for doing exactly that against Progress 4GL source code.) For example, a lint-like tool might warn you if you forgot to put NO-UNDO into a variable definition. This is helpful, except in the case where you intentionally left off the NO-UNDO option. In those cases, you will want to be able to put a special marker into your source code to tell your lint tool not to warn you about that particular instance.

Proparse provides at least two ways to do that. One way would be to use specially formatted comments, and then to look for those by using the hidden tokens functions. Proparse provides another method though which is easier to use. It allows you to create real nodes (not just hidden tokens) in the syntax tree.

You can use the function parserConfigSet with parameter values "show-proparse-directives" and "true" to enable this feature. (By default, its value is "false".)

Now to describe the marking that you can put into your code. An undefined preprocessor name can be inserted into your Progress source code without impacting the behavior of the program. We take advantage of this for our marking method. Normally all undefined preprocessor names have no impact on Proparse or its resulting syntax tree. However, if you set the configuration flag "show-proparse-directives" to "true", then Proparse watches for {&_PROPARSE_} directives, and allows those to be inserted into your syntax tree anywhere a statement may be inserted. In other words, you may place {&_PROPARSE_} directives anywhere in your source code where you would be able to place a complete Progress statement. If you set "show-proparse-directives" to "true" and insert a {&_PROPARSE_} directive in the middle of a statement, your source code will not parse. Unless you set "show-proparse-directives" to "true", {&_PROPARSE_} directives are ignored.

{&_PROPARSE_} directives create nodes of node type "PROPARSEDIRECTIVE". To add meaning to the _PROPARSE_ directives, simply add text (you decide what) before the closing curly. For example: {&_PROPARSE_ your meaningful text here} will create a node with type "PROPARSEDIRECTIVE", and the node's "proparsedirective" attribute will be "your meaningful text here". You use the function parserAttrGet to retrieve the value of the "proparsedirective" attribute, for example: parserAttrGet(theNode, "proparsedirective":U).

Note that whitespace between the "_PROPARSE_" and the first non-whitespace character is discarded. However, any whitespace between your text and the closing curly brace is not discarded.

Customized Token Recognition

The functions parserDictAdd and parserDictDelete as well as the new node attribute from-user-dict allow you to play with alternative names for token types. For example, you could use parserDictAdd("define_const", "DEFINE") in order to make "define_const" a valid synonym for DEFINE.

A tree walker can find nodes related to user dictionary entries, and then make transformations to the tree based on those specially named nodes. In the "define_const" example, you might want to replace variable references with string or numeric literals, raise a syntax error if assignment of the variable is attempted, remove the define statement, comment the code where substitutions were made, etc.

Once all transformations were made to ensure that the user-defined language extensions were converted to valid 4gl syntax, the tree could be written out to a new .p file, ready for handing over to the compiler.

Fun ideas for playing with might include extending the 4gl to allow user-defined datatypes (classes), more object-oriented syntaxes, aspect-oriented programming, etc.

Preprocessor Listing File

In order to review how Proparse has evaluated the preprocessing within a compile unit, a "listing" file can be written out.

You enable this feature by telling Proparse which file name to write the listing out to:
parserConfigSet("listing-file", "/my/listing/file.txt")
and disable it with:
parserConfigSet("listing-file", "").

The file is written to (overwritten) each time a new compile unit is parsed.

The output file is designed for use by programs or scripts which read the file - it is not designed to be looked at without the aid of some sort of viewer. For example, it would be easy to write a script to generate an HTML view of the preprocessing done for a compile unit.

The file format is small and simple, but requires a little explanation. Rather than list entire file names, we list a file index number. At the end of the listing file, there's a cross reference to tell you which file number goes with which file name. We do it this way for efficiency sake, especially if we consider that we will want the data from these listing files to be stored persistently.

Each line starts with three numbers: The file index number, the line number, and the column number. There are three zeros "0 0 0" if that information is not relevant or for some reason not available.

Here is the format. "9" represents a number, "0" means that "0" will be written...

	9 9 9 globdef name value
	9 9 9 scopdef name value
	9 9 9 macroref name
	0 0 0 macrorefend
	9 9 9 undef name
	9 9 9 include 9
	0 0 0 incarg {name|9} value
	0 0 0 incend
	9 9 9 ampif {true|false}
	9 9 9 ampelseif {true|false|?}
	9 9 9 ampelse {true|?}
	9 9 9 ampendif
	0 0 0 fileindex 9 filename
Whenever it is something that can be calculated, we do not show names or values. Again, this is for persistent storage efficiency.

For "ampif", we show if it evaluated to true or false. For "ampelseif" and "ampelse", the value would be "?" if it's not evaluated, because a "true" &if or &elseif has already been evaluated.

It is important to note that the value for include arguments requires extra processing. In order to keep the listing file such that there is one line per entry, any line breaks in an include argument value are replaced. Backslashes are replaced with double backslash, newlines are replaced with backslash-n, carriage returns are replaced with backslash-r. To convert the argument back to its original string, do a search and replace. (Double backslashes first, then backslash n and r to newline and carriage return characters.)

Super Class Syntax Tree

The integer node attribute for key 2100 is valid for the CLASS node only. It returns an integer handle to the CLASS node of the super class, or zero if the handle is not available. This handle was already created internally - do not use the getHandle() function.

NOTE: The super tree might not be available if "multi-parse" caching is turned on. If caching is turned on, then the trees for those supers are available the first time they are needed. After that, only their inheritance information is cached internally by Proparse so that it does not need to be re-parsed. In other words, if the super class's syntax tree needs to be examined by your application at some point anyway, then examine it the first time it becomes available. This can save you an extra call to parse(), and save your application from the redundant processing overhead.