Joanju - Tree Parsers

Tree parsers are a very powerful, scalable mechanism for building parser-based tools. If you are strictly interested in 4gl development tools, then you won't find tree parsers of any interest. However, if you are faced with a huge project and need to build some sophisticated code parsing tools, then you are in for a treat. Read on!

I have yet to come up with a good way to explain what a tree parser is, so I'm going to launch straight into an example. These examples use Java. (Other options are C++ and C#.) Please open this link in another window so that you can refer to it: Proparse Tree Specification. That is the complete P4GL syntax, as far as Proparse sees it.

The most interesting aspect of that tree specification is that it is used as the input for generating programs that we can compile and run. We use a parser-generator named "Antlr" to do that. You should see at the top of the comments that the file name is "JPTreeParser.g". Antlr grammar files use ".g" as the extension.

However, as it stands, "JPTreeParser.g" just generates a program which parses a Proparse syntax tree (Abstract Syntax Tree - AST). It doesn't actually have any actions - it parses the tree but does nothing else. It's our job to add code to the grammar file which will get run at the appropriate time during the parse of the tree.

procedurestate
	:	#(	p:PROCEDURE ID
			{	tpSupport.scopeAdd(#p); }
			(	#(	EXTERNAL constant
					(	CDECL_KW
					|	PASCAL_KW
					|	STDCALL_KW
					|	#(ORDINAL expression )
					|	PERSISTENT
					)*
				)
			|	PRIVATE
			|	IN_KW SUPER
			)?
			block_colon code_block (EOF | #(END (PROCEDURE)?) state_end)
			{	tpSupport.scopeClose(); }
		)
	;

The above is taken from the grammar for a real, working tree parser. If you were to compare it to "procedurestate" in JPTreeParser.g, you would find that it is different in these ways:

The p: in front of PROCEDURE was added. This gives us a way to refer to the PROCEDURE node; "p" becomes our label for the reference to that node.
The action code: { tpSupport.scopeAdd(#p); } was added. This code just makes a function call - it calls the "scopeAdd" method of the "tpSupport" object. It also passes an argument - and that argument is the AST node which we added a reference to with the "p" label. We refer to labeled nodes with a hash mark.
The action code: { tpSupport.scopeClose(); } was also added.

As you can probably imagine, the purpose of the tpSupport object's scopeAdd and scopeClose methods is to add and close a scope, because we are interested in the scope where variables and buffers are defined.

definevariablestate
	:	#(	DEFINE (#(NEW (GLOBAL)? SHARED ) | SHARED)? VARIABLE id:ID
			(fieldoption)* (triggerphrase)? state_end
			{	tpSupport.defineVariable(#id); }
		)
	;

As with the PROCEDURE snippet, in this snippet we use another label "id" so that we can refer to the ID node and pass it as an argument to the defineVariable method in our tpSupport object.

Maybe what we're trying to do here has become a little more clear: as we parse the tree, we want to keep track of what variables are defined. But more importantly, we want to keep track of what scope those variables are defined in. The tpSupport object, of course, keeps track of those scopes and other useful context information, and stores it all away so that it can be referenced later.

These examples come from a treeparser which generates a complete set of symbol tables, so that we know exactly which field, table, or variable is being referred to by any node in the AST.

So What?

Hopefully you can see that using a grammar file to generate a tree parser gives us good organization and a complete context for us to put actions into.

Hopefully you can also see that this is much easier than building these things by hand. If you had to write the code for finding the ID node in a DEFINE ... VARIABLE statement, you would have a lot more work to do. If you have to find a lot of nodes in the tree, then you are quickly going to find that you have written a lot of source code. By eliminating a lot of hand-built source code, we have a much more scalable tool building environment.