Syntax Trees

What's "token type" and "token text"?

A lexical scanner turns a character stream into a token stream. Given the following input:

DO: DISPLAY "Hello world!". DISPLAY 1 + 2 * 3. END.

Proparse.dll creates tokens with attributes like token type and token text. For our purposes here, let's represent each token as a pair of values: (token type, token text). Our tokens can then be represented, for a few examples, as (DO,DO), (LEXCOLON,:), (QSTRING,"Hello world!"), (PLUS,+), etc.

Why does the MULTIPLY come before the 2 and the 3?

A parser builds a token stream into a tree. Proparse.dll puts a token at each node of the tree. One part of our example program would be pulled into a tree that looks something like this (showing token types rather than token text values):

DISPLAY
      |
      +--QSTRING--PERIOD

(As an aside: a normal parser would discard the PERIOD token as it's not necessary for evaluating the meaning of the tree. However, proparse.dll is for more than just the meaning of the tree - it is also for relating the tree back to your original source code.)

This tree has a DISPLAY token (or node) as the head of a new branch, and has a quoted string and the period its only two children. The entire program would be difficult to draw out this way, so we'll use a better way. We'll use Lisp notation. In Lisp, this same tree structure would be represented as: (DISPLAY QSTRING PERIOD). Given this notation, we can look at the structure of the tree that proparse.dll builds:

(DO LEXCOLON
  (DISPLAY QSTRING PERIOD)
  (DISPLAY (PLUS 1 (MULTIPLY 2 3)) PERIOD)
  END PERIOD)

The beginning of each statement, function, or block starts a new branch in the tree. To make it possible to evaluate expressions like 1 + 2 * 3, each operator is made the head of a new branch.