Token

Meaning

In most programming languages the tokens are the fundamental entities manipulated by the syntactic parser, and constitute the leaves of parse trees. A compiler component known as the lexical analyzer, or lexer, scans the source file and provides a flow of tokens. Typical tokens in these languages are strings, numerical literals and syntactic delimiters. For example:

                      foo (100)
                          |
              +-----------+-----------+
              |                       |
          identifier          +-------+------+
                              |       |      |
                              (      100     )

Where the terminal production would be the tokens identifier, (, 100 and ).

In Algol 68, on the other hand, the grammar extends all the way down to the individual letters, digits and symbols of a particular program: there is not a “lexical” specification separated from the “syntactic” specification. Therefore the individual digits of integral denotations, comments and its contents, string denotations and their contents etc, are all included in the grammar and are part of the parse tree. What is known as a token in other programming languages translates into the concept of symbol in Algol 68. The example above would be parsed in Algol 68 to something similar to:

                           foo (100)
                               |
             +-----------------+------------------+
             |                                    |
            tag                  +----------------+----------------+
             |                   |                |                |
    +--------+-------+       open-symbol      denotation      close-symbol
    |        |       |                            |
f-symbol o-symbol o-symbol           +------------+------------+
                                     |            |            |
                                  1-symbol     0-symbol     0-symbol

Where the terminal production would be the symbols f-symbol, o-symbol, o-symbol, open-symbol, 1-symbol, 0-symbol, 0-symbol, close-symbol.

In conventional languages comments are considered a purely lexical artifact, meaning they get not tokenized, but simply skipper over and ignored by the lexer. Comments therefore never appear as tokens in the syntax of these programming languages, and can appear virtually anywhere in the program source without impacting the resulting parse tree.

On the other hand, Algol 68 accommodates comments (and the very similar pragmats) by defining a token as a syntactic construct composed by an optional comment (or pragmat) followed by a symbol. Note that this doesn’t mean any symbol can be preceded by a comment or a pragmat. Comments and pragments can therefore appear anywhere the grammar generates a sequence of symbols via a sequence of tokens, but not where the grammar generates a sequence of symbols directly, such as in string denotations or inside other comments and pragments.

Syntax

The tokens are realized in the syntax by the following meta-production rule in [RR 9.1.1]:

f) NOTION token :
     pragment sequence option, NOTION symbol.
g) *token : NOTION token.
h) *symbol : NOTION symbol.

See Also