Bold taggles in Algol 68 ************************ GNU68-2025-002 (draft) Copyright © 2025 Jose E. Marchesi. You can redistribute and/or modify this document under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. The following specification has been released under the auspices of the GNU Algol 68 Working Group, and has been scrutinized to ensure that a. it is strictly upwards-compatible with Algol 68, b. it is consistent with the philosophy and orthogonal framework of the language, and c. it fills a clearly discernible gap in the expressive power of that language. The source of this document can be found at . The informal description of this proposal introduces the proposed new language features, providing a rationale and usage examples. The formal definition of this proposal uses the existing formalism and conventions of the Standard Hardware Representation for Algol 68, and it is expressed as modifications to the Standard Hardware Representation. Finally, the implementation notes of this proposal describes a way in which the features added by this specification can be implemented. No implementer should feel committed to do things as described there; the same language facilities may well be implementable in other ways, more suitable to specific implementations. 1 Informal Description ********************** Bold words and tags =================== The Standard Hardware Representation for Algol 68 defines a “bold word” as the hardware representation of tokens whose representation in the reference language is written using bold letters and digits, such as for example ‘bold begin symbol’, ‘completion symbol’, and ‘bold-letter-f-letter-o-letter-o’. These sort of tokens are used in the strict language to denote syntactic marks, standard modes, user-defined mode indicators and both predefined and user-defined operator indicators. The Standard Hardware Representation for Algol 68 defines a “tag” as the hardware representation of tokens whose representation in the reference language is written using non-bold letters and digits, such as for example ‘letter-f-letter-o-letter-o’. These sort of tokens are used in the strict language to denote identifiers, be them predefined or defined by the user via an identity declaration for example. Stropping regimes ================= The way of writing bold words and tags using worthy characters depends on the particular stropping regime in use. The Standard Hardware Representation describes three standard stropping regimes that result in three different ways to write bold words. In POINT stropping bold words are written writing a point character (‘.’) followed by the worthy letters or digits corresponding to the bold-faced letters or digits in the word. For example, ‘bold begin symbol’ can be represented as ‘.begin’ or ‘.BEGIN’ or ‘.bEgIn’. In this stropping regime it is possible to represent bold-faced digits, like in ‘.algol68’. In the UPPER regime bold words are written like in POINT, but the leading point character can be omitted if the bold word is not immediately preceded by another bold word, and only upper-case letters are to be used in bold words. For example, ‘bold begin symbol’ is represented as ‘BEGIN’. In this stropping regime it is not possible to omit the leading point character when a bold word includes a bold-faced digit, as there is not such a thing as an upper-case digit. The RES regime handles bold words in exactly the same way than POINT, and only differs from it in its handling of tags. Readability of indicators ========================= Bold words are therefore composed by letters and digits, typed as bold letters and digits in the reference language. It is very common, however, for user-defined mode indicators to contain several natural words. Consider for example the following mode declaration: MODE TREENODE = STRUCT (TREENODEPAYLOAD data, REF TREENODE next); The mode indicators ‘TREENODE’ and ‘TREENODEPAYLOAD’ can be a little difficult to read, as the different natural words implicated in the indicators are not easily distinguishable at first sight. This problem can be alleviated by using the written form of bold words with a leading point character (which is theoretically supported in all standard stropping regimes) that supports mixing case. This leads to mode indicators such as ‘.TreeNodePayload’ which is much more readable. However, modern Algol 68 implementations tend to implement a slightly non-conforming UPPER stropping where writing bold words leaded by point characters is no longer supported (1). In these cases it is no longer possible to use mixing case. The above discussion also applies to operator indicators, even though these are seldom composed by more than one natural word, usually a verb. Bold taggles ============ This GNU extension allows to use underscore characters (‘_’) in bold words in all standard stropping regimes. This is done by specifying that a bold word is written as a sequence of “taggles” rather than as a sequence of worthy letters and digits. Each taggle itself is written as a sequence of one or more worthy letter and digits, optionally followed by a trailing underscore character. It is now possible to write the mode definition above using UPPER stropping as: MODE TREE_NODE = STRUCT (TREE_NODE_PAYLOAD data, REF TREE_NODE next); Note that from the definition of taggle it follows that a trailing underscore is allowed, like in ‘PRIVATE_MODE_’, but no prefixing underscores are allowed, and indicators such as ‘FOO__BAR’ and ‘FOO_BAR__’ are not legal. Tags are also defined in terms of taggles in all the standard stropping regimes. However, unlike in tags, no typographic display features can appear between the taggles in a written form of a bold word. This means that the mode indicator ‘TREENODEPAYLOAD’ cannot be written as ‘TREE NODE PAYLOAD’ in UPPER stropping. Comparison to similar extensions ================================ This GNU extension is not to be confused with the support provided by the Algol 68 Genie interpreter for using underscore characters in tags and bold words. There are two important and rather fundamental differences. First, the underscore characters in Algol 68 Genie's tags and bold words are integral part of the represented identifiers, mode indicators and operator indicators, and not just a convenient way to write more readable written forms of these. Therefore, in Algol 68 Genie ‘TREE_NODE’ denotes the mode indicator ‘TREE_NODE’, not ‘TREENODE’ as mandated by the Report. Second, Algol 68 Genie doesn't use the concept of taggle, and therefore a set of arbitrary rules has to be used for inserting underscore characters in both tags and bold words. Leading underscore characters are not allowed. Therefore ‘_code’ and ‘__TREE_NODE’ are not valid. Zero or more trailing underscore characters are allowed, but these are not part of the represented identifiers or indicators. Therefore ‘code__’ and ‘TREE_NODE__’ are both allowed and represent the same than ‘code’ and ‘TREE_NODE’ respectively. Any number of sequences of one or more underscore characters are allowed anywhere else in the tags or bold words. Therefore both ‘foo___bar’ and ‘TREE__PAYLOAD_FACTOR’ are both valid. In this case the underscore characters become integral part of the denoted identifier and mode indicator, respectively. ---------- Footnotes ---------- (1) This is the case of both Algol 68 Genie and the GCC Algol 68 front-end 2 Formal Description ******************** The description of the stropping of bold words in POINT stropping in the Standard Hardware Representation is changed from: 3.5.1 Bold words. - A bold word is written as a point (".") followed, in order, by the worthy letters or digits corresponding to the bold-faced letters or digits in the word. - A bold word must be followed by a disjunctor. to 3.5.1 Bold words. - A bold word is written as a point (".") followed by a sequence of {one or more} taggles whose worthy letters and digits correspond, in order, to the bold-faced letters and digits in the word. When the extension is enabled, the new rule for writing bold words applies to all the three standard stropping regimes described in the Standard Hardware Representation. 3 Implementation Notes ********************** Stropping is to be implemented purely at the lexical level and it must not have any impact on the letters and digits that constitute bold words and tags. Therefore, in principle, of all the components of a compiler or interpreter only the lexical analyzer (or tokenizer) shall concern itself with the stropping regime. A programming editor can highlight any sequence of two or more consecutive underscore characters as an error: these are not syntactically valid in any standard stropping regime, out of string denotations and pragments. Implementations are advised to always gate the usage of this extension through a command-line option. Appropriate ‘PORTCHECK’ diagnostics should be emitted whenever appropriate.