SUPPER, a "modern" stropping regime for Algol 68 ************************************************ GNU68-2025-004 (draft) Copyright © 2025 Jose E. Marchesi. You can redistribute and/or modify this document under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. The following specification has been released under the auspices of the GNU Algol 68 Working Group, and has been scrutinized to ensure that a. it is strictly upwards-compatible with Algol 68, b. it is consistent with the philosophy and orthogonal framework of the language, and c. it fills a clearly discernible gap in the expressive power of that language. The source of this document can be found at . The informal description of this proposal introduces the proposed new language features, providing a rationale and usage examples. The formal definition of this proposal uses the existing formalism and conventions of the Standard Hardware Representation for Algol 68, and it is expressed as modifications to the Standard Hardware Representation. Finally, the implementation notes of this proposal describes a way in which the features added by this specification can be implemented. No implementer should feel committed to do things as described there; the same language facilities may well be implementable in other ways, more suitable to specific implementations. 1 Informal Description ********************** Representation languages and stropping ====================================== The Algol 68 algorithmic language establishes that certain source constructs, namely mode indications and operator indications, consist in a sequence of “bold letters” and “bold digits”, known as a “bold word”. In contrast, other constructs like identifiers, field selectors and labels are composed of regular or non-bold letters and digits, known as a “tag”. What is precisely a bold letter or digit, and how it differs from a non-bold letter or digit, is not specified by the Report. This is no negligence, but a conscious effort at abstracting the definition of the so-called “strict language” from its representation. This allows having several different representations of the same language. Some representations of Algol 68 are intended to be published in books, be it paper or electronic devices, and be consumed by persons. These are called “publication languages”. In publication languages bold letters and digits are typically represented by actual bold alphanumeric typographic marks, or sometimes underlined alphanumeric marks. Other representations of Algol 68 are intended to be both produced and consumed by computers. These are called “hardware languages”, and would very likely use some compact binary representation in which the distinction between bold and regular letters and digits becomes irrelevant. Finally, we have representations of Algol 68 that are intended to be primarily written by programmers and to be primarily processed by programs such as compilers, static analyzers or interpreters. These representations are called “programming languages”, and use some textual representation that is easy to read, edited and parsed, consisting in a stream of characters encoded in some character set. Unfortunately, distinguishing a bold alphabet in programming languages is not easy, because computer systems today do not yet provide readily usable and ergonomic bold or underline alphanumeric marks in text files, despite the existence of Unicode and very fancy and sophisticated editing environments. The lack of appropriate input methods surely plays a role on this pitiful state of affairs. Thus, the programming representation languages of Algol 68 should resort to a technique known as “stropping” in order to differentiate bold letters and digits from non-bold letters and digits. A particular set of rules specifying the representation of these characters is known as a “stropping regime”. There are three classical stropping regimes for Algol 68, which were standardized and specified long ago in the Standard Hardware Representation normative document. These are “POINT stropping”, “RES stropping” and “UPPER stropping”. The following sections review these existing stropping regimes in a cursory way, to then introduce a new stropping regime that is the subject of this specification. For more details on the standard stropping regimes the reader is referred to the Standard Hardware Representation. POINT stropping =============== POINT stropping is in a way the most fundamental of the three standard regimes. It was designed to work in installations with limited character sets that provided just one alphabet, usually printed in upper-case, a set of digits, and a very restricted set of other symbols. .PROC RECSEL OUTPUT RECORDS = .VOID: .BEGIN .BITS FLAGS := (INCLUDE DESCRIPTORS | REC F DESCRIPTOR | REC F NONE); .RECRSET RES = REC DB QUERY (DB, RECUTL TYPE, RECUTL QUICK, FLAGS); .RECWRITER WRITER := REC WRITER FILE NEW (STDOUT); SKIP COMMENTS .OF WRITER := .TRUE; .IF RECUTL PRINT SEXPS .THEN MODE .OF WRITER := REC WRITER SEXP .FI; REC WRITE (WRITER, RES) .END Figure 1.1: Example of code in POINT stropping In POINT stropping a bold word is represented by its constituent letters and digits preceded by a point character. For example, the symbol ‘bold begin symbol’ in the strict language, which is represented as *begin* in the reference language, would be represented as ‘.BEGIN’ in POINT stropping. More examples are summarized in the following table. Strict language Reference language POINT stropping --------------------------------------------------------------------------- ‘true symbol’ *true* ‘.TRUE’ ‘false symbol’ *false* ‘.FALSE’ ‘integral symbol’ *int* ‘.INT’ ‘completion symbol’ *exit* ‘.EXIT’ ‘bold-letter-c-...’ *crc32* ‘.CRC32’ In POINT stropping a tag is represented by writing its constituent non-bold letters and digits in order. But they are organized in several “taggles”. Each taggle is a sequence of one or more letters and digits, optionally followed by an underscore character. For example, the tag ‘PRINT’ is composed of a single taggle, but the tag ‘PRINT_TABLE’ is composed of a first taggle ‘PRINT_’ followed by a second taggle ‘TABLE’. To improve readability it is possible to insert zero or more white space characters between the taggles in a tag. Therefore, the tag ‘PRINT_TABLE’ could have been written ‘PRINT TABLE’, or even ‘PRINT_ TABLE’. This is the reason why Algol 68 identifiers, labels and field selectors can and do usually feature white spaces in them. It is important to note that both the trailing underscore characters in taggles and the white spaces in a tag do not contribute anything to the denoted tag: these are just stropping artifacts aimed to improve readability. Therefore ‘FOOBAR’ ‘FOO BAR’, ‘FOO_BAR’ and ‘FOO_BAR_’ are all representations of the same tag, that represents the ‘letter-f-letter-o-letter-o-letter-b-letter-a-letter-r’ language construct. See *note Figure 1.1: fig:ex1. for an example of an Algol 68 procedure encoded in POINT stropping. RES stropping ============= The early installations where Algol 68 ran not only featured a very restricted character set, but also suffered from limited storage and complex to use and time consuming input methods such as card punchers and readers. It was important for the representation of programs to be as compact as possible. The RES stropping regime was very likely introduced due to that reason. As its name implies, it reduces the number of bold words that require being stropped by introducing “reserved words”, which are the the bold words specified in the section 9.4.1 of the Report as a representation of certain symbols, such as *at*, *begin*, *if*, *int* and many others. PROC RECSEL OUTPUT RECORDS = VOID: BEGIN BITS FLAGS := (INCLUDE DESCRIPTORS | REC F DESCRIPTOR | REC F NONE); .RECRSET RES = REC DB QUERY (DB, RECUTL TYPE, RECUTL QUICK, FLAGS); .RECWRITER WRITER := REC WRITER FILE NEW (STDOUT); SKIP COMMENTS OF WRITER := TRUE; IF RECUTL PRINT SEXPS THEN MODE .OF WRITER := REC WRITER SEXP FI; REC WRITE (WRITER, RES) END Figure 1.2: Example of code in RES stropping RES stropping encodes bold words and tags like POINT stropping, but if a bold word is a reserved word then it can then be written without a preceding point, achieving this way a more compact, and easier to read, representation for programs. Introducing reserved words has the obvious disadvantage that some tags cannot be written the obvious way due to the possibility of conflicts. For example, to represent a tag ‘if’ it is not possible to just write ‘IF’, because it conflicts with a reserved word, but this can be overcome easily (if not very elegantly) by writing ‘IF_’ instead. See *note Figure 1.2: fig:ex2. for an example of an Algol 68 procedure encoded in RES stropping. Note how user-defined mode indications an operator indications still require explicit stropping. UPPER stropping =============== At some point computers added support for more than one alphabet by introducing character sets with both upper and lower case letters, along with convenient ways to both input and display these, namely a shift key and proper terminals. PROC recsel output records = VOID: BEGIN BITS flags := (include descriptors | rec f descriptor | rec f none); RECRSET res = rec db query (db, recutl type, recutl quick, flags); RECWRITER writer := rec writer file new (stdout); skip comments of writer := TRUE; IF recutl print sexps THEN mode OF writer := rec writer sexp FI; rec write (writer, res) END Figure 1.3: Example of code in UPPER stropping In UPPER stropping the letters in bold word are represented by upper-case letters, whereas the letters in tags are represented by lower-case letters. The notions of upper- and lower-case are obviously not applicable to digits, but since the language syntax assures that it is not possible to have a bold word that starts with a digit, digits are considered to be bold by convention if they follow a bold letter or another bold digit. See *note Figure 1.3: fig:ex3. for an example of an Algol 68 procedure encoded in UPPER stropping. Note how in this regime it is almost never necessary to introduce bold tags with points. All in all, it looks much more natural to contemporary readers. UPPER stropping is in fact the stropping regime of choice today. It is difficult to think of any reason why anyone would resort to use POINT or RES stropping nowadays. Bold taggles ============ In all three classical stropping regimes it is not possible to write white space characters between the constituent letters and digits of a bold word. It is very common, however, for user-defined mode indications and operator indications to contain several natural words, such as in ‘TREENODE’ or ‘RECWRITER’. This can be a little difficult to read. The GNU extension GNU68-2025-002, "Bold taggles in Algol 68", adds support in all the standard stropping regimes to use underscores in bold words. This is done by redefining bold words to be based on taggles, much like tags. PROC recsel output records = VOID: BEGIN BITS flags := (include descriptors | rec f descriptor | rec f none); REC_RSET res = rec db query (db, recutl type, recutl quick, flags); REC_WRITER writer := rec writer file new (stdout); skip comments of writer := TRUE; IF recutl print sexps THEN mode OF writer := rec writer sexp FI; rec write (writer, res) END Figure 1.4: Example of code in UPPER stropping with bold taggles With this extension, the above mode indications could have been written like ‘TREE_NODE’ an ‘REC_WRITER’, improving readability. See *note Figure 1.4: fig:ex4. for an example of an Algol 68 procedure encoded in UPPER stropping with bold taggles. SUPPER stropping ================ This proposal describes a new stropping regime that combines the advantages of the RES and the UPPER regimes. The resulting representation of programs aims to be both more appealing to contemporary programmers and also more convenient to be used in today's computing systems. On one hand, the RES stropping regime made it possible to avoid explicit stropping of a big subset of all the bold words, namely the ones pertaining to the fixed set of "reserved words". However, both user-defined mode indications and operator indications still had to be stropped explicitly, prefixing them with a dot character. On the other hand, the UPPER stropping regime, cleverly exploiting dual-alphabet installations, implemented the explicit stropping by having bold words encoded using upper-case letters and tags encoded using lower-case letters. Combining and adapting both approaches we can obtain a stropping regime in which explicit stropping is reduced to the minimum necessary, i.e. user-defined mode and operator indications, and in which explicit stropping is done in a way that looks more familiar to today's programmers and less heavy on upper-case letters than in UPPER stropping. proc recsel_output_records = void: begin bits flags := (include_descriptors | rec_f_descriptor | rec_f_none); RecRset res = rec_db_query (db, recutl_type, recutl_uick, flags); RecWriter writer := rec_writer_file_new (stdout); skip_comments of writer := true; if recutl_print_sexps then mode_ of writer := rec_writer_sexp fi; rec_write (writer, res) end Figure 1.5: Example of code in SUPPER stropping In the SUPPER stropping regime bold words are written by writing a sequence of one or more “taggles”. Each taggle is written by writing a letter followed by zero or more other letters and digits and is optionally followed by a trailing underscore character. The first letter in a bold word shall be an upper-case letter. The rest of the letters in the bold word may be either upper- or lower-case. For example, ‘RecRset’, ‘Rec_Rset’ and ‘RECRset’ are all different ways to represent the same mode indication. This allows to recreate popular naming conventions such as ‘CamelCase’. As in the other stropping regimes, the casing of the letters and the underscore characters are not really part of the mode or operator indication. Operator indications are also bold words and are written in exactly the same way than mode indications, but it is usually better to always use upper-case letters in operator indications. On one side, it looks better, especially in the case of dyadic operators where the asymmetry of, for example ‘Equal’ would look odd, consider ‘m1 Equal m2’ as opposed to ‘m1 EQUAL m2’. On the other side, tools like editors can make use of this convention in order to highlight operator indications differently than mode indications. In the SUPPER stropping regime tags are written by writing a sequence of one or more “taggles”. Each taggle is written by writing a letter followed by zero or more other letters and digits and is optionally followed by a trailing underscore character. All letters in a tag shall be lower-case letters. For example, the identifier ‘list’ is represented by a single taggle, and it is composed by the letters ‘l’, ‘i’, ‘s’ and ‘t’, in order. In the jargon of the strict language we would spell the tag as ‘letter-l-letter-i-letter-s-letter-t’. The label ‘found_zero’ is represented by two taggles, ‘found_’ and ‘zero’, and it is composed by the letters ‘f’, ‘o’, ‘u’, ‘n’, ‘d’, ‘z’, ‘e’, ‘r’ and ‘o’, in order. In the jargon of the strict language we would spell the tag as ‘letter-f-letter-o-letter-u-letter-n -letter-d-letter-z-letter-e-letter-r-letter-o’. The identifier ‘crc_32’ is likewise represented by two taggles, ‘crc_’ and ‘32’. Note how the second taggle contains only digits. In the jargon of the strict language we would spell the tag as ‘letter-c-letter-r-letter-c-digit-three-digit-two’. The underscore characters are not really part of the tag, but part of the stropping. For example, both ‘goto found_zero’ and ‘goto foundzero’ jump to the same label. See *note Figure 1.5: fig:ex5. for an example of an Algol 68 procedure encoded in SUPPER stropping. No white spaces in identifiers ============================== SUPPER is the only Algol 68 stropping regime that doesn't allow having typographical display features (spaces, tabs and newline characters) between the taggles conforming a tag. In other words, it is not allowed to have white spaces as part of identifiers. This is a shame, but there are two main reasons why it was decided to proceed like this. First and most importantly, the SUPPER stropping regime is based on reserved words, which are mapped from the representation specified for the symbols of the language that are represented by bold words in the reference language. The reference language of Algol 68 was designed before the Standard Hardware Representation introduced the RES stropping regime. Likely this is the reason why it specifies so many and so short symbols: the authors assumed that tags would always live in a different name space than bold words. With reserved words like ‘to’ and ‘in’, it becomes very difficult to separate identifier taggles with white spaces without bumping into conflicts. Consider for example the not at all unlikely procedure name ‘checked real to int’. This identifier would be not legal, because the three last taggles conflict with the the standard mode ‘real’, the syntactic bold word ‘to’ and the standard mode ‘int’, respectively. Such conflicts could be avoided by making the offending taggles to be adjacent to underscores, but then who would want to write identifiers like ‘checked real_ to_ int_’ or ‘checked real_to_int’? It is much simpler to just mandate for such tags to be written as ‘checked_real_to_int’. This avoids any possibility of conflict since all the taggles are adjacent to an underscore and therefore cannot collide with reserved words. Also, the risk of programs breaking in the future due to new reserved words getting added to the language gets dramatically reduced. The second reason is that the notion of allowing blanks to be freely interjected in user-defined identifiers is alien to most if not all programming languages widely used today. It is, unfortunately, an eccentricity, even if a beautiful one, that makes it difficult to leverage existing tools such as editors, code indentation engines, and basically any program that makes assumptions on the general form of programs. For example, convincing a programming editor or IDE that a single identifier may span for more than one logical line in the source file may prove quite challenging and frustrating. Conformance =========== This proposal conforms to the requirements specified in the appendix B.3 of the Standard Hardware Representation: B.3 Other Stropping Regimes. For compatibility with existing installation practice, implementations may implement stropping regimes in addition to those provided by the standard. However, such additional regimes should be invoked by pragmat-items distinct from those in *3.5. All modifications to the defined regimes -- including extensions -- should be avoided because they would inhibit error detection and decrease portability. 2 Formal Description ******************** This extension adds a new stropping regime to the Standard Hardware Representation, so a new pragmat-item shall be invented. 3.5 ... of which there are four. A new regime is invoked by a pragmat containing one of the pragmat-items POINT, UPPER, RES or SUPPER, and takes effect following the closing pragmat symbol. The new stropping regime is then described in its own section, as follows. 3.5.4 SUPPER Stropping Bold words. - A bold word is written as a sequence of {one or more} taggles whose worthy letters and digits correspond, in order, to the bold faced letters and digits in the word. - A bold word must start with an upper-case letter. Upper- and lower-case letters may be otherwise intermixed in a bold word. - Upper-case letters may be written only in bold words and character-glyphs. Tags. - A tag is written as a sequence of {one or more} taggles. The taggles cannot be separated by typographical display features. - A taggle is written by writing, in order, the corresponding worthy letters and digits optionally followed by an underscore. - A taggle must be adjacent to an underscore if its letters and digits correspond, in order, to those of a reserved word. - If a taggle does not end with an underscore, it must be followed by a disjunctor. {Examples: Program: PR SUPPER PR begin real x; x := x - 1 end Bold: begin, AMode, AmOdE, AMODE, Oper, OPER Plain: begin_, end_of_file, end_of_file_, xl, x_l, Error: Some_Mode, end of file, end____of__file, end_of_file__ } 3 Implementation Notes ********************** Stropping is to be implemented purely at the lexical level and it must not have any impact on the letters and digits that constitute bold words and tags. Therefore, in principle, of all the components of a compiler or interpreter only the lexical analyzer (or tokenizer) shall concern itself with the stropping regime.