Stropping (syntax): Difference between revisions
m Disambiguating links to Bold (link changed to Emphasis (typography)) using DisamAssist. |
mention [Digraphs and trigraphs], BASIC. +link. checked some links. |
||
Line 1: | Line 1: | ||
{{Use dmy dates|date=May 2019|cs1-dates=y}} |
{{Use dmy dates|date=May 2019|cs1-dates=y}} |
||
In [[computer language]] design, '''stropping''' is a method of explicitly marking letter sequences as having a special property, such as being a [[keyword (computing)|keyword]], or a certain type of variable or storage location, and thus inhabiting a different namespace from ordinary names ("identifiers"), in order to avoid clashes. Stropping is not used in most modern languages – instead, keywords are [[reserved word]]s and cannot be used as identifiers. Stropping allows the same letter sequence to be used both as a keyword and as an [[ |
In [[computer language]] design, '''stropping''' is a method of explicitly marking letter sequences as having a special property, such as being a [[keyword (computing)|keyword]], or a certain type of variable or storage location, and thus inhabiting a different namespace from ordinary names ("identifiers"), in order to avoid clashes. Stropping is not used in most modern languages – instead, keywords are [[reserved word]]s and cannot be used as identifiers. Stropping allows the same letter sequence to be used both as a keyword and as an [[identifier#In computer languages|identifier]], and simplifies [[parsing#Computer languages|parsing]] in that case – for example allowing a variable named <code>if</code> without clashing with the keyword '''if'''. |
||
Stropping is primarily associated with [[ALGOL]] and related languages in the 1960s. Though it finds some [[#Modern use|modern use]], it is easily confused with other [[#Similar techniques|similar techniques]] that are superficially similar. |
Stropping is primarily associated with [[ALGOL]] and related languages in the 1960s. Though it finds some [[#Modern use|modern use]], it is easily confused with other [[#Similar techniques|similar techniques]] that are superficially similar. |
||
Line 9: | Line 9: | ||
==Syntaxes== |
==Syntaxes== |
||
A range of different syntaxes for stropping have been used: |
A range of different syntaxes for stropping have been used: |
||
* [[ |
* [[ALGOL 60]] commonly used only the convention of single quotes around the word, generally as apostrophes, whence the name "stropping" (e.g. <code>'BEGIN'</code>). |
||
* [[ |
* [[ALGOL 68]]<ref name="R3"/><ref name="Wijngaarden_1976"/> in some implementations treat letter sequences prefixed by a single quote, <nowiki>'</nowiki>, as being keywords (e.g., <code>'BEGIN</code>)<ref name="Lindsey_1977"/> |
||
In fact it was often the case that several stropping conventions might be in use within one language. For example, in [[ALGOL 68]], the choice of stropping convention can be specified by a compiler [[ |
In fact it was often the case that several stropping conventions might be in use within one language. For example, in [[ALGOL 68]], the choice of stropping convention can be specified by a compiler [[directive (programming)|directive]] (in ALGOL terminology, a "[[ALGOL 68#pr .26 co: Pragmats and Comments|pragmat]]"), namely POINT, UPPER, QUOTE, or RES: |
||
* POINT for 6-bit (not enough characters for lowercase), as in <code>.FOR</code> – a similar convention is used in FORTRAN 77, where LOGICAL keywords are stropped as <code>.EQ.</code> etc. (see below) |
* POINT for 6-bit (not enough characters for lowercase), as in <code>.FOR</code> – a similar convention is used in FORTRAN 77, where LOGICAL keywords are stropped as <code>.EQ.</code> etc. (see below) |
||
* UPPER for 7-bit, as in <code>FOR</code> – with lowercase used for ordinary identifiers |
* UPPER for 7-bit, as in <code>FOR</code> – with lowercase used for ordinary identifiers |
||
Line 22: | Line 22: | ||
Other examples: |
Other examples: |
||
* [[Atlas Autocode]] had the choice of three: keywords could be <code><u>underlined</u></code> using backspace and overstrike on a [[Friden Flexowriter|Flexowriter]] keyboard, they could be introduced by a <code>%percent %symbol</code>, or they could be typed in <code>UPPER CASE</code> with no delimiting character ("uppercasedelimiters" mode, in which case all variables had to be in lower case). |
* [[Atlas Autocode]] had the choice of three: keywords could be <code><u>underlined</u></code> using backspace and overstrike on a [[Friden Flexowriter|Flexowriter]] keyboard, they could be introduced by a <code>%percent %symbol</code>, or they could be typed in <code>UPPER CASE</code> with no delimiting character ("uppercasedelimiters" mode, in which case all variables had to be in lower case). |
||
* [[ |
* [[ALGOL 60]] on the [[Elliott 803]] and [[Elliott 503]] computers used underlining. The Flexowriters (producing punched paper tape) had a non-movement key (underline _) so that typing _b_e_g_i_n produced <u>begin</u> which was very readable. The vertical bar | was also a non-movement key so that typing |= produced a good approximation to ≠. |
||
* The Kidsgrove compiler for [[ |
* The Kidsgrove compiler for [[ALGOL 60]] on the [[English Electric KDF9]] appears to have used at least two other stropping conventions in addition to quotation marks: [http://sw-pres.computerconservationsociety.org/KDF9/kalgol/DavidHo/A0.a60 exclamation marks] and [https://www.gtoal.com/languages/algol60/KDF9/soap.a60 percent characters]. |
||
* [[ALGOL 68RS]] programs are allowed the use of several stropping variants, even within the one language processor. |
* [[ALGOL 68RS]] programs are allowed the use of several stropping variants, even within the one language processor. |
||
* [[Edinburgh IMP]] inherited the Atlas Autocode <code>%percent %symbol</code> prefix convention but not its other stropping options |
* [[Edinburgh IMP]] inherited the Atlas Autocode <code>%percent %symbol</code> prefix convention but not its other stropping options |
||
==Examples of different ALGOL 68 styles== |
==Examples of different ALGOL 68 styles== |
||
Note the leading '''pr''' (abbreviation of '''pragmat''') [[ |
Note the leading '''pr''' (abbreviation of '''pragmat''') [[directive (programming)|directive]], which is itself stropped in POINT or quote style, and the {{code|¢}} for comment (from "{{code|2¢}}") – see [[ALGOL 68#pr .26 co: Pragmats and Comments|ALGOL 68: pr & co: Pragmats and Comments]] for details. |
||
{| class="wikitable" style="font-size:90%" |
{| class="wikitable" style="font-size:90%" |
||
|- style="vertical-align:top" |
|- style="vertical-align:top" |
||
! style="font-weight:normal" | Algol68 "strict"<br |
! style="font-weight:normal" | Algol68 "strict"<br>as typically published |
||
! style="font-weight:normal" | Quote stropping<br |
! style="font-weight:normal" | Quote stropping<br>(like [[lightweight markup language#Text/font-face formatting|wikitext]]) |
||
! style="font-weight:normal" | For a [[ |
! style="font-weight:normal" | For a [[list of binary codes#Seven-bit binary codes|7-bit]] character<br>code compiler |
||
! style="font-weight:normal" | For a [[ |
! style="font-weight:normal" | For a [[six-bit character code|6-bit]] character<br>code compiler |
||
! style="font-weight:normal" | Algol68 using res stropping<br |
! style="font-weight:normal" | Algol68 using res stropping<br>(reserved word) |
||
|- style="vertical-align:top" |
|- style="vertical-align:top" |
||
|{{pre|1= |
|{{pre|1= |
||
''¢ underline or'' |
''¢ underline or'' |
||
''bold typeface ¢'' |
''bold typeface ¢'' |
||
'''mode''' '''xint''' = '''int'''; |
'''mode''' '''xint''' = '''int'''; |
||
Line 95: | Line 95: | ||
<kbd>.AND.</kbd>, <kbd>.OR.</kbd> and <kbd>.XOR.</kbd> are also used in combined tests in <code>IF</code> and <code>IFF</code> statements in [[batch file]]s run under [[JP Software]]'s command line processors like [[4DOS]],<ref name="4DOS_8.00_HELP"/> [[4OS2]], and [[Take Command Console|4NT / Take Command]]. |
<kbd>.AND.</kbd>, <kbd>.OR.</kbd> and <kbd>.XOR.</kbd> are also used in combined tests in <code>IF</code> and <code>IFF</code> statements in [[batch file]]s run under [[JP Software]]'s command line processors like [[4DOS]],<ref name="4DOS_8.00_HELP"/> [[4OS2]], and [[Take Command Console|4NT / Take Command]]. |
||
==Modern use== |
==Modern use== |
||
===To indicate identifiers=== |
===To indicate identifiers=== |
||
Most modern computer languages do not use stropping. However, some languages support optional stropping to specify identifiers that would otherwise collide with [[reserved word]]s or which contain non-alphanumeric characters. |
Most modern computer languages do not use stropping. However, some languages support optional stropping to specify identifiers that would otherwise collide with [[reserved word]]s or which contain non-alphanumeric characters. |
||
Line 104: | Line 106: | ||
A second major example is in many implementations of [[SQL|Structured Query Language]]. In those languages reserved words can be used as column, table, or variable names by lexically delimiting them. The standard specifies enclosing reserved words in double quotes, but in practice the exact mechanism varies by implementation; [[MySQL]], for example, allows reserved words to be used in other contexts by enclosing them in backticks, and [[Microsoft SQL Server]] uses square brackets. |
A second major example is in many implementations of [[SQL|Structured Query Language]]. In those languages reserved words can be used as column, table, or variable names by lexically delimiting them. The standard specifies enclosing reserved words in double quotes, but in practice the exact mechanism varies by implementation; [[MySQL]], for example, allows reserved words to be used in other contexts by enclosing them in backticks, and [[Microsoft SQL Server]] uses square brackets. |
||
In several languages, including [[Nim (programming language)|Nim]], [[R (programming language)|R]],<ref name="R Documentation">{{ |
In several languages, including [[Nim (programming language)|Nim]], [[R (programming language)|R]],<ref name="R Documentation">{{citation |author=R Core Team |title=Quotes: Quotes |publisher=R Foundation for Statistical Computing |postscript=. |url=/proxy/https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/Quotes}}</ref> and [[Scala (programming language)|Scala]],<ref>{{citation |last=Odersky |first=Martin |title=The Scala Language Specification Version 2.9 |date=2011-05-24}}</ref> a reserved word or non-alphanumeric name can be used as an identifier by enclosing it in [[backtick]]s. |
||
There are other, more minor examples. For example, [[Web IDL]] uses a leading underscore <code>_</code> to strop identifiers that otherwise collide with reserved words: the value of the identifier strips this leading underscore, making this stropping, rather than a naming convention.<ref name="W3"/> |
There are other, more minor examples. For example, [[Web IDL]] uses a leading underscore <code>_</code> to strop identifiers that otherwise collide with reserved words: the value of the identifier strips this leading underscore, making this stropping, rather than a naming convention.<ref name="W3"/> |
||
===Other purposes=== |
===Other purposes=== |
||
In [[Haskell]], surrounding a function name by backticks causes it to be parsed as an infix operator. |
In [[Haskell]], surrounding a function name by backticks causes it to be parsed as an [[infix operator]]. |
||
=={{Not a typo|Unstropping}} by the compiler== |
=={{Not a typo|Unstropping}} by the compiler== |
||
{{Further|Compiler |
{{Further|Compiler front end}} |
||
In a [[compiler |
In a [[compiler front end]], {{not a typo|unstropping}} originally occurred during an initial [[line reconstruction]] phase, which also eliminated whitespace. This was then followed by [[scannerless parsing]] (no tokenization); this was standard in the 1960s, notably for ALGOL. In modern use, {{Not a typo|unstropping}} is generally done as part of [[lexical analysis]]. This is clear if one distinguishes the lexer into two phases of scanner and evaluator: the scanner categorizes the stropped sequence into the correct category, and then the evaluator {{Not a typo|unstrops}} when calculating the value. For example, in a language where an initial underscore is used to strop identifiers to avoid collisions with reserved words, the sequence <code>_if</code> would be categorized as an identifier (not as the reserved word <code>if</code>) by the scanner, and then the evaluator would give this the value <code>if</code>, yielding <code>(Identifier, if)</code> as the token type and value. |
||
==Similar techniques== |
==Similar techniques== |
||
A number of similar techniques exist, generally prefixing or suffixing an identifier to indicate different treatment, but the semantics are varied. Strictly speaking, stropping consists of different representations of the same name (value) in different namespaces, and occurs at the tokenization stage. For example, in ALGOL 60 with matched apostrophe stropping, <code>'if'</code> is tokenized as (Keyword, if), while <code>if</code> is tokenized as (Identifier, if) – same value in different token classes. |
A number of similar techniques exist, generally prefixing or suffixing an identifier to indicate different treatment, but the semantics are varied. Strictly speaking, stropping consists of different representations of the same name (value) in different namespaces, and occurs at the tokenization stage. For example, in ALGOL 60 with matched apostrophe stropping, <code>'if'</code> is tokenized as (Keyword, if), while <code>if</code> is tokenized as (Identifier, if) – same value in different token classes. |
||
Using uppercase for keywords remains in use as a convention for writing grammars for lexing and parsing – tokenizing the reserved word <code>if</code> as the token class IF, and then representing an if-then-else clause by the phrase <code>IF Expression THEN Statement ELSE Statement</code> where uppercase terms are keywords and capitalized terms are [[nonterminal symbol]]s in a [[ |
Using uppercase for keywords remains in use as a convention for writing grammars for lexing and parsing – tokenizing the reserved word <code>if</code> as the token class IF, and then representing an if-then-else clause by the phrase <code>IF Expression THEN Statement ELSE Statement</code> where uppercase terms are keywords and capitalized terms are [[nonterminal symbol]]s in a [[production (computer science)|production rule]] ([[terminal symbol]]s are denoted by lowercase terms, such as <code>identifier</code> or <code>integer</code>, for an [[integer literal]]). |
||
===Naming conventions=== |
===Naming conventions=== |
||
{{ |
{{Main|Naming convention (programming)}} |
||
Most loosely, one may use [[ |
Most loosely, one may use [[naming convention (programming)|naming conventions]] to avoid clashes, commonly prefixing or suffixing with an underscore, as in <code>if_</code> or <code>_then</code>. A leading underscore is often used to indicate private members in object-oriented programming. |
||
These names may be interpreted by the compiler and have some effect, though this is generally done at the semantic analysis phase, not the tokenization phase. For example, in Python, a single leading underscore is a weak private indicator, and affects which identifiers are imported on module import, while a double leading underscore (and no more than one trailing underscore) on a class attribute invokes [[name mangling]].<ref name="PEP"/> |
These names may be interpreted by the compiler and have some effect, though this is generally done at the semantic analysis phase, not the tokenization phase. For example, in Python, a single leading underscore is a weak private indicator, and affects which identifiers are imported on module import, while a double leading underscore (and no more than one trailing underscore) on a class attribute invokes [[name mangling]].<ref name="PEP"/> |
||
===Reserved words=== |
===Reserved words=== |
||
{{ |
{{Main|Reserved word}} |
||
While modern languages generally use reserved words rather than stropping to distinguish keywords from identifiers – e.g., making <code>if</code> reserved – they also frequently reserve a syntactic class of identifiers as keywords, yielding representations which can be interpreted as a stropping regime, but instead have the semantics of reserved words. |
While modern languages generally use reserved words rather than stropping to distinguish keywords from identifiers – e.g., making <code>if</code> reserved – they also frequently reserve a syntactic class of identifiers as keywords, yielding representations which can be interpreted as a stropping regime, but instead have the semantics of reserved words. |
||
Line 133: | Line 135: | ||
===Name mangling=== |
===Name mangling=== |
||
{{ |
{{Main|Name mangling}} |
||
[[Name mangling]] also addresses name clashes by renaming identifiers, but does this much later in compilation, during semantic analysis, not during tokenization. This consists of creating names that include scope and type information, primarily for use by linkers, both to avoid clashes and to include necessary semantic information in the name itself. In these cases the original identifiers may be identical, but the context is different, as in the functions <code>foo(int x)</code> versus <code>foo(char x)</code>, in both cases having the same identifier <code>foo</code>, but different signature. These names might be mangled to <code>foo_i</code> and <code>foo_c</code>, for instance, to include the type information. |
[[Name mangling]] also addresses name clashes by renaming identifiers, but does this much later in compilation, during semantic analysis, not during tokenization. This consists of creating names that include scope and type information, primarily for use by linkers, both to avoid clashes and to include necessary semantic information in the name itself. In these cases the original identifiers may be identical, but the context is different, as in the functions <code>foo(int x)</code> versus <code>foo(char x)</code>, in both cases having the same identifier <code>foo</code>, but different signature. These names might be mangled to <code>foo_i</code> and <code>foo_c</code>, for instance, to include the type information. |
||
===Sigils=== |
===Sigils=== |
||
{{ |
{{Main|Sigil (computer programming)}} |
||
A syntactically similar but semantically different phenomenon are [[ |
A syntactically similar but semantically different phenomenon are [[sigil (computer programming)|sigils]], which instead indicate properties of variables. These are common in [[BASIC]], [[Perl]], [[Ruby (programming language)|Ruby]], and various other languages to identify characteristics of variables/constants: BASIC and Perl to designate the type of variable, Ruby both to distinguish variables from constants and to indicate scope. Note that this affects the ''semantics'' of the variable, not the ''syntax'' of whether it is an identifier or keyword. |
||
==Parallels in human language== |
==Parallels in human language== |
||
Stropping is used in computer programming languages to make the [[compiler]]'s (or more strictly, the [[parser]]'s) job easier, i.e. within the capability of the relatively small and slow computers available in early days of computing in the 20th century. However, similar techniques have been commonly used to aid reading comprehension for people too. |
Stropping is used in computer programming languages to make the [[compiler]]'s (or more strictly, the [[parser]]'s) job easier, i.e. within the capability of the relatively small and slow computers available in early days of computing in the 20th century. However, similar techniques have been commonly used to aid reading comprehension for people too. Some examples are: |
||
* Placing important words in '''[[ |
* Placing important words in '''[[emphasis (typography)|bold]]''',<ref name="Twyman">{{cite journal |last1=Twyman |first1=Michael |title=The Bold Idea: The Use of Bold-looking Types in the Nineteenth Century |journal=Journal of the Printing Historical Society |volume=22 |issue=107–143}}</ref> such as the very first mention of '''stropping''' at the head of this page, because defining stropping is the very purpose of the page. |
||
* Formatting new words in ''[[italic type]]''<ref>{{ |
* Formatting new words in ''[[italic type]]''<ref>{{citation |title=Eats, Shoots & Leaves: The Zero Tolerance Approach to Punctuation |last=Truss |first=Lynne |year=2004 |publisher=Gotham Books |location=New York |isbn=978-1-59240-087-4 |page=146}}</ref> when they are first introduced in text. This is commonly used in [[science fiction]] and [[fantasy]] when introducing invented plants, foods, creatures; in [[travel literature|travelogue]] and historical writing when describing unfamiliar foreign words; and so on. Also using a special font, possibly associated with the language in question, for example using a [[blackletter|Gothic]]<ref>{{cite web |title=Styles of Handwriting |website=Rigsarkivet |publisher=The Danish National Archives |url=/proxy/https://www.sa.dk/en/genealogy/handwriting |access-date=March 26, 2017}}</ref> font for [[German language|German]] words. |
||
* Using a different language, typically [[Latin]] or [[Greek language|Greek]] to signify technical terms. This is similar to using reserved words, but it is usually combined with italic text to aid readability. |
* Using a different language, typically [[Latin]] or [[Greek language|Greek]] to signify technical terms. This is similar to using reserved words, but it is usually combined with italic text to aid readability. For example: |
||
** the typical [[binomial nomenclature]]<ref name="howto">{{ |
** the typical [[binomial nomenclature]]<ref name="howto">{{citation |title=How to Write Scientific Names of Organisms |journal=Competition Science Vision |postscript=. |url=/proxy/http://www.journal.au.edu/au_techno/2001/oct2001/howto.pdf |access-date=20 June 2011}}</ref> or "Latin names" of plants and animals helps the reader to see that "''Erithacus rubecula''" is the special technical name of the [[European robin|Erithacus rubecula]], in a way that "Red-breasted European thrush" does not. |
||
** many [[Law|legal]] terms where a short Latin phrase refers to a large body of law and precedent, such as ''[[habeas corpus]]'', ''[[sub judice]]'', ''[[in loco parentis]]''.<ref>{{Google books|id=ePlWAAAAcAAJ|title=A Selection of Legal Maxims, classified and illustrated|text=|plainurl=}}</ref> |
** many [[Law|legal]] terms where a short Latin phrase refers to a large body of law and precedent, such as ''[[habeas corpus]]'', ''[[sub judice]]'', ''[[in loco parentis]]''.<ref>{{Google books|id=ePlWAAAAcAAJ|title=A Selection of Legal Maxims, classified and illustrated|text=|plainurl=}}</ref> |
||
** logic and mathematical terms such as ''[[Q.E.D.|QED]]'', ''[[A priori and a posteriori|a priori]]'', ''[[vice versa]]'' |
** logic and mathematical terms such as ''[[Q.E.D.|QED]]'', ''[[A priori and a posteriori|a priori]]'', ''[[vice versa]]''... |
||
* In written [[Japanese writing system|Japanese]], in addition to [[Kanji]] characters, the two distinct alphabets (more strictly, [[ |
* In written [[Japanese writing system|Japanese]], in addition to [[Kanji]] characters, the two distinct alphabets (more strictly, [[syllabary|syllabaries]]) [[Hiragana]]<ref>[http://daijirin.dual-d.net/extra/hiragana.html Dual 大辞林]<br>「平」とは平凡な、やさしいという意で、当時普通に使用する文字体系であったことを意味する。 漢字は書簡文や重要な文章などを書く場合に用いる公的な文字であるのに対して、 平仮名は漢字の知識に乏しい人々などが用いる私的な性格のものであった。<br>Translation: 平 [the "hira" part of "hiragana"] means "ordinary" or "simple" since at that time [the time that the name was given] it was a writing system for everyday use. While kanji was the official system used for letter-writing and important texts, hiragana was for personal use by people who had limited knowledge of kanji.</ref><ref>{{cite web |title=Japanese calligraphy |website=Encyclopedia Britannica |language=en |url=https://www.britannica.com/art/Japanese-calligraphy#ref1049371 |access-date=2017-06-22}}</ref> and [[Katakana]],<ref>{{cite web |title=Hiragana, Katakana & Kanji |date=8 September 2010 |publisher=Japanese Word Characters |url=/proxy/https://www.japanesewordswriting.com/ |access-date=15 October 2011}}</ref> both representing the same set of sounds, are used to distinguish phonetically spelled-out Japanese words from imported foreign words, respectively; Katakana is also used for emphasis, much like ''italics'' in English. |
||
==See also== |
==See also== |
||
* [[Digraphs and trigraphs]] |
|||
* [[Escape character]] |
* [[Escape character]] |
||
Line 162: | Line 165: | ||
==References== |
==References== |
||
{{Reflist|refs= |
{{Reflist|refs= |
||
<ref name="King_1974">{{cite journal |title=(unknown) |journal=Proceedings of an International Conference on ALGOL 68 Implementation |location=Department of Computer Science, University of Manitoba, Winnipeg |date=1974-06-18<!-- /20 --> |editor- |
<ref name="King_1974">{{cite journal |title=(unknown) |journal=Proceedings of an International Conference on ALGOL 68 Implementation |location=Department of Computer Science, University of Manitoba, Winnipeg |date=1974-06-18<!-- /20 --> |editor-last=King |editor-first=Peter R. |publisher=University of Manitoba, Department of Computer Science |page=148 |isbn=9780919628113 |url=/proxy/https://books.google.com/books?id=rGoZAQAAIAAJ |quote=More serious problems are posed by "stropping", the technique used to distinguish boldface text from roman text. Some implementations demand apostrophes around boldface (whence the name stropping); others require backspacing and underlining; [...]}}</ref> |
||
<ref name="R3">http://www.fh-jena.de/~kleine/history/languages/Algol68-RR-HardwareRepresentation.pdf {{Dead link|date=February 2022}}</ref> |
<ref name="R3">http://www.fh-jena.de/~kleine/history/languages/Algol68-RR-HardwareRepresentation.pdf {{Dead link|date=February 2022}}</ref> |
||
<ref name="Wijngaarden_1976">{{cite book |editor- |
<ref name="Wijngaarden_1976">{{cite book |editor-last1=van Wijngaarden |editor-first1=Adriaan |editor-link1=Adriaan van Wijngaarden |editor-last2=Mailloux |editor-first2=Barry James |editor-link2=Barry James Mailloux |editor-last3=Peck |editor-first3=John Edward Lancelot |editor-link3=John Edward Lancelot Peck |editor-last4=Koster |editor-first4=Cornelis Hermanus Antonius |editor-link4=Cornelis Hermanus Antonius Koster |editor-last5=Sintzoff |editor-first5=Michel |editor-link5=:fr:Michel Sintzoff |editor-last6=Lindsey |editor-first6=Charles Hodgson |editor-link6=Charles Hodgson Lindsey |editor-last7=Meertens |editor-first7=Lambert Guillaume Louis Théodore |editor-link7=Lambert Guillaume Louis Théodore Meertens |editor-last8=Fisker |editor-first8=Richard G. |title=Revised Report on the Algorithmic Language ALGOL 68 |chapter=Section 9.3 Representations |publisher=[[Springer-Verlag]] |date=1976 |isbn=978-0-387-07592-1 |oclc=1991170 |pages=94, 123 |chapter-url=/proxy/http://web.eah-jena.de/~kleine/history/languages/algol68-revisedreport.pdf |access-date=2019-05-11 |url-status=live |archive-url=/proxy/https://web.archive.org/web/20190419223929/http://web.eah-jena.de/~kleine/history/languages/algol68-revisedreport.pdf |archive-date=2019-04-19}}</ref> |
||
<ref name="Lindsey_1977">{{cite book |title=Informal Introduction to ALGOL 68 | |
<ref name="Lindsey_1977">{{cite book |title=Informal Introduction to ALGOL 68 |last1=Lindsey |first1=Charles Hodgson |author-link1=Charles Hodgson Lindsey |last2=van der Meulen |first2=Sietse G. |publisher=North-Holland |date=1977 |isbn=978-0-7204-0726-6 |oclc=230034877 |pages=348–349}}</ref> |
||
<ref name="Fortran77">{{cite web |url=/proxy/http://www.personal.psu.edu/jhm/f90/lectures/10.html |
<ref name="Fortran77">{{cite web |title=Logical Structures |url=/proxy/http://www.personal.psu.edu/jhm/f90/lectures/10.html}}</ref> |
||
<ref name="4DOS_8.00_HELP">{{cite book |title=4DOS 8.00 online help |title-link=4DOS 8.00 | |
<ref name="4DOS_8.00_HELP">{{cite book |title=4DOS 8.00 online help |title-link=4DOS 8.00 |last1=Brothers |first1=Hardin |last2=Rawson |first2=Tom |author-link2=Tom Rawson |last3=Conn |first3=Rex C. |author-link3=Rex C. Conn |last4=Paul |first4=Matthias R. |last5=Dye |first5=Charles E. |last6=Georgiev |first6=Luchezar I. |date=2002-02-27}}</ref> |
||
<ref name="W3">''[ |
<ref name="W3">''[https://webidl.spec.whatwg.org/ Web IDL]'', "[https://webidl.spec.whatwg.org/#idl-names 3.1. Names]". [...] For all of these constructs, the identifier is the value of the identifier token with any single leading U+005F LOW LINE ("_") character (underscore) removed. [...] Note [...] A leading "_" is used to escape an identifier from looking like a reserved word so that, for example, an interface named “interface” can be defined. The leading "_" is dropped to unescape the identifier. [...]</ref> |
||
<ref name="PEP">[https:// |
<ref name="PEP">[https://peps.python.org/pep-0008/ PEP 008]: [https://www.python.org/dev/peps/pep-0008/#descriptive-naming-styles Descriptive: Naming Styles]</ref> |
||
<ref name="C99">[[C99]] standard, 7.1.3 Reserved identifiers</ref> |
<ref name="C99">[[C99]] standard, 7.1.3 Reserved identifiers</ref> |
||
}} |
}} |
||
==Further reading== |
==Further reading== |
||
* {{cite journal | |
* {{cite journal |last1=Hansen |first1=W. J. |last2=Boom |first2=H. J. |title=Report on the Standard Hardware Representation for Revised ALGOL 68 |journal=[[Acta Informatica]] |volume=9 |issue=2 |pages=105–119 |date=1978 |s2cid=34231916 |doi=10.1007/BF00289072}} |
||
* {{citation | |
* {{citation |last=Lindsey |first=Charles Hodgson |author-link=Charles Hodgson Lindsey |title=An ISO-Code Representation for ALGOL 68 |journal=[[ALGOL Bulletin]] |id=AB31.3.6 |publisher=ACM |issue=31 |date=March 1970 |pages=37–60 |url=https://dl.acm.org/doi/10.5555/1061500.1061509}} |
||
[[Category:Parsing]] |
[[Category:Parsing]] |
Revision as of 00:44, 10 January 2024
In computer language design, stropping is a method of explicitly marking letter sequences as having a special property, such as being a keyword, or a certain type of variable or storage location, and thus inhabiting a different namespace from ordinary names ("identifiers"), in order to avoid clashes. Stropping is not used in most modern languages – instead, keywords are reserved words and cannot be used as identifiers. Stropping allows the same letter sequence to be used both as a keyword and as an identifier, and simplifies parsing in that case – for example allowing a variable named if
without clashing with the keyword if.
Stropping is primarily associated with ALGOL and related languages in the 1960s. Though it finds some modern use, it is easily confused with other similar techniques that are superficially similar.
History
The method of stropping and the term "stropping" arose in the development of ALGOL in the 1960s, where it was used to represent typographical distinctions (boldface and underline) found in the publication language which could not directly be represented in the hardware language – a typewriter could have bold characters, but in encoding in punch cards, there were no bold characters. The term "stropping" arose in ALGOL 60, from "apostrophe", as some implementations of ALGOL 60 used apostrophes around text to indicate boldface,[1] such as 'if'
to represent the keyword if. Stropping is also important in ALGOL 68, where multiple methods of stropping, known as "stropping regimes", are used; the original matched apostrophes from ALGOL 60 was not widely used, with a leading period or uppercase being more common,[2] as in .IF
or IF
and the term "stropping" was applied to all of these.
Syntaxes
A range of different syntaxes for stropping have been used:
- ALGOL 60 commonly used only the convention of single quotes around the word, generally as apostrophes, whence the name "stropping" (e.g.
'BEGIN'
). - ALGOL 68[3][2] in some implementations treat letter sequences prefixed by a single quote, ', as being keywords (e.g.,
'BEGIN
)[4]
In fact it was often the case that several stropping conventions might be in use within one language. For example, in ALGOL 68, the choice of stropping convention can be specified by a compiler directive (in ALGOL terminology, a "pragmat"), namely POINT, UPPER, QUOTE, or RES:
- POINT for 6-bit (not enough characters for lowercase), as in
.FOR
– a similar convention is used in FORTRAN 77, where LOGICAL keywords are stropped as.EQ.
etc. (see below) - UPPER for 7-bit, as in
FOR
– with lowercase used for ordinary identifiers - QUOTE as in ALGOL 60, as in
'for'
- RES reserved words, as used in modern languages –
for
is reserved and not available to ordinary identifiers
The various rules regimes are a lexical specification for stropped characters, though in some cases these have simple interpretations: in the single apostrophe and dot regimes, the first character is functioning as an escape character, while in the matched apostrophes regime the apostrophes are functioning as delimiters, as in string literals.
Other examples:
- Atlas Autocode had the choice of three: keywords could be
underlined
using backspace and overstrike on a Flexowriter keyboard, they could be introduced by a%percent %symbol
, or they could be typed inUPPER CASE
with no delimiting character ("uppercasedelimiters" mode, in which case all variables had to be in lower case). - ALGOL 60 on the Elliott 803 and Elliott 503 computers used underlining. The Flexowriters (producing punched paper tape) had a non-movement key (underline _) so that typing _b_e_g_i_n produced begin which was very readable. The vertical bar | was also a non-movement key so that typing |= produced a good approximation to ≠.
- The Kidsgrove compiler for ALGOL 60 on the English Electric KDF9 appears to have used at least two other stropping conventions in addition to quotation marks: exclamation marks and percent characters.
- ALGOL 68RS programs are allowed the use of several stropping variants, even within the one language processor.
- Edinburgh IMP inherited the Atlas Autocode
%percent %symbol
prefix convention but not its other stropping options
Examples of different ALGOL 68 styles
Note the leading pr (abbreviation of pragmat) directive, which is itself stropped in POINT or quote style, and the ¢
for comment (from "2¢
") – see ALGOL 68: pr & co: Pragmats and Comments for details.
Algol68 "strict" as typically published |
Quote stropping (like wikitext) |
For a 7-bit character code compiler |
For a 6-bit character code compiler |
Algol68 using res stropping (reserved word) |
---|---|---|---|---|
¢ underline or bold typeface ¢ mode xint = int; xint sum sq:=0; for i while sum sq≠70×70 do sum sq+:=i↑2 od |
'pr' quote 'pr'
'mode' 'xint' = 'int';
'xint' sum sq:=0;
'for' i 'while'
sum sq≠70×70
'do'
sum sq+:=i↑2
'od'
|
.PR UPPER .PR
MODE XINT = INT;
XINT sum sq:=0;
FOR i WHILE
sum sq/=70*70
DO
sum sq+:=i**2
OD
|
.PR POINT .PR
.MODE .XINT = .INT;
.XINT SUM SQ:=0;
.FOR I .WHILE
SUM SQ .NE 70*70
.DO
SUM SQ .PLUSAB I .UP 2
.OD
|
.PR RES .PR
mode .xint = int;
.xint sum sq:=0;
for i while
sum sq≠70×70
do
sum sq+:=i↑2
od
|
Other languages
For various reasons Fortran 77 has these "logical" values and operators: .TRUE., .FALSE., .EQ., .NE., .LT., .LE., .GT., .GE., .EQV., .NEQV., .OR., .AND., .NOT.[5]
.AND., .OR. and .XOR. are also used in combined tests in IF
and IFF
statements in batch files run under JP Software's command line processors like 4DOS,[6] 4OS2, and 4NT / Take Command.
Modern use
To indicate identifiers
Most modern computer languages do not use stropping. However, some languages support optional stropping to specify identifiers that would otherwise collide with reserved words or which contain non-alphanumeric characters.
For example, the use of many languages in Microsoft's .NET Common Language Infrastructure (CLI) requires a way to use variables in a different language that may be keywords in a calling language. This is sometimes done by prefixes, such as @
in C#, or enclosing the identifier in brackets, in Visual Basic.NET.
A second major example is in many implementations of Structured Query Language. In those languages reserved words can be used as column, table, or variable names by lexically delimiting them. The standard specifies enclosing reserved words in double quotes, but in practice the exact mechanism varies by implementation; MySQL, for example, allows reserved words to be used in other contexts by enclosing them in backticks, and Microsoft SQL Server uses square brackets.
In several languages, including Nim, R,[7] and Scala,[8] a reserved word or non-alphanumeric name can be used as an identifier by enclosing it in backticks.
There are other, more minor examples. For example, Web IDL uses a leading underscore _
to strop identifiers that otherwise collide with reserved words: the value of the identifier strips this leading underscore, making this stropping, rather than a naming convention.[9]
Other purposes
In Haskell, surrounding a function name by backticks causes it to be parsed as an infix operator.
Unstropping by the compiler
In a compiler front end, unstropping originally occurred during an initial line reconstruction phase, which also eliminated whitespace. This was then followed by scannerless parsing (no tokenization); this was standard in the 1960s, notably for ALGOL. In modern use, unstropping is generally done as part of lexical analysis. This is clear if one distinguishes the lexer into two phases of scanner and evaluator: the scanner categorizes the stropped sequence into the correct category, and then the evaluator unstrops when calculating the value. For example, in a language where an initial underscore is used to strop identifiers to avoid collisions with reserved words, the sequence _if
would be categorized as an identifier (not as the reserved word if
) by the scanner, and then the evaluator would give this the value if
, yielding (Identifier, if)
as the token type and value.
Similar techniques
A number of similar techniques exist, generally prefixing or suffixing an identifier to indicate different treatment, but the semantics are varied. Strictly speaking, stropping consists of different representations of the same name (value) in different namespaces, and occurs at the tokenization stage. For example, in ALGOL 60 with matched apostrophe stropping, 'if'
is tokenized as (Keyword, if), while if
is tokenized as (Identifier, if) – same value in different token classes.
Using uppercase for keywords remains in use as a convention for writing grammars for lexing and parsing – tokenizing the reserved word if
as the token class IF, and then representing an if-then-else clause by the phrase IF Expression THEN Statement ELSE Statement
where uppercase terms are keywords and capitalized terms are nonterminal symbols in a production rule (terminal symbols are denoted by lowercase terms, such as identifier
or integer
, for an integer literal).
Naming conventions
Most loosely, one may use naming conventions to avoid clashes, commonly prefixing or suffixing with an underscore, as in if_
or _then
. A leading underscore is often used to indicate private members in object-oriented programming.
These names may be interpreted by the compiler and have some effect, though this is generally done at the semantic analysis phase, not the tokenization phase. For example, in Python, a single leading underscore is a weak private indicator, and affects which identifiers are imported on module import, while a double leading underscore (and no more than one trailing underscore) on a class attribute invokes name mangling.[10]
Reserved words
While modern languages generally use reserved words rather than stropping to distinguish keywords from identifiers – e.g., making if
reserved – they also frequently reserve a syntactic class of identifiers as keywords, yielding representations which can be interpreted as a stropping regime, but instead have the semantics of reserved words.
This is most notable in C, where identifiers that begin with an underscore are reserved, though the precise details of what identifiers are reserved at what scope are involved, and leading double underscores are reserved for any use;[11] similarly in C++ any identifier that contains a double underscore is reserved for any use, while an identifier that begins with an underscore is reserved in the global space.[nb 1] Thus one can add a new keyword foo
using the reserved word __foo
. While this is superficially similar to stropping, the semantics are different. As a reserved word, the string __foo
represents the identifier __foo
in the common identifier namespace. In stropping (by prefixing keywords by __
), the string __foo
represents the keyword foo
in a separate keyword namespace. Thus using reserved words, the tokens for __foo
and foo
are (identifier, __foo) and (identifier, foo) – different values in the same category – while in stropping the tokens for __foo
and foo
are (keyword, foo) and (identifier, foo) – same values in different categories. These solve the same problem of namespace clashes in a way that is the same for a programmer, but which differs in terms of formal grammar and implementation.
Name mangling
Name mangling also addresses name clashes by renaming identifiers, but does this much later in compilation, during semantic analysis, not during tokenization. This consists of creating names that include scope and type information, primarily for use by linkers, both to avoid clashes and to include necessary semantic information in the name itself. In these cases the original identifiers may be identical, but the context is different, as in the functions foo(int x)
versus foo(char x)
, in both cases having the same identifier foo
, but different signature. These names might be mangled to foo_i
and foo_c
, for instance, to include the type information.
Sigils
A syntactically similar but semantically different phenomenon are sigils, which instead indicate properties of variables. These are common in BASIC, Perl, Ruby, and various other languages to identify characteristics of variables/constants: BASIC and Perl to designate the type of variable, Ruby both to distinguish variables from constants and to indicate scope. Note that this affects the semantics of the variable, not the syntax of whether it is an identifier or keyword.
Parallels in human language
Stropping is used in computer programming languages to make the compiler's (or more strictly, the parser's) job easier, i.e. within the capability of the relatively small and slow computers available in early days of computing in the 20th century. However, similar techniques have been commonly used to aid reading comprehension for people too. Some examples are:
- Placing important words in bold,[12] such as the very first mention of stropping at the head of this page, because defining stropping is the very purpose of the page.
- Formatting new words in italic type[13] when they are first introduced in text. This is commonly used in science fiction and fantasy when introducing invented plants, foods, creatures; in travelogue and historical writing when describing unfamiliar foreign words; and so on. Also using a special font, possibly associated with the language in question, for example using a Gothic[14] font for German words.
- Using a different language, typically Latin or Greek to signify technical terms. This is similar to using reserved words, but it is usually combined with italic text to aid readability. For example:
- the typical binomial nomenclature[15] or "Latin names" of plants and animals helps the reader to see that "Erithacus rubecula" is the special technical name of the Erithacus rubecula, in a way that "Red-breasted European thrush" does not.
- many legal terms where a short Latin phrase refers to a large body of law and precedent, such as habeas corpus, sub judice, in loco parentis.[16]
- logic and mathematical terms such as QED, a priori, vice versa...
- In written Japanese, in addition to Kanji characters, the two distinct alphabets (more strictly, syllabaries) Hiragana[17][18] and Katakana,[19] both representing the same set of sounds, are used to distinguish phonetically spelled-out Japanese words from imported foreign words, respectively; Katakana is also used for emphasis, much like italics in English.
See also
Notes
- ^ There are other restrictions, such as an identifier that begins with an underscore, followed by an uppercase letter.
References
- ^ King, Peter R., ed. (1974-06-18). "(unknown)". Proceedings of an International Conference on ALGOL 68 Implementation. Department of Computer Science, University of Manitoba, Winnipeg: University of Manitoba, Department of Computer Science: 148. ISBN 9780919628113.
More serious problems are posed by "stropping", the technique used to distinguish boldface text from roman text. Some implementations demand apostrophes around boldface (whence the name stropping); others require backspacing and underlining; [...]
{{cite journal}}
: Cite uses generic title (help) - ^ a b van Wijngaarden, Adriaan; Mailloux, Barry James; Peck, John Edward Lancelot; Koster, Cornelis Hermanus Antonius; Sintzoff, Michel [in French]; Lindsey, Charles Hodgson; Meertens, Lambert Guillaume Louis Th�odore; Fisker, Richard G., eds. (1976). "Section 9.3 Representations" (PDF). Revised Report on the Algorithmic Language ALGOL 68. Springer-Verlag. pp. 94, 123. ISBN 978-0-387-07592-1. OCLC 1991170. Archived (PDF) from the original on 2019-04-19. Retrieved 2019-05-11.
- ^ http://www.fh-jena.de/~kleine/history/languages/Algol68-RR-HardwareRepresentation.pdf [dead link]
- ^ Lindsey, Charles Hodgson; van der Meulen, Sietse G. (1977). Informal Introduction to ALGOL 68. North-Holland. pp. 348–349. ISBN 978-0-7204-0726-6. OCLC 230034877.
- ^ "Logical Structures".
- ^ Brothers, Hardin; Rawson, Tom; Conn, Rex C.; Paul, Matthias R.; Dye, Charles E.; Georgiev, Luchezar I. (2002-02-27). 4DOS 8.00 online help.
- ^ R Core Team, Quotes: Quotes, R Foundation for Statistical Computing.
- ^ Odersky, Martin (2011-05-24), The Scala Language Specification Version 2.9
- ^ Web IDL, "3.1. Names". [...] For all of these constructs, the identifier is the value of the identifier token with any single leading U+005F LOW LINE ("_") character (underscore) removed. [...] Note [...] A leading "_" is used to escape an identifier from looking like a reserved word so that, for example, an interface named “interface” can be defined. The leading "_" is dropped to unescape the identifier. [...]
- ^ PEP 008: Descriptive: Naming Styles
- ^ C99 standard, 7.1.3 Reserved identifiers
- ^ Twyman, Michael. "The Bold Idea: The Use of Bold-looking Types in the Nineteenth Century". Journal of the Printing Historical Society. 22 (107–143).
- ^ Truss, Lynne (2004), Eats, Shoots & Leaves: The Zero Tolerance Approach to Punctuation, New York: Gotham Books, p. 146, ISBN 978-1-59240-087-4
- ^ "Styles of Handwriting". Rigsarkivet. The Danish National Archives. Retrieved 2017-03-26.
- ^ "How to Write Scientific Names of Organisms" (PDF), Competition Science Vision, retrieved 2011-06-20.
- ^ A Selection of Legal Maxims, classified and illustrated at Google Books
- ^ Dual 大辞林
「平」とは平凡な、やさしいという意で、当時普通に使用する文字体系であったことを意味する。 漢字は書簡文や重要な文章などを書く場合に用いる公的な文字であるのに対して、 平仮名は漢字の知識に乏しい人々などが用いる私的な性格のものであった。
Translation: 平 [the "hira" part of "hiragana"] means "ordinary" or "simple" since at that time [the time that the name was given] it was a writing system for everyday use. While kanji was the official system used for letter-writing and important texts, hiragana was for personal use by people who had limited knowledge of kanji. - ^ "Japanese calligraphy". Encyclopedia Britannica. Retrieved 2017-06-22.
- ^ "Hiragana, Katakana & Kanji". Japanese Word Characters. 2010-09-08. Retrieved 2011-10-15.
Further reading
- Hansen, W. J.; Boom, H. J. (1978). "Report on the Standard Hardware Representation for Revised ALGOL 68". Acta Informatica. 9 (2): 105–119. doi:10.1007/BF00289072. S2CID 34231916.
- Lindsey, Charles Hodgson (March 1970), "An ISO-Code Representation for ALGOL 68", ALGOL Bulletin (31), ACM: 37–60, AB31.3.6