Grammar

Description

sealed class Tinman.Core.Parsing.Grammar

Derived from

Disposable abstract

Represents a grammar for structural parsing.

The following code shows the grammar for defining grammars:

grammar       := (decl-skip | decl-rule)+ ;

decl-rule     := [!?] <! identifier ':=' rule ';' ;
decl-skip     := '#' ':=' rule ';' ;

rule          := rule-list ;
rule-ast      := '{' identifier ':' rule (('<!' | '!>') rule)? '}' ;
rule-brace    := '(' rule ')' ;
rule-choice   := rule-sequence !> ('|' rule-sequence)+ ;
rule-except   := rule-not !> '\\' rule-not ;
rule-list     := rule-choice !> '..' rule-choice ;
rule-not      := '~' <! rule-primary ;
rule-peek-no  := '>' rule '<' ;
rule-peek-yes := '<' rule '>' ;
rule-primary  := rule-char | rule-token | rule-name | rule-brace | rule-ast
                 | rule-peek-yes | rule-peek-no ;
rule-repeat   := rule-except !> '?' | '*' | '+' | '[' number ('..' number)? ']' ;
rule-sequence := rule-trim !> rule-trim+ ;
rule-trim     := '#' <! rule-repeat !> '#' ;

rule-char     := char-range | char-set | char-set-inv | char-eof ;

char-range    := char-one !> '..' char-one ;

char-one      := '\'' (char-esc | ]\\'[) '\'' ;
char-set      := '[' (char-esc | ]\]\\[)+ ']' ;
char-set-inv  := ']' (char-esc | ]\[\\[)+ '[' ;
rule-token    := '@' <! '\'' (char-esc | ]\\'[)* '\'' ;

char-esc      := '\\' ([t0\]nr\[\\'] | 'x' hexdigit[1..4]) ;
rule-name     := identifier ;

hexdigit      := digit | 'a'..'f' | 'A'..'F' ;
identifier    := letter (letter | digit)* ;
number        := digit+ ;

char-eof      := @'<eof>' ;
digit         := '0'..'9' ;
letter        := 'a'..'z' | 'A'..'Z' | [_-] ;

The above grammar describes itself in its very own grammar language, so it is by definition fully self-explanatory (but still non-comprehensible, so the documentation goes on).

A grammar contains one or more declarations, which can be one of the following:

  • decl-rule
    Defines a new grammar rule: the left side defines the rule name, the right side rule content. The rule optionally be prefixed with either '!' or '?', which assigns an additional flag to the grammar rule: either GrammarRuleFlags.ReportErrorRoot or GrammarRuleFlags.ReportErrorTerminal, respectively.

    ! expression := 'A' ;
    ? identifier := 'B' ;
    some_rule   := 'C' ;
  • decl-skip
    Defines the special grammar rule for rule-trim.

    # := ' ' ;

These are the grammar rules, listed from lowest precedence to highest:

  • rule-list (lowest precedence)
    Matches the rule on the left one or more times, while matching the rule on the right as the separator. When the left-side rule is matched n times, the right-side rule will always be matched n-1 times.

    ('A' | 'B' .. # ',' #)
  • rule-choice
    Matches one rule in the given list of possible rules:

    'A' | 'B' | 'C'
  • rule-sequence
    Matches all given rules one after another:

    'A' 'B' 'C'
  • rule-trim
    Trims leading ('#' is left of rule) and/or trailing ('#' is right of rule) whitespace characters (see FormattingUtil.IsWhitespace) from the input before matching the given rule:

    'A' # ',' # 'B' # ',' # 'C'

    A grammar may optionally define a special skip rule using the special name '#': the skip rule is applied at all potential trim locations. All input characters that are matched by the skip rule are silently dropped.

  • rule-repeat
    Matches the given rule multiple times ('?' zero or one, '*' zero or more, '+' one or more). The minimum and maximum number of occurrences can be specified with 'min..max' where min and max specify the respective occurrence count.

    'A'?
    'A'*
    'A'+
    'A'[2..4]
  • rule-except
    Matches the left rule only if the right rule does not match.

    'a' .. 'z' \ 'e'
  • rule-not
    Matches the current input if the given rule fails:

    ~'A'
  • rule-primary (highest precedence)

These are the primary rules (same precedence):

  • rule-ast
    Matches the given rule(s) and creates an abstract syntax tree node on success (the node will span all input matched by rules inside of the curly braces). The '!>' and '<!' tokens are conditional operators: a syntax tree node is created only iff the rules to the right resp. to the left are also matched.

    { AST : 'A' }
    { AST : 'A' !> 'B' }
    { AST : 'A' <! 'B' }
  • rule-brace
    Wraps another given rule, overriding the default rule precedence.
    Example:

    a (b | c) d
    (a | b) (c | d)
  • rule-char
    Matches a single character in the given set.

    'A'
    'A'..'C'
    [ABC]
    ]ABC[
    <eof>

    Inverse brackets ']' and '[' indicate that all characters except the specified ones will be matched (inverse set). The <eof> token will match the end of input condition (see ParserContext.IsEof).

  • rule-name
    Matches another grammar rule, specified by its name. Identifiers consist of letters, digits, dashes and underscores and must not start with a digit.
    Example:

    name-of-grammar-rule
    rule_123
  • rule-peek-no
    Peeks at the subsequent input, checking if the wrapped rule does not match, while retaining the parser state.
    Example:

    'null' > 'ify' <
  • rule-peek-yes
    Peeks at the subsequent input, checking if the wrapped rule does match while retaining the parser state.
    Example:

    '(int)' < 'expr' >
  • rule-token
    Matches the given string token. Use the '@' prefix to match case-insensitively.
    Example:

    'keyword_case_sensitive'
    @'KeyWord_Case_InSensitive'

Parsing is performed by the following steps.

  1. Write a grammar script (see above).

  2. Create a Grammar object from that script (see ParseGrammar1).

  3. Create a ICodeInput object (e.g. CodeInput.FromString).

  4. Create or obtain some source code that follows the grammar.

  5. Use a grammar rule (see GetRule1) to parse the source code into an abstract syntax tree (AST).

  6. Analyse the ParseResult object and do error handling if necessary.

  7. Process the resulting AST (see ParseResult.Ast), for example by creating a program structure information (PSI) model from it (see below).

  8. Create a program structure information (PSI) model from an AST node by calling the AstNode.CreatePsi method.

Public / Constants

Ast​Root


public constant AstRoot → ("ROOT":string)

Type for default root AST nodes.

When the result of a parsing process yielded more than one root AST node, an artificial root node of this type is created that holds the yielded nodes.

Public / Constructors

Parse​Grammar

2 overloads


[OwnerReturn]
public static method ParseGrammar1 → (1)

source in : string

[not-empty]
The grammar source code.

returns → GrammarBuilder

The GrammarBuilder object for building the grammar. Call GrammarBuilder.Build to get the final Grammar object.

Creates a new Grammar object.


[OwnerReturn]
public static method ParseGrammar2 → (1)

lockedSource in : int32

The locked grammar source code.

returns → GrammarBuilder

The GrammarBuilder object for building the grammar. Call GrammarBuilder.Build to get the final Grammar object.

Creates a new Grammar object.

Public / Methods

Get​Rule

2 overloads


public method GetRule1 → ()

returns → IGrammarRule

The default rule or null if no default rule has been specified via GrammarBuilder.DefaultRule.

Returns the default rule of this grammar.

To get another rule by its name, use GetRule2.


public method GetRule2 → (1)

name in : string

The rule name or null to return the default rule.

returns → IGrammarRule

The rule.

Returns a rule of this grammar by its name.

This method will never return null if name in is null (the default rule always exists).

ValidatingException

If no rule of the given name exists.

Get​Rule​Null


public method GetRuleNull → (1)

name in : string

The rule name or null to return the default rule.

returns → IGrammarRule

The rule or null if no rule of the given name exists.

Returns a rule of this grammar by its name.

This method will never return null if name in is null (the default rule always exists).

Parse​With


public static method ParseWith → (2)

grammar in : string

[not-empty]
The grammar to use.

input in : string

[not-empty]
The input to parse.

returns → AstNode

The parsed AST.

Parses the given input, using the specified grammar.

This method is intended to be used in special cases where simplistic grammars are used to perform simple tasks. In such cases, the required effort for using a singleton Grammar object might not be justified.

ValidatingException

If one or more parsing errors have occurred (see ValidateMessageType.Error).

To​Source


public method ToSource → (1)

flags opt : RuleToSourceFlags = RuleToSourceFlags.None

The format flags to use.

returns → string

The grammar source code.

Returns the grammar source code.

Public / Attributes

Grammar​Code


public static attribute GrammarCode → (get)

value : string

[not-null]
The grammar source code.

Returns the source code for grammars in the grammar language.

Left​Recursive


public attribute LeftRecursive → (get)

value : IVectorConst<IGrammarRule>

[not-null]
The list of left-recursive rules.

Returns all left-recursive IRule objects that are present this grammar.

Left-recursion happens if the IRule.Match1 method calls itself on the same IRule object for the same parse position (see ParserContext.Position), either directly or indirectly.

A grammar will fix left-recursive rule invocations by forcing the recursive match attempt to fail. This requires additional house-keeping, which might reduce parsing performance. This property can be used to check if left-recursive rules have slipped into a grammar inadvertently. It might not be possible to avoid left-recursion at all, especially for complex grammars.

Here are some examples for left-recursive grammars:

  • This will never match any input, because the recursive match attempt is forced to fail:

    a := a ;
  • This will only match empty input, because the forced match failure is allowed:

    a := a? ;
  • This will match one '1' characters (please note that this differs from the behaviour of grammar rules in classic EBNF, which would allow one or more characters to be matched):

    a := a? '1' ;
  • This will match one or more '1' characters, because the second recursive invocation happens at a later parse position:

    a := a? '1' a? ;
  • This will never match any input, because of indirect left-recursion:

    a := b ;
    b := c ;
    c := a ;
  • This will match '1', '21' or '321':

    a := b? '1' ;
    b := c? '2' ;
    c := a? '3' ;

Rules


public attribute Rules → (get)

value : IMapConst<string, IGrammarRule>

[not-null]
Mapping from grammar rule name (see IGrammarRule.RuleName) to grammar rules.

The grammar rules.