Grammar

Description

sealed class Tinman.Core.Parsing.Grammar

Derived from: Disposable ^abstract

Represents a grammar for structural parsing.

The following code shows the grammar for defining grammars:

grammar       := (decl-skip | decl-rule)+ ;

decl-rule     := [!?] <! identifier ':=' rule ';' ;
decl-skip     := '#' ':=' rule ';' ;

rule          := rule-list ;
rule-ast      := '{' identifier ':' rule (('<!' | '!>') rule)? '}' ;
rule-brace    := '(' rule ')' ;
rule-choice   := rule-sequence !> ('|' rule-sequence)+ ;
rule-except   := rule-not !> '\\' rule-not ;
rule-list     := rule-choice !> '..' rule-choice ;
rule-not      := '~' <! rule-primary ;
rule-peek-no  := '>' rule '<' ;
rule-peek-yes := '<' rule '>' ;
rule-primary  := rule-char | rule-token | rule-name | rule-brace | rule-ast
                 | rule-peek-yes | rule-peek-no ;
rule-repeat   := rule-except !> '?' | '*' | '+' | '[' number ('..' number)? ']' ;
rule-sequence := rule-trim !> rule-trim+ ;
rule-trim     := '#' <! rule-repeat !> '#' ;

rule-char     := char-range | char-set | char-set-inv | char-eof ;

char-range    := char-one !> '..' char-one ;

char-one      := '\'' (char-esc | ]\\'[) '\'' ;
char-set      := '[' (char-esc | ]\]\\[)+ ']' ;
char-set-inv  := ']' (char-esc | ]\[\\[)+ '[' ;
rule-token    := '@' <! '\'' (char-esc | ]\\'[)* '\'' ;

char-esc      := '\\' ([t0\]nr\[\\'] | 'x' hexdigit[1..4]) ;
rule-name     := identifier ;

hexdigit      := digit | 'a'..'f' | 'A'..'F' ;
identifier    := letter (letter | digit)* ;
number        := digit+ ;

char-eof      := @'<eof>' ;
digit         := '0'..'9' ;
letter        := 'a'..'z' | 'A'..'Z' | [_-] ;

The above grammar describes itself in its very own grammar language, so it is by definition fully self-explanatory (but still non-comprehensible, so the documentation goes on).

A grammar contains one or more declarations, which can be one of the following:

decl-rule
Defines a new grammar rule: the left side defines the rule name, the right side rule content. The rule optionally be prefixed with either '!' or '?', which assigns an additional flag to the grammar rule: either GrammarRuleFlags.ReportErrorRoot or GrammarRuleFlags.ReportErrorTerminal, respectively.
```
! expression := 'A' ;
? identifier := 'B' ;
some_rule   := 'C' ;
```
decl-skip
Defines the special grammar rule for rule-trim.
```
# := ' ' ;
```

These are the grammar rules, listed from lowest precedence to highest:

rule-list (lowest precedence)
Matches the rule on the left one or more times, while matching the rule on the right as the separator. When the left-side rule is matched n times, the right-side rule will always be matched n-1 times.
```
('A' | 'B' .. # ',' #)
```
rule-choice
Matches one rule in the given list of possible rules:
```
'A' | 'B' | 'C'
```
rule-sequence
Matches all given rules one after another:
```
'A' 'B' 'C'
```
rule-trim
Trims leading ('#' is left of rule) and/or trailing ('#' is right of rule) whitespace characters (see FormattingUtil.IsWhitespace) from the input before matching the given rule:
```
'A' # ',' # 'B' # ',' # 'C'
```
A grammar may optionally define a special skip rule using the special name '#': the skip rule is applied at all potential trim locations. All input characters that are matched by the skip rule are silently dropped.
rule-repeat
Matches the given rule multiple times ('?' zero or one, '*' zero or more, '+' one or more). The minimum and maximum number of occurrences can be specified with 'min..max' where min and max specify the respective occurrence count.
```
'A'?
'A'*
'A'+
'A'[2..4]
```
rule-except
Matches the left rule only if the right rule does not match.
```
'a' .. 'z' \ 'e'
```
rule-not
Matches the current input if the given rule fails:
```
~'A'
```
rule-primary (highest precedence)

These are the primary rules (same precedence):

rule-ast
Matches the given rule(s) and creates an abstract syntax tree node on success (the node will span all input matched by rules inside of the curly braces). The '!>' and '<!' tokens are conditional operators: a syntax tree node is created only if the rules to the right resp. to the left are also matched.
```
{ AST : 'A' }
{ AST : 'A' !> 'B' }
{ AST : 'A' <! 'B' }
```
rule-brace
Wraps another given rule, overriding the default rule precedence.
Example:
```
a (b | c) d
(a | b) (c | d)
```
rule-char
Matches a single character in the given set.
```
'A'
'A'..'C'
[ABC]
]ABC[
<eof>
```
Inverse brackets ']' and '[' indicate that all characters except the specified ones will be matched (inverse set). The <eof> token will match the end of input condition (see ParserContext.IsEof).
rule-name
Matches another grammar rule, specified by its name. Identifiers consist of letters, digits, dashes and underscores and must not start with a digit.
Example:
```
name-of-grammar-rule
rule_123
```
rule-peek-no
Peeks at the subsequent input, checking if the wrapped rule does not match, while retaining the parser state.
Example:
```
'null' > 'ify' <
```
rule-peek-yes
Peeks at the subsequent input, checking if the wrapped rule does match while retaining the parser state.
Example:
```
'(int)' < 'expr' >
```
rule-token
Matches the given string token. Use the '@' prefix to match case-insensitively.
Example:
```
'keyword_case_sensitive'
@'KeyWord_Case_InSensitive'
```

Parsing is performed by the following steps.

Write a grammar script (see above).
Create a Grammar object from that script (see ParseGrammar¹).
Create a ICodeInput object (e.g. CodeInput.FromString).
Create or obtain some source code that follows the grammar.
Use a grammar rule (see GetRule¹) to parse the source code into an abstract syntax tree (AST).
Analyse the ParseResult object and do error handling if necessary.
Process the resulting AST (see ParseResult.Ast), for example by creating a program structure information (PSI) model from it (see below).
Create a program structure information (PSI) model from an AST node by calling the AstNode.CreatePsi method.

Public / Constants

AstRoot

public constant AstRoot → ("ROOT":string)

Type for default root AST nodes.

When the result of a parsing process yielded more than one root AST node, an artificial root node of this type is created that holds the yielded nodes.

Public / Constructors

ParseGrammar

_2 overloads

[OwnerReturn]
public static method ParseGrammar¹ → (1)

source ⁱⁿ : string: [not-empty]
The grammar source code.
returns → GrammarBuilder: The GrammarBuilder object for building the grammar. Call GrammarBuilder.Build to get the final Grammar object.

Creates a new Grammar object.

[OwnerReturn]
public static method ParseGrammar² → (1)

lockedSource ⁱⁿ : int32: The locked grammar source code.
returns → GrammarBuilder: The GrammarBuilder object for building the grammar. Call GrammarBuilder.Build to get the final Grammar object.

Creates a new Grammar object.

LicenceException: If LicenceDomain.Tinman is still locked.

Public / Methods

GetRule

_2 overloads

public method GetRule¹ → ()

returns → IGrammarRule: The default rule or null if no default rule has been specified via GrammarBuilder.DefaultRule.

Returns the default rule of this grammar.

To get another rule by its name, use GetRule².

public method GetRule² → (1)

name ⁱⁿ : string: The rule name or null to return the default rule.
returns → IGrammarRule: The rule.

Returns a rule of this grammar by its name.

This method will never return null if name ⁱⁿ is null (the default rule always exists).

ValidatingException: If no rule of the given name exists.

See also: Grammar.GetRuleNull

GetRuleNull

public method GetRuleNull → (1)

name ⁱⁿ : string: The rule name or null to return the default rule.
returns → IGrammarRule: The rule or null if no rule of the given name exists.

Returns a rule of this grammar by its name.

This method will never return null if name ⁱⁿ is null (the default rule always exists).

ParseWith

public static method ParseWith → (2)

grammar ⁱⁿ : string: [not-empty]
The grammar to use.
input ⁱⁿ : string: [not-null]
The input to parse.
returns → AstNode: The parsed AST.

Parses the given input, using the specified grammar.

This method is intended to be used in special cases where simplistic grammars are used to perform simple tasks. In such cases, the required effort for using a singleton Grammar object might not be justified.

ValidatingException: If one or more parsing errors have occurred (see ValidateMessageType.Error).

ToSource

public method ToSource → (1)

flags ^opt : RuleToSourceFlags = RuleToSourceFlags.None: The format flags to use.
returns → string: The grammar source code.

Returns the grammar source code.

Public / Attributes

GrammarCode

public static attribute GrammarCode → (get)

value : string: [not-null]
The grammar source code.

Returns the source code for grammars in the grammar language.

LeftRecursive

public attribute LeftRecursive → (get)

value : IVectorConst<IGrammarRule>: [not-null]
The list of left-recursive rules.

Returns all left-recursive IRule objects that are present this grammar.

Left-recursion happens if the IRule.Match¹ method calls itself on the same IRule object for the same parse position (see ParserContext.Position), either directly or indirectly.

A grammar will fix left-recursive rule invocations by forcing the recursive match attempt to fail. This requires additional house-keeping, which might reduce parsing performance. This property can be used to check if left-recursive rules have slipped into a grammar inadvertently. It might not be possible to avoid left-recursion at all, especially for complex grammars.

Here are some examples for left-recursive grammars:

This will never match any input, because the recursive match attempt is forced to fail:
```
a := a ;
```
This will only match empty input, because the forced match failure is allowed:
```
a := a? ;
```
This will match one '1' characters (please note that this differs from the behaviour of grammar rules in classic EBNF, which would allow one or more characters to be matched):
```
a := a? '1' ;
```
This will match one or more '1' characters, because the second recursive invocation happens at a later parse position:
```
a := a? '1' a? ;
```
This will never match any input, because of indirect left-recursion:
```
a := b ;
b := c ;
c := a ;
```

This will match '1', '21' or '321':

a := b? '1' ;
b := c? '2' ;
c := a? '3' ;

Rules

public attribute Rules → (get)

value : IMapConst<string, IGrammarRule>: [not-null]
Mapping from grammar rule name (see IGrammarRule.RuleName) to grammar rules.

The grammar rules.

Grammar

Description

Public / Constants

Ast​Root

Public / Constructors

Parse​Grammar

Public / Methods

Get​Rule

Get​Rule​Null

Parse​With

To​Source