Syntactic Errors Detection
Overview
Syntactic errors result in an invalid Concrete Syntax Tree (CST) generated by the tree-sitter parser (Tree-sitter) when parsing Essence code. An invalid CST may contain:
-
ERRORnodes
Inserted when the parser fails to interpret a portion of the code according to the grammar. -
MISSINGnodes
Inserted when the parser expects a token that is not present in the source.
These errors are detected by traversing the CST, classified into specific subtypes, and reported via the Diagnostics API (Diagnostics API documentation).
Implementation
pub fn detect_syntactic_errors(source: &str) -> Vec<Diagnostic)
Tree-sitter is used to parse the Essence source code and generate a CST. If parsing fails entirely, a corresponding Diagnostic is returned immediately. Otherwise, the CST is traversed using a WalkDFS::with_retract iterator. It consists of Tree-sitter’s TreeCursor and an optional retract. Tree-sitter nests errors in the CST. When a node is marked as erroneous, its descendant nodes may also be marked as errors which can lead to duplicate diagnostics. Enabling retract avoids this by skipping the children of ERROR or MISSING nodes and collecting only top-level errors, thereby improving the clarity of the reported diagnostics.
When a node is identified as an ERROR node or contains missing content, the syntax error is classified and a specific Diagnostic is generated and pushed to a Vec<Diagnostic>. Each Diagnostic contains the Range of the error (start and end positions in the source code) and a tailored error message.
In some cases, tree-sitter fails to insert an explicit MISSING node. To handle this, missing tokens are additionally detected by checking whether the source range of a CST node has zero length.
Missing Token Diagnostics
fn classify_missing_token(node: Node) -> Diagnostic
Generates diagnostics for missing tokens.
To improve error reporting, the function pattern-matches on the parent node
of the missing token and produces a more context-aware error message. Source
ranges are calculated using Tree-sitter’s Node.start_position() and
Node.end_position() APIs.
Unexpected Token and Malformed Top Level Lines Diagnostics
fn classify_unexpected_token_error(node: Node, source_code: &str) -> Diagnostic
This function is called when an ERROR node is encountered in the CST.
Unexpected tokens occur when tree-sitter fails to recognise input according to the grammar rule it is currently applying. These tokens may either be valid grammar elements that are unexpected in the current context, or completely foreign symbols.
When no rule can be applied to an entire line (often when unexpected tokens appear at the start of the line) the line is considered malformed.
The different cases are distinguished based on the position of the ERROR node in the CST:
-
Parent is the root node (
program):- Starting column 0: the entire line is malformed.
- Has a previous sibling: unexpected tokens appear at the end of a line, meaning a grammar rule was successfully applied but extra tokens remain.
-
Parent is another token node: unexpected tokens occur inside a construct, indicating that the rule was only partially matched.
How To Test
cargo test -p conjure-cp-essence-parser --test malformed_top_level
cargo test -p conjure-cp-essence-parser --test missing_token
cargo test -p conjure-cp-essence-parser --test unexpected_token
Examples
Example: Missing Right-Hand Operand
Input
find x: int
such that 5 =
Diagnostic
Range: (1:13) - (1:13)
Severity: Error
Message: Missing right operand in 'comparison' expression
Source: syntactic-error-detector
Example: Missing Parenthesis and Expression/Domain
Input
find x: int(1..3
letting x be
Diagnostics
Diagnostic 1:
Range: (0:16) - (0:16)
Severity: Error
Message: Missing ')'
Source: syntactic-error-detector
Diagnostic 2:
Range: (1:12) - (1:12)
Severity: Error
Message: Missing 'expression or domain'
Source: syntactic-error-detector
Example: Unexpected token inside a construct
Input
find x: int(1..3)
such that x -> %9
Diagnostic
Range: (1:15) - (1:16)
Severity: Error
Message: Unexpected '%' inside 'implication'
Source: syntactic-error-detector
Example: Unexpected tokens at the end of a line
Input
find x, a, b: int(1..3)+x
Diagnostic
Range: (0:23) - (0:25)
Severity: Error
Message: Unexpected '+x' at the end of 'find'
Source: syntactic-error-detector
Example: Malformed line
Input
find x: int(1..3)
find x
Diagnostic
Range: (1:0) - (1:6)
Severity: Error
Message: Malformed line 2: 'find x'
Source: syntactic-error-detector