Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Syntactic Errors Detection

Overview

Syntactic errors result in an invalid Concrete Syntax Tree (CST) generated by the tree-sitter parser (Tree-sitter) when parsing Essence code. An invalid CST may contain:

  • ERROR nodes
    Inserted when the parser fails to interpret a portion of the code according to the grammar.

  • MISSING nodes
    Inserted when the parser expects a token that is not present in the source.

These errors are detected by traversing the CST, classified into specific subtypes, and reported via the Diagnostics API (Diagnostics API documentation).

Implementation

pub fn detect_syntactic_errors(source: &str) -> Vec<Diagnostic)

Tree-sitter is used to parse the Essence source code and generate a CST. If parsing fails entirely, a corresponding Diagnostic is returned immediately. Otherwise, the CST is traversed using a WalkDFS::with_retract iterator. It consists of Tree-sitter’s TreeCursor and an optional retract. Tree-sitter nests errors in the CST. When a node is marked as erroneous, its descendant nodes may also be marked as errors which can lead to duplicate diagnostics. Enabling retract avoids this by skipping the children of ERROR or MISSING nodes and collecting only top-level errors, thereby improving the clarity of the reported diagnostics.

When a node is identified as an ERROR node or contains missing content, the syntax error is classified and a specific Diagnostic is generated and pushed to a Vec<Diagnostic>. Each Diagnostic contains the Range of the error (start and end positions in the source code) and a tailored error message.

In some cases, tree-sitter fails to insert an explicit MISSING node. To handle this, missing tokens are additionally detected by checking whether the source range of a CST node has zero length.

Missing Token Diagnostics

fn classify_missing_token(node: Node) -> Diagnostic

Generates diagnostics for missing tokens.

To improve error reporting, the function pattern-matches on the parent node of the missing token and produces a more context-aware error message. Source ranges are calculated using Tree-sitter’s Node.start_position() and Node.end_position() APIs.

Unexpected Token and Malformed Top Level Lines Diagnostics

fn classify_unexpected_token_error(node: Node, source_code: &str) -> Diagnostic

This function is called when an ERROR node is encountered in the CST.

Unexpected tokens occur when tree-sitter fails to recognise input according to the grammar rule it is currently applying. These tokens may either be valid grammar elements that are unexpected in the current context, or completely foreign symbols.

When no rule can be applied to an entire line (often when unexpected tokens appear at the start of the line) the line is considered malformed.

The different cases are distinguished based on the position of the ERROR node in the CST:

  • Parent is the root node (program):

    • Starting column 0: the entire line is malformed.
    • Has a previous sibling: unexpected tokens appear at the end of a line, meaning a grammar rule was successfully applied but extra tokens remain.
  • Parent is another token node: unexpected tokens occur inside a construct, indicating that the rule was only partially matched.

How To Test

cargo test -p conjure-cp-essence-parser --test malformed_top_level
cargo test -p conjure-cp-essence-parser --test missing_token
cargo test -p conjure-cp-essence-parser --test unexpected_token

Examples

Example: Missing Right-Hand Operand

Input

find x: int
such that 5 =

Diagnostic

Range: (1:13) - (1:13)
Severity: Error
Message: Missing right operand in 'comparison' expression
Source: syntactic-error-detector

Example: Missing Parenthesis and Expression/Domain

Input

find x: int(1..3
letting x be

Diagnostics

Diagnostic 1:
Range: (0:16) - (0:16)
Severity: Error
Message: Missing ')'
Source: syntactic-error-detector

Diagnostic 2:
Range: (1:12) - (1:12)
Severity: Error
Message: Missing 'expression or domain'
Source: syntactic-error-detector

Example: Unexpected token inside a construct

Input

find x: int(1..3)
such that x -> %9

Diagnostic

Range: (1:15) - (1:16)
Severity: Error
Message: Unexpected '%' inside 'implication'
Source: syntactic-error-detector

Example: Unexpected tokens at the end of a line

Input

find x, a, b: int(1..3)+x

Diagnostic

Range: (0:23) - (0:25)
Severity: Error
Message: Unexpected '+x' at the end of 'find'
Source: syntactic-error-detector

Example: Malformed line

Input

find x: int(1..3)
find x

Diagnostic

Range: (1:0) - (1:6)
Severity: Error
Message: Malformed line 2: 'find x'
Source: syntactic-error-detector