Antlr4 DM string lexer rules
- The string start and end with double quotes. i.e.
"hello world"evaluates to
- A backslash acts as an escape character, which can escape the end quote. i.e.
- Newlines in the string can be ignored by ending the line with a backslash. i.e.
"hello\ world"evaluates to
- If the string opens/closes with the sequence
"}respectively, newlines are allowed and entered into the final string. The sequence
\\\nis still ignored
- The string can contain embedded expressions inside braces which are formatted into the result. Backslashes can escape the opening brace. i.e.
"hello [ "world" ] \["evaluates to
hello world [at run-time. Any expression can go in the braces (calls, math, etc...)
- If the starting quote/curly brace is prefixed with '@' escape sequences and embedded expressions are disabled for the string. i.e.
@"hello [worl\d"both evaluate to
I am trying to construct ANTLR4 .g4 lexer rules to tokenize these strings. I figure there's 4 (or more) token types I'd need:
- Normal string. i.e
- String start before embedded expression. i.e.
- String end after embedded expression. i.e.
- String in between two embedded expressions. i.e.
] hello world [
Here are my (incomplete and unsuccessful) attempts:
LSTRING: '"' ('\\[' | ~[[\r\n])* '['; RSTRING: ']' ('\\"' | ~["\r\n])* '"'; CSTRING: ']' ('\\[' | ~[[\r\n])* '['; FSTRING: '"' ('\\"' | ~["\r\n])* '"';
If this can't be solved in the lexer, I can write the parser rules on my own with the tokens
". But, I figure I'd give this a shot since it'd be more performant.