This module contains a range-based compile-time _lexer generator.
Examples
- A _lexer for D is available here.
- A _lexer for Lua is available here.
- A _lexer for JSON is available here.
- defaultTokenFunction
- A function that serves as the default token lexing function. For most
languages this will be the identifier lexing function.
- tokenSeparatingFunction
- A function that is able to determine if an identifier/keyword has come
to an end. This function must return bool and take a single size_t
argument representing the number of bytes to skip over before looking for
a separating character.
- staticTokens
- A listing of the tokens whose exact value never changes and which cannot
possibly be a token handled by the default token lexing function. The
most common example of this kind of token is an operator such as
"*", or "-" in a programming language.
- dynamicTokens
- A listing of tokens whose value is variable, such as whitespace,
identifiers, number literals, and string literals.
- possibleDefaultTokens
- A listing of tokens that could posibly be one of the tokens handled by
the default token handling function. An common example of this is
a keyword such as "for", which looks like the beginning of
the identifier "fortunate". tokenSeparatingFunction is
called to determine if the character after the 'r' separates
the identifier, indicating that the token is "for", or if
lexing should be turned over to the defaultTokenFunction.
- tokenHandlers
- A mapping of prefixes to custom token handling function names. The
generated _lexer will search for the even-index elements of this array,
and then call the function whose name is the element immedately after the
even-indexed element. This is used for lexing complex tokens whose prefix
is fixed.
Here are some example constants for a simple calculator _lexer:
// There are a near infinite number of valid number literals, so numbers are
// dynamic tokens.
enum string[] dynamicTokens = ["numberLiteral", "whitespace"];
// The operators are always the same, and cannot start a numberLiteral, so
// they are staticTokens
enum string[] staticTokens = ["-", "+", "*", "/"];
// In this simple example there are no keywords or other tokens that could
// look like dynamic tokens, so this is blank.
enum string[] possibleDefaultTokens = [];
// If any whitespace character or digit is encountered, pass lexing over to
// our custom handler functions. These will be demonstrated in an example
// later on.
enum string[] tokenHandlers = [
"0", "lexNumber",
"1", "lexNumber",
"2", "lexNumber",
"3", "lexNumber",
"4", "lexNumber",
"5", "lexNumber",
"6", "lexNumber",
"7", "lexNumber",
"8", "lexNumber",
"9", "lexNumber",
" ", "lexWhitespace",
"\n", "lexWhitespace",
"\t", "lexWhitespace",
"\r", "lexWhitespace"
];