std.lexer

This module contains a range-based compile-time _lexer generator.

The _lexer generator consists of a template mixin, Lexer, along with several helper templates for generating such things as token identifiers. To write a _lexer using this API:

Create the string array constants for your language.
Create aliases for the various token and token identifier types specific to your language.
- TokenIdType
- tokenStringRepresentation
- TokenStructure
- TokenId
Create a struct that mixes in the Lexer template mixin and implements the necessary functions.
- Lexer

Examples

A _lexer for D is available here.
A _lexer for Lua is available here.
A _lexer for JSON is available here.

defaultTokenFunction: A function that serves as the default token lexing function. For most languages this will be the identifier lexing function.
tokenSeparatingFunction: A function that is able to determine if an identifier/keyword has come to an end. This function must return bool and take a single size_t argument representing the number of bytes to skip over before looking for a separating character.
staticTokens: A listing of the tokens whose exact value never changes and which cannot possibly be a token handled by the default token lexing function. The most common example of this kind of token is an operator such as "*", or "-" in a programming language.
dynamicTokens: A listing of tokens whose value is variable, such as whitespace, identifiers, number literals, and string literals.
possibleDefaultTokens: A listing of tokens that could posibly be one of the tokens handled by the default token handling function. An common example of this is a keyword such as "for", which looks like the beginning of the identifier "fortunate". tokenSeparatingFunction is called to determine if the character after the 'r' separates the identifier, indicating that the token is "for", or if lexing should be turned over to the defaultTokenFunction.
tokenHandlers: A mapping of prefixes to custom token handling function names. The generated _lexer will search for the even-index elements of this array, and then call the function whose name is the element immedately after the even-indexed element. This is used for lexing complex tokens whose prefix is fixed.

Here are some example constants for a simple calculator _lexer:

// There are a near infinite number of valid number literals, so numbers are
// dynamic tokens.
enum string[] dynamicTokens = ["numberLiteral", "whitespace"];

// The operators are always the same, and cannot start a numberLiteral, so
// they are staticTokens
enum string[] staticTokens = ["-", "+", "*", "/"];

// In this simple example there are no keywords or other tokens that could
// look like dynamic tokens, so this is blank.
enum string[] possibleDefaultTokens = [];

// If any whitespace character or digit is encountered, pass lexing over to
// our custom handler functions. These will be demonstrated in an example
// later on.
enum string[] tokenHandlers = [
   "0", "lexNumber",
   "1", "lexNumber",
   "2", "lexNumber",
   "3", "lexNumber",
   "4", "lexNumber",
   "5", "lexNumber",
   "6", "lexNumber",
   "7", "lexNumber",
   "8", "lexNumber",
   "9", "lexNumber",
   " ", "lexWhitespace",
   "\n", "lexWhitespace",
   "\t", "lexWhitespace",
   "\r", "lexWhitespace"
];

Copyright

Brian Schott 2013

License

License 1.0

Authors

Brian Schott, with ideas shamelessly stolen from Andrei Alexandrescu

Source

Functions

tokenStringRepresentation

Looks up the string representation of the given token type.

Structs

TokenStructure		The token that is returned by the lexer.
LexerRange		Range structure that wraps the _lexer's input.

Templates

TokenIdType		Template for determining the type used for a token type.
TokenId		Generates the token type identifier for the given symbol.
Lexer		The implementation of the _lexer is contained within this mixin template.