I want to build a little lexer and parser by myself. I want the lexer to produce a vector of tokens that I feed into the parser later. Now I think about what belongs into which stage.
Let's look at this input:
xy = 1.23
My token stream could be one of the following - or a mixture of both:
letter letter whitespace eqsign whitespace digit dot digit digit
identifier eqsign decimal
To further process the input, I need (2) of course. But to what extend will the lexer stage do the job? I could also think of 2 consecutive lexer stages in which Lexer1 will produce (1) from String
and Lexer2 will produce (2) from List<Lexer1Token>
.
Similary, for <b>test</b>
in HTML, the tokens might be
lt string gt string lt slash string gt
opentag[type=b] string closingtag[type=b]