1

I want to build a little lexer and parser by myself. I want the lexer to produce a vector of tokens that I feed into the parser later. Now I think about what belongs into which stage.

Let's look at this input:

xy = 1.23

My token stream could be one of the following - or a mixture of both:

  1. letter letter whitespace eqsign whitespace digit dot digit digit
  2. identifier eqsign decimal

To further process the input, I need (2) of course. But to what extend will the lexer stage do the job? I could also think of 2 consecutive lexer stages in which Lexer1 will produce (1) from String and Lexer2 will produce (2) from List<Lexer1Token>.

Similary, for <b>test</b> in HTML, the tokens might be

  1. lt string gt string lt slash string gt
  2. opentag[type=b] string closingtag[type=b]

1 Answer 1

1

Obviously it depends if your language (e.g. Your language might need special handling of .``. ) but for most cases you just need version 2, [identifier, equal, decimal] ( I would call it assign).

Let the lexer do as much as possible without getting into the domain of the parser (e.g. decide if the order is valid).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.