Granularity of tokens for lexer

Question

I want to build a little lexer and parser by myself. I want the lexer to produce a vector of tokens that I feed into the parser later. Now I think about what belongs into which stage.

Let's look at this input:

xy = 1.23

My token stream could be one of the following - or a mixture of both:

letter letter whitespace eqsign whitespace digit dot digit digit
identifier eqsign decimal

To further process the input, I need (2) of course. But to what extend will the lexer stage do the job? I could also think of 2 consecutive lexer stages in which Lexer1 will produce (1) from String and Lexer2 will produce (2) from List<Lexer1Token>.

Similary, for <b>test</b> in HTML, the tokens might be

lt string gt string lt slash string gt
opentag[type=b] string closingtag[type=b]

OscarRyz · Accepted Answer · 2024-11-21 18:30:13Z

1

Obviously it depends if your language (e.g. Your language might need special handling of .``. ) but for most cases you just need version 2, [identifier, equal, decimal] ( I would call it assign).

Let the lexer do as much as possible without getting into the domain of the parser (e.g. decide if the order is valid).

answered Nov 21 at 18:30

OscarRyz

199k118 gold badges396 silver badges573 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Granularity of tokens for lexer

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
parsing
tokenize
lexer
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged parsingtokenizelexer or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
parsing
tokenize
lexer
or ask your own question.