851 questions
0
votes
1
answer
37
views
How do I use the "built-in" yyerror() function from flex/bison?
I created my own. It works fine. However, I want to use the default yyerror() function provided by flex/bison. I am using GNU C on Windows 11.
0
votes
0
answers
29
views
How to match keywords as standalone words in Flex without capturing substrings?
I'm writing a flex lexer to recognize specific keywords only when they appear as standalone words (not as part of larger words). For example, if I'm looking for the keyword "in", it should ...
1
vote
1
answer
56
views
How to pass a file pointer by reference for writing in a flex lexer?
I need help with a flex program. The program must analyze a string entered by the user and define if they are constants, identifiers, numerals or if there are no errors, these results need to be ...
0
votes
0
answers
24
views
From here I was able to fetch only the text with face. The styles we have here are not coming
I am using it with Nextjs tsx..
Github Link
{
\_id: ObjectId('66d88efc48f48a786e4a683b'),
title: 'Name of Organization: Dhaka Ice Cream Industries Limited. (Polar Ice Cream)',
slug: 'Position: ...
0
votes
0
answers
36
views
Azure AI Search - singular vs plural search terms
I have a search index in Azure AI Search using the newest version of the API (2024-07-01).
I have added searchable text fields with analyzer and synonymMaps. I am using the "en.microsoft" ...
0
votes
0
answers
13
views
How to produce output in C4B in SableCC grammar file
I translated all the tokens in Russian.
Before that, I have set the path in sablecc cmd to java path jdk-20. The program that I want to perform lexical analysis are also in Russian.
But, when I want ...
0
votes
1
answer
84
views
identify a comment between /* and */ that may contain */ inside of it only if it's inside ""
I am working on a lexical analyser but i am struggling on how to identify the comments as below:
A comment is a sequence of characters (even space) between /* and */; it may not contain the sub-...
0
votes
1
answer
60
views
Error regarding my Python ANTLR parser code
I am trying to parse the contents of my test case files to output files to where the token type and the production rules are displayed.
My test case is this:
[* operator test *] = != > < <= =&...
0
votes
0
answers
66
views
Syntactic or Lexical Error in writing an interpret
For a school project I have to write an interpret in C for programming languge similar to Swift. I'm encountering an issue in my code, and I'm not sure if it's a syntactic or lexical error. In the ...
0
votes
1
answer
72
views
Parse an expression with non whitespace seperated operators
I am currently working on a compiler. Recently, I stumbled upon an issue concerning the parsing of operators in an expression. Obviously I have not found this to be an issue in other languages, which ...
0
votes
0
answers
25
views
C Lexical Analyzer Bug: Pointer Value Reset [duplicate]
Description of the Problem
I am working on implementing a lexical analyzer in C, and I have encountered an issue with a pointer variable (pos) that seems to be behaving unexpectedly. The problem ...
1
vote
1
answer
299
views
Creating a static unordered_set from keys of a static unordered_map
I'm writing a front-end for a compiler and currently working on implementing punctuator scanning functionality. I've got a Punctuator class that I'd like to use to represent punctuators from the input ...
0
votes
1
answer
48
views
ANTLR 4 lexer rule to skip combination of backslash and newline?
WS : [ \t]+ -> skip ; // skip spaces, tabs
Works well to ignore white space by preventing those characters from reaching the parser. I want to do the same thing with character pair of '/' and ...
0
votes
2
answers
206
views
Regex to remove Key/Value in a JSON Object
I have a JSON like below:
{"queueNumber": "123","field":"name",UserId":[12,12,34],"cur":[{"objectName":"test","...
2
votes
1
answer
307
views
How does tokenization relates to formalism, lexical grammar, and regular language?
I am reading Bob Nystrom Crafting Compiler's and in the chapter 5 it says this
In the last chapter, the formalism we used for defining the lexical grammar—
the rules for how characters get grouped ...
1
vote
0
answers
25
views
Issue with identifying invalid lexemes in C initialization statement
I have been working on a code snippet to identify invalid lexemes in a C initialization statement. The code is intended to check for invalid data types, invalid identifiers, and invalid constant ...
0
votes
1
answer
85
views
How do I search and print for invalid lexemes and assign to it an appropriate error message
I'm trying to create a kind-of lexical analyzer in which it will recognize either an int or float input then split it into lexemes and tokens.
I'm using regex to find matching patterns and detect if ...
0
votes
1
answer
31
views
Typing /0 (not \0) into a lex program causes unexpected behavior
I made a simple lex program for school, and for some reason it has an unexpected behavior while reading "/0".
The program is as following:
%%
[a-z] printf("char %c", yytext[0]-32);
...
1
vote
3
answers
351
views
Is there relatively simple way to find all exported names from JavaScript text
So let's say we have some JavaScript es module as text
const ESMText = "export const answer = 42;"
I checked ECMAScript documents, and logic for export is quite complicated, is there kinda ...
0
votes
1
answer
29
views
lex file not executing
I was tring to doing my assignment. That is "Write a program using YACC specifications to implement syntax analysis phase of compiler to validate type and syntax of variable declaration in C ...
1
vote
1
answer
85
views
Regex in java for semicolon and comma as delimiter but also considered in the string
I have written a lexical analyzer program in java. I am splitting the input string with a whitespace or a semicolon, but what I really wish to do is consider a semicolon or a comma as a separate token ...
2
votes
1
answer
72
views
how can the syntactic analyzer ignore white space in the input
In the code down below, although I added \t as a token, with a priority higher than the digits, still when I test it, it accepts -2- 2 (with a single whitespace after the -) and -2 - 2 (with 2 ...
0
votes
1
answer
135
views
Why does my lexical analyser behave as if there are no line-endings? [closed]
I have coded a Java lexical analyzer down below
Token.java looks like this
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public enum Token {
TK_MINUS ("-"),
...
0
votes
0
answers
88
views
I need the solve for this problem. Can not understand what to do
Write a C/C++ program using the following instructions (filename: ID.c/ ID.cpp):
a. Read a file named “program.cpp” (given below).
b. After reading file you have to identify and count the unique ...
0
votes
1
answer
347
views
(Building a lexer in Racket) How to identify & tokenize a line number that comes after a "gosub" statement
I am new to racket and am building a Lexer using the parser-tools/lex module and want to be able to tokenize a number that comes after a 'gosub' statement as a line number token. I am having trouble ...
0
votes
1
answer
121
views
How could I read through a text file in Python and recognize certain tokens/words?
I tried to build a program that would open and read through each line of a .txt file, then recognize and label its components.
I was able to get the program to open the text file, and recognize some ...
-2
votes
1
answer
202
views
Lexical Analyzer in Java. Operators Shouldn't be Tokenized as individuals like '++' or '>=' and any unlisted tokens shouldnt print out anything
I am using a Lexical analyzer to tokenize some operators, conditions, and syntaxes. My approach is checking each and every character and when it finds a space between characters, it tokenizes the ...
0
votes
1
answer
148
views
Remove Blank Lines While Printing An Output
I have this Java lexical analyzer algorithm that prints out each assigned token to every symbol. The output should print out on each and every individual line without any space in between. But mine ...
0
votes
0
answers
267
views
yacc program to accept the string starts with 01 and ends with 10
%{
int yylex(void);
#include<stdio.h>
#include <stdlib.h>
int yyerror(char* s);
%}
%token ZERO ONE NL
%%
st : ZERO s ZERO NL { printf("Sequence Accepted\n");}
;
s : ONE r
;
...
0
votes
1
answer
114
views
Dutch (or German) compound words in search functions (in PHP)
I have been having an issue with building a search function for a while now that I'm building for a cooking blog.
In Dutch (similar to German), one can add as many compound words together to create a ...
0
votes
1
answer
103
views
Errors in definitions in Flex and Lex
I am writing a lexical analyser for a toy programming language with toy keywords. I wish to print "keyword" for every keyword the analyser bumps into. To make my code cleaner, I defined the ...
0
votes
1
answer
105
views
Accessing States of Flex during lexical analysis
So, I have created a .l file for my lexical analysis. I have defined one pattern as follows
<str,chars,comment,text_block,text_block_chars><<EOF>> {printf("%d", ...
1
vote
1
answer
326
views
Binary and unary minus operator in Lexical Analyzer
So, I am doing a lexical analysis of a TOY programming language using flex. I am currently stuck at the following point.
Minus sign: As we know the minus sign can have two meanings by defining them as ...
0
votes
0
answers
129
views
Python - Parsing a Lexical Grammar without a Parser
I've got a bit of a chicken-egg problem for you. I'm wondering what the "correct" way — the "academically substantiated" way of solving it is.
I'm building a lexer generator to ...
0
votes
0
answers
32
views
Implementation of lexical analysis program in TEST language
This is the program I have:
#include <stdio.h>
#include <string.h>
int main()
{
char Scanin[300],Scanout[300];
extern FILE *fin,*fout;
printf("Please enter the source ...
0
votes
1
answer
179
views
Flex/Lex: Input string that does not contain duplicate letters [A-Z]
Having read similar questions around the topic such as Flex/Lex: Regular Expression matches double characters and Is there a difference between [ABCDEFGH] and [A-H] on flex? I can't figure out a way ...
1
vote
1
answer
370
views
pyparsing infix notation with non-grouping parentheses
I'm working on a billing application where users can enter arbitrary mathematical expressions. For example, a line item might be defined as (a + b) * c.
pyparsing handles the order of operations well ...
1
vote
0
answers
61
views
lexical analyser recognise numbers
i'm trying to make a small lexical analyser that recognizes numbers as it is in the following regular expression:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include ...
0
votes
1
answer
188
views
Lexical analyzer unable to recognize reserved words
I'm trying to develop a lexical analyzer which should tokenize and identify operators, identifiers, constants, reserved words, and data types in a code from an external text file, but the problem is I ...
0
votes
2
answers
61
views
How to run Python file in another python file?
I'm trying to run another python file within this lexical analyzer, but it's not working. Can anyone help me?
I'm using this code:
https://github.com/huzaifamaw/Lexical_Analyzer-Parser_Implemented-in-...
-2
votes
2
answers
151
views
(python - cpp) - How to split the c++ codes while writing a lexical analyzer in python?
I wrote a lexical analyzer for cpp codes in python, but the problem is when I use input.split(" ") it won't recognize codes like x=2 or function() as three different tokens unless I add an ...
0
votes
1
answer
453
views
Azure Cognitive Search - When would you use different search and index analyzers?
I'm trying to understand what is the purpose of configuring a different analyzer for searching and indexing in Azure Search. See: https://learn.microsoft.com/en-us/rest/api/searchservice/create-index#-...
1
vote
1
answer
80
views
Seeking the right token filters for my requirements and getting desperate
I'm indexing documents which contain normal text, programming code and other non-linguistic fragments. For reasons which aren't particularly relevant I am trying to tokenise the content into ...
0
votes
0
answers
183
views
Lexical Analyzer not printing all tokens and lexemes
It seems when there is an operator or parenthesis it won't print what comes last. For example when taking the "(47+12);" It will output:
Next token is: LEFT_PAREN Next lexeme is (
Next ...
0
votes
1
answer
45
views
Terminal is inputing random stuff
I'm making a lexical analyzer in C that should work the following way:
Input : 2 + 3
Output:
Token text (Operand): 2
Token text: +
Token lexical Category: ADD
Token text (...
-1
votes
1
answer
97
views
Avoiding overlap with similar regex patterns during tokenization
Background
I've made a couple simple compilers before, but I've never properly addressed this issue:
Say I have a token LT which searches the expression < and a token LTEQ which searches <=.
A ...
0
votes
2
answers
44
views
How can I make the code recognize several digits string?
I have a code of a lexical analyzer but what I need is that it recognize a full string, because it only recognize one digit. This means: if the needed digit is 111 it will process it as 1, 1, 1, doing ...
-1
votes
2
answers
125
views
How can I scan a file with no delimiter between tokens in Java?
I have text input which looks like this:
!10#6#4!3#4
I have two patterns for the two types of data found in the input above:
Pattern totalPattern = Pattern.compile("![0-9]+");
Pattern ...
0
votes
0
answers
57
views
Optimized detection of which regex in a list matches an input
Suppose I have a list of regexes, like so:
File ".*" not found
Out of memory
You do not have permission to perform this action
Network connection to [^ ]* failed
Unrecognized command: .*
......
1
vote
3
answers
896
views
lexical Analyzer to C
This is my code here, I need it to be output as lexical analysis. I should what the output should be on the bottom. I don't understand why my code is giving me this error.
/* front.c - a lexical ...