Skip to main content
Filter by
Sorted by
Tagged with
0 votes
1 answer
22 views

PHP Match two Hebrew words with nekudots as identical

I need a bit of help in PHP. I have two Hebrew words which are perfectly the same from the point of view of lexical meaning, but they do not match in bit wise comparison. 1. version: הִפַּלְנוּ 2. ...
xerostomus's user avatar
0 votes
1 answer
53 views

Handling strings with high Unicode codepoints (above U+FFFF)

In Kotlin, how can I iterate over a string that contains Unicode characters above U+FFFF? Example code: val s = "Hëllø! € 😀" for (c in s) { println("$c ${c.code}") } Actual ...
Peter Kleiweg's user avatar
2 votes
1 answer
85 views

How to convert unicode black pawn emoji to black pawn text character?

I'm making chess in Python 3.12 using purely text for a challenge. The IDE I'm using is Visual Studio 2022. All the other unicode characters, including the white pawn, render as their text character ...
Nugget Gacha Guy's user avatar
0 votes
1 answer
82 views

Will php mb_strlen($str,‘utf8’) ever return a greater result than JavaScript .length?

I'm working on an Angular 17 reactive form where I send the form data to a PHP API on the server and store it in a database. I would like the user to be able to input emojis to the form so I have set ...
Sarah's user avatar
  • 1,993
3 votes
2 answers
120 views

How to process data internally so that it becomes equivalent to what it would be when outputted to terminal

I have this string: "birthday_balloons.\u202egpj" If I execute print("birthday_balloons.\u202egpj") it outputs birthday_balloons.jpg Note how the last three characters are ...
Ethan's user avatar
  • 41
0 votes
1 answer
192 views

Is it safe to convert emails with other characters than a-Z to lower case?

Any modern email service provider treats emails as case insensitive meaning that in my application I should allows users to log in both using [email protected] and [email protected]. In terms of the ...
Oscar R's user avatar
  • 558
-1 votes
3 answers
80 views

Convert mixed Unicode Numbers/Values/Strings (The whole Line contains also string) to a String [duplicate]

I have a line "Example: \u09b8\u09be\u09b0\u09cd\u09ac\u09bf\u09df\u09be\u09df \u0993\u09df\u09be\u09b0\u09cd\u0995\u09aa\u09be\u09b0\u09ae\u09bf\u099f \u09ad\u09bf\u09b8\u09be," What I want ...
Noor Hossain's user avatar
  • 1,819
0 votes
1 answer
133 views

Convert raw string (having escape characters) to unicode/utf8 string [duplicate]

In Python 3, how to convert an ASCII raw-string (that includes escape characters) into a proper unicode string? As an example: a = "ä" # note the umlaut b = bytearray(...
IronPillow2's user avatar
1 vote
0 answers
92 views

Do surrogate pairs in JSON really have two interpretations?

The official JSON standard (ECMA-404, 2e, 2017-Dec) states the following about Unicode surrogate pairs: However, whether a processor of JSON texts interprets such a surrogate pair as a single code ...
Lover of Structure's user avatar
0 votes
0 answers
23 views

PHP - How to decode or parse emoji from unicode string? [duplicate]

In my PHP code, I receive user's emoji as this string "1f49c" from frontend app. It should be as simple as echo "\u{1f49c}";, but my data is only this string: "1f49c" How ...
Taufik Nur Rahmanda's user avatar
1 vote
1 answer
309 views

How to add accents to letters in java

I am trying to combine alphabetical characters with accents in java. For example: Combining the letter "e" (\u0065) with a combing grave accent (\u0300). I have attempted numerous ways in ...
Michael_Swartz's user avatar
1 vote
0 answers
64 views

What's the difference between JavaScript's `String.slice()` method and PHP's `mb_substr` function? [duplicate]

I'm seeking to understand the distinctions between JavaScript's String.slice() method and PHP's mb_substr function. As a front-end developer, I encountered a challenging issue that affected our ...
Joseph's user avatar
  • 4,655
0 votes
0 answers
37 views

WinForms TextBox doesn't reliably display non-spacing Unicode character that displays properly in a CheckedListBox

Currently, my code generates a list of strings from selected options that can contain the non-spacing Unicode "Combining Double Breve Below" and then displays those strings in a textbox. As ...
null's user avatar
  • 1
0 votes
1 answer
143 views

convert byte array to strings split by NUL character

I am sorry, if this is much of a dumb question. But I can't really figure this out, and I bet it has to be much simpler than I think. I have a byte[] array which contains several Unicode Strings, each ...
totalZero's user avatar
  • 335
1 vote
1 answer
143 views

Understanding Font header file used in MCU code

There is font file in MCU code and it is used to display the text on LED martrix display. I want to understand the contains of below code char widths, font data, first char, char count etc.. How the ...
N N's user avatar
  • 23
1 vote
1 answer
164 views

Unicode (Devanagari) characters not showing up correctly in HTML dropdown and text-field tags

I have a HTML page in which I need to show a dropdown containing Devanagari characters. The problem I am facing is that in the DOM the characters show up in expected manner but in the browser it doesn'...
Jignesh Gohel's user avatar
1 vote
1 answer
45 views

Error compiling a boost qi parser to skip all comments and spaces in a php code string

Following my previous question, and applying the suggestions, I have created this boost qi spirit grammar to get only 'non-comments' from a piece of PHP code in the string contents: #include <boost/...
Santilín's user avatar
0 votes
3 answers
47 views

Yet another Unicode preg_replace() question

I've read many posts that explain how to deal with Unicode characters, but none of the suggestions are working for me. My php page reads a file that contains strings with high-order characters, e.g., &...
Steve A's user avatar
  • 1,981
0 votes
1 answer
91 views

Is it possible to modify TStringField class (with new property)

Using Delphi 11.3 and an Oracle DB using UniDac, I have the problem that there are many old apps (written in Cobol) which do not support Unicode, so the data for text fields is stored as Ansi text ...
user1304759's user avatar
1 vote
1 answer
102 views

How do I get accented letter using substring()?

I have the following string: ąbc. I'd like to get its first character, ie. ą. When I use use the following code "ąbc".substring(0,1) I get the correct result using jshell. However, when ...
menteith's user avatar
  • 668
5 votes
1 answer
441 views

Foldcase conversion between (German) lower ß (U+00DF) and upper ẞ (U+1E9E)?

According to Wikipedia, in 2017 using an uppercase ẞ (Unicode U+1E9E) was officially adopted--at least as an option--for what may in fact be a subset of fully-capitalized words in German: In June of ...
jubilatious1's user avatar
  • 2,289
2 votes
1 answer
333 views

Imagemagick annotate with Unicode characters

How can I draw Unicode characters in Imagemagick using -annotate ? My spurce file is UTF-8 This is my code: "C:\Program Files\ImageMagick-7.1.1-Q8\magick.exe" convert -size 800x200 xc:black -...
Joe Jobs's user avatar
  • 241
1 vote
1 answer
59 views

In python, how do you get the escape sequence for any Unicode character? [duplicate]

In python, what is the escape sequence for the left-double quote Unicode character “ (U+201C)? left_double_quote = "“" print(ord(left_double_quote)) # prints `8220` Note that 201C is ...
Toothpick Anemone's user avatar
0 votes
0 answers
284 views

Unicode glyphs are not combining for indian languages in ImGui

In the ImGui project, words in indian languages are read from a json file where they are displayed properly. And when they are displayed in the window the glyphs are not combining. Even when checked ...
Gagan karanth's user avatar
0 votes
0 answers
16 views

How to Remove the Unicode Signature from while converting a file from CSV to JSON? [duplicate]

So, I tried converting a file from Kaggle which was in CSV to JSON. Which made a new JSON file, but the first field of each object had the \ufeff Unicode signature. Below mentioned is the code I used ...
sahil parvani's user avatar
0 votes
0 answers
145 views

Disadvantages of using `std::wstring` for Unicode in cross-platform code?

Situation I have a large existing Win32 C++ code-base, and I want to make it portable so that it compiles and runs on both Windows (MSCV) and Linux (gcc). For a new project I would try to go UTF-8 ...
smls's user avatar
  • 5,791
0 votes
0 answers
497 views

How do I know if my Stata file needs unicode translate?

I suspect I do not quite grasp the difference between extended ASCII and unicode and what they mean to Stata, therefore I am not able to determine if I need to use unicode translate on my Stata files ...
Tuulip's user avatar
  • 1
1 vote
1 answer
538 views

How to remove u' from ansible list

In my playbook, I am trying to get list of sub-directory names using the find module and then extracting the basename from path. I have been able to get the list but the elements are prepended with u'...
adbdkb's user avatar
  • 2,151
0 votes
1 answer
40 views

Why does Java's regex Pattern/Matcher miscount group positions in strings with unicode

I'm trying to use regular expressions and the strings include unicode characters such as '' and ''. The Pattern/Matcher are finding the expression I'm looking for but the Matcher returns the wrong ...
Jeffrey Wiegley's user avatar
1 vote
2 answers
339 views

Why trying to print unicode encoded strings with cout leads to compilation error in newer C++ standards?

I tried the following printing of Unicode characters with Visual C++ 2022 Version 17.4.4 with C++ standard set to the latest. #include <iostream> using namespace std; int main() { cout <&...
bobeff's user avatar
  • 3,729
0 votes
1 answer
1k views

UnicodeEncodeError: 'charmap' codec can't encode character '\u25c4' in position 276445: character maps to <undefined>

In this project, I am trying to utilize the pycaret package to analyze some time series with the help of scikit-learn package. Specifically, I have imported some modules as follows: from pycaret....
Rebel's user avatar
  • 515
0 votes
0 answers
630 views

Some Unicode Characters are not rendered in Emacs

My emacs has a problem in rendering the Unicode character written by \McU (All \Mc* and \Mi* characters) in Agda input mode. (𝒰, 𝒱, 𝒲, ... etc) I installed Iosevka and JuliaMono Light fonts, ...
Learning Student's user avatar
0 votes
0 answers
421 views

Replacing all special characters but the dot in a string also replaces the dot

I am searching to replace some special characters by an underscore in a given string (in my current case, a badly formatted file path) but I cannot make it work. Here is my current Python 3 code: ...
swiss_knight's user avatar
  • 7,639
1 vote
1 answer
629 views

Unicode/Collation Issue in Openrowset SQL Server

My CSV has text like this: Côté fenêtres, carré I'm trying to open this CSV file using openrowset in SQL Server like below:- select * from openrowset(BULK 'C:\Import_Orders\Files\PO.csv', ...
nibblebytes07's user avatar
0 votes
0 answers
62 views

Why is unicode charactes supported in my programs, without any extra libraries

I have to reproduce a client / server communication with Unix signals, I already done this part however there is a bonus where I have to make my program supports Unicode and it supports it but I don't ...
user avatar
0 votes
0 answers
82 views

mark part of a plain text

I want to somehow mark a part of a plain text to emphasize that without putting extra characters around that. I figured out that I can use the combining characters and used \u0332, but when I tried to ...
Peter SH's user avatar
0 votes
3 answers
134 views

Strange character added when decoding with urllib

I'm trying to parse a query string like this: filename=logo.txt\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x01x&filename=.hidden.txt Since it mixes bytes and text, ...
Iaotle's user avatar
  • 86
-2 votes
1 answer
823 views

How to convert Unicode codepoint to UTF-8 Hex Bytes?

Given a list of all emojis, I need to convert unicode codepoint to UTF-8 hex bytes programmatically. For example: Take this emoji: https://unicode-table.com/en/1F606/ and convert 1F606 to F0 9F 98 86 ...
Connie Shi's user avatar
0 votes
1 answer
166 views

How can I print these PHP special characters? [duplicate]

I have this piece of code: <?php $letters='ñéáóúí'; $letters=$letters[1]; echo $letters; ?> I want to print "ñ" or any other letter but I only print � instead, any tips?
Kns's user avatar
  • 25
2 votes
1 answer
628 views

Regex \P{IsHan} does not work well in Java 8

I want to remove all non-Chinese characters in a String, and retain Chinese Characters. Here is an example: input -> 勇𣌀hi你好👋()【】「」{}[]() output -> 勇𣌀你好 First of all, I try to extract all ...
william's user avatar
  • 101
0 votes
2 answers
225 views

How would I print a multi-line (non-standard) unicode string of text in C++? Updated for Clarity! (hopefully)

Rewriting this question with a bit more knowledge on what I'm requesting; (Thank you James Risner and Turtle for your assistance, but I didn't word this correctly and got different responses than what ...
Carter Marshall's user avatar
0 votes
1 answer
459 views

Using symfony UnicodeString on laravel

I want to use unicodeString from symfoni in laravel controller protected function validateWaWebhook($known_token, Request $request) { if (($signature = $request->headers->get('X-Hub-...
Blue Moon's user avatar
0 votes
2 answers
234 views

C++ unicode strings - the basic_strings know nothing about Unicode?

I see here that the C++ standard library now has typedefs of std::basic_string like u8string and u16string, but I don't see any member functions or algorithms that know much of anything about Unicode. ...
Rob N's user avatar
  • 16.3k
0 votes
1 answer
903 views

how can I replace characters which give me UnicodeDecoderErrors?

I am extracting the content from different file types into a csv file. I am currently trying to extract from the file type 'm'. That's my extraction function: def extract_m(f): # f is the file ...
law trafalgar's user avatar
1 vote
1 answer
324 views

Sanitise unicode pair for filename in javascript?

My web-extension fails to initiate file download for filenames having a pair of emojis with invalid filename error, this seems to be some unicode surrogate pair issue when multiple emojis are used. ...
WannabeCoder's user avatar
1 vote
2 answers
501 views

How to convert the file path input spaces and backslash to unicode for using on a PowerShell command using "subprocess" module?

I have to use: x=input() subprocess.Popen(f'PowerShell -Executionpolicy byPass {x}\n') To open an executable, but it does not allow me to use a path from the input of an variable that contains ...
Berdy Alexei Cadaeib Fecei's user avatar
1 vote
1 answer
257 views

How to use Perl pack to convert UTF-16 surrogate pairs to UTF-8?

I have input strings which contain text in which some characters are in UTF-16 format and escaped with '\u'. I am trying to, in Perl, convert all the strings to UTF-8. For example, the string 'Alice &...
WingedKnight's user avatar
1 vote
1 answer
405 views

Filestream Bytes not written properly

Why do I get the wrong output using the following code in RAD Studio 10.1? var sPalette : string; mystream: TfileStream; begin mystream := TfileStream.Create('C:\Data\test.bmp', fmCreate); ...
Ankush's user avatar
  • 17
1 vote
0 answers
411 views

Arabic: Convert notmal Arabic text to Presentation Forms-B

Background From this answer: Arabic is a script with cursive joining; the shape of the letters changes depending on whether they occur initially, medially, or finally within a word. Sometimes you may ...
JudahA's user avatar
  • 91
0 votes
1 answer
483 views

Is there a way to covert Unicode Hex Character Code &#x61; to a simple "a" so that I can get an alphabetic string?

This character format encoding description can be found in All such encoded characters and character description. The exact specific string I want to convert is- -9<ahref="j&#x61vascript:&...
Shubhangi_sk's user avatar

1
2 3 4 5
14