659 questions
0
votes
1
answer
22
views
PHP Match two Hebrew words with nekudots as identical
I need a bit of help in PHP.
I have two Hebrew words which are perfectly the same from the point of view of lexical meaning, but they do not match in bit wise comparison.
1. version: הִפַּלְנוּ
2. ...
0
votes
1
answer
53
views
Handling strings with high Unicode codepoints (above U+FFFF)
In Kotlin, how can I iterate over a string that contains Unicode characters above U+FFFF?
Example code:
val s = "Hëllø! € 😀"
for (c in s) {
println("$c ${c.code}")
}
Actual ...
2
votes
1
answer
85
views
How to convert unicode black pawn emoji to black pawn text character?
I'm making chess in Python 3.12 using purely text for a challenge. The IDE I'm using is Visual Studio 2022. All the other unicode characters, including the white pawn, render as their text character ...
0
votes
1
answer
82
views
Will php mb_strlen($str,‘utf8’) ever return a greater result than JavaScript .length?
I'm working on an Angular 17 reactive form where I send the form data to a PHP API on the server and store it in a database.
I would like the user to be able to input emojis to the form so I have set ...
3
votes
2
answers
120
views
How to process data internally so that it becomes equivalent to what it would be when outputted to terminal
I have this string: "birthday_balloons.\u202egpj"
If I execute print("birthday_balloons.\u202egpj") it outputs
birthday_balloons.jpg
Note how the last three characters are ...
0
votes
1
answer
192
views
Is it safe to convert emails with other characters than a-Z to lower case?
Any modern email service provider treats emails as case insensitive meaning that in my application I should allows users to log in both using [email protected] and [email protected].
In terms of the ...
-1
votes
3
answers
80
views
Convert mixed Unicode Numbers/Values/Strings (The whole Line contains also string) to a String [duplicate]
I have a line "Example: \u09b8\u09be\u09b0\u09cd\u09ac\u09bf\u09df\u09be\u09df \u0993\u09df\u09be\u09b0\u09cd\u0995\u09aa\u09be\u09b0\u09ae\u09bf\u099f \u09ad\u09bf\u09b8\u09be,"
What I want ...
0
votes
1
answer
133
views
Convert raw string (having escape characters) to unicode/utf8 string [duplicate]
In Python 3, how to convert an ASCII raw-string (that includes escape characters) into a proper unicode string?
As an example:
a = "ä" # note the umlaut
b = bytearray(...
1
vote
0
answers
92
views
Do surrogate pairs in JSON really have two interpretations?
The official JSON standard (ECMA-404, 2e, 2017-Dec) states the following about Unicode surrogate pairs:
However, whether a processor of JSON texts interprets such a surrogate pair as a single code ...
0
votes
0
answers
23
views
PHP - How to decode or parse emoji from unicode string? [duplicate]
In my PHP code, I receive user's emoji as this string "1f49c" from frontend app.
It should be as simple as echo "\u{1f49c}";, but my data is only this string: "1f49c"
How ...
1
vote
1
answer
309
views
How to add accents to letters in java
I am trying to combine alphabetical characters with accents in java. For example:
Combining the letter "e" (\u0065) with a combing grave accent (\u0300).
I have attempted numerous ways in ...
1
vote
0
answers
64
views
What's the difference between JavaScript's `String.slice()` method and PHP's `mb_substr` function? [duplicate]
I'm seeking to understand the distinctions between JavaScript's String.slice() method and PHP's mb_substr function. As a front-end developer, I encountered a challenging issue that affected our ...
0
votes
0
answers
37
views
WinForms TextBox doesn't reliably display non-spacing Unicode character that displays properly in a CheckedListBox
Currently, my code generates a list of strings from selected options that can contain the non-spacing Unicode "Combining Double Breve Below" and then displays those strings in a textbox. As ...
0
votes
1
answer
143
views
convert byte array to strings split by NUL character
I am sorry, if this is much of a dumb question. But I can't really figure this out, and I bet it has to be much simpler than I think.
I have a byte[] array which contains several Unicode Strings, each ...
1
vote
1
answer
143
views
Understanding Font header file used in MCU code
There is font file in MCU code and it is used to display the text on LED martrix display.
I want to understand the contains of below code
char widths, font data, first char, char count etc..
How the ...
1
vote
1
answer
164
views
Unicode (Devanagari) characters not showing up correctly in HTML dropdown and text-field tags
I have a HTML page in which I need to show a dropdown containing Devanagari characters.
The problem I am facing is that in the DOM the characters show up in expected manner but in the browser it doesn'...
1
vote
1
answer
45
views
Error compiling a boost qi parser to skip all comments and spaces in a php code string
Following my previous question, and applying the suggestions, I have created this boost qi spirit grammar to get only 'non-comments' from a piece of PHP code in the string contents:
#include <boost/...
0
votes
3
answers
47
views
Yet another Unicode preg_replace() question
I've read many posts that explain how to deal with Unicode characters, but none of the suggestions are working for me.
My php page reads a file that contains strings with high-order characters, e.g., &...
0
votes
1
answer
91
views
Is it possible to modify TStringField class (with new property)
Using Delphi 11.3 and an Oracle DB using UniDac, I have the problem that there are many old apps (written in Cobol) which do not support Unicode, so the data for text fields is stored as Ansi text ...
1
vote
1
answer
102
views
How do I get accented letter using substring()?
I have the following string: ąbc. I'd like to get its first character, ie. ą. When I use use the following code
"ąbc".substring(0,1)
I get the correct result using jshell. However, when ...
5
votes
1
answer
441
views
Foldcase conversion between (German) lower ß (U+00DF) and upper ẞ (U+1E9E)?
According to Wikipedia, in 2017 using an uppercase ẞ (Unicode U+1E9E) was officially adopted--at least as an option--for what may in fact be a subset of fully-capitalized words in German:
In June of ...
2
votes
1
answer
333
views
Imagemagick annotate with Unicode characters
How can I draw Unicode characters in Imagemagick using -annotate ?
My spurce file is UTF-8
This is my code:
"C:\Program Files\ImageMagick-7.1.1-Q8\magick.exe" convert -size 800x200 xc:black -...
1
vote
1
answer
59
views
In python, how do you get the escape sequence for any Unicode character? [duplicate]
In python, what is the escape sequence for the left-double quote Unicode character “ (U+201C)?
left_double_quote = "“"
print(ord(left_double_quote)) # prints `8220`
Note that 201C is ...
0
votes
0
answers
284
views
Unicode glyphs are not combining for indian languages in ImGui
In the ImGui project, words in indian languages are read from a json file where they are displayed properly. And when they are displayed in the window the glyphs are not combining.
Even when checked ...
0
votes
0
answers
16
views
How to Remove the Unicode Signature from while converting a file from CSV to JSON? [duplicate]
So, I tried converting a file from Kaggle which was in CSV to JSON.
Which made a new JSON file, but the first field of each object had the \ufeff Unicode signature.
Below mentioned is the code I used ...
0
votes
0
answers
145
views
Disadvantages of using `std::wstring` for Unicode in cross-platform code?
Situation
I have a large existing Win32 C++ code-base, and I want to make it portable so that it compiles and runs on both Windows (MSCV) and Linux (gcc).
For a new project I would try to go UTF-8 ...
0
votes
0
answers
497
views
How do I know if my Stata file needs unicode translate?
I suspect I do not quite grasp the difference between extended ASCII and unicode and what they mean to Stata, therefore I am not able to determine if I need to use unicode translate on my Stata files ...
1
vote
1
answer
538
views
How to remove u' from ansible list
In my playbook, I am trying to get list of sub-directory names using the find module and then extracting the basename from path. I have been able to get the list but the elements are prepended with u'...
0
votes
1
answer
40
views
Why does Java's regex Pattern/Matcher miscount group positions in strings with unicode
I'm trying to use regular expressions and the strings include unicode characters such as '' and ''. The Pattern/Matcher are finding the expression I'm looking for but the Matcher returns the wrong ...
1
vote
2
answers
339
views
Why trying to print unicode encoded strings with cout leads to compilation error in newer C++ standards?
I tried the following printing of Unicode characters with Visual C++ 2022 Version 17.4.4 with C++ standard set to the latest.
#include <iostream>
using namespace std;
int main()
{
cout <&...
0
votes
1
answer
1k
views
UnicodeEncodeError: 'charmap' codec can't encode character '\u25c4' in position 276445: character maps to <undefined>
In this project, I am trying to utilize the pycaret package to analyze some time series with the help of scikit-learn package. Specifically, I have imported some modules as follows:
from pycaret....
0
votes
0
answers
630
views
Some Unicode Characters are not rendered in Emacs
My emacs has a problem in rendering the Unicode character written by \McU (All \Mc* and \Mi* characters) in Agda input mode. (𝒰, 𝒱, 𝒲, ... etc)
I installed Iosevka and JuliaMono Light fonts,
...
0
votes
0
answers
421
views
Replacing all special characters but the dot in a string also replaces the dot
I am searching to replace some special characters by an underscore in a given string (in my current case, a badly formatted file path) but I cannot make it work.
Here is my current Python 3 code:
...
1
vote
1
answer
629
views
Unicode/Collation Issue in Openrowset SQL Server
My CSV has text like this:
Côté fenêtres,
carré
I'm trying to open this CSV file using openrowset in SQL Server like below:-
select * from openrowset(BULK 'C:\Import_Orders\Files\PO.csv',
...
0
votes
0
answers
62
views
Why is unicode charactes supported in my programs, without any extra libraries
I have to reproduce a client / server communication with Unix signals, I already done this part however there is a bonus where I have to make my program supports Unicode and it supports it but I don't ...
0
votes
0
answers
82
views
mark part of a plain text
I want to somehow mark a part of a plain text to emphasize that without putting extra characters around that.
I figured out that I can use the combining characters and used \u0332, but when I tried to ...
0
votes
3
answers
134
views
Strange character added when decoding with urllib
I'm trying to parse a query string like this:
filename=logo.txt\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x01x&filename=.hidden.txt
Since it mixes bytes and text, ...
-2
votes
1
answer
823
views
How to convert Unicode codepoint to UTF-8 Hex Bytes?
Given a list of all emojis, I need to convert unicode codepoint to UTF-8 hex bytes programmatically.
For example:
Take this emoji: https://unicode-table.com/en/1F606/ and convert 1F606 to F0 9F 98 86
...
0
votes
1
answer
166
views
How can I print these PHP special characters? [duplicate]
I have this piece of code:
<?php
$letters='ñéáóúí';
$letters=$letters[1];
echo $letters;
?>
I want to print "ñ" or any other letter but I only print � instead, any tips?
2
votes
1
answer
628
views
Regex \P{IsHan} does not work well in Java 8
I want to remove all non-Chinese characters in a String, and retain Chinese Characters.
Here is an example:
input -> 勇𣌀hi你好👋()【】「」{}[]()
output -> 勇𣌀你好
First of all, I try to extract all ...
0
votes
2
answers
225
views
How would I print a multi-line (non-standard) unicode string of text in C++? Updated for Clarity! (hopefully)
Rewriting this question with a bit more knowledge on what I'm requesting; (Thank you James Risner and Turtle for your assistance, but I didn't word this correctly and got different responses than what ...
0
votes
1
answer
459
views
Using symfony UnicodeString on laravel
I want to use unicodeString from symfoni in laravel controller
protected function validateWaWebhook($known_token, Request $request)
{
if (($signature = $request->headers->get('X-Hub-...
0
votes
2
answers
234
views
C++ unicode strings - the basic_strings know nothing about Unicode?
I see here that the C++ standard library now has typedefs of std::basic_string like u8string and u16string, but I don't see any member functions or algorithms that know much of anything about Unicode.
...
0
votes
1
answer
903
views
how can I replace characters which give me UnicodeDecoderErrors?
I am extracting the content from different file types into a csv file.
I am currently trying to extract from the file type 'm'.
That's my extraction function:
def extract_m(f): # f is the file
...
1
vote
1
answer
324
views
Sanitise unicode pair for filename in javascript?
My web-extension fails to initiate file download for filenames having a pair of emojis with invalid filename error, this seems to be some unicode surrogate pair issue when multiple emojis are used. ...
1
vote
2
answers
501
views
How to convert the file path input spaces and backslash to unicode for using on a PowerShell command using "subprocess" module?
I have to use:
x=input()
subprocess.Popen(f'PowerShell -Executionpolicy byPass {x}\n')
To open an executable, but it does not allow me to use a path from the input of an variable that contains ...
1
vote
1
answer
257
views
How to use Perl pack to convert UTF-16 surrogate pairs to UTF-8?
I have input strings which contain text in which some characters are in UTF-16 format and escaped with '\u'. I am trying to, in Perl, convert all the strings to UTF-8. For example, the string 'Alice &...
1
vote
1
answer
405
views
Filestream Bytes not written properly
Why do I get the wrong output using the following code in RAD Studio 10.1?
var
sPalette : string;
mystream: TfileStream;
begin
mystream := TfileStream.Create('C:\Data\test.bmp', fmCreate);
...
1
vote
0
answers
411
views
Arabic: Convert notmal Arabic text to Presentation Forms-B
Background
From this answer:
Arabic is a script with cursive joining; the shape of the letters changes depending on whether they occur initially, medially, or finally within a word. Sometimes you may ...
0
votes
1
answer
483
views
Is there a way to covert Unicode Hex Character Code a to a simple "a" so that I can get an alphabetic string?
This character format encoding description can be found in All such encoded characters and character description. The exact specific string I want to convert is-
-9<ahref="javascript:&...