0

Wanna get the unicode of chinese or vietnamese's han-nom and japanese characters I've tried these code

text = "𬖰";

br = text.encode("unicode-escape");

print(br);

and got

b'\\U0002c5b0'

But what should I do when I want to have something like U+2C5B0 or U2C5B0 ?

2
  • 1
    FYI: "The Unicode of a character" doesn't really make much sense. Unicode is a whole big standard with many different parts to it. You want the Unicode code point.
    – deceze
    Commented Jul 23 at 7:42
  • Yea ... in "Unicode of a character", "Unicode" could also mean the UTF-8 encoding, UTF-16 encoding and so on. And even "of a character" is ambiguous in many contexts. (Character in what character set?) I recommend you do some background reading on Unicode so that you know and understand the correct terminology ... and then >use< it.
    – Stephen C
    Commented Aug 6 at 2:27

1 Answer 1

2

You can use the ord function to get the character's numeric code point and format it with the 04X specifier in an f-string to display the code point as uppercased hexadecimals that are 0-padded up to 4 characters wide:

print(f'U+{ord(text):04X}')

Demo here

1
  • 1
    print(f'U+{ord(text):04X}') may be a better version, conforming to conventions used to display codepoints.
    – Andj
    Commented Aug 6 at 1:17

Not the answer you're looking for? Browse other questions tagged or ask your own question.