I want to get the UTF-16 code unit at a given index in ABAP.
Same can be done in JavaScript with charCodeAt()
.
For example "d".charCodeAt();
will give back 100.
Is there a similar functionality in ABAP?
This can be done with class CL_ABAP_CONV_OUT_CE
DATA(lo_converter) = cl_abap_conv_out_ce=>create( encoding = '4103' ). "Litte Endian
TRY.
CALL METHOD lo_converter->convert
EXPORTING
data = 'a'
n = 1
IMPORTING
buffer = DATA(lv_buffer). "lv_buffer will 0061
CATCH ...
ENDTRY.
Codepage 4102 is for UTF-16 Big endian.
It is possible to encode not just a single character, but a string as well:
EXPORTING
data = 'abc'
n = 3
"n" always stands for the length of the string you want to be encoded. It could be less, than the actual length of the string.
4103
is little endian and 4102
is big endian.
Commented
Mar 23, 2021 at 18:13
n
is optional and the method deduces it automatically from the length of data
.
Commented
Mar 26, 2021 at 9:10
When you say you "want to get the UTF-16 code unit",
d
is always U+0064
(official "name" of Unicode character, the two bytes 0x0064
being the hexadecimal representation of decimal 100
),d
to UTF-16 little endian (SAP code page 4103) or big endian (SAP code page 4102) which gives respectively 2 bytes 0x4400
or 2 bytes 0x0044
.For the second case, see József answer.
For the first case, you may get it using the method UCCP
(UniCode Code Point) or UCCPI
(UniCode Code Point Integer) of class CL_ABAP_CONV_OUT_CE
:
DATA: l_unicode_point_hex TYPE x LENGTH 2,
l_unicode_point_int TYPE i.
l_unicode_point_hex = cl_abap_conv_out_ce=>UCCP( 'd' ).
ASSERT l_unicode_point_hex = '0064'.
l_unicode_point_int = cl_abap_conv_out_ce=>UCCPI( 'd' ).
ASSERT l_unicode_point_int = 100.
EDIT: Note that the two methods return always the same values whatever the SAP system code page is (4102
, 4103
or whatever).
U+0064
(official "name" of Unicode character, 0x0064 being the hex representation of 100), while UTF-16 little endian (SAP code page 4103) and big endian (SAP code page 4102) encode "d" differently, respectively 2 bytes 0x4400 and 2 bytes 0x0044.