Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
1 vote
1 answer
74 views

How to decode bytes to characters in a POSIX-compliant way?

I'm attempting to write a strictly POSIX-compliant shell, but the standard doesn't make it clear how to go from bytes to characters. It says to use LC_CTYPE, which further links to the concept of a ...
T0mstone's user avatar
0 votes
1 answer
127 views

How to preserve non-ASCII characters?

We have default POSIX locale in our server but when non-ASCII character like רקטות לגוש דן וירושלים(hebrew) uploaded in server its getting changes to רק××ת ×××ש ×× ××ר×ש×××, How can preserve it ...
Amrita's user avatar
  • 1
1 vote
0 answers
96 views

is there any way query LC_CTYPE locale character class expressions?

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05 9.3.5 RE Bracket Expression section 6 quote: The following character class expressions shall be supported in all ...
jian's user avatar
  • 587
2 votes
1 answer
116 views

Debian tcsh not respecting locale for character class expansion

Using Debian 11, tcsh version tcsh 6.21.00 (Astron) 2019-05-08 (x86_64-unknown-linux) options wide,nls,dl,al,kan,sm,rh,nd,color,filec In a directory with two files, a and A, $ echo [a-z] a A This ...
Jon's user avatar
  • 51
0 votes
2 answers
456 views

How to detect that POSIX locale is not provided on POSIX shellscript and POSIX utilities, portablily?

So far I have found that Termux is the only POSIX environment without POSIX locale; as a result the following command, for example: awk 'BEGIN{for(i=1;i<256;i++)printf"%c",i;}' outputs ...
user avatar
3 votes
2 answers
11k views

What does "in the POSIX locale" mean?

In this question there is a comment which says: All of this from not understanding what "in the POSIX locale" means. (-: You should really try matching Greek lowercase letters with (say) sed and [[...
user avatar
-3 votes
2 answers
362 views

Proposed additional POSIX "Character Classes" [closed]

There are a handful of "character classes" defined in POSIX as in LC_CTYPE locale definition with the following (12) names: alnum alpha blank cntrl digit graph lower print punct space upper xdigit ...
user avatar
34 votes
2 answers
9k views

Which is the current decimal separator?

Say I have a POSIX shell script that needs to run on different systems/environments that I do not control, and needs to remove the decimal separator from a string that is emitted by a program that ...
gboffi's user avatar
  • 1,264
3 votes
0 answers
47 views

Should I set the locale to `C` when matching a range of numbers?

If I wanted to search for lines in a file that contain a or b or c or d I would run LC_COLLATE=C grep -E '[a-d]' file_to_search or LC_ALL=C grep -E '[a-d]' file_to_search If I fail to set the ...
Harold Fischer's user avatar
1 vote
1 answer
791 views

Wildcards/Globbing: Are character ranges problematic?

In The Linux Command Line William Shotts claims that character ranges can be problematic. See the relevant excerpt below, emphasis is mine. Character Ranges If you are coming from another Unix-like ...
Git Gud's user avatar
  • 177
7 votes
2 answers
323 views

How to interpret character ranges in charmap files?

The charmap file /usr/share/i18n/charmaps/UTF-8.gz has this line: <U3400>..<U343F> /xe3/x90/x80 <CJK Ideograph Extension A> The map page for charmap(5) only says that it means a ...
yt7b97q-'s user avatar
  • 145
3 votes
2 answers
314 views

alphabetical expansion order for *

When I'm using a POSIX compliant shell (es: dash, bash, zsh, ...) can I be sure that * will always expand in alphabetical order (dictated by LC_COLLATE)? example: $ echo 1 > file_a $ echo 2 > ...
S1cK94's user avatar
  • 35
7 votes
2 answers
8k views

What would break if the C locale was UTF-8 instead of ASCII?

The C locale is defined to use the ASCII charset and POSIX does not provide a way to use a charset without changing the locale as well. What would happen if the encoding of C were switched to UTF-8 ...
gioele's user avatar
  • 2,239