All Questions
13 questions
1
vote
1
answer
74
views
How to decode bytes to characters in a POSIX-compliant way?
I'm attempting to write a strictly POSIX-compliant shell, but the standard doesn't make it clear how to go from bytes to characters. It says to use LC_CTYPE, which further links to the concept of a ...
0
votes
1
answer
127
views
How to preserve non-ASCII characters?
We have default POSIX locale in our server but when non-ASCII character like רקטות לגוש דן וירושלים(hebrew) uploaded in server its getting changes to רק××ת ×××ש ×× ××ר×ש×××, How can preserve it ...
1
vote
0
answers
96
views
is there any way query LC_CTYPE locale character class expressions?
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05
9.3.5 RE Bracket Expression section 6 quote:
The following character class expressions shall be supported in all
...
2
votes
1
answer
116
views
Debian tcsh not respecting locale for character class expansion
Using Debian 11, tcsh version
tcsh 6.21.00 (Astron) 2019-05-08 (x86_64-unknown-linux) options wide,nls,dl,al,kan,sm,rh,nd,color,filec
In a directory with two files, a and A,
$ echo [a-z]
a A
This ...
0
votes
2
answers
456
views
How to detect that POSIX locale is not provided on POSIX shellscript and POSIX utilities, portablily?
So far I have found that Termux is the only POSIX environment without POSIX locale; as a result the following command, for example:
awk 'BEGIN{for(i=1;i<256;i++)printf"%c",i;}'
outputs ...
3
votes
2
answers
11k
views
What does "in the POSIX locale" mean?
In this question there is a comment which says:
All of this from not understanding what "in the POSIX locale" means. (-: You should really try matching Greek lowercase letters with (say) sed and [[...
-3
votes
2
answers
362
views
Proposed additional POSIX "Character Classes" [closed]
There are a handful of "character classes" defined in POSIX as in LC_CTYPE locale definition with the following (12) names:
alnum alpha blank cntrl digit graph lower print punct space upper xdigit
...
34
votes
2
answers
9k
views
Which is the current decimal separator?
Say I have a POSIX shell script that
needs to run on different systems/environments that I do not control, and
needs to remove the decimal separator from a string that is emitted by a program that ...
3
votes
0
answers
47
views
Should I set the locale to `C` when matching a range of numbers?
If I wanted to search for lines in a file that contain a or b or c or d I would run
LC_COLLATE=C grep -E '[a-d]' file_to_search
or
LC_ALL=C grep -E '[a-d]' file_to_search
If I fail to set the ...
1
vote
1
answer
791
views
Wildcards/Globbing: Are character ranges problematic?
In The Linux Command Line William Shotts claims that character ranges can be problematic. See the relevant excerpt below, emphasis is mine.
Character Ranges
If you are coming from another Unix-like ...
7
votes
2
answers
323
views
How to interpret character ranges in charmap files?
The charmap file /usr/share/i18n/charmaps/UTF-8.gz has this line:
<U3400>..<U343F> /xe3/x90/x80 <CJK Ideograph Extension A>
The map page for charmap(5) only says that it means a ...
3
votes
2
answers
314
views
alphabetical expansion order for *
When I'm using a POSIX compliant shell (es: dash, bash, zsh, ...) can I be sure that * will always expand in alphabetical order (dictated by LC_COLLATE)?
example:
$ echo 1 > file_a
$ echo 2 > ...
7
votes
2
answers
8k
views
What would break if the C locale was UTF-8 instead of ASCII?
The C locale is defined to use the ASCII charset and POSIX does not provide a way to use a charset without changing the locale as well.
What would happen if the encoding of C were switched to UTF-8 ...