Yes, this is a problem with all versions of printf
that I am aware of. I briefly discuss the matter in this answer and also in this one.
For C, I do not know of a library that will do this for you, but if anyone has it, it would be ICU.
For Perl, you have to use the Unicode::GCString module form CPAN to calculate the number of print columns a Unicode string will take up. This takes into account Unicode Standard Annex #11: East Asian Width.
For example, some code points take up 1 column and others take up 2 columns. There are even some that take up no columns at all, like combining characters and invisible control characters. The class has a columns
method that returns how many columns the string takes up.
I have an example of using this for aligning Unicode text vertically here. It will sort a bunch of Unicode strings, including some with combining characters and “wide” Asian ideograms (CJK characters), and allow you to align things vertically.
Code for the little umenu
demo program which prints that nicely aligned output, is included below.
You might also be interested the far more ambitious Unicode::LineBreak module, of which the aforementioned Unicode::GCString
class is just a smaller component. This module is much cooler, and takes into account Unicode Standard Annex #14: Unicode Line Breaking Algorithm.
Here’s the code for the little umenu
demo, tested on Perl v5.14:
#!/usr/bin/env perl
# umenu - demo sorting and printing of Unicode food
#
# (obligatory and increasingly long preamble)
#
use utf8;
use v5.14; # for locale sorting
use strict;
use warnings;
use warnings qw(FATAL utf8); # fatalize encoding faults
use open qw(:std :utf8); # undeclared streams in UTF-8
use charnames qw(:full :short); # unneeded in v5.16
# std modules
use Unicode::Normalize; # std perl distro as of v5.8
use List::Util qw(max); # std perl distro as of v5.10
use Unicode::Collate::Locale; # std perl distro as of v5.14
# cpan modules
use Unicode::GCString; # from CPAN
# forward defs
sub pad($$$);
sub colwidth(_);
sub entitle(_);
my %price = (
"γύρος" => 6.50, # gyros, Greek
"pears" => 2.00, # like um, pears
"linguiça" => 7.00, # spicy sausage, Portuguese
"xoriço" => 3.00, # chorizo sausage, Catalan
"hamburger" => 6.00, # burgermeister meisterburger
"éclair" => 1.60, # dessert, French
"smørbrød" => 5.75, # sandwiches, Norwegian
"spätzle" => 5.50, # Bayerisch noodles, little sparrows
"包子" => 7.50, # bao1 zi5, steamed pork buns, Mandarin
"jamón serrano" => 4.45, # country ham, Spanish
"pêches" => 2.25, # peaches, French
"シュークリーム" => 1.85, # cream-filled pastry like éclair, Japanese
"막걸리" => 4.00, # makgeolli, Korean rice wine
"寿司" => 9.99, # sushi, Japanese
"おもち" => 2.65, # omochi, rice cakes, Japanese
"crème brûlée" => 2.00, # tasty broiled cream, French
"fideuà" => 4.20, # more noodles, Valencian (Catalan=fideuada)
"pâté" => 4.15, # gooseliver paste, French
"お好み焼き" => 8.00, # okonomiyaki, Japanese
);
my $width = 5 + max map { colwidth } keys %price;
# So the Asian stuff comes out in an order that someone
# who reads those scripts won't freak out over; the
# CJK stuff will be in JIS X 0208 order that way.
my $coll = new Unicode::Collate::Locale locale => "ja";
for my $item ($coll->sort(keys %price)) {
print pad(entitle($item), $width, ".");
printf " €%.2f\n", $price{$item};
}
sub pad($$$) {
my($str, $width, $padchar) = @_;
return $str . ($padchar x ($width - colwidth($str)));
}
sub colwidth(_) {
my($str) = @_;
return Unicode::GCString->new($str)->columns;
}
sub entitle(_) {
my($str) = @_;
$str =~ s{ (?=\pL)(\S) (\S*) }
{ ucfirst($1) . lc($2) }xge;
return $str;
}
As you see, the key to making it work in that particular program is this line of code, which just calls other functions defined above, and uses the module I was discussing:
print pad(entitle($item), $width, ".");
That will pad out the item to the given width using dots as the fill character.
Yes, it’s a lot less convenient that printf
, but at least it is possible.
printf
for Perl and/or C?East_Asian_Width=Ambiguous
characters. I don’t know of any Go library that deals with this the way the Perl library described in my answer does, but if there is such a thing for Go, I’d love to learn about it! Thanks.printf
only cares about the number of bytes. If you want to take into account the number of characters, usewprintf
(remember, it takes awchar_t*
format). There's no formatting function in C that takes into account display width.