Skip to content

Commit

Permalink
Treat U+30A0 & U+30FB in Katakana Block as CJK (#16796)
Browse files Browse the repository at this point in the history
  • Loading branch information
tats-u authored Nov 26, 2024
1 parent d52e905 commit ac46a4f
Show file tree
Hide file tree
Showing 5 changed files with 56 additions and 1 deletion.
25 changes: 25 additions & 0 deletions changelog_unreleased/markdown/16796.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#### Treat U+30A0 & U+30FB in Katakana Block as CJK (#16796 by @tats-u)

Prettier doesn't treat U+30A0 & U+30FB as Japanese. U+30FB is commonly used in Japanese to represent the delimitation of first and last names of non-Japanese people or “and”. The following “C言語・C++・Go・Rust” means “C language & C++ & Go & Rust” in Japanese.

<!-- prettier-ignore -->
```md
<!-- Input (--prose-wrap=never) -->

C言
C++
Go
Rust

<!-- Prettier stable -->
C言語・ C++ ・ Go ・ Rust

<!-- Prettier main -->
C言語・C++・Go・Rust
```

U+30A0 can be used as the replacement of the `-` in non-Japanese names (e.g. “Saint-Saëns” (Charles Camille Saint-Saëns) can be represented as “サン゠サーンス” in Japanese), but substituted by ASCII hyphen (U+002D) or U+FF1D (full width hyphen) in many cases (e.g. “サン=サーンス” or “サン=サーンス”).
1 change: 1 addition & 0 deletions cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,7 @@
"Rubocop",
"ruleset",
"rulesets",
"Saëns",
"sandhose",
"Sapegin",
"sbdchd",
Expand Down
9 changes: 8 additions & 1 deletion src/language-markdown/constants.evaluate.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,14 @@ const cjkCharset = new Charset(
"Modifier_Symbol",
"Nonspacing_Mark",
],
}),
// .union below makes the next Block condition "OR"
// If it is merged into this object definition, it will be "AND" instead
}).union(
// Firefox treats some symbols (U+30A0, U+30FB) in the Katakana block as CJK
unicodeRegex({
Block: ["Katakana"],
}),
),
);
const variationSelectorsCharset = unicodeRegex({
Block: ["Variation_Selectors", "Variation_Selectors_Supplement"],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,16 @@ English
[ウ
ィキペディア]: https://ja.wikipedia.org/
C言
C++
Go
Rust
=====================================output=====================================
日本語、にほんご。汉语, 中文. 日本語,にほんご.English
words!? 漢字!汉字?「セリフ」(括弧) 文字(括弧)文字【括弧】日本語English
Expand All @@ -297,5 +307,7 @@ words!? 漢字!汉字?「セリフ」(括弧) 文字(括弧)文字【括
[ウ ィキペディア]: https://ja.wikipedia.org/
C言語・C++・Go・Rust
================================================================================
`;
10 changes: 10 additions & 0 deletions tests/format/markdown/splitCjkText/symbolSpaceNewLine.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,13 @@ English

[
ィキペディア]: https://ja.wikipedia.org/


C言
C++
Go
Rust

0 comments on commit ac46a4f

Please sign in to comment.