I'm building a trigram search using PostreSQL extension pg_trgm
. I want the search to be case insensitive and to ignore accents. I've tried to achieve that by using this collation:
CREATE COLLATION "und-for-search" (provider = icu, locale = 'und-u-kn-true-ks-level1', deterministic = false);
When I test this collation on a simple comparison, it seems to work ok:
SELECT 'z' = 'ž' COLLATE "und-for-search" -- return true ✅
But in trigram search 'z', and 'ž' are not treated as equal:
SELECT similarity('z' COLLATE "und-for-search", 'ž' COLLATE "und-for-search") -- returns 0 ❌
If we check the trigrams being generated via SELECT show_trgm('ž' COLLATE "und-for-search")
, which returns:
| show_trgm |
| - |
| {0xe0bfa7,0xfd71f4} |
Can somebody explain why the collation is not working correctly with trigrams? Is there any way to run trigram search that ignores accents without additional functions like unaccent
?
unaccent()
.