You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, I'm trying to use fst with arabic, with levenshtein, and it seems to not work at all... The code I'm working with:
use fst::{automaton::Levenshtein,IntoStreamer,Map};fnmain(){letmut data = vec![("مصر",1)];
data.sort();// A convenient way to create sets in memory.let map = Map::from_iter(data).unwrap();// Build our fuzzy query.let lev = Levenshtein::new("مصر",2).unwrap();// Apply our fuzzy query to the set we built.let stream = map.search(lev).into_stream();let keys = stream.into_stream().into_str_keys().unwrap();println!("keys: {keys:#?}");}
I've enabled the utf8-ranges and levenshtein features on the crate.
The text was updated successfully, but these errors were encountered:
#38 is a tough bug to follow, and I'm not sure there is even a solid reproduction there yet? In any case, generalizing from "the levenshtein implementation has some bugs in places" to the entire library seems pretty egregious to me. The vast majority of this library doesn't care about UTF-8 at all. It's just bytes. Levenshtein tries to do the somewhat-correct thing of measuring edit distance in terms of the number of codepoints. There may indeed be bugs there. I personally do not have any time in the short term to look into those bugs. The Levenshtein code was something I whipped together several years ago, and it has not seen much love since then.
But for this one, it looks like it's a nice small reproduction. Thank you for that. I can indeed reproduce it.
Folks are welcome to take a dive into the Levenshtein automata code and see if they can fix it. PRs are welcome. #38 appears to have some useful suggestions.
Hey, I'm trying to use fst with arabic, with levenshtein, and it seems to not work at all... The code I'm working with:
I've enabled the utf8-ranges and levenshtein features on the crate.
The text was updated successfully, but these errors were encountered: