0

There is a package called fuzzy_pandas that can use levenshtein for ratio string matching. With some great examples here

As this exemple :

import pandas as pd
import fuzzy_pandas as fpd

df1 = pd.DataFrame({'Key':['Apple', 'Banana', 'Orange', 'Strawberry']})
df2 = pd.DataFrame({'Key':['Aple', 'Mango', 'Orag', 'Straw', 'Bannanna', 'Berry']})

results = fpd.fuzzy_merge(df1, df2,
            left_on='Key',
            right_on='Key',
            method='levenshtein',
            threshold=0.6)

results.head()

So, I don’t know if it’s possible to display the threshold value in the results.

The output is:

Key Key
0   Apple   Aple
1   Banana  Bannanna
2   Orange  Orag

And I want something like:

Key Key Ratio
0   Apple   Aple 0.81
1   Banana  Bannanna 0.87
2   Orange  Orag 0.78

Maybe with another library
4
  • Can you clarify your question?
    – AMC
    Commented Mar 16, 2020 at 18:49
  • Thank you I update the Question Commented Mar 16, 2020 at 19:31
  • 1
    Take a look at my answer here
    – amanb
    Commented Mar 16, 2020 at 19:35
  • Thank you, so with fuzzy_pandas it's not possible. But I do it with DiffLib. Commented Mar 17, 2020 at 10:14

1 Answer 1

0

To create a threshold values, you can do the following code:

results['Similarity']= results.apply(lambda x:fuzz.token_set_ratio(x['Key'],x['Key']),axis=1)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.