fuzzy match 2 DataFrames?

Question

There is a package called fuzzy_pandas that can use levenshtein for ratio string matching. With some great examples here

As this exemple :

import pandas as pd
import fuzzy_pandas as fpd

df1 = pd.DataFrame({'Key':['Apple', 'Banana', 'Orange', 'Strawberry']})
df2 = pd.DataFrame({'Key':['Aple', 'Mango', 'Orag', 'Straw', 'Bannanna', 'Berry']})

results = fpd.fuzzy_merge(df1, df2,
            left_on='Key',
            right_on='Key',
            method='levenshtein',
            threshold=0.6)

results.head()

So, I don’t know if it’s possible to display the threshold value in the results.

The output is:

Key Key
0   Apple   Aple
1   Banana  Bannanna
2   Orange  Orag

And I want something like:

Key Key Ratio
0   Apple   Aple 0.81
1   Banana  Bannanna 0.87
2   Orange  Orag 0.78

Maybe with another library

Thank you, so with fuzzy_pandas it's not possible. But I do it with DiffLib. — elitebook190, Commented Mar 17, 2020 at 10:14

Luis Segura · Accepted Answer · 2023-01-25 22:33:35Z

0

To create a threshold values, you can do the following code:

results['Similarity']= results.apply(lambda x:fuzz.token_set_ratio(x['Key'],x['Key']),axis=1)

answered Jan 25, 2023 at 22:33

Luis Segura

1

Add a comment |

Collectives™ on Stack Overflow

fuzzy match 2 DataFrames?

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
pandas
levenshtein-distance
fuzzy
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonpandaslevenshtein-distancefuzzy or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
levenshtein-distance
fuzzy
or ask your own question.