I have two lists to match against one another. I Need to match each str1
word with each list of str2
words. I have a list of 40k words in str2
. I want to try using multiprocessing
to make it run faster.
For example:
str1 = ['how', 'are', 'you']
str2 = [['this', 'how', 'done'], ['they', 'were', 'here'], ['can', 'you', 'leave'], ['how', 'sad]]
The code I tried:
from multiprocessing import Process, Pool
from fuzzywuzzy import process
def f(str2, str1):
for u in str1:
res = []
for i in str2:
Ratios = process.extract(u,i)
res.append(str(Ratios))
print(res)
return res
if __name__ == '__main__':
str1 = ['how', 'are', 'you']
str2 = [['this', 'how', 'done'], ['they', 'were', 'here'], ['can', 'you', 'leave'], ['how', 'sad]]
for i in str2:
p = Process(target=f, args=(i, str1))
p.start()
p.join()
This does not return what I expect - I was expecting the output to look like a data frame:
words how are you
['this', 'how', 'done'] 100 0 0
['they', 'were', 'here'] 0 90 0
['can', 'you', 'leave'] 0 80 100
['how', 'sad'] 100 0 0
p.start() p.join()
in your loop isn't going to make your code any faster