Python: How to iterate through 2 columns and match one by one

Lets say i have following 2 columns from 2 different excel sheets:

Excel 1:

Name
0      Abilass Vethanayagam June 04 2018 
1      Abraham H, Tesfazghi June 01 2017
2       Achilles Cortel February 24 2017
3         Achraf El khadri April 24 2019

Excel 2:

                             Name
0        Adam Clausen Feb 09 2019
1         Adam Honore Feb 22 2020
2         Adam Honore Feb 22 2020
3         Adam Honore Feb 22 2020

I want to match each cell from column 1 to each cell in column 2 and then locate the biggest similarity.

I have done some research and:

  1. I know how to load the 2 columns as data frames (input)
  2. I know how to match 2 inputs and get the procentage similarity:

SequenceMatcher code example:

from difflib import SequenceMatcher

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()


x = "Adam Clausen a Feb 09 2019"
y = "Adam Clausen Feb 08 2019"
print(similar(x,y))

Output:0.92

What i need is a loop that takes the cells from dataframe 1, one by one and sequencematch with dataframe 2, one by one.

It should then find the biggest match and potentially reorganize the to column if possible.

I really hope somebody could help because i have been struggling with this now for 2 weeks :(

1 answer

  • answered 2020-06-02 11:02 pritam samanta

    If u know how to load colums as dataframe..this code should get your job done..

    from difflib import SequenceMatcher
    
    col_1 = ['potato','tomato', 'apple']
    col_2 = ['tomatoe','potatao','appel']
    
    def similar(a,b):
        ratio = SequenceMatcher(None, a, b).ratio()
        matches = a, b
        return ratio, matches
    
    for i in col_1:
        print(max(similar(i,j) for j in col_2))