sklarn - Which model to find pairs

I am working on a project and I want to check if 2 pcs of data fit together.

My idea is to use python and sklearn - I would need to predict if list A (100 entries) and list B (also 100 entries) fit together and the model should tell me how likely they will fit together.

I am pretty new to ML and I am actually not sure what model(s) would be most likely the best option(s) to try and how to structure the data for learning.

Would it be better to have 200 inputs which map then to a 0 (don't fit) or a 1 (fit) or would it be better to have 100 inputs mapping to 100 outputs.

But in case of 100 inputs and 100 outputs would I (at least to my understanding) try to predict the 2nd half of the pair based on the 1st half. I could check then how similiar would be each of the possible candidates to the prediction and select one based on that. But I am not sure if that is a good approach...

Basically I want to throw on the model e.g. 100k pcs and it should find the 50k matching pairs.

1 answer

  • answered 2022-02-20 04:31 PKS

    As you are using it for ML I assume you are using pandas. You need to merge data frames:

    frames = [df1, df2]

    Here's a more detailed example:

    df1 = pd.DataFrame(
            "A": ["A0", "A1", "A2", "A3"],
            "B": ["B0", "B1", "B2", "B3"],
            "C": ["C0", "C1", "C2", "C3"],
            "D": ["D0", "D1", "D2", "D3"],
        index=[0, 1, 2, 3],
    df2 = pd.DataFrame(
            "A": ["A4", "A5", "A6", "A7"],
            "B": ["B4", "B5", "B6", "B7"],
            "C": ["C4", "C5", "C6", "C7"],
            "D": ["D4", "D5", "D6", "D7"],
        index=[4, 5, 6, 7],

    then add like this:

    frames = [df1, df2]

    You can store the frames in a separate data frame (df) using

    result = pd.concat(frames)

    And the output will be like

    enter image description here

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum