Using cosine_similarity function on Python

import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

a = np.array([[3,4],[2,5],[1,2],[1,2],[4,5]])

ap = pd.DataFrame(a, index=['Sonata','Etudes','Waltzes','Nocturnes','Marches'],columns=['search_history','view_count'])
ap

enter image description here

b = np.array([[4,4],[3,5],[2,1],[4,7],[1,2]])
bp = pd.DataFrame(b, index=['Sonata','Etudes','Waltzes','Nocturnes','Marches'],columns=['comment + wishlist ',' signup'])
bp

enter image description here

then i cosine_similarity function ,

from sklearn.metrics.pairwise import cosine_similarity
pd.DataFrame(cosine_similarity(a, b),columns=['A','B'], index=['Sonata','Etudes','Waltzes','Nocturnes','Marches'])

this gives:

ValueError: Shape of passed values is (5, 5), indices imply (5, 2)

so if i change like this,

from sklearn.metrics.pairwise import cosine_similarity
pd.DataFrame(cosine_similarity(a, b),columns=['A','B','c','d','e'], index=['Sonata','Etudes','Waltzes','Nocturnes','Marches'])

enter image description here

This result cames out.

This is not the result I thought. Like dataFrames a and b, i want to show results in five rows and two columns, but we always get results in only five rows and five columns.

What should I do?

expected result was

           A            B   
Sonata     0.989949     0.994692    
Etudes      0.919145    0.987241    
Waltzes     0.948683    0.997054    
Nocturnes   0.948683    0.997054    
Marches    0.993884     0.990992    

like this

1 answer

  • answered 2022-05-02 10:20 Guy

    cosine_similarity() will compare every value in the array to all the values in the second array, which is 5 * 5 operations and results. You want just the first two columns, so you can slice the result DataFrame

    df = pd.DataFrame(cosine_similarity(a, b), columns=['A', 'B', 'C', 'D', 'E'], index=['Sonata', 'Etudes', 'Waltzes', 'Nocturnes', 'Marches'])
    print(df[['A', 'B']]) # by columns names
    # or
    print(df.iloc[:, 0:2]) # by columns indices
    

    Output

                      A         B
    Sonata     0.989949  0.994692
    Etudes     0.919145  0.987241
    Waltzes    0.948683  0.997054
    Nocturnes  0.948683  0.997054
    Marches    0.993884  0.990992
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum