Soft cosine distance between two vectors (Python)

I am wondering if there is a good way to calculate the soft cosine distance between two vectors of numbers. So far, I have seen solutions for sentences, which however did not help me, unfortunately.

Say I have two vectors like this:

a = [0,.25,.25,0,.5]
b = [.5,.0,.0,0.25,.25]

Now, I know that the features in the vectors exhibit some degree of similarity among them. This is described via:

s = [[0,.67,.25,0.78,.53]
     [.53,0,.33,0.25,.25]
     [.45,.33,0,0.25,.25]
     [.85,.04,.11,0,0.25]
     [.95,.33,.44,0.25,0]]

So a and b are 1x5 vectors, and s is a 5x5 matrix, describing how similar the features in a and b are.

Now, I would like to calculate the soft cosine distance between a and b, but accounting for between-feature similarity. I found this formula, which should calculate what I need: soft cosine formula

I already tried implementing it using numpy:

import numpy as np

soft_cosine = 1 - (np.dot(a,np.dot(s,b)) / (np.sqrt(np.dot(a,np.dot(s,b))) * np.sqrt(np.dot(a,np.dot(s,b)))))

It is supposed to produce a number between 0 and 1, with a higher number indicating a higher distance between a and b. However, I am running this on a larger dataframe with multiple vectors a and b, and for some it produces negative values. Clearly, I am doing something wrong.

Any help is greatly appreciated, and I am happy to clarify what need clarification!

Best, Johannes

1 answer

  • answered 2021-04-21 17:52 RandomGuy

    From what I see it may just be a formula error. Could you please try with mine ?

    soft_cosine = a @ (s@b) / np.sqrt( (a @ (s@a) ) * (b @ (s@b) ) )
    

    I use the @ operator (which is a shorthand for np.matmul on ndarrays), as I find it cleaner to write : it's just matrix multiplication, no matter if 1D or 2D. It is a simple way to compute a dot product between two 1D arrays, with less code than the usual np.dot function.