# Finding euclidean distance from multiple mean vectors

This is what I am trying to do - I was able to do steps 1 to 4. Need help with steps 5 onward

Basically for each data point I would like to find euclidean distance from all mean vectors based upon column `y`

1. take data
2. separate out non numerical columns
3. find mean vectors by y column
4. save means
5. subtract each mean vector from each row based upon y value
6. square each column
8. join back to numerical dataset and then join non numerical columns
``````import pandas as pd

data = [['Alex',10,5,0],['Bob',12,4,1],['Clarke',13,6,0],['brke',15,1,0]]
df = pd.DataFrame(data,columns=['Name','Age','weight','class'],dtype=float)
print (df)
df_numeric=df.select_dtypes(include='number')#, exclude=None)[source]
df_non_numeric=df.select_dtypes(exclude='number')
means=df_numeric.groupby('class').mean()
``````

For each row of `means`, subtract that row from each row of `df_numeric`. then take square of each column in the output and then for each row add all columns. Then join this data back to `df_numeric` and `df_non_numeric`

--------------update1

added code as below. My questions have changed and updated questions are at the end.

``````def calculate_distance(row):

def calculate_distance2(row):
return (np.sum(np.square(row-means.tail(1)),1))

df_numeric2=df_numeric.drop("class",1)
df_numeric2['distance0']= df_numeric.apply(calculate_distance, axis=1)
df_numeric2['distance1']= df_numeric.apply(calculate_distance2, axis=1)

print(df_numeric2)

final = pd.concat([df_non_numeric, df_numeric2], axis=1)
final["class"]=df["class"]
``````

could anyone confirm that these is a correct way to achieve the results? i am mainly concerned about the last two statements. Would the second last statement do a correct join? would the final statement assign the original `class`? i would like to confirm that python wont do the concat and class assignment in a random order and that python would maintain the order in which rows appear

``````final = pd.concat([df_non_numeric, df_numeric2], axis=1)
final["class"]=df["class"]
``````

I think this is what you want

``````import pandas as pd
import numpy as np
data = [['Alex',10,5,0],['Bob',12,4,1],['Clarke',13,6,0],['brke',15,1,0]]
df = pd.DataFrame(data,columns=['Name','Age','weight','class'],dtype=float)
print (df)
df_numeric=df.select_dtypes(include='number')#, exclude=None)[source]
# Make df_non_numeric a copy and not a view
df_non_numeric=df.select_dtypes(exclude='number').copy()

# Subtract mean (calculated using the transform function which preserves the
# number of rows) for each class  to create distance to mean
df_dist_to_mean =  df_numeric[['Age', 'weight']] - df_numeric[['Age', 'weight', 'class']].groupby('class').transform('mean')
# Finally calculate the euclidean distance (hypotenuse)
df_non_numeric['euc_dist'] = np.hypot(df_dist_to_mean['Age'], df_dist_to_mean['weight'])
df_non_numeric['class'] = df_numeric['class']
# If you want a separate dataframe named 'final' with the end result
df_final = df_non_numeric.copy()
print(df_final)
``````

It is probably possible to write this even denser but this way you'll see whats going on.

I'm sure there is a better way to do this but I iterated through depending on the class and follow the exact steps.

1. Assigned the 'class' as the index.
2. Rotated so that the 'class' was in the columns.
3. Performed that operation of means that corresponded with df_numeric
4. Squared the values.
5. Summed the rows.
6. Concatenated the dataframes back together.

``````data = [['Alex',10,5,0],['Bob',12,4,1],['Clarke',13,6,0],['brke',15,1,0]]
df = pd.DataFrame(data,columns=['Name','Age','weight','class'],dtype=float)
#print (df)

df_numeric=df.select_dtypes(include='number')#, exclude=None)[source]
df_non_numeric=df.select_dtypes(exclude='number')

means=df_numeric.groupby('class').mean().T

import numpy as np
# Changed index
df_numeric.index = df_numeric['class']
df_numeric.drop('class' , axis = 1 , inplace = True)

# Rotated the Numeric data sideways so the class was in the columns
df_numeric = df_numeric.T

#Iterated through the values in means and seen which df_Numeric values matched
store = [] # Assigned an empty array
for j in means:
sto = df_numeric[j]
if type(sto) == type(pd.Series()): # If there is a single value it comes out as a pd.Series type
sto = sto.to_frame() # Need to convert ot dataframe type
store.append(sto-j) # append the various values to the array

values = np.array(store)**2 # Squaring the values

# Summing the rows
summed = []
for i in values:
summed.append((i.sum(axis = 1)))

df_new = pd.concat(summed , axis = 1)
df_new.T
``````