Why does pandas.DataFrame.mean() work but pandas.DataFrame.std() does not over same data
I'm trying to figure out why the pandas.DataFrame.mean() function works over a ndarray of ndarrays, but the pandas.DataFrame.std() does not over the same data. The following is a minimum example.
x = np.array([1,2,3])
y = np.array([4,5,6])
df = pd.DataFrame({"numpy": [x,y]})
df["numpy"].mean() #works as expected
Out[231]: array([ 2.5, 3.5, 4.5])
df["numpy"].std() #does not work as expected
Out[231]: TypeError: setting an array element with a sequence.
However, if I do it through
df["numpy"].values.mean() #works as expected
Out[231]: array([ 2.5, 3.5, 4.5])
df["numpy"].values.std() #works as expected
Out[233]: array([ 1.5, 1.5, 1.5])
Debug information:
df["numpy"].dtype
Out[235]: dtype('O')
df["numpy"][0].dtype
Out[236]: dtype('int32')
df["numpy"].describe()
Out[237]:
count 2
unique 2
top [1, 2, 3]
freq 1
Name: numpy, dtype: object
df["numpy"]
Out[238]:
0 [1, 2, 3]
1 [4, 5, 6]
Name: numpy, dtype: object
1 answer

Assuming you have the following orginal DF (containing numpy arrays of the same shape in cells):
In [320]: df Out[320]: file numpy 0 x [1, 2, 3] 1 y [4, 5, 6]
Convert it to the following format:
In [321]: d = pd.DataFrame(df['numpy'].values.tolist(), index=df['file']) In [322]: d Out[322]: 0 1 2 file x 1 2 3 y 4 5 6
Now you are free to use all the Pandas/Numpy/Scipy power:
In [323]: d.sum(axis=1) Out[323]: file x 6 y 15 dtype: int64 In [324]: d.sum(axis=0) Out[324]: 0 5 1 7 2 9 dtype: int64 In [325]: d.mean(axis=0) Out[325]: 0 2.5 1 3.5 2 4.5 dtype: float64 In [327]: d.std(axis=0) Out[327]: 0 2.12132 1 2.12132 2 2.12132 dtype: float64