# Trouble Using Chain Rule to Implement Backprop in Numpy

I'm having trouble figuring out how to translate the backpropagation steps from my HW from math into python, or numpy operations.

I'm pretty sure I have the first few backprop steps correct, however, on the third step I know I messed up because I'm trying to multiply 2 matrices that are sized (784,1) and (200,10).

Here is the code so far

``````#imports
import keras
import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import mnist

np.random.seed(0)

(x_train, y_train), (x_test, y_test) = mnist.load_data()

X = x_train[0].reshape(1,-1)/255.; Y = y_train[0]
zeros = np.zeros(10); zeros[Y] = 1
Y = zeros

show_img(X,Y)

num_hidden_nodes = 200
num_classes = 10

# init weights
W1 = np.random.uniform(-1e-3,1e-3,size=(784,num_hidden_nodes))
b1 = np.zeros((1,num_hidden_nodes))

W2 = np.random.uniform(-1e-3,1e-3,size=(num_hidden_nodes,num_classes))
b2 = np.zeros((1,num_classes))

#forward pass

# multiply input with weights

def sigmoid(z):
return 1 / (1 + np.exp(- z))

def d_sigmoid(g):
return sigmoid(g) * (1. - sigmoid(g))

# activation function of Z1
X2 = sigmoid(Z1)

# softmax
def softmax(z):
# subracting the max adds numerical stability
shiftx = z - np.max(z)
exps = np.exp(shiftx)
return exps / np.sum(exps)

def d_softmax(Y_hat, Y):
return Y_hat - Y

# the hypothesis,
Y_hat = softmax(Z2)

cost = -1 * np.sum(Y * np.log(Y_hat))

#backprop math
dJ_dZ2 = d_softmax(Y_hat, Y)
dJ_dW2 = np.transpose(X2) * dJ_dZ2
dJ_db2 = dJ_dW2 * d_sigmoid(Z2)
dJ_dX2 = dJ_db2 * W2
dJ_dZ1 = np.transpose(dJ_dX2) * d_sigmoid(Z1)
dJ_dW1 = np.transpose(X) * dJ_dX2
dJ_db1 =  ??

``````

Here are the steps for backprop show with the math representation above.

Output Math Supposed to be implemented in numpy: here

``````dJ_dZ2 = d_softmax(Y_hat, Y)
``````

Second Layer Weights Math Supposed to be implemented in numpy: here

``````dJ_dW2 = np.transpose(X2) * dJ_dZ2
``````

Second Layer Bias Math Supposed to be implemented in numpy: here

``````dJ_db2 = dJ_dW2 * d_sigmoid(Z2)
``````

Second Layer Input Math Supposed to be implemented in numpy: here

``````dJ_dX2 = dJ_db2 * W2
``````

First Layer Activation Math Supposed to be implemented in numpy: here

``````dJ_dZ1 = np.transpose(dJ_dX2) * d_sigmoid(Z1)
``````

First Layer Weights Math Supposed to be implemented in numpy: here

``````dJ_dW1 = np.transpose(X) * dJ_dX2
``````

First Layer Bias Math Supposed to be implemented in numpy: here

``````dJ_db1 =  ??
``````

I really have no idea how to convert this math to numpy, any help would be appreciated.