Trouble Using Chain Rule to Implement Backprop in Numpy

I'm having trouble figuring out how to translate the backpropagation steps from my HW from math into python, or numpy operations.

I'm pretty sure I have the first few backprop steps correct, however, on the third step I know I messed up because I'm trying to multiply 2 matrices that are sized (784,1) and (200,10).

Here is the code so far

#imports 
import keras
import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import mnist

np.random.seed(0)

#loading data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

X = x_train[0].reshape(1,-1)/255.; Y = y_train[0]
zeros = np.zeros(10); zeros[Y] = 1
Y = zeros

show_img(X,Y)

num_hidden_nodes = 200 
num_classes = 10

# init weights
W1 = np.random.uniform(-1e-3,1e-3,size=(784,num_hidden_nodes))
b1 = np.zeros((1,num_hidden_nodes))

W2 = np.random.uniform(-1e-3,1e-3,size=(num_hidden_nodes,num_classes))
b2 = np.zeros((1,num_classes))

#forward pass

# multiply input with weights
Z1 = np.add(np.matmul(X,W1), b1)

def sigmoid(z):
    return 1 / (1 + np.exp(- z))

def d_sigmoid(g):
    return sigmoid(g) * (1. - sigmoid(g))

# activation function of Z1
X2 = sigmoid(Z1)

Z2 = np.add(np.matmul(X2,W2), b2)

# softmax
def softmax(z):
    # subracting the max adds numerical stability
    shiftx = z - np.max(z)
    exps = np.exp(shiftx)
    return exps / np.sum(exps)

def d_softmax(Y_hat, Y):
    return Y_hat - Y

# the hypothesis, 
Y_hat = softmax(Z2)

cost = -1 * np.sum(Y * np.log(Y_hat))

#backprop math
dJ_dZ2 = d_softmax(Y_hat, Y)
dJ_dW2 = np.transpose(X2) * dJ_dZ2
dJ_db2 = dJ_dW2 * d_sigmoid(Z2)
dJ_dX2 = dJ_db2 * W2
dJ_dZ1 = np.transpose(dJ_dX2) * d_sigmoid(Z1)
dJ_dW1 = np.transpose(X) * dJ_dX2
dJ_db1 =  ??

Here are the steps for backprop show with the math representation above.

Output Math Supposed to be implemented in numpy: here

dJ_dZ2 = d_softmax(Y_hat, Y)

Second Layer Weights Math Supposed to be implemented in numpy: here

dJ_dW2 = np.transpose(X2) * dJ_dZ2

Second Layer Bias Math Supposed to be implemented in numpy: here

dJ_db2 = dJ_dW2 * d_sigmoid(Z2)

Second Layer Input Math Supposed to be implemented in numpy: here

dJ_dX2 = dJ_db2 * W2

First Layer Activation Math Supposed to be implemented in numpy: here

dJ_dZ1 = np.transpose(dJ_dX2) * d_sigmoid(Z1)

First Layer Weights Math Supposed to be implemented in numpy: here

dJ_dW1 = np.transpose(X) * dJ_dX2

First Layer Bias Math Supposed to be implemented in numpy: here

dJ_db1 =  ??

I really have no idea how to convert this math to numpy, any help would be appreciated.