How does error get back propagated through pooling layers?
I asked a question earlier that might have been too specific so I'll ask again in more general terms. How does error get propagated backwards through a pooling layer when there are no weights to train? In the tensorflow video at 6:36 https://www.youtube.com/watch?v=Y_hzMnRXjhI there's a GlobalAveragePooling1D after Embedding, How does the error go backwards?
1 answer

A layer doesn't need to have weights in order to backprop. You can compute the gradients of a global avg pool w.r.t the inputs  it's simply dividing by the number of elements pooled.
It is a bit more tricky when it comes to max pooling: in that case, you propagate gradients through the pooled indices. That is, during backprop, the gradients are "routed" to the input elements that contributed the maximal elements, no gradient is propagated to the other elements.
do you know?
how many words do you know
See also questions close to this topic

Java Array modifies itself so that each element within itself is identical
So I've been attempting to design a neural network for use with a neuroevolutionary algorithm . The main roadblock I am currently hitting is the array of my population of networks. When I generate them for the first time, they all seem to have randomised weights to them, as intended, but whenever I reference the array immediately after generating it, it spits out that each element's weights are identical to each other. This is the snippet of code that confuses me the most:
for(int i = 0; i < size; i++){ Dot newDot = new Dot(goal, nodes, connections); for(int j = 0; j< newDot.brain.connections.size(); j++){ newDot.brain.connections.get(j).randomiseWeight(); } dots.add(newDot); System.out.println(i); dots.get(i).brain.printNetwork(); } for (int i = 0; i < dots.size(); i++) { System.out.println(i) dots.get(i).brain.printNetwork(); }
The Dot class is a wrapper for the network, and the brain class is the network itself. The method printNetwork is what prints the network to the console, which tells me the weights of the connections.
So, when I call the printNetwork within the first loop, each network comes back as different, as expected, but when I call it again in the second loop, it prints the same values for each dot it loops over. It's not getting stuck on the same item when looping, as the value for i within the second loop ticks up. So the only conclusion I can reach, is that the array is deciding that each value is identical, for some reason.
If anyone has any insight as to why this might have occurred, it would be helpful

Calculation the crowd of Weights on a perceptron
I have a perceptron with an input layer with 10 neurons, two hidden layers with 500 neurons each other and an output layer with 3 neurons. The question is how i can calculate the crowd of the total Weights on this model;

How to decide on activation function?
Currently there are a lot of activation functions like sigmoid, tanh, ReLU ( being the preferred choice ), but I have a question that concerns which choices are needed to be considered so that a certain activation function should be selected.
For example : When we want to Upsample a network in GANs, we prefer using LeakyReLU.
I am a newbie in this subject, and have not found a concrete solution as to which activation function to use in different situations.
My knowledge uptil now :
Sigmoid : When you have a binary class to identify
Tanh : ?
ReLU : ?
LeakyReLU : When you want to upsampleAny help or article will be appreciated.

Mean_iou in tensorflow not updating/resulting in correct value
I have implemented a version of UNET in tensorflow, trying to identify buildings from satellite images. The implementation is working and is giving promising results regarding the classification. All the metrics seems to be working correctly except mean_iou. Regardless of the different hyperparameters and the images chosen from the dataset the mean_iou is always the same. The value is similar to 15 decimal points after each epoch.
The precision and recall values are considerable higher compared to mean_iou and what should be expected, so it seems that something is not working as intended.
As I am relatively new to tensorflow so the error might be something totally different, but I am here to learn. All feedback will be greatly appriciated.
Here is the relevant code and printout from the training of the model.
import numpy as np import tensorflow as tf from unet_model import build_unet from data import load_dataset, tf_dataset from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, CSVLogger, EarlyStopping model_types = ['segnetmaster', 'unetmaster', 'simpler', 'evensimpler'] if __name__ == "__main__": """ Hyperparamaters """ dataset_path = "buildingsegmentation" input_shape = (64, 64, 3) batch_size = 20 model = 3 epochs = 5 res = 64 lr = 1e3 model_path = f"unet_models/unet_{epochs}_epochs_{res}.h5" csv_path = f"csv/data_unet_{epochs}_{res}.csv" """ Load the dataset """ (train_images, train_masks), (val_images, val_masks) = load_dataset(dataset_path) train_dataset = tf_dataset(train_images, train_masks, batch=batch_size) val_dataset = tf_dataset(val_images, val_masks, batch=batch_size) model = build_unet(input_shape) model.compile( loss="binary_crossentropy", optimizer=tf.keras.optimizers.Adam(lr), metrics=[ tf.keras.metrics.MeanIoU(num_classes=2), tf.keras.metrics.IoU(num_classes=2, target_class_ids=[0]), tf.keras.metrics.Recall(), tf.keras.metrics.Precision() ] ) callbacks = [ ModelCheckpoint(model_path, monitor="val_loss", verbose=1), ReduceLROnPlateau(monitor="val_loss", patience=10, factor=0.1, verbose=1), CSVLogger(csv_path), EarlyStopping(monitor="val_loss", patience=10) ] train_steps = len(train_images)//batch_size if len(train_images) % batch_size != 0: train_steps += 1 test_steps = len(val_images)//batch_size if len(val_images) % batch_size != 0: test_steps += 1 model.fit( train_dataset, validation_data=val_dataset, epochs=epochs, steps_per_epoch=train_steps, validation_steps=test_steps, callbacks=callbacks )
epoch loss lr mean_io_u precision recall val_loss val_mean_io_u val_precision val_recall 0 0.41137945652008057 0.001 0.37184661626815796 0.695444643497467 0.5243006944656372 0.87176513671875 0.37157535552978516 0.38247567415237427 0.9118495583534241 1 0.3461640477180481 0.001 0.37182655930519104 0.7579150795936584 0.6075601577758789 0.3907579183578491 0.37157535552978516 0.8406943082809448 0.5024654865264893 2 0.3203786611557007 0.001 0.37182655930519104 0.7694798707962036 0.6599727272987366 0.3412915766239166 0.37157535552978516 0.6986522674560547 0.7543279528617859 3 0.2999393939971924 0.001 0.37182655930519104 0.7859976887702942 0.6890525221824646 0.40518054366111755 0.37157535552978516 0.6738141775131226 0.6654454469680786 4 0.28737708926200867 0.001 0.37182655930519104 0.793653130531311 0.7092126607894897 0.37544798851013184 0.37157535552978516 0.621263325214386 0.768422544002533 5 0.27629318833351135 0.001 0.37182655930519104 0.8028419613838196 0.72260981798172 0.4055494964122772 0.37157535552978516 0.8477562665939331 0.5473824143409729 6 0.2665417492389679 0.001 0.37182655930519104 0.809609055519104 0.7353982329368591 0.33294594287872314 0.37157535552978516 0.7307689785957336 0.6933897733688354 7 0.25887876749038696 0.001 0.37182655930519104 0.8132126927375793 0.744954526424408 0.28797024488449097 0.37157535552978516 0.7534120082855225 0.7735632061958313 8 0.25271594524383545 0.001 0.37182655930519104 0.8179733753204346 0.7538670897483826 0.30249008536338806 0.37157535552978516 0.8644329905509949 0.6237345337867737 9 0.24556593596935272 0.001 0.37182655930519104 0.8207928538322449 0.7622584104537964 0.3576349914073944 0.37157535552978516 0.6576451063156128 0.8346141576766968 10 0.23954670131206512 0.001 0.37182655930519104 0.8256030082702637 0.769091010093689 0.2541409134864807 0.37157535552978516 0.8100516200065613 0.7633218765258789 11 0.2349284589290619 0.001 0.37182655930519104 0.8274455070495605 0.7762861847877502 0.24383187294006348 0.37157535552978516 0.795067310333252 0.8124401569366455 12 0.22480393946170807 0.001 0.37182655930519104 0.8354562520980835 0.787416398525238 0.3778316378593445 0.37157535552978516 0.6533672213554382 0.8588836789131165 13 0.22573505342006683 0.001 0.37182655930519104 0.8342418670654297 0.7852107882499695 0.3342073857784271 0.37157535552978516 0.6768029928207397 0.7917631268501282 14 0.21639415621757507 0.001 0.37182655930519104 0.8411555886268616 0.7972605228424072 0.2792396545410156 0.37157535552978516 0.7611830234527588 0.7955203652381897 15 0.21154287457466125 0.001 0.37182655930519104 0.8441442251205444 0.8019176125526428 0.27426305413246155 0.37157535552978516 0.8764772415161133 0.6708933115005493 16 0.20740143954753876 0.001 0.37182655930519104 0.8469985127449036 0.8068550825119019 0.367437481880188 0.37157535552978516 0.646026611328125 0.8527452945709229 17 0.2005360722541809 0.001 0.37182655930519104 0.8522992134094238 0.8129924535751343 0.22591133415699005 0.37157535552978516 0.8203750252723694 0.8089460730552673 18 0.1976771354675293 0.001 0.37182655930519104 0.853760302066803 0.8163849115371704 0.2331937551498413 0.37157535552978516 0.807687520980835 0.8157453536987305 19 0.19583451747894287 0.001 0.37182655930519104 0.8560215830802917 0.8190248012542725 0.2519392669200897 0.37157535552978516 0.7935053110122681 0.8000433444976807 20 0.1872621327638626 0.001 0.37182655930519104 0.8615736365318298 0.8263705372810364 0.22855037450790405 0.37157535552978516 0.7948822975158691 0.8500961065292358 21 0.1852150857448578 0.001 0.37182655930519104 0.8620718717575073 0.8289932012557983 0.2352440059185028 0.37157535552978516 0.7972174286842346 0.8323403000831604 22 0.17845036089420319 0.001 0.37182655930519104 0.8677510023117065 0.8351714611053467 0.21090157330036163 0.37157535552978516 0.8470866084098816 0.8098670244216919 23 0.1732502579689026 0.001 0.37182655930519104 0.8711428046226501 0.8414102792739868 0.32612740993499756 0.37157535552978516 0.8412857055664062 0.695543646812439 24 0.17396509647369385 0.001 0.37182655930519104 0.8704758882522583 0.840953528881073 0.2149643898010254 0.37157535552978516 0.8315027952194214 0.8180400729179382 25 0.1740695685148239 0.001 0.37182655930519104 0.8702647089958191 0.8410759568214417 0.2138184905052185 0.37157535552978516 0.8604387044906616 0.7878146171569824 26 0.16104143857955933 0.001 0.37182655930519104 0.8794053196907043 0.8530260324478149 0.23256370425224304 0.37157535552978516 0.8179659843444824 0.8145195841789246 27 0.15866029262542725 0.001 0.37182655930519104 0.8813797831535339 0.8556373119354248 0.21111807227134705 0.37157535552978516 0.8566364049911499 0.805817723274231 28 0.15867507457733154 0.001 0.37182655930519104 0.8811318874359131 0.8551875352859497 0.2091868668794632 0.37157535552978516 0.8498891592025757 0.8088852763175964 29 0.15372247993946075 0.001 0.37182655930519104 0.884833574295044 0.8602938055992126 0.2100905030965805 0.37157535552978516 0.8543928265571594 0.8121073246002197 30 0.1550114005804062 0.001 0.37182655930519104 0.8840479850769043 0.85946124792099 0.21207265555858612 0.37157535552978516 0.8512551784515381 0.814805269241333 31 0.14192143082618713 0.001 0.37182655930519104 0.8927850127220154 0.8717316389083862 0.21726688742637634 0.37157535552978516 0.8147332072257996 0.8602878451347351 32 0.1401694267988205 0.001 0.37182655930519104 0.8940809965133667 0.8732201457023621 0.21714988350868225 0.37157535552978516 0.8370103240013123 0.8307888507843018 33 0.13880570232868195 0.001 0.37182655930519104 0.8950505256652832 0.8743049502372742 0.23316830396652222 0.37157535552978516 0.8291308283805847 0.8264546990394592 34 0.14308543503284454 0.001 0.37182655930519104 0.892676830291748 0.8704872131347656 0.2735193967819214 0.37157535552978516 0.7545790076255798 0.8698106408119202 35 0.14015090465545654 0.001 0.37182655930519104 0.8939213752746582 0.8743175864219666 0.20235474407672882 0.37157535552978516 0.8535885810852051 0.8286886215209961 36 0.1288939267396927 0.001 0.37182655930519104 0.9015076756477356 0.8844809532165527 0.22387968003749847 0.37157535552978516 0.8760555982589722 0.7937673926353455 37 0.12568938732147217 0.001 0.37182655930519104 0.9041174054145813 0.8872519731521606 0.21494744718074799 0.37157535552978516 0.8468613028526306 0.8249993324279785 38 0.12176792323589325 0.001 0.37182655930519104 0.9065613746643066 0.8911336064338684 0.23827765882015228 0.37157535552978516 0.8391880989074707 0.8176671862602234 39 0.11993639171123505 0.001 0.37182655930519104 0.9084023237228394 0.8925207257270813 0.22297391295433044 0.37157535552978516 0.8404833674430847 0.8346469402313232 40 0.11878598481416702 0.001 0.37182655930519104 0.9090615510940552 0.8941413164138794 0.22415445744991302 0.37157535552978516 0.8580552339553833 0.8152300715446472 41 0.1256236732006073 0.001 0.37182655930519104 0.9046309590339661 0.8880045413970947 0.20100584626197815 0.37157535552978516 0.8520526885986328 0.8423823714256287 42 0.10843898355960846 0.001 0.37182655930519104 0.9163806438446045 0.903978168964386 0.21887923777103424 0.37157535552978516 0.86836838722229 0.8237167596817017 43 0.10670299828052521 0.001 0.37182655930519104 0.9178842902183533 0.9054436683654785 0.21005834639072418 0.37157535552978516 0.8679876327514648 0.8253417611122131 44 0.10276217758655548 0.001 0.37182655930519104 0.9207708239555359 0.909300684928894 0.2151617556810379 0.37157535552978516 0.8735089302062988 0.8225894570350647 45 0.10141195356845856 0.001 0.3718271255493164 0.9218501448631287 0.9108821749687195 0.22106514871120453 0.37157535552978516 0.8555923700332642 0.8328163623809814 46 0.09918847680091858 0.001 0.37182655930519104 0.9235833883285522 0.9129346609115601 0.23230132460594177 0.37157535552978516 0.8555824756622314 0.8224022388458252 47 0.10588783025741577 0.001 0.37182655930519104 0.9191931486129761 0.9068878293037415 0.22423967719078064 0.37157535552978516 0.8427634239196777 0.825032114982605 48 0.103585384786129 0.001 0.37182655930519104 0.9209527969360352 0.9087461233139038 0.2110774666070938 0.37157535552978516 0.8639764785766602 0.8252225518226624 49 0.09157560020685196 0.001 0.37182655930519104 0.9292182922363281 0.9203035831451416 0.22161123156547546 0.37157535552978516 0.8649827837944031 0.8406093120574951 50 0.08616402745246887 0.001 0.37182655930519104 0.9334553480148315 0.9252204298973083 0.2387685328722 0.37157535552978516 0.8806527256965637 0.811405599117279 51 0.0846954956650734 0.001 0.37182655930519104 0.9345796704292297 0.9265674352645874 0.22581790387630463 0.37157535552978516 0.8756505846977234 0.8313769698143005 
ValueError: Exception encountered when calling layer "max_pooling2d_9" (type MaxPooling2D)
I have encountered a Problem while making VG16 model for Computer Vision, problem were connected with last layers with activation softmax
Code:
model_2 = Sequential([ Conv2D(input_shape=X_train.shape[1:], filters=64, kernel_size=(3,3,), padding='same'), Conv2D(filters=64, kernel_size=3, padding='same', activation='relu'), MaxPool2D(strides=(2,2)), Conv2D(128, 3, padding='same', activation='relu'), Conv2D(128, 3, padding='same', activation='relu'), MaxPool2D(strides=(2,2)), Conv2D(256, kernel_size=(3,3), padding='same', activation='relu'), Conv2D(256, 3, padding='same', activation='relu'), Conv2D(256, 3, padding='same', activation='relu'), MaxPool2D(strides=(2,2)), Conv2D(512, 3, padding='same', activation='relu'), Conv2D(512, 3, padding='same', activation='relu'), Conv2D(512, 3, padding='same', activation='relu'), MaxPool2D(strides=(2,2)), Conv2D(512, 3, padding='same', activation='relu'), Conv2D(512, 3,padding='same', activation='relu'), Conv2D(512, 3, padding='same', activation='relu'), MaxPool2D(strides=(2,2)), Flatten(), Dense(10, activation='softmax') ])
Error:
> 21 Dense(10, activation='softmax') 22 ]) in _create_c_op(graph, node_def, inputs, control_inputs, op_def) 1937 except errors.InvalidArgumentError as e: 1938 # Convert to ValueError for backwards compatibility. > 1939 raise ValueError(e.message) 1940 1941 return c_op ValueError: Exception encountered when calling layer "max_pooling2d_9" (type MaxPooling2D). Negative dimension size caused by subtracting 2 from 1 for '{{node max_pooling2d_9/MaxPool}} = MaxPool[T=DT_FLOAT, data_format="NHWC", explicit_paddings=[], ksize=[1, 2, 2, 1], padding="VALID", strides=[1, 2, 2, 1]](Placeholder)' with input shapes: [?,1,1,512].

Kernel Size for 3D Convolution
The kernel size of 3D convolution is defined using depth, height and width in Pytorch or TensorFlow. For example, if we consider a CT/MRI image data with 300 slices, the input tensor can be (1,1,300,128,128), corresponding to (N,C,D,H,W). Then, the kernel size can be (3,3,3) for depth, height and width. When doing 3D convolution, the kernel is passed in 3 directions.
However, I was confused if we change the situation from CT/MRI to a colourful video. Let the video has 300 frames, then the input tensor will be (1,3,300,128,128) because of 3 channels for RGB images. I know that for a single RGB image, the kernel size can be 3X3X3 for channels, height and width. But when it comes to a video, it seems both Pytorch and Tensorflow still use depth, height and width to set the kernel size. My question is, if we still use a kernel of (3,3,3), is there a potential fourth dimension for the colour channels?

Simple linear regression code leads to unreasonable cost function
I am trying to write a simple linear regression code from scratch as my first Machine Learning exercise. However, when I try to perform the gradient descent over a for loop, my code leads to unreasonable results (crazy high cost function and coefficients). This is clearly wrong since the cost function has to decrease over the steps. Can someone point where in my code I am doing wrong? Here is a code example:
import numpy as np x = np.arange(1,11,1)# simple feature values y0 = 2 + 4*x # some true values for the coefficients y_rand = np.random.uniform(low=3,high=3,size = len(x)) # adding some randomness in the data y = y0+y_rand def linear_regression(x, y,a0,a1, learning_rate,iterate): m = len(x) # length of inputs for i in range(iterate): temp0 = a0 temp1 = a1 y_theory = temp0 + temp1*x J.append(np.sum( (y_theoryy)**2 )/2/m ) # cost function da0 =learning_rate*np.sum(y_theoryy)/m da1 =learning_rate*np.sum((y_theoryy)*x)/m a0 = temp0da0 #updating the intercept a1 = temp1 da1#updating the slope pdb.set_trace() return (a0,a1,J)
When I call
(a0,a1,J) = linear_regression(x,y,1,1, 0.1,25)
after finishing the for loop, all a0, a1, and J becomes unreasonably high (some numbers like 1e24 for the cost function).

Linear Regression Stochastic Gradient Descent
I am trying to fit a sinusoidal wave (sin(2 pi x)) with some gaussian noise added to it. I am using the stochastic gradient descent algorithm, and the model I am trying to fit is linear in the parameters. I have used a simple basis function of
[1 x^1 x^2 ... x^5]
. The loss function is least squared loss.def gradient_descent(phi, Y, W, a): N = len(Y) for i in range(N): dE_dW = (np.matmul(np.array([W]), np.array([phi[i]]).T)[0][0]  Y[i]) * phi[i] W = W  a * dE_dW return W
For sampling I am doing this,
noise_sample = np.random.normal(loc = 0, scale = 0.07, size = sample_size) for i in range(sample_size): x = random.uniform(0.0, 0.5) y = sin(x) X.append(x), Y.append(y) X, Y = np.array(X), np.array(Y) permutation = np.random.permutation(sample_size) X, Y = X[permutation], Y[permutation] Y = np.add(Y, noise_sample) order = 5 phi = np.array([np.ones(sample_size)]).T for i in range(order): phi = np.c_[phi, X ** (i + 1)] W = np.random.uniform(low=0.0, high=1.0, size=(order+1,))
I am getting this as the fitted curve in this case (orange).
When I try for the same degree using the closed form solution,
phi_inv = np.matmul(np.linalg.inv(np.matmul(phi.T, phi)), phi.T) weights = np.matmul(phi_inv, Y.T)
I am getting the desired curve. Is there something I am doing wrong?

What is the tradeoff between batch size and number of layers to train a neural network?
I am reading a paper about making training efficient. This paper assumes a high number of layers with almost a very small batch size. Is it a good decision?

Compute gradients across two layers using gradients calculated from a previous layer using tf.gradients or tf.GradientTape
I want to use the gradients of one layer to calculate the gradients of the layer that comes before it.
My motivation for doing this is, when I tried to use model parallelism using tf.device, I found out that backpropagation has been running on CPU. The entire Backprop started running on a chosen tf.device only after I wrapped the call to GradientTape(when it computes the gradient) within the tf.device context manager. Since the model is split, I want the backprop of each partitionÂ to execute on the device where that partition is placed.
Ideally, I would like to find out a method with which this oversimplified pseudocode is possible.
with tf.device(device_3): grad_3 = tf.gradients(loss, trainable_vars_of_partition_3) with tf.device(device_2): grad_2 = tf.gradients(grad_3, trainable_vars_of_partition_2) with tf.device(device_1): grad_1 = tf.gradients(grad_2, trainable_vars_of_partition_1) grads = concat(grad_1, grad_2, grad_3)
If something like this exists then I would be overjoyed if you could point me in the right direction.
Unfortunately, I could not find something as simple as this. The next best approach that I could think of was using the gradients of one layer to find the gradients of a layer that comes before it. Using chain rule and backpropagation, I feel that this should be possible.
I created this toy example, solving which is the first step towards the final goal.
Let's say we have a model with 3 dense layers without activations functions. X, Y as defined as follows:
x = tf.concat([tf.random.uniform([1, 10], minval=0, maxval=0.25), tf.random.uniform([1, 10], minval=0.25, maxval=0.5), tf.random.uniform([1, 10], minval=0.5, maxval=0.75), tf.random.uniform([1, 10], minval=0.75, maxval=1.), ], axis = 0) y = tf.constant(0., shape=[4, 1]) d1 = tf.keras.layers.Dense(5, name='d1') d2 = tf.keras.layers.Dense(2, name='d2') d3 = tf.keras.layers.Dense(1, name='d3')
I am using a tf.function in this toy example but an answer with eager mode enabled, using GradientTape will also be appreciated.
@tf.function def tf_func(x, y, d1, d2, d3): # Using shortforms of these function helped the code look neater and more readable to me. g = tf.gradients rs = tf.reduce_sum rm = tf.reduce_mean o1 = d1(x) o2 = d2(o1) o3 = d3(o2) l = tf.reduce_mean(tf.square(o3  y)) w3, w2, w1 = d3.trainable_variables, d2.trainable_variables, d1.trainable_variables tf.print('actual grads' + '=' * 80) dl_dw3 = g(l, w3) dl_dw2 = g(l, w2) tf.print('dl_dw2: \n', dl_dw2) dl_dw1 = g(l, w1) tf.print() tf.print() tf.print('reference grads' + '=' * 80) dl_do1 = g(l, o1) dl_do2 = g(l, o2) tf.print('dl_do2: \n', dl_do2) dl_do3 = g(l, o3) dl_dw1 = g(l, w1) dl_dw2 = g(l, w2) dl_dw3 = g(l, w3) do3_o2 = g(o3, o2) do2_do1 = g(o2, o1) do3_w3 = g(o3, w3) do2_w2 = g(o2, w2) do1_w1 = g(o1, w1) tf.print('testing chain_rule method' + '=' * 80) # Added a 't' before derivatives to differentiate between ref_grads and grads obtained using chain rule tdl_do3 = g(l, o3) # same as ref_grads tdo3_dw3 = g(o3, w3) # same as ref_grads tdl_dw3 = [rm(tdl_do3) * tdo3_dw3[0], rm(tdl_do3) * tdo3_dw3[1]] # same as actual grads tdo3_do2 = g(o3, o2) # same as ref_grads tdl_do2 = tdo3_do2 * rm(tdl_do3, axis=0) # same as ref_grads tf.print('tdl_do2: \n', tdl_do2) tdo2_dw2 = g(o2, w2) tf.print('tdo2_dw2: \n', tdo2_dw2) tdl_dw2 = [tdo2_dw2[0] * rm(tdl_do2, axis=[1]), tdo2_dw2[1] * rm(tdl_do2, axis=[1])] tf.print('tdl_dw2: \n', tdl_dw2) return None tf_func(x, y, d1, d2, d3)
The output was:
actual grads================================================================================ dl_dw2: [[[3.04819393 1.30051827] [5.02123785 2.14232159] [0.260933906 0.111328] [5.87596226 2.50699162] [1.9655633 0.838611722]], [4.69162369 2.0016911]] reference grads================================================================================ dl_do2: [[[0.43842113 0.187053293] [0.889310718 0.379426271] [1.41650343 0.604354143] [1.94738865 0.830857456]]] testing chain_rule method================================================================================ tdl_do2: [[[0.43842113 0.187053293] [0.889310718 0.379426271] [1.41650343 0.604354143] [1.94738865 0.830857456]]] tdo2_dw2: [[[2.10966444 2.10966444] [3.48670244 3.48670244] [0.22972326 0.22972326] [3.95618558 3.95618558] [1.3790133 1.3790133]], [4 4]] tdl_dw2: [[[2.47443795 1.05572414] [4.08957386 1.74482536] [0.26944378 0.114958748] [4.64023352 1.97976542] [1.61745286 0.690089643]], [[4.69162369 2.0016911]]]
For some reason, gradients wrt weights in tdl_dw2 and dl_dw2 differ slightly. Every value in tdl_dw2 is slightly less than dl_dw2 even though the gradients wrt biases are the same. I cannot figure out why.
The gradient of loss wrt to w3 is as expected.
I used tf.reduce_mean to replicate what tf.gradients was doing internally as far as I understand. Please correct me if I am wrong.
From Tensorflow's documentations:
gradients() adds ops to the graph to output the derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys and for x in xs.
tf.gradients constructs symbolic derivatives of sum of ys w.r.t. x in xs.
Any guidance or help will be greatly appreciated, thank you.
Some Similar StackOverflow questions(there are many more):
 Compute gradients across two models
 Is it possible to acquire an intermediate gradient? (Tensorflow)
 Breaking TensorFlow gradient calculation into two (or more) parts
Here is a colab notebook with the code: https://colab.research.google.com/drive/1034hu6Zo766spKu5qfeG4c2Yv2vDGM?usp=sharing

Gradient of a Neuronal Network
I have a trained neural network, that predicts the noise of a given image. Now, I want to use it, to calculate a subgradient of my NN (wrt the norm of the output).
I want to use this in a larger algorithm, but since I do not get it to work as expected, I created this minimal example.
model = load_model(os.path.join(nn_path, 'NN' + name +'.h5')) #trained NN y0_tensor = np.load(os.path.join(data_path, 'fbp_val' + name +'.npy'))[0] #typical input of the NN max_iter = 15 alpha = 0.45 s = 0.2 for iters in range(max_iter): with tf.GradientTape() as tape: tape.watch(y0_tensor) prediction = model(y0_tensor) norm = tf.norm(prediction) grad = tape.gradient(norm, y0_tensor) #d norm / d y0_tensor y0_tensor = y0_tensor  s * alpha * grad
I would expect it to become more and more simmilar to the output of the NN with increasing number of iterations. But it just seems to add noise to it.
Note that I am not fixed on using GardientType. I also tried to use keras.backend.gradients with no succes.
For more background information, here is what I am trying to do:
Note that the subgradient of the regularization term can be evaluated by standard software for network training with the backpropagation algorithm. (Source: https://iopscience.iop.org/article/10.1088/13616420/ab6d57, chapter 4.2)

How to get backpropagation of bias?
My goal is to calculate backpropagation(Especially the backpropagation of the bias).
For example, X, W and B are python numpy array, such as [[0,0],[0,1]] , [[5,5,5],[10,10,10]] and [1,2,3] for each. And, suppose dL/dT is [[1,2,3],[4,5,6]].
 Formula of dL/dW is given at the figure below. why is it calculated that way?
 How to calculate dL/dB? Answer should be [5, 7, 9]. why is it calculated that way?