Do gradients flow through operations performed on TensorFlow variables across session.run calls? Persistent graphs?
My understanding is that TensorFlow variables don't do this — is there a way to maintain a partially computed graph persistently across session.run
calls?
partial_run
stores a partially computed graph, but it can only be used once and is not persistent. Variables on the other hand are persistent, but as far as I'm aware do not store the graph of operations that led up to them.
Just to make my question more clear: if I have a matrix of TensorFlow variables and perform some operations on that matrix (say, using assign
or scatter_update
), would the operations that led up to the new matrix be stored in the computation graph and allow gradients to flow through?
I'm aware this would make TensorFlow far more dynamic than it probably is.
See also questions close to this topic

Tensorflow Custom Estimator and tf.contrib.predictor.from_estimator feed a tensor  The value of a feed cannot be a tf.Tensor object
I have a custom estimator E1 that uses a
tf.contrib.predictor.from_estimator
of another custom estimator E2 within its predict operations.The predictor from estimator is defined with the following serving input function:
def serving_input_fn(): inputs_x = tf.placeholder(tf.int64, shape=[batch_size, input_len], name='inputs_x') inputs_y = tf.placeholder(tf.int64, shape=[batch_size, input_len], name='inputs_y') features = {'inputs_x':inputs_x, 'inputs_y':inputs_y} return tf.estimator.export.ServingInputReceiver(features, features) predictor = tf.contrib.predictor.from_estimator(custom_estimator_model_2, serving_input_fn) model = CustomEstimator(model_dir, params, predictor=predictor)
In the model function of E1, I compute a tensor that needs to be fed into the
predictor
like this.feats_x = tf.gather_nd(.....) feats_y = tf.gather_nd(.....) inputs = {'inputs_x': feats_x, 'inputs_y': feats_y} predictor(inputs)
However, I cannot feed a tensor into the feed.
TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, numpy ndarrays, or TensorHandles.
Since all my tensor operations are within the model function, the session is not exposed for me to evaluate these tensors before feeding them. Is there any alternate to solving this problem? Should I modify my serving_input_fn to not include placeholders? I'd still like to use the predictor because of the performance gain.

Frozen Graph has worse results than retrained graph
I followed the basic tensorflow for poets (googlecodelabs), but then when trying to freeze the graph I added the following to the end of the main function:
# Initialize all variables init = tf.global_variables_initializer() sess.run(init) saver = tf.train.Saver() saver.save(sess,'./pods.ckpt') tf.train.write_graph(sess.graph.as_graph_def(), '.', 'pods.pbtxt', as_text=True) gf = tf.GraphDef() gf.ParseFromString(open('./tf_files/retrained_graph.pb','rb').read()) [print(n.name) for n in gf.node if n.op in ( 'Softmax','Placeholder')] # freeze the graph freeze_graph.freeze_graph('pods.pbtxt', "", False, './pods.ckpt', "final_result", "save/restore_all", "save/Const:0", 'frozenpods.pb', True, "" )
After retraining with that added to the end of the main function the last bit of the output looks like this:
INFO:tensorflow:20180924 17:39:07.615455: Step 117: Validation accuracy = 100.0% (N=100) INFO:tensorflow:Final test accuracy = 100.0% (N=28) INFO:tensorflow:Froze 2 variables. Converted 2 variables to const ops. input final_result INFO:tensorflow:Restoring parameters from ./pods.ckpt INFO:tensorflow:Froze 2 variables. Converted 2 variables to const ops. 560 ops in the final graph.
I'm wondering why only 2 variables are frozen to const ops. Also why are the score accuracies much lower using the frozen model than the original generated
retrained_graph.pb
file?Thanks for your help!

Access all tensorflow graphs
How can I get the list or set of all tensorflow graphs? There's
tf.get_default_graph()
, so presumably atf.get_all_graphs()
should exist, but I can't find it or anything with similar functionality. 
How to penalise my agent for using different action?
I'm using A3C in tensroflow, I want to keep my agent using the same action and only change when necessary and will have a benefit from changing.

what functions of OPENCV mainly used in image analysis or particularly in deep learning?
I want to know about what functions or how we use OPENCV in analysis the image which is overall beneficial for deep learning models?

How to implement asymmetric structure of LeNet5 in tensorflow?
There is an asymmetric structure from S2 to C3 in LeNet5, with a corresponding connection table. LeNet5 connection table How to implement such architecture via Tensorflow?

Using automatic differentiation on a function that makes use of a preallocated array in Julia
My long subject title pretty much covers it.
I have managed to isolate my much bigger problem in the following contrived example below. I cannot figure out where the problem exactly is, though I imagine it has something to do with the type of the preallocated array?
using ForwardDiff function test() A = zeros(1_000_000) function objective(A, value) for i=1:1_000_000 A[i] = value[1] end return sum(A) end helper_objective = v > objective(A, v) ForwardDiff.gradient(helper_objective, [1.0]) end
The error reads as follows:
ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{getfield(Main, Symbol("##69#71")){Array{Float64,1},getfield(Main, Symbol("#objective#70")){Array{Float64,1}}},Float64},Float64,1})
In my own problem (not described here) I have a function that I need to optimise using Optim, and the automatic differentiation it offers, and this function makes use of a big matrix that I would like to preallocate in order to speed up my code. Many thanks.

Why don't C++ compilers do better constant folding?
I'm investigating ways to speed up a large section of C++ code, which has automatic derivatives for computing jacobians. This involves doing some amount of work in the actual residuals, but the majority of the work (based on profiled execution time) is in calculating the jacobians.
This surprised me, since most of the jacobians are propagated forward from 0s and 1s, so the amount of work should be 24x the function, not 1012x. In order to model what a large amount of the jacobian work is like, I made a super minimal example with just a dot product (instead of sin, cos, sqrt and more that would be in a real situation) that the compiler should be able to optimize to a single return value:
#include <Eigen/Core> #include <Eigen/Geometry> using Array12d = Eigen::Matrix<double,12,1>; double testReturnFirstDot(const Array12d& b) { Array12d a; a.array() = 0.; a(0) = 1.; return a.dot(b); }
Which should be the same as
double testReturnFirst(const Array12d& b) { return b(0); }
I was disappointed to find that, without fastmath enabled, neither GCC 8.2, Clang 6 or MSVC 19 were able to make any optimizations at all over the naive dotproduct with a matrix full of 0s. Even with fastmath (https://godbolt.org/z/GvPXFy) the optimizations are very poor in GCC and Clang (still involve multiplications and additions), and MSVC doesn't do any optimizations at all.
I don't have a background in compilers, but is there a reason for this? I'm fairly sure that in a large proportion of scientific computations being able to do better constant propagation/folding would make more optimizations apparent, even if the constantfold itself didn't result in a speedup.
While I'm interested in explanations for why this isn't done on the compiler side, I'm also interested for what I can do on a practical side to make my own code faster when facing these kinds of patterns.

Tensorflow: Differentiable Primitives
I was under the impression that all tensorflow primitives are differentiable. Under this "illusion" I wrote this function in the hopes that tensorflow will just automatically differentiate it and I can backprop erros through it.
Rankweight function:
def ranked(a): lens = tf.convert_to_tensor(tf.range(1, (tf.size(a) + 1))) rankw01 = tf.cast(tf.convert_to_tensor(tf.contrib.framework.argsort(tf.contrib.framework.argsort(a)) + 1), tf.float64) rankw02 = tf.convert_to_tensor(rankw01  ((tf.size(a) + 1)/2)) rankw03 = tf.divide(rankw02, tf.reduce_sum(tf.gather(rankw02, tf.where(tf.greater(rankw02, 0))))) rankw04 = tf.cast(rankw03, tf.float32) return rankw04
Unfortunately the function works as expected in the forward pass but does not work in the reverse pass because the derivative does not exist (from the error I keep getting).
The function is explained in the attached image:
I have the following questions:
1: Why can't I take the derivative of the function above.
2: If it is an implementation issue, can you suggest how I can rewrite it so I can take its derivative and backprop errors through it?
3: Are all tensorflow ops differentiable?

Tensorflow tf.placeholder with shape = []
I am looking at a Tensorflow code that has learning rate input to the graph using placeholder with shape = [], as below:
self.lr_placeholder = tf.placeholder(dtype=tf.float32, shape=[])
I looked at the official documentation page of Tensorflow (https://www.tensorflow.org/api_docs/python/tf/placeholder) to understand what would shape=[] mean, but could not get an explanation for the shape set to empty list. If someone can explain what does this mean.

How can I replace a variable with another one in Tensorflow's computation graph?
Problem: I have two pretrained models with variables W1,b1 and W2,b2 saved as numpy arrays.
I want to set a mixture of these two pretrained models as the variables of my model, and only update the mixture weights alpha1 and alpha2 during training.
In order to do that I create two variables alpha1 and alpha2 and load the numpy arrays and create the mixture nodes: W_new, b_new.
I want to replace W and b in the computation graph with W_new and b_new and then only train alpha1 and alpha2 parameter by
opt.minimize(loss, var_list= [alpha1, alpha2])
.I don't know how to replace W_new and b_new in the computation graph. I tried assigning
tf.trainable_variables()[0] = W_new
but this doesn't work.I'd appreciate if anyone could give me some clues.
Note 1: I don't want to assign values to W and b (this will disconnect the graph from alpha1 and alpha2), I want the mixture of parameters to be a part of the graph.
Note 2: You might say that you could compute y using the new variables, but problem is, the code here is just a toy sample to simplify things. In reality instead of linear regression I have several bilstms with crf. So I can't manually compute the formula. I'll have to replace these variables in the graph.
import tensorflow as tf import numpy as np np.random.seed(7) tf.set_random_seed(7) #define a linear regression model with 10 params and 1 bias with tf.variable_scope('main'): X = tf.placeholder(name='input', dtype=float) y_gold = tf.placeholder(name='output', dtype=float) W = tf.get_variable('W', shape=(10, 1)) b = tf.get_variable('b', shape=(1,)) y = tf.matmul(X, W) + b #loss = tf.losses.mean_squared_error(y_gold, y) #numpy matrices saved from two different trained models with the exact same architecture W1 = np.random.rand(10, 1) W2 = np.random.rand(10, 1) b1 = np.random.rand(1) b2 = np.random.rand(1) with tf.variable_scope('mixture'): alpha1 = tf.get_variable('alpha1', shape=(1,)) alpha2 = tf.get_variable('alpha2', shape=(1,)) W_new = alpha1 * W1 + alpha2 * W2 b_new = alpha1 * b1 + alpha2 * b2 all_trainable_vars = tf.trainable_variables() print(all_trainable_vars) #replace the original W and b with the new mixture variables in the computation graph (**doesn't do what I want**) all_trainable_vars[0] = W_new all_trainable_vars[1] = b_new #this doesn't work #note that I could just do the computation for y using the new variables as y = tf.matmul(X, W_new) + b_new #but the problem is this is just a toy example. In real world, my model has a big architecture with several #bilstms whose variables I want to replace with these new ones. #Now what I need is to replace W and b trainable parameters (items 0 and 1 in all_trainable vars) #with W_new and b_new in the computation graph. with tf.Session() as sess: sess.run(tf.global_variables_initializer()) train_writer = tf.summary.FileWriter('./' + 'graph', sess.graph) #print(sess.run([W, b])) #give the model 3 samples and predict on them print(sess.run(y, feed_dict={X:np.random.rand(3, 10)}))
Why do I want to do this?
Assume you have several pretrained models (in different domains) but you don't have access to any of their data.
Then you have a little bit of training data from another domain that doesn't give you that much performance, but if you could train the model jointly with the data that you don't have you could get a good performance.
Assuming the data is somehow represented in the trained models, we want to learn a mixture of the pretrained models, by learning the mixing coefficients, using little labelled data that we have as supervision.
We don't want to pretrain any parameters, we only want to learn a mix of pretrained models. What are the mixture weights? we need to learn that from the little supervision that we have.