Normalized Mutual Information in Tensorflow
Is that possible to implement normalized mutual information in Tensorflow? I was wondering if I can do that and if I will be able to differentiate it. Let's say that I have predictions P and labels Y in two different tensors. Is there an easy way to use normalized mutual information?
I want to do something similar to this:
https://course.ccs.neu.edu/cs6140sp15/7_locality_cluster/Assignment6/NMI.pdf
See also questions close to this topic

Keras Lambda Layer Before Embedding: Use to Convert Text to Integers
I currently have a
keras
model which uses anEmbedding
layer. Something like this:input = tf.keras.layers.Input(shape=(20,) dtype='int32') x = tf.keras.layers.Embedding(input_dim=1000, output_dim=50, input_length=20, trainable=True, embeddings_initializer='glorot_uniform', mask_zero=False)(input)
This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.
Two issues:
1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/
2) Even if I could add a
Lambda
layer before theEmbedding
, I'd need it to keep track of certain state (like a dictionary mapping specific words to integers). How might I go about this stateful preprocessing?In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.
Thanks!

Freezing weight of neural network such that its output takes a particular value at a particular point (tensorflow)
Let's say I have a neural network that looks like this
def neural_net(x): layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1']) layer_1 = tf.nn.sigmoid(layer_1) layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']) layer_2 = tf.nn.sigmoid(layer_2) out_layer = tf.matmul(layer_2, weights['out']) + biases['out'] return out_layer
Is there a way in tensorflow to fix the weights in such a way that
neural_net(a)
always returnsb
(wherea,b
are real numbers), e.g.,f(1) = 0
? 
tf.transform: add preprocessing to Keras model?
I have a
keras
model for text classification using atensorflow
backend. It currently assumes the input is a numpy array of integers.I'd like to modify this so that I can train and predict on raw text. From what I've gathered, this involves using
tf.transform
to convert a tensor of strings into a tensor of integers.I've done this using
tf.transform
but now am unsure how to add this preprocessing step to my model as the very first layer / step. To be clear, my input data looks like this:[{"review":"movie is great}, {"review":"awful film"}]
and the output is:
[{"review_out": array([1, 1, 1, 0, 2])}, {"review_out": array([1, 1, 1, 3, 4])]
The function that does this is called
preprocess
. So I just want to include runningpreprocess
as the first step in my DAG.How might I do this?
For reference, this is important because I want to do live prediction on ML Engine.

R  parallel computing (Chunking) to generate Gower distance Matrix
I have microdata with 50k rows and 10 columns. Since features are mixed type I decided to use gower distance to generate a dissimilarity matrix. But as you can imagine I got "Error: cannot allocate vector of size XGB" error.
I am just wondering how can I handle this data without pruning it. I read some suggestions like ff but I could not implement it. Can someone help me please ?

Why finding Elbow (or using Lmethod) with CH and SIL for number of cluster selection?
In this paper, the author uses CH (Caliński–Harabasz index) and SIL (Silhouette index) methods to decide the number of clusters. However, instead of selecting the highest values, it applies a Lmethod on these index, choosing its knee (elbow) points.
In this link there are many subquestions, in which one is about why the authors use the maximal 'stability' of CH to define the number of clusters. However, there wasn't a answer for this subquestion that has explained that decision.
The maximal 'stability' on that question is related with the Lmethod as they chose points where the changes start to be the smallest.
What could be the reason for using the Lmethod (or the maximal stability) with CH and SIL indexes, which usually are wanted to me maximized? (I would understand if they would be using the within sum of squares, for instance)

Clustering of Categorical Data
I have a dataset (740 participants with 96 variables for each) and I'm wondering about the best way to perform clustering analysis on this?
The goal is to see if subgroups (or clusters) will emerge with similar traits being prominent (such as age, presence of another disease) within the groups.
The variables range from 0 (trait not present) to 9.
I have tried Hierarcial Clustering in R with no luck.
Thanks! CD