total_float_ops is 0 by tf.profiler.profile from tensorflow
System information
OS Platform: Linux Ubuntu 16.04
TensorFlow version (use command below): 1.8
Python version: 3.5
CUDA/cuDNN version: CUDA 9 / cuDNN 7
GPU model and memory: GeForce GTX 1080Ti
Exact command to reproduce:
import cv2
import time
import tensorflow as tf
from tensorflow.python.framework import graph_util
ModelFile = "OX_Predict_frozen.pb"
def load_pb(pb):
with tf.gfile.GFile(pb, "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
with tf.Graph().as_default() as graph:
tf.import_graph_def(graph_def, name='')
return graph
import math
M = math.pow( 10, 6 )
print(M)
def log_FLOP():
# ***** (3) Load frozen graph *****
g2 = load_pb(ModelFile)
with g2.as_default():
flops = tf.profiler.profile(g2, options = tf.profiler.ProfileOptionBuilder.float_operation())
print('FLOP after freezing(M): ', flops.total_float_ops/ M)
def main():
log_FLOP()
if __name__ == "__main__":
main()
Describe the problem:
I'm trying to log the number of multiply-add operations (MAC) in my network by "tf.profiler.profile".
Here is the model(.pb)
the model work perfect when predict, but it always return 0 flops with "tf.profiler.profile"
any suggestion ?
Source code / logs: the sample code to get .pb is following: https://github.com/ChiFang/TensorFlow_XO_example
it takes only few sec~~~~
See also questions close to this topic
-
How to map a dataframe df (having two columns) with another dataframe df1(having three columns) and update it
have a df1 with values
0 1 0 abc def unknown 1 uvw xyz unknown 2 cricket ball unknown 3 tennis racket unknown
And df2 with values
0 0 1 0 abc def password 1 cricket ball password1 2 tennis racket password2 ---------- ---------- ---------- 22600 uvw xyz password3
should map the df1 and df2 with 0 values and update 1 column in df1
Output should be
0 1 0 abc def | password 1 uvw xyz | password3 2 cricket ball | password1 3 tennis racket | password2
-
Create multi line byte string in Python 3.7
How to create a multiline byte string in Python? I want to send multiline string in Python 3.7 using
socket.send()
I tried following,
Case 1
strng = """foo bar""" byte_str = strng.encode()
When I print byte_str, output is
"foo\nbar"
Case 2
byte_str = b"""foo bar"""
When I print byte_str, output is
"foo\nbar"
In both cases new line is being replaced with '\n'.
-
Relative XPath Wrongly Selects Same Element in Loop
I'm scraping some data.
One of the data points I need is the date, but the table cells containing this data only include months and days. Luckily the year is used to categorize the tables.
Actual Output
For some reason the following code is selecting the same
preceding-sibling::(h3 or h2)/span).text
element for every iteration.# random, hypothetical values Page #1 element="921" element="921" element="921" ... Page #2 element="1283" element="1283" element="1283" ...
Expected Output
I would expect the following script to select the unique
preceding-sibling::/span).text
element relative to each//h2/following::table
element as it loops through all of them, but this isn't the case.# random, hypothetical values Page #1 element="921" element="922" element="923" ... Page #2 element="1283" element="1284" element="1285" ...
How come the following code selects the same element for every iteration on each page?
# -*- coding: utf-8 -*- from selenium import webdriver from selenium.webdriver import Firefox from selenium.webdriver.common.by import By links_sc2 = [ 'https://liquipedia.net/starcraft2/Premier_Tournaments', 'https://liquipedia.net/starcraft2/Major_Tournaments', 'https://liquipedia.net/starcraft2/Minor_Tournaments', 'https://liquipedia.net/starcraft2/Minor_Tournaments/HotS', 'https://liquipedia.net/starcraft2/Minor_Tournaments/WoL' ] ff = webdriver.Firefox(executable_path=r'C:\\WebDriver\\geckodriver.exe') urls = [] for link in links_sc2: tables = ff.find_elements(By.XPATH, '//h2/following::table') for table in tables: try: # premier, major year = table.find_element(By.XPATH, './preceding-sibling::h3/span').text except: # minor year = table.find_element(By.XPATH, './preceding-sibling::h2/span').text print(year) ff.quit()
-
Is concatenated matrix multiplication faster than multiple non-concatenated matmul? If so, why?
The definition of the LSTM cell involves 4 matrix multiplications with the input, and 4 matrix multiplications with the output. We can simplify the expression by using a single matrix multiply by concatenating 4 small matrices (now the matrix are 4 times larger).
My question is: does this improve the efficiency of the matrix multiplication? If so, why? Because we can put them in continuous memory? Anything else?
-
VGG16 Tensorflow implementation does not learn on cifar-10
This VGGNet was implemented using Tensorflow framework, from scratch, where all of the layers are defined in the code. The main problem I am facing here is that the training accuracy, not to mention validation accuracy, goes up even though I wait it out for a decent amount of time. There are few problems that I suspect is causing this problem right now. First, I think the network is too deep and wide for cifar-10 dataset. Second, extracting data batch out of the whole dataset is not exhaustive, i.e. Batch selection is used over and over again over the whole dataset without eliminating those examples that were selected in the ongoing epoch.
However, still I could not get this code to work after many hours and days of experiments.
I wish I could extract the problematic code section to ask a question, but since I cannot pinpoint the exact section here, let me upload my whole code.
import os import sys import tensorflow as tf import numpy as np import scipy as sci import math import matplotlib.pyplot as plt import time import random import imageio import pickle import cv2 import json from pycocotools.coco import COCO class SVGG: def __init__(self, num_output_classes): self.input_layer_size = 0 self.num_output_classes = num_output_classes # Data self.X = [] self.Y = [] self.working_x = [] self.working_y = [] self.testX = [] self.testY = [] # hard coded for now. Have to change. self.input_data_size = 32 # 32 X 32 self.input_data_size_flat = 3072 # 32 X 32 X 3 == 3072 self.num_of_channels = 3 # 3 for colour image self.input_data_size = 32 # 32 X 32 self.input_data_size_flat = self.input_data_size * self.input_data_size # 32 X 32 X 3 == 3072 self.num_of_channels = 3 # 3 for colour image self.convolution_layers = [] self.convolution_weights = [] self.fully_connected_layers = [] self.fully_connected_weights = [] def feed_examples(self, input_X, input_Y): """ Feed examples to be learned :param input_X: Training dataset X :param input_Y: Traning dataset label :return: """ # Take first input and calculate its size # hard code size self.X = input_X self.Y = input_Y self.input_data_size_flat = len(self.X[0]) * len(self.X[0][0]) * len(self.X[0][0][0]) def feed_test_data(self, test_X, test_Y): self.testX = test_X self.testY = test_Y def run(self): x = tf.placeholder(tf.float32, [None, self.input_data_size_flat], name='x') x_data = tf.reshape(x, [-1, self.input_data_size, self.input_data_size, 3]) y_true = tf.placeholder(tf.float32, [None, self.num_output_classes], name='y_true') y_true_cls = tf.argmax(y_true, axis=1) """ VGG layers """ # Create layers ######################################## Input Layer ######################################## input_layer, input_weight = self.create_convolution_layer(x_data, num_input_channels=3, filter_size=3, num_filters=64, use_pooling=True) # False ######################################## Convolutional Layer ######################################## ############### Conv Layer 1 ################# conv_1_1, w_1_1 = self.create_convolution_layer(input=input_layer, num_input_channels=64, filter_size=3, num_filters=64, use_pooling=False) conv_1_2, w_1_2 = self.create_convolution_layer(input=conv_1_1, num_input_channels=64, filter_size=3, num_filters=128, use_pooling=True) ############### Conv Layer 2 ################# conv_2_1, w_2_1 = self.create_convolution_layer(input=conv_1_2, num_input_channels=128, filter_size=3, num_filters=128, use_pooling=False) conv_2_2, w_2_2 = self.create_convolution_layer(input=conv_2_1, num_input_channels=128, filter_size=3, num_filters=256, use_pooling=True) ############### Conv Layer 3 ################# conv_3_1, w_3_1 = self.create_convolution_layer(input=conv_2_2, num_input_channels=256, filter_size=3, num_filters=256, use_pooling=False) conv_3_2, w_3_2 = self.create_convolution_layer(input=conv_3_1, num_input_channels=256, filter_size=3, num_filters=256, use_pooling=False) conv_3_3, w_3_3 = self.create_convolution_layer(input=conv_3_2, num_input_channels=256, filter_size=3, num_filters=512, use_pooling=True) ############### Conv Layer 4 ################# conv_4_1, w_4_1 = self.create_convolution_layer(input=conv_3_3, num_input_channels=512, filter_size=3, num_filters=512, use_pooling=False) conv_4_2, w_4_2 = self.create_convolution_layer(input=conv_4_1, num_input_channels=512, filter_size=3, num_filters=512, use_pooling=False) conv_4_3, w_4_3 = self.create_convolution_layer(input=conv_4_2, num_input_channels=512, filter_size=3, num_filters=512, use_pooling=True) ############### Conv Layer 5 ################# conv_5_1, w_5_1 = self.create_convolution_layer(input=conv_4_3, num_input_channels=512, filter_size=3, num_filters=512, use_pooling=False) conv_5_2, w_5_2 = self.create_convolution_layer(input=conv_5_1, num_input_channels=512, filter_size=3, num_filters=512, use_pooling=False) conv_5_3, w_5_3 = self.create_convolution_layer(input=conv_5_2, num_input_channels=512, filter_size=3, num_filters=512, use_pooling=True) layer_flat, num_features = self.flatten_layer(conv_5_3) ######################################## Fully Connected Layer ######################################## fc_1 = self.create_fully_connected_layer(input=layer_flat, num_inputs=num_features, num_outputs=4096) fc_2 = self.create_fully_connected_layer(input=fc_1, num_inputs=4096, num_outputs=4096) fc_3 = self.create_fully_connected_layer(input=fc_2, num_inputs=4096, num_outputs=self.num_output_classes, use_dropout=False) # Normalize prediction y_prediction = tf.nn.softmax(fc_3) # The class-number is the index of the largest element y_prediction_class = tf.argmax(y_prediction, axis=1) # Cost-Fuction to be optimized cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=fc_3, labels=y_true) # => Now we have a measure of how well the model performs on each image individually. But in order to use the # Cross entropy to guide the optimization of the model's variable swe need a single value, so we simply take the # Average of the cross-entropy for all the image classifications cost = tf.reduce_mean(cross_entropy) # Optimizer optimizer_adam = tf.train.AdamOptimizer(learning_rate=0.002).minimize(cost) # Performance measure correct_prediction = tf.equal(y_prediction_class, y_true_cls) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) total_iterations = 0 num_iterations = 100000 start_time = time.time() with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(num_iterations): x_batch, y_true_batch, _ = self.get_batch(X=self.X, Y=self.Y, low=0, high=40000, batch_size=128) feed_dict_train = {x: x_batch, y_true: y_true_batch} sess.run(optimizer_adam, feed_dict_train) if i % 100 == 99: # Calculate the accuracy on the training-set. x_batch, y_true_batch, _ = self.get_batch(X=self.X, Y=self.Y, low=40000, high=50000, batch_size=1000) feed_dict_validate = {x: x_batch, y_true: y_true_batch} acc = sess.run(accuracy, feed_dict=feed_dict_validate) # Message for printing. msg = "Optimization Iteration: {0:>6}, Training Accuracy: {1:>6.1%}" # print(sess.run(y_prediction, feed_dict=feed_dict_train)) # print(sess.run(y_prediction_class, feed_dict=feed_dict_train)) print(msg.format(i + 1, acc)) if i % 10000 == 9999: oSaver = tf.train.Saver() oSess = sess path = "./model/_" + "iteration_" + str(i) + ".ckpt" oSaver.save(oSess, path) if i == num_iterations - 1: x_batch, y_true_batch, _ = self.get_batch(X=self.testX, Y=self.testY, low=0, high=10000, batch_size=10000) feed_dict_test = {x: x_batch, y_true: y_true_batch} test_accuracy = sess.run(accuracy, feed_dict=feed_dict_test) msg = "Test Accuracy: {0:>6.1%}" print(msg.format(test_accuracy)) def get_batch(self, X, Y, low=0, high=50000, batch_size=128): x_batch = [] y_batch = np.ndarray(shape=(batch_size, self.num_output_classes)) index = np.random.randint(low=low, high=high, size=batch_size) counter = 0 for idx in index: x_batch.append(X[idx].flatten()) y_batch[counter] = one_hot_encoded(Y[idx], self.num_output_classes) y_batch_cls = Y[idx] counter += 1 return x_batch, y_batch, y_batch_cls def generate_new_weights(self, shape): w = tf.Variable(tf.truncated_normal(shape, stddev=0.05)) return w def generate_new_biases(self, shape): b = tf.Variable(tf.constant(0.05, shape=[shape])) return b def create_convolution_layer(self, input, num_input_channels, filter_size, num_filters, use_pooling): """ :param input: The previous layer :param num_input_channels: Number of channels in previous layer :param filter_size: W and H of each filter :param num_filters: Number of filters :return: """ shape = [filter_size, filter_size, num_input_channels, num_filters] weights = self.generate_new_weights(shape) biases = self.generate_new_biases(num_filters) layer = tf.nn.conv2d(input=input, filter=weights, strides=[1, 1, 1, 1], padding='SAME') layer += biases # Max Pooling if use_pooling: layer = tf.nn.max_pool(layer, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME') # ReLu. Using elu for better performance layer = tf.nn.elu(layer) return layer, weights def create_fully_connected_layer(self, input, num_inputs, num_outputs, use_dropout=True): weights = self.generate_new_weights(shape=[num_inputs, num_outputs]) biases = self.generate_new_biases(shape=num_outputs) layer = tf.matmul(input, weights) + biases layer = tf.nn.elu(layer) if use_dropout: keep_prob = tf.placeholder(tf.float32) keep_prob = 0.5 layer = tf.nn.dropout(layer, keep_prob) return layer def flatten_layer(self, layer): """ Flattens dimension that is output by a convolution layer. Flattening is need to feed into a fully-connected-layer. :param layer: :return: """ # shape [num_images, img_height, img_width, num_channels] layer_shape = layer.get_shape() # Number of features h x w x channels num_features = layer_shape[1: 4].num_elements() # Reshape layer_flat = tf.reshape(layer, [-1, num_features]) # Shape is now [num_images, img_height * img_width * num_channels] return layer_flat, num_features def unpickle(file): with open(file, 'rb') as file: dict = pickle.load(file, encoding='bytes') return dict def convert_to_individual_image(flat): img_R = flat[0:1024].reshape((32, 32)) img_G = flat[1024:2048].reshape((32, 32)) img_B = flat[2048:3072].reshape((32, 32)) #B G R mean = [125.3, 123.0, 113.9] img = np.dstack((img_R - mean[0], img_G - mean[1], img_B - mean[2])) img = np.array(img) # img = cv2.resize(img, (224, 224), img) return img def read_coco_data(img_path, annotation_path): coco = COCO(annotation_path) ids = list(coco.imgs.keys()) ann_keys = list(coco.anns.keys()) print(coco.imgs[ids[0]]) print(coco.anns[ann_keys[0]]) def one_hot_encoded(class_numbers, num_classes=None): if num_classes is None: num_classes = np.max(class_numbers) + 1 return np.eye(num_classes, dtype=float)[class_numbers] if __name__ == '__main__': data = [] labels = [] val_data = [] val_label = [] # cifar-10 counter = 0 for i in range(1, 6): unpacked = unpickle("./cifar10/data_batch_" + str(i)) tmp_data = unpacked[b'data'] tmp_label = unpacked[b'labels'] inner_counter = 0 for flat in tmp_data: converted = convert_to_individual_image(flat) data.append(converted) labels.append(tmp_label[inner_counter]) counter += 1 inner_counter += 1 cv2.imwrite("./img/" + str(counter) + ".jpg", converted) # Test data unpacked = unpickle("./cifar10/test_batch") test_data = [] test_data_flat = unpacked[b'data'] test_label = unpacked[b'labels'] for flat in test_data_flat: test_data.append(convert_to_individual_image(flat)) svgg = SVGG(10) svgg.feed_examples(input_X=data, input_Y=labels) svgg.feed_test_data(test_X=test_data, test_Y=test_label) svgg.run()
-
How to use tensorflow-gpu (with retrain.py) with Nvidia GPU card on a Windows 10 PC?
I have a setup with retrain.py to work with tenforflow and working fine. Now I want to try tensoflow-gpu version on a new PC with Nvidia GPU card and Intel Zeon processor. Are there any changes to be done to run retrain.py (code changes)? Do I need to install any special drivers ?
-
Face Detection on an Image
The following code is used to detect facial emotions in a Live video stream which is captured by the webcam. Suppose I need to do the same i.e detect emotions in a particular image (which is indeed captured and stored in a file) And return the probability of each emotions.
My code : Live video :
from keras.preprocessing.image import img_to_array import imutils import cv2 from keras.models import load_model import numpy as np # parameters for loading data and images detection_model_path = 'haarcascade_files/haarcascade_frontalface_default.xml' emotion_model_path = 'models/_mini_XCEPTION.102-0.66.hdf5' # hyper-parameters for bounding boxes shape # loading models face_detection = cv2.CascadeClassifier(detection_model_path) emotion_classifier = load_model(emotion_model_path, compile=False) EMOTIONS = ["angry" ,"disgust","scared", "happy", "sad", "surprised", "neutral"] # starting video streaming cv2.namedWindow('your_face') camera = cv2.VideoCapture(0) while True: frame = camera.read()[1] #reading the frame frame = imutils.resize(frame,width=300) gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) faces = face_detection.detectMultiScale(gray,scaleFactor=1.1,minNeighbors=5,minSize=(30,30),flags=cv2.CASCADE_SCALE_IMAGE) canvas = np.zeros((250, 300, 3), dtype="uint8") frameClone = frame.copy() if len(faces) > 0: faces = sorted(faces, reverse=True, key=lambda x: (x[2] - x[0]) * (x[3] - x[1]))[0] (fX, fY, fW, fH) = faces # Extract the ROI of the face from the grayscale image, resize it to a fixed 28x28 pixels, and then prepare # the ROI for classification via the CNN roi = gray[fY:fY + fH, fX:fX + fW] roi = cv2.resize(roi, (64, 64)) roi = roi.astype("float") / 255.0 roi = img_to_array(roi) roi = np.expand_dims(roi, axis=0) preds = emotion_classifier.predict(roi)[0] emotion_probability = np.max(preds) label = EMOTIONS[preds.argmax()] for (i, (emotion, prob)) in enumerate(zip(EMOTIONS, preds)): # construct the label text text = "{}: {:.2f}%".format(emotion, prob * 100) w = int(prob * 300) cv2.rectangle(canvas, (7, (i * 35) + 5), (w, (i * 35) + 35), (0, 0, 255), -1) cv2.putText(canvas, text, (10, (i * 35) + 23), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (255, 255, 255), 2) cv2.putText(frameClone, label, (fX, fY - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2) cv2.rectangle(frameClone, (fX, fY), (fX + fW, fY + fH), (0, 0, 255), 2) cv2.imshow('your_face', frameClone) cv2.imshow("Probabilities", canvas) if cv2.waitKey(1) & 0xFF == ord('q'): break camera.release() cv2.destroyAllWindows()
I need to do the same i.e detect emotions in a particular image (which is indeed captured and stored in a file) And return the probability of each emotions.
Thanks in adv.
-
How to extract specific features of same person using different image?
The aim of my project is extracting specific facial features on mobile phone. This a verification application using user's face. Given two different images of the same person, extracting the features as close as possible.
Right now, I use the pretrained model and weights of VGGFace team as a feature extractor, you can download the model in here. However, when I extracted features based on the model, the result was not good enough, I described what I did and what I want as below:
I extract features from Emma Watson' images, image_1 returns feature_1, image2 returns feature_2 and so on (vector length = 2048). If feature[i] > 0.0, convert it to 1.
for i in range(0, 2048): if feature1[0][i] > 0.0: feature1[0][i] = 1
Then, I compare the two features vector using Hamming distance. Hamming distance is just a naive way to compare, in real project, I will quantize those features before comparing. However, the distance between 2 images of Emma still large even though I use 2 neural facial expression images (same emotion, different emotion type return worse result).
My question is how could I train the model to extract features of target user. Imaging, Emma is a target user, and her phone only need to extract her features. When someone try to unlock Emma's phone, her phone extract this person's face then compare with saved Emma's features. In addition, I don't want to train a model to classify 2 classes Emma and not Emma. The thing I need is comparing extracted features.
To sum up, If we compare features from different images of the same person, the distance (differences) should be "close" (small). If we compare features from different images of different people, the distance should be "far" (large).
Thank you so much.
-
Matrix shapes difference between lectures
I'm currently learning deep-learning from two lectures. What gets me confused is that there is a notation difference between two lectures when they shape an input matrix X.
In Coursera's lecture, they make a matrix X in shape of (number of features, number of samples), so that they stack the samples vertically. Otherwise, the other lecture stacks the samples horizontally, so that every row represents one sample.
What makes this difference and which one should I follow?