How would I put my own dataset into this code?
I have been looking at a Tensorflow tutorial for unsupervised learning, and I'd like to put in my own dataset; the code currently uses the MNIST dataset. I know how to create my own datasets in Tensorflow, but I have trouble setting the code used here to my own. I am pretty new to Tensorflow, and the filepath to my dataset in my project is \data\training
and \data\test-val\
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)
# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"
# TensorFlow ≥2.0-preview is required
import tensorflow as tf
from tensorflow import keras
assert tf.__version__ >= "2.0"
# Common imports
import numpy as np
import os
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()
X_train_full = X_train_full.astype(np.float32) / 255
X_test = X_test.astype(np.float32) / 255
X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]
y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]
def rounded_accuracy(y_true, y_pred):
return keras.metrics.binary_accuracy(tf.round(y_true), tf.round(y_pred))
tf.random.set_seed(42)
np.random.seed(42)
conv_encoder = keras.models.Sequential([
keras.layers.Reshape([28, 28, 1], input_shape=[28, 28]),
keras.layers.Conv2D(16, kernel_size=3, padding="SAME", activation="selu"),
keras.layers.MaxPool2D(pool_size=2),
keras.layers.Conv2D(32, kernel_size=3, padding="SAME", activation="selu"),
keras.layers.MaxPool2D(pool_size=2),
keras.layers.Conv2D(64, kernel_size=3, padding="SAME", activation="selu"),
keras.layers.MaxPool2D(pool_size=2)
])
conv_decoder = keras.models.Sequential([
keras.layers.Conv2DTranspose(32, kernel_size=3, strides=2, padding="VALID", activation="selu",
input_shape=[3, 3, 64]),
keras.layers.Conv2DTranspose(16, kernel_size=3, strides=2, padding="SAME", activation="selu"),
keras.layers.Conv2DTranspose(1, kernel_size=3, strides=2, padding="SAME", activation="sigmoid"),
keras.layers.Reshape([28, 28])
])
conv_ae = keras.models.Sequential([conv_encoder, conv_decoder])
conv_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(lr=1.0),
metrics=[rounded_accuracy])
history = conv_ae.fit(X_train, X_train, epochs=5,
validation_data=[X_valid, X_valid])
conv_encoder.summary()
conv_decoder.summary()
conv_ae.save("\models")
Do note that I got this code from another StackOverflow answer.
do you know?
how many words do you know
See also questions close to this topic
-
Python File Tagging System does not retrieve nested dictionaries in dictionary
I am building a file tagging system using Python. The idea is simple. Given a directory of files (and files within subdirectories), I want to filter them out using a filter input and tag those files with a word or a phrase.
If I got the following contents in my current directory:
data/ budget.xls world_building_budget.txt a.txt b.exe hello_world.dat world_builder.spec
and I execute the following command in the shell:
py -3 tag_tool.py -filter=world -tag="World-Building Tool"
My output will be:
These files were tagged with "World-Building Tool": data/ world_building_budget.txt hello_world.dat world_builder.spec
My current output isn't exactly like this but basically, I am converting all files and files within subdirectories into a single dictionary like this:
def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree
Right now, my dictionary looks like this:
key:''
.In the following function, I am turning the empty values
''
into empty lists (to hold my tags):def empty_str_to_list(d): for k,v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v)
When I run my entire code, this is my output:
hello_world.dat ['World-Building Tool'] world_builder.spec ['World-Building Tool']
But it does not see
data/world_building_budget.txt
. This is the full dictionary:{'data': {'world_building_budget.txt': []}, 'a.txt': [], 'hello_world.dat': [], 'b.exe': [], 'world_builder.spec': []}
This is my full code:
import os, argparse def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree def empty_str_to_list(d): for k, v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v) parser = argparse.ArgumentParser(description="Just an example", formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument("--filter", action="store", help="keyword to filter files") parser.add_argument("--tag", action="store", help="a tag phrase to attach to a file") parser.add_argument("--get_tagged", action="store", help="retrieve files matching an existing tag") args = parser.parse_args() filter = args.filter tag = args.tag get_tagged = args.get_tagged current_dir = os.getcwd() files_dict = fs_tree_to_dict(current_dir) empty_str_to_list(files_dict) for k, v in files_dict.items(): if filter in k: if v == []: v.append(tag) print(k, v) elif isinstance(v, dict): empty_str_to_list(v) if get_tagged in v: print(k, v)
-
Actaully i am working on a project and in it, it is showing no module name pip_internal plz help me for the same. I am using pycharm(conda interpreter
File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\Scripts\pip.exe\__main__.py", line 4, in <module> File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\site-packages\pip\_internal\__init__.py", line 4, in <module> from pip_internal.utils import _log
I am using pycharm with conda interpreter.
-
Looping the function if the input is not string
I'm new to python (first of all) I have a homework to do a function about checking if an item exists in a dictionary or not.
inventory = {"apple" : 50, "orange" : 50, "pineapple" : 70, "strawberry" : 30} def check_item(): x = input("Enter the fruit's name: ") if not x.isalpha(): print("Error! You need to type the name of the fruit") elif x in inventory: print("Fruit found:", x) print("Inventory available:", inventory[x],"KG") else: print("Fruit not found") check_item()
I want the function to loop again only if the input written is not string. I've tried to type return Under print("Error! You need to type the name of the fruit") but didn't work. Help
-
Only download certain label tf dataset
Looking to do some fine tuning. The dataset (found here: https://knowyourdata-tfds.withgoogle.com/#dataset=sun397&filters=kyd%2Fsun397%2Flabel:%2Fh%2Fhouse&tab=ITEM&select=kyd%2Fsun397%2Flabel&item=%2Fh%2Fhouse%2Fsun_blpzjomvikwtulrq.jpg&expanded_groups=sun397) that Im trying to finetune w is pretty large and i just want to use/download the images with label /h/house. Any tips on how I can best accomplish this? Thanks!
import tensorflow as tf import tensorflow_hub as hub import tensorflow_datasets as tfds import numpy as np import matplotlib.pyplot as plt import functools import pandas (train_ds, valid_ds), info = tfds.load("sun397", split=["train", "validation"], as_supervised=True, with_info=True, label = "/h/house") int_to_class_label = info.features['label'].int2str
-
TFF: How can I train any model using a server running tff-runtime and a client running tff-client?
I read all the tensorflow-federated tutorials, including this one https://www.tensorflow.org/federated/gcp_setup, but I couldn't understand how to use this for training a model.
I'm doing a graduation project, to start I need to do this POC using tensorflow-federated to train a model with one server and one client in order to apply cross-silo setup for recognition of organs affected by covid in the future. If anyone can point me a direction, I'd be very grateful.
-
Can't use Keras MeanIoU to train semantic segmentation model
I'm working on a binary semantic segmentation problem. I built an UNet model with MobileNetV2 backbone. Here is my model code:
def upsample(filters, size, apply_dropout=False): initializer = tf.random_normal_initializer(0., 0.02) layer = Sequential() layer.add(layers.Conv2DTranspose(filters, size, strides=2, padding='same', kernel_initializer=initializer, use_bias=False)) layer.add(layers.BatchNormalization()) if apply_dropout: layer.add(layers.Dropout(0.5)) layer.add(layers.ReLU()) return layer def UNet(image_size, num_classes): inputs = Input(shape=image_size + (3,)) base_model = applications.MobileNetV2(input_shape=image_size + (3,), include_top=False) layer_names = [ 'block_1_expand_relu', 'block_3_expand_relu', 'block_6_expand_relu', 'block_13_expand_relu', 'block_16_project', ] base_model_outputs = [base_model.get_layer(name).output for name in layer_names] down_stack = Model(inputs=base_model.input, outputs=base_model_outputs) down_stack.trainable = False up_stack = [ upsample(512, 3), upsample(256, 3), upsample(128, 3), upsample(64, 3) ] skips = down_stack(inputs) x = skips[-1] skips = reversed(skips[:-1]) for up, skip in zip(up_stack, skips): x = up(x) x = layers.Concatenate()([x, skip]) outputs = layers.Conv2DTranspose(filters=num_classes, kernel_size=3, strides=2, padding='same')(x) return Model(inputs, outputs)
To load the images and masks for training, I built an image loader inherits from
keras.Sequnce
.class ImageLoader(utils.Sequence): def __init__(self, batch_size, img_size, img_paths, mask_paths): self.batch_size = batch_size self.img_size = img_size self.img_paths = img_paths self.mask_paths = mask_paths def __len__(self): return len(self.mask_paths) // self.batch_size def __getitem__(self, idx): i = idx * self.batch_size batch_img_paths = self.img_paths[i:i + self.batch_size] batch_mask_paths = self.mask_paths[i:i + self.batch_size] x = np.zeros((self.batch_size,) + self.img_size + (3,), dtype='float32') for j, path in enumerate(batch_img_paths): img = utils.load_img(path, target_size=self.img_size) img = utils.img_to_array(img) x[j] = img y = np.zeros((self.batch_size,) + self.img_size + (1,), dtype='uint8') for j, path in enumerate(batch_mask_paths): img = utils.load_img(path, target_size=self.img_size, color_mode='grayscale') img = utils.img_to_array(img) # [0, 255] -> [0, 1] img //= 255 y[j] = img return x, y
In my segmentation problem, all the labels are in the range [0, 1]. However, when I try to compile and then fit the model using Adam optimizer, Sparse categorical cross entropy loss and metric
tf.keras.metrics.MeanIoU
, I encountered with the following problem:Node: 'confusion_matrix/assert_non_negative_1/assert_less_equal/Assert/AssertGuard/Assert' 2 root error(s) found. (0) INVALID_ARGUMENT: assertion failed: [`predictions` contains negative values. ] [Condition x >= 0 did not hold element-wise:] [x (confusion_matrix/Cast:0) = ] [-1 -1 -1...] [[{{node confusion_matrix/assert_non_negative_1/assert_less_equal/Assert/AssertGuard/Assert}}]] [[confusion_matrix/assert_less_1/Assert/AssertGuard/pivot_f/_31/_67]] (1) INVALID_ARGUMENT: assertion failed: [`predictions` contains negative values. ] [Condition x >= 0 did not hold element-wise:] [x (confusion_matrix/Cast:0) = ] [-1 -1 -1...] [[{{node confusion_matrix/assert_non_negative_1/assert_less_equal/Assert/AssertGuard/Assert}}]]
At first, I used accuracy as a metrics for training and I didn't encounter this problem, however when I changed to MeanIoU, this problem happened. Does anyone know how to fix this problem? Thank you very much!
UPDATE: I've searched on StackOverflow and found this question about a similar error, however the fix mentioned in that link (reduce learning rate) doesn't work in my case.
-
Training an ML model on two different datasets before using test data?
So I have the task of using a CNN for facial recognition. So I am using it for the classification of faces to different classes of people, each individual person being a it's own separate class. The training data I am given is very limited - I only have one image for each class. I have 100 classes (so I have 100 images in total, one image of each person). The approach I am using is transfer learning of the GoogLenet architecture. However, instead of just training the googLenet on the images of the people I have been given, I want to first train the googLenet on a separate larger set of different face images, so that by the time I train it on the data I have been given, my model has already learnt the features it needs to be able to classify faces generally. Does this make sense/will this work? Using Matlab, as of now, I have changed the fully connected layer and the classification layer to train it on the Yale Face database, which consists of 15 classes. I achieved a 91% validation accuracy using this database. Now I want to retrain this saved model on my provided data (100 classes with one image each). What would I have to do to this now saved model to be able to train it on this new dataset without losing the features it has learned from training it on the yale database? Do I just change the last fully connected and classification layer again and retrain? Will this be pointless and mean I just lose all of the progress from before? i.e will it make new weights from scratch or will it use the previously learned weights to train even better to my new dataset? Or should I train the model with my training data and the yale database all at once? I have a separate set of test data provided for me which I do not have the labels for, and this is what is used to test the final model on and give me my score/grade. Please help me understand if what I'm saying is viable or if it's nonsense, I'm confused so I would appreciate being pointed in the right direction.
-
What's the best way to select variable in random forest model?
I am training RF models in R. What is the best way of selecting variables for my models (the datasets were pretty big, each has around 120 variables in total). I know that there is a cross-validation way of selecting variables for other classification algorithms such as KNN. Is that also a thing or if there exists a similar way for parameter tuning in RF model training?
-
Comparison between object detection algorithm speeds
I am writing my final degree project and I am having trouble to compare different algorithms in the state of the art. I am comparing ResNet, MobileNet SSD, YOLOv4, VGG16, and VGG19 used in embedded devices such as Jetson Nano or Raspberry pi. All algorithms are used for object detection but I am unable to find information about which one is faster or usually gets a higher accuracy. Also, I was looking if they can be used in low-performance devices. I would be grateful if someone is able to help me.
Thanks in advance.
-
Every time I train my CNN on matlab, is it remembering the old weights from the previous time I trained it? Or does it reset them?
So for example, I have trained a CNN on my data using a learning rate of 0.0003 and 10 epochs, with a minibatch size of 32. After training it, lets say I get an accuracy of 0.7. Now I want to adjust the learning rate and the minibatch size and try training it again to see how the accuracy changes, using the trainNetwork Matlab function. My question is, is it training the model from scratch or is it training them using the weights previously calculated? I want it to start from scratch to prevent overfitting every time I adjust the hyperparamters. Sorry if this is intuitive and I'm being dumb lol I just wanna make sure.
-
Association between categorical variables with no hierarchy in Python
I have a dataset with over 100 possible variable occurrences across 20 columns. At first glance this problem seemed to fit into hierarchical clustering. However, in working with the stakeholder to increase my business understanding, I found that there is no "pathing" that occurs during the process. The data looks like this:
col_1 col_2 col_3 code 1 code 80 code 87 code 80 code 53 NaN Each row represents a customer's application for a product. The application runs through a series of automated checks to determine eligibility. Several issue codes are identified for an individual to manually resolve before passing the application on. Sometimes there are duplicate codes (stakeholder is unsure why this may be) identified at the same time. Some applications have one error, some have up to 20.
The intention is to apply unsupervised learning, likely a clustering technique, to determine if there are strong associations between the occurrence any two to three or more codes. However, most of my experience is in NLP and classification. From what I've researched, dummy variables may be appropriate to create a flag for the presence of each variables, but have not been successful so far due to the variable width of each row. A colleague suggested pairwise correlation, but since this is categorical instead of numeric, I do not know whether coercing to numeric affects the outcome of the correlation.
Any suggestions on an appropriate modeling or data mining technique would be much appreciated.
-
Implementation details of K-means++ without sklearn
I am doing K-means using MINST dataset. However, I found difficulties in the implementation on initialization and some further steps.
For the initialization, I have to first pick one random data point to the first centroid. Then for the remaining centroids, we also pick data points randomly, but from a weighted probability distribution, until all the centroids are chosen
I am sticking in this step, how can I apply this distribution to choose? I mean, how to implement it? for the
D_{k-1}(x)
, can I just usenp.linalg.norm
to compile and square it?For my implementation, I now just initialized the first element
self.centroids = np.zeros((self.num_clusters, input_x.shape[1])) ran_num = np.random.choice(input_x.shape[0]) self.centroids[0] = input_x[ran_num] for k in range(1, self.num_clusters):
for the next step, do I need to find the next centroid by obtaining the largest distance between the previous centroid and all sample points?
-
Modeling Physics with unsupervised learning on Python
I'm a beginner in python and I'm trying to model data with python and hoping to output an equation/model using unsupervised learning. What are some libraries recommended and what functions specifically should I use?
For instance, if I were to input data from any or most of the known physics equations to teach the AI, the model created by the computer will closely approximate this chosen equation. Any existing lines of code that do this?