Training an ML model on two different datasets before using test data?
So I have the task of using a CNN for facial recognition. So I am using it for the classification of faces to different classes of people, each individual person being a it's own separate class. The training data I am given is very limited - I only have one image for each class. I have 100 classes (so I have 100 images in total, one image of each person). The approach I am using is transfer learning of the GoogLenet architecture. However, instead of just training the googLenet on the images of the people I have been given, I want to first train the googLenet on a separate larger set of different face images, so that by the time I train it on the data I have been given, my model has already learnt the features it needs to be able to classify faces generally. Does this make sense/will this work? Using Matlab, as of now, I have changed the fully connected layer and the classification layer to train it on the Yale Face database, which consists of 15 classes. I achieved a 91% validation accuracy using this database. Now I want to retrain this saved model on my provided data (100 classes with one image each). What would I have to do to this now saved model to be able to train it on this new dataset without losing the features it has learned from training it on the yale database? Do I just change the last fully connected and classification layer again and retrain? Will this be pointless and mean I just lose all of the progress from before? i.e will it make new weights from scratch or will it use the previously learned weights to train even better to my new dataset? Or should I train the model with my training data and the yale database all at once? I have a separate set of test data provided for me which I do not have the labels for, and this is what is used to test the final model on and give me my score/grade. Please help me understand if what I'm saying is viable or if it's nonsense, I'm confused so I would appreciate being pointed in the right direction.
do you know?
how many words do you know
See also questions close to this topic
-
What's the best way to select variable in random forest model?
I am training RF models in R. What is the best way of selecting variables for my models (the datasets were pretty big, each has around 120 variables in total). I know that there is a cross-validation way of selecting variables for other classification algorithms such as KNN. Is that also a thing or if there exists a similar way for parameter tuning in RF model training?
-
How would I put my own dataset into this code?
I have been looking at a Tensorflow tutorial for unsupervised learning, and I'd like to put in my own dataset; the code currently uses the MNIST dataset. I know how to create my own datasets in Tensorflow, but I have trouble setting the code used here to my own. I am pretty new to Tensorflow, and the filepath to my dataset in my project is
\data\training
and\data\test-val\
# Python ≥3.5 is required import sys assert sys.version_info >= (3, 5) # Scikit-Learn ≥0.20 is required import sklearn assert sklearn.__version__ >= "0.20" # TensorFlow ≥2.0-preview is required import tensorflow as tf from tensorflow import keras assert tf.__version__ >= "2.0" # Common imports import numpy as np import os (X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data() X_train_full = X_train_full.astype(np.float32) / 255 X_test = X_test.astype(np.float32) / 255 X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:] y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:] def rounded_accuracy(y_true, y_pred): return keras.metrics.binary_accuracy(tf.round(y_true), tf.round(y_pred)) tf.random.set_seed(42) np.random.seed(42) conv_encoder = keras.models.Sequential([ keras.layers.Reshape([28, 28, 1], input_shape=[28, 28]), keras.layers.Conv2D(16, kernel_size=3, padding="SAME", activation="selu"), keras.layers.MaxPool2D(pool_size=2), keras.layers.Conv2D(32, kernel_size=3, padding="SAME", activation="selu"), keras.layers.MaxPool2D(pool_size=2), keras.layers.Conv2D(64, kernel_size=3, padding="SAME", activation="selu"), keras.layers.MaxPool2D(pool_size=2) ]) conv_decoder = keras.models.Sequential([ keras.layers.Conv2DTranspose(32, kernel_size=3, strides=2, padding="VALID", activation="selu", input_shape=[3, 3, 64]), keras.layers.Conv2DTranspose(16, kernel_size=3, strides=2, padding="SAME", activation="selu"), keras.layers.Conv2DTranspose(1, kernel_size=3, strides=2, padding="SAME", activation="sigmoid"), keras.layers.Reshape([28, 28]) ]) conv_ae = keras.models.Sequential([conv_encoder, conv_decoder]) conv_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(lr=1.0), metrics=[rounded_accuracy]) history = conv_ae.fit(X_train, X_train, epochs=5, validation_data=[X_valid, X_valid]) conv_encoder.summary() conv_decoder.summary() conv_ae.save("\models")
Do note that I got this code from another StackOverflow answer.
-
how can I modify Dataset class to make the mask RCNN work with multiple objects?
I am currently working on instance segmentation. I follow these two tutorials:
However, these two tutorials work perfectly with one class like person + background. But in my case, I have two classes like a person and car + background. I didn't find any resources about making the Mask RCNN work with multiple objects.
Notice that:
I am using PyTorch ( torchvision ), torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0
I am using a Pascal VOC annotation
i used segmentation class (not the XML file) + the images
and this is my dataset class
class PennFudanDataset(torch.utils.data.Dataset): def __init__(self, root, transforms=None): self.root = root self.transforms = transforms # load all image files, sorting them to # ensure that they are aligned self.imgs = list(sorted(os.listdir(os.path.join(root, "img")))) self.masks = list(sorted(os.listdir(os.path.join(root, "imgMask")))) def __getitem__(self, idx): # load images ad masks img_path = os.path.join(self.root, "img", self.imgs[idx]) mask_path = os.path.join(self.root, "imgMask", self.masks[idx]) img = Image.open(img_path).convert("RGB") # note that we haven't converted the mask to RGB, # because each color corresponds to a different instance # with 0 being background mask = Image.open(mask_path) mask = np.array(mask) # instances are encoded as different colors obj_ids = np.unique(mask) # first id is the background, so remove it obj_ids = obj_ids[1:] # split the color-encoded mask into a set # of binary masks masks = mask == obj_ids[:, None, None] # get bounding box coordinates for each mask num_objs = len(obj_ids) boxes = [] for i in range(num_objs): pos = np.where(masks[i]) xmin = np.min(pos[1]) xmax = np.max(pos[1]) ymin = np.min(pos[0]) ymax = np.max(pos[0]) boxes.append([xmin, ymin, xmax, ymax]) boxes = torch.as_tensor(boxes, dtype=torch.float32) # there is only one class labels = torch.ones((num_objs,), dtype=torch.int64) masks = torch.as_tensor(masks, dtype=torch.uint8) image_id = torch.tensor([idx]) area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]) # suppose all instances are not crowd iscrowd = torch.zeros((num_objs,), dtype=torch.int64) target = {} target["boxes"] = boxes target["labels"] = labels target["masks"] = masks target["image_id"] = image_id target["area"] = area target["iscrowd"] = iscrowd if self.transforms is not None: img, target = self.transforms(img, target) return img, target def __len__(self): return len(self.imgs)
anyone can help me?
-
OpenCV - Correcting Image for Mis Alignment
I discovered the other day that my camera lens or perhaps tilted sensor is causing an object centered in front of the camera to appear off center in the captured image. As close as I can get it, the red circle is the point that should be aligned directly to the camera lens, the orange circle is the center pixel of my image. In a perfect world these would be aligned. This is the original picture, not the undistorted version using the results of my camera calibration. This is using Python and OpenCV just to put that out there.
I used solvePnP() so that I can map the pixel coordinates to real world coordinates and the results are better than I could have expected. The the output in real world coordinates is bang on, the distance it determined is bang on and it shows a shifting z value that is consistent with the rotation of a plane caused by a center shift in this direction.
What I would like to do instead is re-project this image as if the camera was not skewed and taking a picture head on (black pixels around the outside are okay and expected). I don't want to translate the image, but am looking to fix for rotation. I think a re-projection to fix rotation and redo of solvePnP() will get me what I want? Really, the goal is to correct those real world coordinate outputs so that x and y remain accurate, but z will be constant (as you would expect with an orthogonal plane and camera that was "perfect"). I am not sure how to do the reprojection using the output from my first run of solvePnP(), and am hoping someone can confirm that doing another solvePnP() after re-projection would help me get consistent z values. Or perhaps there is a better way to do all of this?
Output based on some important coordinates in my image (outputs are in mm):
Center Pixel (orange circle): Shift in mm is confirmed as accurately as I can with a ruler. u:319 v:239 [-12.19730412] [-13.78697338] [-0.14210989] Known True Center (red circle): A perfect mapping would be 0,0,0 we are pretty close. u:359 v:195 [0.50044927] [0.11737794] [0.03856228] Top Left Corner: Coordinates are within 1mm and z is pushed back as I would expect with the tilt displayed in the image. u:0 v:0 [-112.58447447] [62.540533] [-0.43397294] Bottom Right Corner: Coordinates are within 1mm and z is pulled forward as I would expect with the tilt displayed in the image. u:639 v:479 [88.50417125] [-90.43334375] [0.15017949]
I have thought of just using the displacement to calculate my two adjustment angles and use these as corrections, but that seems like garbage way to do it when OpenCV clearly has ways of doing this. Also I would not be correcting for any roll if I tried that.
Any help or suggestions would be greatly appreciate!
Thanks all!
-
Transfer learning using pre trained objective detection model: RuntimeError: shape ‘[2, -1, 91, 168, 96]’ is invalid for input of size 64512
I am trying to train a pre trained model using transfer learning following this tutorial Building your own object detector — PyTorch vs TensorFlow and how to even get started?
I have picked FCOS: Fully Convolutional One-Stage Object Detection architecture
First step Inference
I first used
inference
to check whichclass
of the pre trained model can detect my custom data(object), found class id77
Cell Phone
being detected as the custom object I want to trainSecond step Train
Looking at the FCOS architecture below
Final stage has three outputs
- Classification: Input-256 and Output-91(0:Background + 90:Classes) TO BE TRAINED
- Center-ness: Input-256 and Ouput-1
- Regression: Input-256 and Output-4(Bounding Box Coordinates)
I used
requires_grad
to freeze all the layers in the network and decided to train only theclassification
which used to predict 90 classes to train only one classUsed the code below to replace the entire
classification
block and pick77th
label only, replaced entireweight
andbais
classification
with77th
class.# load an object detection model pre-trained on COCO model = torchvision.models.detection.fcos_resnet50_fpn(pretrained=True) selected_head_classification_head_cls_logits_weight = 0 selected_head_classification_head_cls_logits_bias = 0 # List out all the name of the parameters whose gradient can be altered for further training for name, param in model.named_parameters(): # If requires gradient parameters if param.requires_grad: if name == "head.classification_head.cls_logits.bias" or name == "head.classification_head.cls_logits.weight": layer_para_name = "weight" if name.split('.')[-1]=='weight' else "bias" print("\nReplacing",name,"layer containing 90 class score",layer_para_name,", with",selected_label,"th class score",layer_para_name) print("####################################") print("Original layer size(0:Background + 90 classes): ",param.data.size()) # Reshaping bias if name.split('.')[-1] == 'bias': selected_head_classification_head_cls_logits_bias = param.data[selected_label:selected_label+1] param.data = torch.cat([param.data[:1], selected_head_classification_head_cls_logits_bias]) # Reshaping weight if name.split('.')[-1] == 'weight': selected_head_classification_head_cls_logits_weight = torch.tensor(param.data[selected_label][:].reshape([1, 256,3,3])) param.data = torch.cat([param.data[:1], selected_head_classification_head_cls_logits_weight]) print("Alteres layer size(0:Background +",selected_label,"th class): ",param.data.size()) print("####################################") print("Finished enabling requires gradient for",name,"layer......") # Make the layer trainable param.requires_grad = True else: # Make the layer non-trainable param.requires_grad = False
OUTPUT I GOT
Replacing head.classification_head.cls_logits.weight layer containing 90 class score weight , with 77 th class score weight #################################### Original layer size(0:Background + 90 classes): torch.Size([91, 256, 3, 3]) Alteres layer size(0:Background + 77 th class): torch.Size([2, 256, 3, 3]) #################################### Finished enabling requires gradient for head.classification_head.cls_logits.weight layer...... Replacing head.classification_head.cls_logits.bias layer containing 90 class score bias , with 77 th class score bias #################################### Original layer size(0:Background + 90 classes): torch.Size([91]) Alteres layer size(0:Background + 77 th class): torch.Size([2]) #################################### Finished enabling requires gradient for head.classification_head.cls_logits.bias layer...... /usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:26: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). FCOS( (backbone): BackboneWithFPN( (body): IntermediateLayerGetter( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): FrozenBatchNorm2d(64, eps=1e-05) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(64, eps=1e-05) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(64, eps=1e-05) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(256, eps=1e-05) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): FrozenBatchNorm2d(256, eps=1e-05) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(64, eps=1e-05) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(64, eps=1e-05) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(256, eps=1e-05) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(64, eps=1e-05) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(64, eps=1e-05) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(256, eps=1e-05) (relu): ReLU(inplace=True) ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(128, eps=1e-05) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(128, eps=1e-05) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(512, eps=1e-05) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): FrozenBatchNorm2d(512, eps=1e-05) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(128, eps=1e-05) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(128, eps=1e-05) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(512, eps=1e-05) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(128, eps=1e-05) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(128, eps=1e-05) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(512, eps=1e-05) (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(128, eps=1e-05) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(128, eps=1e-05) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(512, eps=1e-05) (relu): ReLU(inplace=True) ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=1e-05) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=1e-05) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=1e-05) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): FrozenBatchNorm2d(1024, eps=1e-05) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=1e-05) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=1e-05) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=1e-05) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=1e-05) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=1e-05) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=1e-05) (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=1e-05) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=1e-05) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=1e-05) (relu): ReLU(inplace=True) ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=1e-05) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=1e-05) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=1e-05) (relu): ReLU(inplace=True) ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=1e-05) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=1e-05) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=1e-05) (relu): ReLU(inplace=True) ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(512, eps=1e-05) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(512, eps=1e-05) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(2048, eps=1e-05) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): FrozenBatchNorm2d(2048, eps=1e-05) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(512, eps=1e-05) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(512, eps=1e-05) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(2048, eps=1e-05) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(512, eps=1e-05) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(512, eps=1e-05) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(2048, eps=1e-05) (relu): ReLU(inplace=True) ) ) ) (fpn): FeaturePyramidNetwork( (inner_blocks): ModuleList( (0): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (2): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) ) (layer_blocks): ModuleList( (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) (extra_blocks): LastLevelP6P7( (p6): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (p7): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) ) ) ) (anchor_generator): AnchorGenerator() (head): FCOSHead( (classification_head): FCOSClassificationHead( (conv): Sequential( (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): GroupNorm(32, 256, eps=1e-05, affine=True) (2): ReLU() (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): GroupNorm(32, 256, eps=1e-05, affine=True) (5): ReLU() (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (7): GroupNorm(32, 256, eps=1e-05, affine=True) (8): ReLU() (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (10): GroupNorm(32, 256, eps=1e-05, affine=True) (11): ReLU() ) (cls_logits): Conv2d(256, 91, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) (regression_head): FCOSRegressionHead( (conv): Sequential( (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): GroupNorm(32, 256, eps=1e-05, affine=True) (2): ReLU() (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): GroupNorm(32, 256, eps=1e-05, affine=True) (5): ReLU() (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (7): GroupNorm(32, 256, eps=1e-05, affine=True) (8): ReLU() (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (10): GroupNorm(32, 256, eps=1e-05, affine=True) (11): ReLU() ) (bbox_reg): Conv2d(256, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bbox_ctrness): Conv2d(256, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (transform): GeneralizedRCNNTransform( Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) Resize(min_size=(800,), max_size=1333, mode='bilinear') ) )
When I started
training
num_epochs = 10 for epoch in range(num_epochs): # train for one epoch, printing every 10 iterations train_one_epoch(model, optimizer, data_loader, device, epoch,print_freq=10) # update the learning rate lr_scheduler.step() # evaluate on the test dataset evaluate(model, data_loader_test, device=device)
I get this error
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. cpuset_checked)) --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-17-05e881bbc3b2> in <module>() 2 for epoch in range(num_epochs): 3 # train for one epoch, printing every 10 iterations ----> 4 train_one_epoch(model, optimizer, data_loader, device, epoch,print_freq=10) 5 # update the learning rate 6 lr_scheduler.step() 6 frames /content/drive/MyDrive/PytorchObjectDetector/engine.py in train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq) 30 print("######################",targets) 31 ---> 32 loss_dict = model(images, targets) 33 34 losses = sum(loss for loss in loss_dict.values()) /usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1109 or _global_forward_hooks or _global_forward_pre_hooks): -> 1110 return forward_call(*input, **kwargs) 1111 # Do not call functions when jit is used 1112 full_backward_hooks, non_full_backward_hooks = [], [] /usr/local/lib/python3.7/dist-packages/torchvision/models/detection/fcos.py in forward(self, images, targets) 594 595 # compute the fcos heads outputs using the features --> 596 head_outputs = self.head(features) 597 598 # create the set of anchors /usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1109 or _global_forward_hooks or _global_forward_pre_hooks): -> 1110 return forward_call(*input, **kwargs) 1111 # Do not call functions when jit is used 1112 full_backward_hooks, non_full_backward_hooks = [], [] /usr/local/lib/python3.7/dist-packages/torchvision/models/detection/fcos.py in forward(self, x) 120 121 def forward(self, x: List[Tensor]) -> Dict[str, Tensor]: --> 122 cls_logits = self.classification_head(x) 123 bbox_regression, bbox_ctrness = self.regression_head(x) 124 return { /usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1109 or _global_forward_hooks or _global_forward_pre_hooks): -> 1110 return forward_call(*input, **kwargs) 1111 # Do not call functions when jit is used 1112 full_backward_hooks, non_full_backward_hooks = [], [] /usr/local/lib/python3.7/dist-packages/torchvision/models/detection/fcos.py in forward(self, x) 184 # Permute classification output from (N, A * K, H, W) to (N, HWA, K). 185 N, _, H, W = cls_logits.shape --> 186 cls_logits = cls_logits.view(N, -1, self.num_classes, H, W) 187 cls_logits = cls_logits.permute(0, 3, 4, 1, 2) 188 cls_logits = cls_logits.reshape(N, -1, self.num_classes) # Size=(N, HWA, 4) RuntimeError: shape '[2, -1, 91, 168, 96]' is invalid for input of size 64512
I don't understand what is happening, How do I correct this?
-
Fine tuning with random data improving transformer performance?
I have a T5-base transformer model using which:
- I train one model for sentiment analysis (let this be baseline).
- I train another such T5-base model on randomly shuffled WikiText data for the language modelling objective and then train it on sentiment analysis (let this be the fine-tuned model).
I expected the fine-tuned model to do worse than the baseline, but instead, it does better by 1-2% (performance averaged over 10 random seeds).
Can someone explain why this might be happening? Shouldn't the fine-tuned model be affected by the garbage input it had been fine-tuned with?
-
Low accuracy CNN
I am using VGG19 pre-trained model with ImageNet weights to do transfer-learning on 4 classes with keras. However I do not know if there really is a difference between these 4 classes, I'd like to discover it. The goal would be to discover if these classes make sense or if there is no difference between these images classes.
These classes are made up of abstract paintings from the same individual.
I tried different models with different hyperparameters (Adam/SGD, learning rate, dropout, l2 regularization, FC layers size, batch size, unfreeze, and also weighted classes as the data is a little bit unbalanced
batch_size = 32 unfreeze = 17 dropout = 0.2 fc = 256 lr = 1e-4 l2_reg = 0.1 train_datagen = ImageDataGenerator( preprocessing_function = preprocess_input, horizontal_flip=True, vertical_flip=True, fill_mode='nearest' ) test_datagen = ImageDataGenerator(preprocessing_function = preprocess_input) train_generator = train_datagen.flow_from_directory( 'C:/Users/train', target_size=(224, 224), batch_size=batch_size, class_mode='categorical') validation_generator = test_datagen.flow_from_directory( 'C:/Users/test', target_size=(224, 224), batch_size=batch_size, class_mode='categorical') base_model = VGG19( weights="imagenet", input_shape=(224, 224, 3), include_top=False, ) last_layer = base_model.get_layer('block5_pool') last_output = last_layer.output x = Flatten()(last_output) x = GlobalMaxPooling2D()(last_output) x = Dense(fc)(x) x = Activation('relu')(x) x = BatchNormalization()(x) x = Dropout(dropout)(x) x = Dense(fc, activation='relu', kernel_regularizer = regularizers.l2(l2=l2_reg))(x) x = layers.Dense(4, activation='softmax')(x) model = Model(base_model.input, x) for layer in model.layers: layer.trainable = False for layer in model.layers[unfreeze:]: layer.trainable = True model.compile(loss='categorical_crossentropy', optimizer=optimizers.SGD(learning_rate = lr), metrics=['accuracy']) class_weights = class_weight.compute_class_weight('balanced', np.unique(train_generator.classes), train_generator.classes) class_weights_dict = dict(enumerate(class_weights)) history = model.fit(train_generator, epochs=epochs, validation_data=validation_generator, validation_steps=392//batch_size, steps_per_epoch=907//batch_size) plot_model_history(history)
I also did feature extractions at every layer, and fed the extracted features to a SVM (for each layer), and the accuracy of these SVM was about 40%, which is higher than this model (30 to 33%). So, I may be wrong but I think this model could achieve a higher accuracy.
Is my code correct, or am I doing something wrong ? If the code is correct, what else can I try to have a better accuracy ?
-
How to remove background and video from facial landmarks
Im using this code to detect teh 468 facial landmarks from a face:
import cv2 import mediapipe as mp import time cap = cv2.VideoCapture(0) pTime = 0 mpDraw = mp.solutions.drawing_utils mpFaceMesh = mp.solutions.face_mesh faceMesh = mpFaceMesh.FaceMesh(max_num_faces=2) drawSpec = mpDraw.DrawingSpec(thickness=1, circle_radius=2) while True: success, img = cap.read() imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) results = faceMesh.process(imgRGB) if results.multi_face_landmarks: for faceLms in results.multi_face_landmarks: mpDraw.draw_landmarks(img, faceLms, mpFaceMesh.FACEMESH_CONTOURS, drawSpec,drawSpec) for id,lm in enumerate(faceLms.landmark): #print(lm) ih, iw, ic = img.shape x,y = int(lm.x*iw), int(lm.y*ih) print(id,x,y) cTime = time.time() fps = 1 / (cTime - pTime) pTime = cTime cv2.putText(img, f'FPS: {int(fps)}', (20, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3) cv2.imshow("Image", img) cv2.waitKey(1)
When I run this script, I can see the facial landmarks printed on my face, but what I want to achieve is that I display the facial landmarks on a black background without actually showing a face.
How can I achieve that?
-
Facial Recognition and Chat Bot concurrently
i am trying to build a facial recognition algorithm that is able to handle some sentences as well. More precisely, what i want it to do is :
- when it recognizes a person (from my database of images) i would like it to say "Hi X"
- otherwise it says "Sorry, i don't know you please consider registering"
The issue i encountered is that the program continuously repeats the phrase it has to say. I'd like it to tell the phrase just once in the same face during the processing. I heard about multi threading and followed some tutos but i have no idea how to concretely use it in my case.
Does anyone have an idea how to handle this ? Here is the code i wrote for this task
import cv2 import face_recognition from gtts import gTTS import os import numpy as np path = 'Data' images = [] Names = [] Liste = os.listdir(path) print(Liste) for elt in Liste: imgActu = cv2.imread(f'{path}/{elt}') images.append(imgActu) Names.append(os.path.splitext(elt)[0]) print(Names) def text_to_speech(text, mp3form): # function allowing to read the content of a given text # specify the text and the used langage my_text = text language = 'fr' # specify the file format with gtts and store it in a mp3 format my_file = gTTS(text=my_text, lang=language, slow=False) my_file.save(mp3form) # read the content os.system("mpg321" + " " + mp3form) def find_encodings(images): """ images : ---> list of images find the encoding of each element of the list """ Encode_List = [] for img in images: img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) encode_img = face_recognition.face_encodings(img)[0] Encode_List.append(encode_img) return Encode_List Known_encodes = find_encodings(images) print(len(Known_encodes)) def find_faces(img): imgsmall = cv2.resize(img, (0, 0), None, 0.25, 0.25) imgsmall = cv2.cvtColor(imgsmall, cv2.COLOR_BGR2RGB) # get the locations and encodings of all the present faces faces_in_frame = face_recognition.face_locations(imgsmall) encodings_frame = face_recognition.face_encodings(imgsmall,faces_in_frame) for encodeface, facelocation in zip(encodings_frame, faces_in_frame): matches = face_recognition.compare_faces(Known_encodes, encodeface) faceDis = face_recognition.face_distance(Known_encodes, encodeface) # print(faceDis) matchId = np.argmin(faceDis) if matches[matchId]: name = Names[matchId] # print(name) y1,x2,y2,x1 = facelocation y1,x2,y2,x1 = y1*4,x2*4,y2*4,x1*4 cv2.rectangle(img,(x1,y1),(x2,y2),(0,255,0),2) cv2.rectangle(img,(x1,y2-35),(x2,y2),(0,255,0),cv2.FILLED) cv2.putText(img,name,(x1+6,y2-6),cv2.FONT_HERSHEY_COMPLEX,1,(255,255,255),) return str(name) else : y1,x2,y2,x1 = facelocation y1,x2,y2,x1 = y1*4,x2*4,y2*4,x1*4 cv2.rectangle(img,(x1,y1),(x2,y2),(0,0,255),2) cv2.rectangle(img,(x1,y2-35),(x2,y2),(0,0,255),cv2.FILLED) cv2.putText(img,"Unknown",(x1+6,y2-6),cv2.FONT_HERSHEY_COMPLEX,1,(255,255,255),) return "Unknown" cap = cv2.VideoCapture(0) while True: ret, img = cap.read() find_faces(img) if find_faces(img) in Names : text_to_speech("Hi" + find_faces(img), "known.mp3") else : text_to_speech("Hi, sorry i don't know you, please register", "unknown.mp3") cv2.imshow('Camera', img) if cv2.waitKey(1) & 0xFF == ord('q'): break cap.release() cv2.destroyAllWindows()
-
AWS Rekognition: Add Feature Vector instead of PNG File
According to the AWS Rekognition SDK documentation,
IndexFaces SDK PHP Documentation
IndexFaces receives a face Image in PGN or JPEG and then obtains the "feature vector" of that Image, then add the feature vector to the database for later face matching.
Now suppose I want to add the same person to different new collections. Is it possible to obtain the feature vector once, and add that feature vector directly to the collections, instead of re-processing the image over and over again ?
Thanks in advance