Any similar function in python of triangulate in matlab? (reconstruct 3D points from 2D points and camera parameters)
I'm trying to reconstruct 3D points from two 2D points from left and right camera, with known camera parameters(intrinsic and extrinsic).
In matlab, there is worldPoints = triangulate(matchedPoints1,matchedPoints2,stereoParams)
:https://www.mathworks.com/help/vision/ref/triangulate.html
Is there a similar function in python?
do you know?
how many words do you know
See also questions close to this topic

How to sum functions in MATLAB under a condition
I'm trying to write a function that sums a bunch of functions and I have no idea what I'm doing wrong. My code:
function [f_max,x_max] = ftot(l,a,E,J,F) % a and F are arrays f_max = 0; b = la; n = length(F); f1 = 0; syms x for j=1:n y1 = piecewise(x<=a(j),1,x>a(j),0); y2 = piecewise(x<=a(j),0,x>a(j),1); f = (F(j)*b(j)*x)*y1 + (F(j)*b(j)*x^3)*y2; f1 = f1 + f; end
Basically I want to create a function that is the sum of
F(j)*b(j)*x
forj=1:n
whenx<=a
andF(j)*b(j)*x^3
whenx>a
. How can I accomplish that? 
Matrix indexing in matlab
a is a nxn matrix.
I have this code:
[m,n] = size(a); x = zeros(m,1); for j=1:1:n if(j==1) a(1,:) = []; else end disp(a); a(:,j) = []; disp(x); disp(a); end
And it gives error in line "a(:,j) = [];" which says Matrix index is out of range for deletion.Why?I dont understand.Help appreciated

Separate initialization and display of plot
What is a proper way to separate the initialization and the display of plots in Matlab? (I mean plots in a wide sense here; could be
plot
,plot3
,scatter
etc.) To give a concrete example, I have a pretty complex 3D visualization that usessphere
andmesh
to draw a static sphere mesh and thenscatter3
to plot a moving trajectory on the sphere. To be able to do this in real time I have implemented some simple optimizations, such as only updating thescatter3
object each frame. But the code is a bit messy, making it hard to add additional features that I want, so I would like improve code separation.I also feel like it might sometimes be useful to return some kind of plot object from a function without displaying it, for example to combine it with other plots in a nice modular way.
An example of what I have in mind would be something like this:
function frames = spherePlot(solution, options) % Initialize sphere mesh and scatter objects, configure properties. ... % Configure axes, maybe figure as well. ... % Draw sphere. ... if options.display % Display figure. end for step = 1:solution.length % Update scatter object, redraw, save frame. % The frames are saved for use with 'movie' or 'VideoWriter'. end end
Each step might also be separated out as a function.
So, what is a neat and proper way to do stuff like this? All documentation seems to assume that one wants to display everything right away.

How To set the size of image more than default value in OpenCV
What is the default frame size of opencv image streaming frame ?
If I set Width and height of frame more than default frame size what happens?

How to read grayscale img from a video with OpenCV?
I read all pictures from my
pic
directory and then convert them each to grayscale withcanny
edge detections before writing it all to a video.But, when I use my video software to play it, it shows a green background, and I can't read video frames from it. Could someone show me how to solve it?
Sample code
import numpy as np import cv2 as cv import matplotlib.pyplot as plt fourcc = cv.VideoWriter_fourcc(*"I420") out = cv.VideoWriter("t2.avi", fourcc, 1, (640, 480), 0) for pic in glob.glob1("./pic/", "A*"): img = cv.imread(f"./pic/{pic}", 1) edge = cv.Canny(img, 100, 200) edge = cv.resize(edge, (640, 480)) out.write(edge) out.release() # Cant read video frame here: cap = cv.VideoCapture("t2.avi") ret, frame = cap.read() if ret: plt.imshow(frame) else: print("end") cap.release()

Is there a way to guarantee a certain number of lines detected with cv2.HoughLines()?
This question is an extension to my previous question asking about how to detect a pool table's corners. I have found the outline of a pool table, and I have managed to apply the Hough transform on the outline. The result of this Hough transform is below:
Unfortunately, the Hough transform returns multiple lines for a single table edge. I want the Hough transform to return four lines, each corresponding to an edge of the table given any image of a pool table. I don't want to tweak the parameters for the Hough transform method manually (because the outline of the pool table might differ for each image of the pool table). Is there any way to guarantee four lines to be generated by
cv2.HoughLines()?
Thanks in advance.

Training an ML model on two different datasets before using test data?
So I have the task of using a CNN for facial recognition. So I am using it for the classification of faces to different classes of people, each individual person being a it's own separate class. The training data I am given is very limited  I only have one image for each class. I have 100 classes (so I have 100 images in total, one image of each person). The approach I am using is transfer learning of the GoogLenet architecture. However, instead of just training the googLenet on the images of the people I have been given, I want to first train the googLenet on a separate larger set of different face images, so that by the time I train it on the data I have been given, my model has already learnt the features it needs to be able to classify faces generally. Does this make sense/will this work? Using Matlab, as of now, I have changed the fully connected layer and the classification layer to train it on the Yale Face database, which consists of 15 classes. I achieved a 91% validation accuracy using this database. Now I want to retrain this saved model on my provided data (100 classes with one image each). What would I have to do to this now saved model to be able to train it on this new dataset without losing the features it has learned from training it on the yale database? Do I just change the last fully connected and classification layer again and retrain? Will this be pointless and mean I just lose all of the progress from before? i.e will it make new weights from scratch or will it use the previously learned weights to train even better to my new dataset? Or should I train the model with my training data and the yale database all at once? I have a separate set of test data provided for me which I do not have the labels for, and this is what is used to test the final model on and give me my score/grade. Please help me understand if what I'm saying is viable or if it's nonsense, I'm confused so I would appreciate being pointed in the right direction.

how can I modify Dataset class to make the mask RCNN work with multiple objects?
I am currently working on instance segmentation. I follow these two tutorials:
However, these two tutorials work perfectly with one class like person + background. But in my case, I have two classes like a person and car + background. I didn't find any resources about making the Mask RCNN work with multiple objects.
Notice that:
I am using PyTorch ( torchvision ), torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0
I am using a Pascal VOC annotation
i used segmentation class (not the XML file) + the images
and this is my dataset class
class PennFudanDataset(torch.utils.data.Dataset): def __init__(self, root, transforms=None): self.root = root self.transforms = transforms # load all image files, sorting them to # ensure that they are aligned self.imgs = list(sorted(os.listdir(os.path.join(root, "img")))) self.masks = list(sorted(os.listdir(os.path.join(root, "imgMask")))) def __getitem__(self, idx): # load images ad masks img_path = os.path.join(self.root, "img", self.imgs[idx]) mask_path = os.path.join(self.root, "imgMask", self.masks[idx]) img = Image.open(img_path).convert("RGB") # note that we haven't converted the mask to RGB, # because each color corresponds to a different instance # with 0 being background mask = Image.open(mask_path) mask = np.array(mask) # instances are encoded as different colors obj_ids = np.unique(mask) # first id is the background, so remove it obj_ids = obj_ids[1:] # split the colorencoded mask into a set # of binary masks masks = mask == obj_ids[:, None, None] # get bounding box coordinates for each mask num_objs = len(obj_ids) boxes = [] for i in range(num_objs): pos = np.where(masks[i]) xmin = np.min(pos[1]) xmax = np.max(pos[1]) ymin = np.min(pos[0]) ymax = np.max(pos[0]) boxes.append([xmin, ymin, xmax, ymax]) boxes = torch.as_tensor(boxes, dtype=torch.float32) # there is only one class labels = torch.ones((num_objs,), dtype=torch.int64) masks = torch.as_tensor(masks, dtype=torch.uint8) image_id = torch.tensor([idx]) area = (boxes[:, 3]  boxes[:, 1]) * (boxes[:, 2]  boxes[:, 0]) # suppose all instances are not crowd iscrowd = torch.zeros((num_objs,), dtype=torch.int64) target = {} target["boxes"] = boxes target["labels"] = labels target["masks"] = masks target["image_id"] = image_id target["area"] = area target["iscrowd"] = iscrowd if self.transforms is not None: img, target = self.transforms(img, target) return img, target def __len__(self): return len(self.imgs)
anyone can help me?

OpenCV  Correcting Image for Mis Alignment
I discovered the other day that my camera lens or perhaps tilted sensor is causing an object centered in front of the camera to appear off center in the captured image. As close as I can get it, the red circle is the point that should be aligned directly to the camera lens, the orange circle is the center pixel of my image. In a perfect world these would be aligned. This is the original picture, not the undistorted version using the results of my camera calibration. This is using Python and OpenCV just to put that out there.
I used solvePnP() so that I can map the pixel coordinates to real world coordinates and the results are better than I could have expected. The the output in real world coordinates is bang on, the distance it determined is bang on and it shows a shifting z value that is consistent with the rotation of a plane caused by a center shift in this direction.
What I would like to do instead is reproject this image as if the camera was not skewed and taking a picture head on (black pixels around the outside are okay and expected). I don't want to translate the image, but am looking to fix for rotation. I think a reprojection to fix rotation and redo of solvePnP() will get me what I want? Really, the goal is to correct those real world coordinate outputs so that x and y remain accurate, but z will be constant (as you would expect with an orthogonal plane and camera that was "perfect"). I am not sure how to do the reprojection using the output from my first run of solvePnP(), and am hoping someone can confirm that doing another solvePnP() after reprojection would help me get consistent z values. Or perhaps there is a better way to do all of this?
Output based on some important coordinates in my image (outputs are in mm):
Center Pixel (orange circle): Shift in mm is confirmed as accurately as I can with a ruler. u:319 v:239 [12.19730412] [13.78697338] [0.14210989] Known True Center (red circle): A perfect mapping would be 0,0,0 we are pretty close. u:359 v:195 [0.50044927] [0.11737794] [0.03856228] Top Left Corner: Coordinates are within 1mm and z is pushed back as I would expect with the tilt displayed in the image. u:0 v:0 [112.58447447] [62.540533] [0.43397294] Bottom Right Corner: Coordinates are within 1mm and z is pulled forward as I would expect with the tilt displayed in the image. u:639 v:479 [88.50417125] [90.43334375] [0.15017949]
I have thought of just using the displacement to calculate my two adjustment angles and use these as corrections, but that seems like garbage way to do it when OpenCV clearly has ways of doing this. Also I would not be correcting for any roll if I tried that.
Any help or suggestions would be greatly appreciate!
Thanks all!

Image point to point matching using intrinsics, extrinsics and thirdparty depth
I want to reopen a similar question to one which somebody posted a while ago with some major difference.
The previous post is https://stackoverflow.com/questions/52536520/imagematchingusingintrinsicandextrinsiccameraparameters]
and my question is can I do the matching if I do have the depth? If it is possible can some describe a set of formulas which I have to solve to get the desirable matching ?
Here there is also some correspondence on slide 16/43: Depth from Stereo Lecture
In what units all the variables here, can some one clarify please ? Will this formula help me to calculate the desirable point to point correspondence ? I know the Z (mm, cm, m, whatever unit it is) and the x_l (I guess this is y coordinate of the pixel, so both x_l and x_r are on the same horizontal line, correct if I'm wrong), I'm not sure if T is in mm (or cm, m, i.e distance unit) and f is in pixels/mm (distance unit) or is it something else ?
Thank you in advance.
EDIT:
So as it was said by @fana, the solution is indeed a projection.
For my understanding it is P(v) = K (Rv+t), where R is 3 x 3 rotation matrix (calculated for example from calibration), t is the 3 x 1 translation vector and K is the 3 x 3 intrinsics matrix.
It can be seen that there is translation only in one dimension (because the situation is where the images are parallel so the translation takes place only on Xaxis) but in other situation, as much as I understand if the cameras are not on the same parallel line, there is also translation on Yaxis. What is the translation on the Zaxis which I get through the calibration, is it some rescale factor due to different image resolutions for example ? Did I wrote the projection formula correctly in the general case?
I also want to ask about the whole idea.
Suppose I have 3 cameras, one with large FOV which gives me color and depth for each pixel, lets call it the first (3d tensor, color stacked with depth correspondingly), and two with which I want to do stereo, lets call them second and third.
Instead of calibrating the two cameras, my idea is to use the depth from the first camera to calculate the xyz of pixel u,v of its correspondent color frame, that can be done easily and now to project it on the second and the third image using the R,t found by calibration between the first camera and the second and the third, and using the K intrinsics matrices so the projection matrix seems to be full known, am I right ?
Assume for the case that FOV of color is big enough to include all that can be seen from the second and the third cameras.
That way, by projection each x,y,z of the first camera I can know where is the corresponding pixels on the two other cameras, is that correct ?

OpenCV  Determine detected objects angle from "center" of frame
I am working on a machine vision project and need to determine the angle of an object in x and y relative to the center of the frame (center in my mind being where the camera is pointed). I originally did NOT do a camera calibration (calculated angle per pixel by taking a picture of a dense grid and doing some simple math). While doing some object tracking I was noticing some strange behaviour which I suspected was due to some distortion. I also noticed that an object that should be dead center of my frame was not, the camera had to be shifted or the angle changed for that to be true.
I performed a calibration in OpenCV and got a principal point of (363.31, 247.61) with a resolution of 640x480. The angle per pixel obtained by cv2.calibrationMatrixVales() was very close to what I had calculated, but up to this point I was assuming center of the frame was based on 640/2, 480/2. I'm hoping that someone can confirm, but going forward do I assume that my (0,0) in cartesian coordinates is now at the principal point? Perhaps I can use my new camera matrix to correct the image so my original assumption is true? Or I am out to lunch and need some direction on how to achieve this.
Also was my assumption of 640/2 correct or should it technically have been (6401)/2. Thanks all!

Can not squeeze dim[4], expected a dimension of 1, got 3
I am working with project where model take 3D cuboid as input, and reconstruct 3D cuboid. I am getting error:
ValueError: Can not squeeze dim[4], expected a dimension of 1, got 3 for '{{node Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[1]](remove_squeezable_dimensions/Squeeze)' with input shapes: [1,227,227,10,3].
I am using below model. Input is 10 consucative frames with height and width 227*227 and three channel(R, G, B)
model=Sequential() model.add(Conv3D(filters=128,kernel_size=(11,11,1),strides=(4,4,1),padding='valid',input_shape=(227,227,10,3),activation='tanh')) model.add(Conv3D(filters=64,kernel_size=(5,5,1),strides=(2,2,1),padding='valid',activation='tanh')) model.add(ConvLSTM2D(filters=64,kernel_size=(3,3),strides=1,padding='same',dropout=0.4,return_sequences=True,recurrent_dropout=0.3)) model.add(ConvLSTM2D(filters=32,kernel_size=(3,3),strides=1,padding='same',dropout=0.3,return_sequences=True)) model.add(ConvLSTM2D(filters=64,kernel_size=(3,3),strides=1,padding='same',dropout=0.5,return_sequences=True)) model.add(Conv3DTranspose(filters=128,kernel_size=(5,5,1),strides=(2,2,1),padding='valid',activation='tanh')) model.add(Conv3DTranspose(filters=3,kernel_size=(11,11,1),strides=(4,4,1),padding='valid',activation='tanh')) model.compile(optimizer='adam',loss='mean_squared_error',metrics=['accuracy']) Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv3d (Conv3D) (None, 55, 55, 10, 128) 46592 conv3d_1 (Conv3D) (None, 26, 26, 10, 64) 204864 conv_lstm2d (ConvLSTM2D) (None, 26, 26, 10, 64) 295168 conv_lstm2d_1 (ConvLSTM2D) (None, 26, 26, 10, 32) 110720 conv_lstm2d_2 (ConvLSTM2D) (None, 26, 26, 10, 64) 221440 conv3d_transpose (Conv3DTra (None, 55, 55, 10, 128) 204928 nspose) conv3d_transpose_1 (Conv3DT (None, 227, 227, 10, 3) 46467 ranspose) ================================================================= Total params: 1,130,179 Trainable params: 1,130,179 Nontrainable params: 0 _________________________________________________________________
And I am using this code to create dataset and train the model
training_data = [] target_data = [] training_data=np.load('/content/training.npy') print(training_data.shape) #(227, 227, 3, 14943) frames=training_data.shape[3] frames=framesframes%10 training_data=training_data[:,:,:,:frames] training_data=training_data.reshape(1,227,227,10,3) training_data=np.expand_dims(training_data,axis=5) target_data=training_data.copy() epochs=15 batch_size=1 callback_save = ModelCheckpoint("saved_model.h5", monitor="mean_squared_error", save_best_only=True) callback_early_stopping = EarlyStopping(monitor='loss', patience=3) history = model.fit(training_data,target_data, batch_size=batch_size, epochs=epochs, callbacks = [callback_save,callback_early_stopping]) model.save("saved_model.h5")

CV2 triangulatePoints() usage  Python OpenCV
I am trying to reconstruct the scene's 3d points using two stereo images. Here is what I am doing so far:
 Calibrated both cameras using calibrateCamera(), then used stereoCalibrate() to get the Essential matrix.
 Used SIFT to get good matched features from both images of an object.
 Now I want to reconstruct the 3D points of that object using triangulatePoints(). I know that the matched features will be used as inputs, however, I am not sure what should the other inputs be (projection matrices) and where to get them?

TSDF value when integrating around a thin surface
While doing 3Dreconstruction, I was puzzled by the following scenario of computing the TSDF value of a voxel:
Suppose you have a thin piece of paper standing up and you take pictures around it. You want to predict the TSDF value of a voxel right behind it. When the paper is between the camera and the voxel, you get negative TSDF values. Yet when the voxel is between you and the paper (you are on the other side of the scene) you get positive TSDF values.
This just doesn't make sense since it seems that I should not integrate the voxel when I get negative TSDF values. But I cannot know that the object is a thin piece of paper, not a thick box before I rotate. But I looked at several papers and articles talking about this and they all have similar definitions.