Solvepnp CPP function gives different results
I am trying to undersatand how solvepnp works. I tried giving 8 corner points of an object (its 2D  3D correspondence) and intrinsics of camera. I get the result as
rvec
1.59 1.6 0.89
Tvec
18 3000 1400
When i tried reprojecting using output of solvepnp rvec and tvec, the points get properly overlayed on the input image. When I increment value of one of my image points by one(say (400,300) earlier and now I just changed to (401,300)). My rvec changes sign and tvec value drastically varies . Now it is
rvec
1.6 1.6 0.8
Tvec
9 900 5000
Reprojection also fails. I am curious on how this change occurs with the minor change. How can it be solved?
See also questions close to this topic

How to optimize/speedup my minimum square error search to find the closest match?
I am making an OpenCVpython script to continuously convert and report pixel data from BGR to RAL format. I have a database of RAL codes for corresponding BGR values. I am searching the entire database from the beginning, each time I get a pixel, looking for the best match based on MSE.
Please point me in the direction of what I should research in order to be able to find a match faster.
import cv2 import numpy as np import time f = open('RAL colour codes.txt') RAL = [] R = [] G = [] B = [] E = [] for line in f: line = line.strip() columns = line.split() RAL.append(int(columns[0])) R.append(int(columns[1])) G.append(int(columns[2])) B.append(int(columns[3])) E.append(0) cap = cv2.VideoCapture(0) if ( not cap.isOpened()): print 'no camera' exit() while(cap.isOpened()): ## t_start = time.clock() ret, frame = cap.read() frame = cv2.flip(frame, 1) h,w,c = frame.shape b = frame[h/2,w/2,0] g = frame[h/2,w/2,1] r = frame[h/2,w/2,2] for i in range(len(RAL)): E[i] = (R[i]r)**2 + (B[i]b)**2 + (G[i]g)**2 j = E.index(min(E)) print 'RAL code:', RAL[j] cv2.circle(frame, (w/2,h/2), 2, (255,0,255), 1) cv2.imshow('frame', frame) ##t_end = time.clock()  t_start ## print t_end k = cv2.waitKey(1) if(k == 27 or k == ord('q')): break cap.release() cv2.destroyAllWindows()

Screen video recorder activated and deactivated during C++ program execution
So far I only found glc (whcih seems dead) and opencv to be a bit "programmingoriented" more than "GUIoriented".
I am looking for some linuxexisting software in which I could be able to beforehands select an area of recording, and which would then wait for my C++ program to give it orders of start and stop recording, and this several times because of a loop.
At each new stop order, it would send the output_i in a beforehanddefined folder, and I would like to be able to control the file name from c++ as well.
Do you know anything that could be worth to investigate please?
Regards

How save Tensorflow model in protobuf format?
Please, help me with my problem. I want to save my neural network in protobuf (pb) format for OpenCV DNN. In input I have 3 files: .meta, .data, .index. As output I need to .pb and .pbtxt files.
Code, for example:
train_data = np.load(TEST_PACK) tf.reset_default_graph() convnet = input_data(shape=[None, SIZE, SIZE, 3], name='input') convnet = conv_2d(convnet, 32, 10, activation='relu') convnet = max_pool_2d(convnet, 30) convnet = conv_2d(convnet, 64, 10, activation='relu') convnet = max_pool_2d(convnet, 10) convnet = conv_2d(convnet, 128, 5, activation='relu') convnet = max_pool_2d(convnet, 5) convnet = conv_2d(convnet, 64, 5, activation='relu') convnet = max_pool_2d(convnet, 5) convnet = conv_2d(convnet, 32, 5, activation='relu') convnet = max_pool_2d(convnet, 5) convnet = fully_connected(convnet, 1024, activation='relu') convnet = dropout(convnet, 0.8) convnet = fully_connected(convnet, 3, activation='softmax') convnet = regression(convnet, optimizer='adam', learning_rate=LR, loss='categorical_crossentropy', name='targets') model = tflearn.DNN(convnet, tensorboard_dir='log') print('model loaded!') train = train_data[:500] test = train_data[500:] X = np.array([i[0] for i in train]).reshape(1,SIZE,SIZE,3) Y = [i[1] for i in train] test_x = np.array([i[0] for i in test]).reshape(1,SIZE,SIZE,3) test_y = [i[1] for i in test] model.fit({'input': X}, {'targets': Y}, n_epoch=5, validation_set=({'input': test_x}, {'targets': test_y}), snapshot_step=500, show_metric=True, run_id=MODEL_NAME)
I am new to neural networks & apologize if I asked nonsense))

How to avoid colour banding in polynomial interpolation background extraction
I have a code that takes some "nodes", or "samples" in an image. With these nodes I build a synthetic background image using polynomial interpolation. My problem is that for some images, with small dynamic, I have a colour banding in my result. However, my source does not have any colour banding.
The image source is a 16bit unsigned short that I convert into 64bit (double) before the main computation. If I was staying in 16bit I would understand the issue, but here I don't understand.
Is there a way to avoid this colour banding?
Here an example of this banding on a monochrome image:
//C contains background function #define C(i) (gsl_vector_get(c,(i))) static double poly_4(gsl_vector *c, double x, double y) { double value = C(0) * 1.0 + C(1) * x + C(2) * y + C(3) * x * x + C(4) * y * x + C(5) * y * y + C(6) * x * x * x + C(7) * x * x * y + C(8) * x * y * y + C(9) * y * y * y + C(10) * x * x * x * x + C(11) * x * x * x * y + C(12) * x * x * y * y + C(13) * x * y * y * y + C(14) * y * y * y * y; return (value); } static double poly_3(gsl_vector *c, double x, double y) { double value = C(0) * 1.0 + C(1) * x + C(2) * y + C(3) * x * x + C(4) * y * x + C(5) * y * y + C(6) * x * x * x + C(7) * x * x * y + C(8) * x * y * y + C(9) * y * y * y; return (value); } static double poly_2(gsl_vector *c, double x, double y) { double value = C(0) * 1.0 + C(1) * x + C(2) * y + C(3) * x * x + C(4) * y * x + C(5) * y * y; return (value); } static double poly_1(gsl_vector *c, double x, double y) { double value = C(0) * 1.0 + C(1) * x + C(2) * y; return (value); } static double *computeBackground(GSList *list, size_t width, size_t height, poly_order order) { size_t n, i, j; size_t k = 0; double chisq, pixel; double row, col; gsl_matrix *J, *cov; gsl_vector *y, *w, *c; n = g_slist_length(list); int nbParam; switch (order) { case POLY_1: nbParam = NPARAM_POLY1; break; case POLY_2: nbParam = NPARAM_POLY2; break; case POLY_3: nbParam = NPARAM_POLY3; break; case POLY_4: default: nbParam = NPARAM_POLY4; } // J is the Jacobian // y contains data (pixel intensity) J = gsl_matrix_calloc(n, nbParam); y = gsl_vector_calloc(n); w = gsl_vector_calloc(n); c = gsl_vector_calloc(nbParam); cov = gsl_matrix_calloc(nbParam, nbParam); while (list) { background_sample *sample = (background_sample *) list>data; col = sample>position.x; row = sample>position.y; pixel = sample>median; // here, it is a bit sketchy in the sense that if there is not value to report in a box (because the threshold is too // low for example, then I just skip the initialization of J and y. gsl automatically discard the non assigned values // during the minimization. I tested it with Matlab and it works fine. The results agree. if (pixel < 0) continue; gsl_matrix_set(J, k, 0, 1.0); gsl_matrix_set(J, k, 1, col); gsl_matrix_set(J, k, 2, row); if (order != POLY_1) { gsl_matrix_set(J, k, 3, col * col); gsl_matrix_set(J, k, 4, col * row); gsl_matrix_set(J, k, 5, row * row); } if (order == POLY_3  order == POLY_4) { gsl_matrix_set(J, k, 6, col * col * col); gsl_matrix_set(J, k, 7, col * col * row); gsl_matrix_set(J, k, 8, col * row * row); gsl_matrix_set(J, k, 9, row * row * row); } if (order == POLY_4) { gsl_matrix_set(J, k, 10, col * col * col * col); gsl_matrix_set(J, k, 11, col * col * col * row); gsl_matrix_set(J, k, 12, col * col * row * row); gsl_matrix_set(J, k, 13, col * row * row * row); gsl_matrix_set(J, k, 14, row * row * row * row); } gsl_vector_set(y, k, pixel); gsl_vector_set(w, k, 1.0); /* go to the next item */ list = list>next; k++; } // Must turn off error handler or it aborts on error gsl_set_error_handler_off(); gsl_multifit_linear_workspace *work = gsl_multifit_linear_alloc(n, nbParam); int status = gsl_multifit_wlinear(J, w, y, c, cov, &chisq, work); if (status != GSL_SUCCESS) { printf("GSL multifit error: %s\n", gsl_strerror(status)); return NULL; } gsl_multifit_linear_free(work); gsl_matrix_free(J); gsl_vector_free(y); gsl_vector_free(w); // Calculation of the background with the same dimension that the input matrix. double *background = malloc(height * width * sizeof(double)); for (i = 0; i < height; i++) { for (j = 0; j < width; j++) { switch (order) { case POLY_1: pixel = poly_1(c, (double) j, (double) i); break; case POLY_2: pixel = poly_2(c, (double) j, (double) i); break; case POLY_3: pixel = poly_3(c, (double) j, (double) i); break; default: case POLY_4: pixel = poly_4(c, (double) j, (double) i); } background[j + i * width] = pixel; } } return background; }

Why is the result from cornerHarris being used to calculate euclidean distance?
I had read in the question that
cornerHarris
returned a matrix with confidence scores.I came across the following snippet which did not make sense to me, if it this method returns a confidence matrix. Here is the snippet.
# GET CORNER MATRIX dst = cv2.cornerHarris(gray, blockSize, ksize, k) max=np.max(dst) # APPLY THRESHOLD loc = np.where( dst >= 0.01*max) # Get topleft, bottomleft, topright, bottomright corners from image corners = get4CornersOfImage() locations = zip(*loc[::1]) #Remove points closer to image corners for corner in corners: # WHAT IS HAPPENING HERE? locations=[location for location in locations if euclidean_distance(location, corner)>10]
What is happening in the last statement? If the
cornerHarris
returns the confidence scores, how could it be used to calculate euclidean distance? I could not understand this. Am I missing some concept? 
Convolution Kernel using a user defined function. How to deal with negative pixel values?
I've declared a function that will be used to calculate the convolution of an image using an arbitrary 3x3 kernel. I also created a script that will prompt the user to select both an image as well as enter the convolution kernel of their choice. However, I do not know how to go about dealing with negative pixel values that will arise for various kernels. How would I implement a condition into my script that will deal with these negative values?
This is my function:
function y = convul(x,m,H,W) y=zeros(H,W); for i=2:(H1) for j=2:(W1) Z1=(x(i1,j1))*(m(1,1)); Z2=(x(i1,j))*(m(1,2)); Z3=(x(i1,j+1))*(m(1,3)); Z4=(x(i,j1))*(m(2,1)); Z5=(x(i,j))*(m(2,2)); Z6=(x(i,j+1))*(m(2,3)); Z7=(x(i+1,j1))*(m(3,1)); Z8=(x(i+1,j))*(m(3,2)); Z9=(x(i+1,j+1))*(m(3,3)); y(i,j)=Z1+Z2+Z3+Z4+Z5+Z6+Z7+Z8+Z9; end end
And this is the script that I've written that prompts the user to enter an image and select a kernel of their choice.
[file,path]=uigetfile('*.bmp'); x = imread(fullfile(path,file)); x_info=imfinfo(fullfile(path,file)); W=x_info.Width; H=x_info.Height; L=x_info.NumColormapEntries; prompt='Enter a convulation kernel m: '; m=input(prompt)/9; y=convul(x,m,H,W); imshow(y,[0,(L1)]);
I've tried to use the absolute value of the convolution, as well as attempting to locate negatives in the output image, but nothing worked.

What does cv2.cornerHarris return?
I have been reading about
cornerHarris
in OpenCV. I read through the documentation but I am not sure what does this function return.While reading through the examples, there is a statement written as:
dst = cv2.cornerHarris(gray,2,3,0.04) img[dst>0.01*dst.max()]=[0,0,255]
I just cannot understand the second statement above, may be because I do not understand, what is
cornerHarris
actually returning.I do sense, that there is some kind of threshold being applied, but I cannot explain.

How to get class_to_idx map for Custom Dataset in Pytorch
I am attempting transfer learning with a CNN (vgg19) on the Oxford102 category dataset consisting of 8189 samples of flowers labeled from 1 through 102. Instead of loading the data with ImageFolder, which requires a tedious process of structuring my data into train, valid and test folders with each class being a subfolder holding my images, I decided to load it in using the Custom Dataset class following
https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
A subset of the code I wrote up for my project
data_dir_path = 'data/images/' labels_path = 'data/imagelabels.mat' class_label_path = 'data/class_label_map' # standard normalization for Imagenet models mean: [0.485, 0.456,0.406], # std :[0.229, 0.224, 0.225] data_transforms = { 'train': transforms.Compose([ transforms.RandomRotation(45), transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), 'valid': transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), 'test': transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), } class MyDataset(Dataset): def __init__(self, image_labels, data_dir, transform=None): """ :param image_labels_path: path to our labels :param root_dir: the directory which houses our images :param transform: apply any transform on our sample """ self.image_labels = image_labels self.root_dir = data_dir self.transform = transform def __len__(self): label_dict = scipy.io.loadmat(self.image_labels) return len(label_dict['labels'][0]) def __getitem__(self, idx): image_path_list = [os.path.join(self.root_dir, filename) for filename in os.listdir(self.root_dir)] image = Image.open(image_path_list[idx]) label_dict = scipy.io.loadmat(self.image_labels) label_list = label_dict['labels'][0] # label index for pytorch should start form zero # so subtract 1 from each class label_list[:] = [i  1 for i in label_list] label = label_list[idx] if self.transform: image = self.transform(image) return image, label image_datasets = {x: MyDataset(image_labels=labels_path, data_dir=data_dir_path, transform=data_transforms[x]) for x in ['train', 'valid', 'test']}
my model class instance inherited from nn.Module is
classifier = Neural(25088, [4096], 102)
I have subtracted 1 from my label list since Pytorch expects labels to start from 0. Thus 0 through 101 for 102 labels. Correct me if I am wrong cause I get a "current target >=0 and current target <= n_classes failed" error if I don't subtract one.
class_label_map is a dict which maps class labels to flowers names
{ "1": "pink primrose", "2": "hardleaved pocket orchid", "3": "canterbury bells", "4": "sweet pea", "5": "english marigold", "6": "tiger lily", "7": "moon orchid", "8": "bird of paradise", "9": "monkshood", "10": "globe thistle", "11": "snapdragon", }
My big problem is to get a class_to_idx mapping, how do I do this, my flower names do not match the images if I visualize them, I get totally different flower names for my flowers.
I first created a mapping by having a dict with key my original label before I subtracted 1, and value the one after. Example
class_to_idx = {77:76, 73:72, 1:0, 65:64......102:101...65:54}
This is beyond doubt wrong as I was getting totally wrong label for my images.
The first label of my image in my data_dir_path = 'data/images' is 77, upon subtracting one I get 76. Would this mean the index for all labels 76 is 0, and if the next class is 72 , would that mean the index for all classes 72 is 1? So...
class_to_idx = {76:0, 72:1, 0:2, 65:3....and so on}
The ImageFolder seems to have a class_to_idx attribute which if used on my Dataset throws an error,
image_datasets['train'].class_to_idx AttributeError: 'MyDataset' object has no attribute 'class_to_idx'
This is obviously the case because my Dataset class does not contain any such attribute.
But seriously though, How do I map my classes to my index? This is super important as I need to checkpoint my model and load it back again to fire out predictions.It may sound really silly, but I really don't know what to do here, please help?

OpenCV solvePnP: perfect coordinates with Blender are not correctly transforming
I'm trying to use OpenCV's solvePnPRansac method to locate the camera position around a 3D model loaded in Blender. However, manually putting in points gives me an incorrect t/r vector and cv2.projectPoints gets the image coordinates wrong, so I suspect something about the input is messed up.
I manually selected 10 points on the 3D model in Blender's arbitrary units, and used those as the world coordinates. I found the corresponding locations of those points in the 960x540 rendered image, and those are my input 2D coordinates (by pixels).
The camera's focal length and sensor size are set to 40mm focal and 40x18mm sensor, and since this is mm I manually measured both as approximated their lengths in pixels as about 916 pixels and 412 pixels  however, both these sets of values give me coordinates that are completely off.
The output of projectPoints for me are coordinates that range from 13,000 to 20,000. I think my input values are incorrect but how do I fix this?
size = (960, 540) image_points = np.array([ (315, 279), # 1 (440, 273), # 2 (407, 386), # 3 (369, 372), # 4 (440, 317), # 5 (457, 373), # 6 (439, 385), # 7 (369, 337), # 8 (407, 358), # 9 (313, 291), # 10 ], dtype="double") model_points = np.array([ (0.77644, 0.63806, 2.55822), # stripe of adidas 1/3 down (0.75437, 0.49247, 2.75569), # first curve up on the logo (0.82970, 0.17763, 1.58873), # lower right part of hole in a (0.82900, 0.16946, 1.68983), # 1/3 down left side of i (0.84466, 0.52011, 2.37269), # bottom of gold circle left of F (0.67476, 0.59525, 1.68853), # left side of thick top 's' (0.74288, 0.46100, 1.58926), # inwards (left) of right below e joint (0.82560, 0.14537, 2.07217), # middle of f top bar, left of upwards (0.83161, 0.17382, 1.88820), # middle line of a near bottom of Y (0.78115, 0.66043, 2.46363) # 1/4 up 2nd d circle in adidas ], dtype = "double") f = 916.8 sx = 916.8 sy = 412.56 width, height = (size[1], size[0]) camera_matrix = np.array([ [width*f/sx, 0, width/2], [0, height*f/sy, height/2], [0, 0, 1] ], dtype = "double") dist_coeffs = np.zeros((4,1)) s,t,r, i = cv2.solvePnPRansac(model_points, image_points, camera_matrix, dist_coeffs) img, x = cv2.projectPoints(model_points, r, t, camera_matrix, dist_coeffs) print(img)

Consider known values when estimating camera pose
I am trying to estimate the pose of my camera relative to 4 known world coordinates. Due to constraints in my system, some of the details of the cameras posed are known and fixed. Namely its vertical offset, pitch and roll are known constants. I am wondering how I can use this information to improve the result of OpenCV's SolvePNP algorithm.
Currently, I am finding that, without this information, the slightest changes in image points can lead to dramatic changes in results. For example, I place the camera at known pose:
X = 2ft Y = 1ft Z = 5ft ROLL = 0 degrees PITCH = 180 degrees YAW = 0 degrees
Then, I let the camera track the 4 image points and compute the pose, I get the following:
{48.0, 138.0} {40.0, 136.0} {45.0, 114.0} {54.0, 114.0} X = 2.4235989629072314 Y = 1.2370888865388812 Z = 4.717115774644273 ROLL = 7.555688896466208 PITCH = 165.9771402205544 YAW = 1.5292313860396367 ============================= {48.0, 138.0} {40.0, 136.0} {45.0, 114.0} {53.0, 114.0} X = 2.864381855099463 Y = 0.9925235082316144 Z = 4.605675917036408 ROLL = 7.962130849477691 PITCH = 168.14583005865828 YAW = 6.697852245666419 ============================= {48.0, 137.0} {40.0, 136.0} {46.0, 112.0} {53.0, 114.0} X = 3.3067589122064986 Y = 0.2727418953073936 Z = 4.393018415532629 ROLL = 6.929120013468928 PITCH = 168.6014586711855 YAW = 59.587627235667476
public VisionProcessor() { // Define bottom right corner of left vision target as origin mObjectPoints = new MatOfPoint3f( new Point3(0.0, 0.0, 0.0), // bottom right new Point3(1.9363, 0.5008, 0.0), // bottom left new Point3(0.5593, 5.8258, 0.0), // topleft new Point3(1.377, 5.325, 0.0) // topright ); mCameraMatrix = Mat.eye(3, 3, CvType.CV_64F); mCameraMatrix.put(0, 0, 2.5751292067328632e+02); mCameraMatrix.put(0, 2, 1.5971077914723165e+02); mCameraMatrix.put(1, 1, 2.5635071715912881e+02); mCameraMatrix.put(1, 2, 1.1971433393615548e+02); mDistortionCoefficients = new MatOfDouble( 2.9684613693070039e01, 1.4380252254747885e+00, 2.2098421479494509e03, 3.3894563533907176e03, 2.5344430354806740e+00 ); } public void update(double[] cornX, double[] cornY) { MatOfPoint2f imagePoints = new MatOfPoint2f( mPointFinder.getBottomRight(), mPointFinder.getBottomLeft(), mPointFinder.getTopLeft(), mPointFinder.getTopRight() ); Mat rotationVector = new MatOfDouble(Math.PI, 0, 0); Mat translationVector = new MatOfDouble(24, 0, 60); Calib3d.solvePnP(mObjectPoints, imagePoints, mCameraMatrix, mDistortionCoefficients, rotationVector, translationVector); Mat rotationMatrix = new Mat(); Calib3d.Rodrigues(rotationVector, rotationMatrix); Mat projectionMatrix = new Mat(3, 4, CvType.CV_64F); projectionMatrix.put(0, 0, rotationMatrix.get(0, 0)[0], rotationMatrix.get(0, 1)[0], rotationMatrix.get(0, 2)[0], translationVector.get(0, 0)[0], rotationMatrix.get(1, 0)[0], rotationMatrix.get(1, 1)[0], rotationMatrix.get(1, 2)[0], translationVector.get(1, 0)[0], rotationMatrix.get(2, 0)[0], rotationMatrix.get(2, 1)[0], rotationMatrix.get(2, 2)[0], translationVector.get(2, 0)[0] ); Mat cameraMatrix = new Mat(); Mat rotMatrix = new Mat(); Mat transVect = new Mat(); Mat rotMatrixX = new Mat(); Mat rotMatrixY = new Mat(); Mat rotMatrixZ = new Mat(); Mat eulerAngles = new Mat(); Calib3d.decomposeProjectionMatrix(projectionMatrix, cameraMatrix, rotMatrix, transVect, rotMatrixX, rotMatrixY, rotMatrixZ, eulerAngles); System.out.println("X = " + translationVector.get(0,0)[0] / 12.0); System.out.println("Y = " + translationVector.get(1,0)[0] / 12.0); System.out.println("Z = " + translationVector.get(2,0)[0] / 12.0); System.out.println("ROLL = " + eulerAngles.get(2,0)[0]); System.out.println("PITCH = " + eulerAngles.get(0,0)[0]); System.out.println("YAW = " + eulerAngles.get(1,0)[0]); System.out.println("=============================") }
I expect the output of the system to closely approximate real world position, but the data shows the slightest change in image points can dramatically affect the resulting pose.

Recovering pose from 3D triangulated points
I have a stereocamera setup where I use the OpenCV method cv::triangulatePoints to detect the checkeboard corners in 3D space. I was wondering what the method is to take these triangulated points and accurately estimate a 3D pose of the checkerboard.
One method I have encountered was found here, feeding the points into a PnP algorithm:
While this is a simple solution to my problem, I am not sure if this is completely correct, as most of my experience with method is for single camera use.
Any insight would be appreciated!