Efficient least squares fit of 2D polynomials to data?
I have some data on a regular 2D grid that I would like to decompose into an orthogonal basis function set. What is the most efficient way to do the fit?
Here's some sample code:
import numpy as np
sze = (128,128)
x, y = np.linspace(1, 1, sze[1]), np.linspace(1, 1, sze[0])
xv, yv = np.meshgrid(x, y)
rho = np.sqrt(xv**2 + yv**2)
phi = np.arctan2(yv, xv)
data = (rho * np.cos(phi))
data[rho > 1] = 0
# now have some synthetic data  need to fit to a basis function set that has singleindex members:
# 0: 1
# 1: rho * cos(phi)
# 2: rho * sin(phi)
# ...
# 24: 70 rho^8  140 rho^6 + 90 rho^4  20 rho^2 + 1
See also questions close to this topic

Connecting to API issues
I am new to Python and am trying to use it to connect to Americommerce API. They have an example on git. It doesn't seem to work though. I add the url to STORE_DOMAIN, key to ACCESS_TOKEN, and app id to STORE_ID.
The only one that seems to matter is the web url. If I mispell it I get coonection errors, however the key and id I can enter whatever I want there and get the same results. here is the script
#!/usr/bin/env python3 # The 'requests' module is available via pip: "pip install requests" # You can find more documentation about requests at http://docs.pythonrequests.org/en/latest/ import requests import json import locale import sys STORE_DOMAIN = "http://www.example.com" ACCESS_TOKEN = "key" STORE_ID = "app_id" # This should reflect your store's ID locale.setlocale(locale.LC_ALL, '') # Searches for and returns the customer that matches the info passed in, # if no customer is found a new one is created and returned def get_customer(firstName, lastName, email): # setup headers headers = { 'XACAuthToken': ACCESS_TOKEN, 'ContentType': 'application/json' } # build API call to check if customer already exists (using email) uri = '{}/api/v1/customers?email={}'.format(STORE_DOMAIN, email) # include verify=False if using dev certificate # r = requests.get(uri, headers = headers, verify=False) r = requests.get(uri, headers = headers) # see if a customer was found customer = r.json() if (customer['total_count'] > 0): return customer['customers'][0] # no customer found, so lets create a new one data = { 'last_name': doe, 'first_name': john, 'email': johndoe@email.com, 'store_id': 4 } # build API call to post a new customer uri = '{}/api/v1/customers'.format(STORE_DOMAIN) # include verify=False if using dev certificate # r = requests.post(uri, headers=headers, data = json.dumps(data), verify=False) r = requests.post(uri, headers=headers, data = json.dumps(data)) # return newly created customer return r.json() if __name__ == '__main__': customer = get_customer('John', 'Doe', 'JohnDoe@email.com') first_name = customer['first_name'].encode('utf8') last_name = ',{}\n'.format(customer['last_name']).encode('utf8') email = customer['email'].encode('utf8') sys.stdout.buffer.write(first_name + last_name + email)
I get this result when I run this:
Traceback (most recent call last): File "./getStuff.py", line 57, in <module> customer = get_customer('John', 'Doe', 'JohnDoe@email.com') File "./getStuff.py", line 35, in get_customer if (customer['total_count'] > 0): KeyError: 'total_count'
Please, a little help here would be appreciated. Ive been stuck on this for a bit now.

Where to put tab completion configuration in python?
python tab completion Mac OSX 10.7 (Lion)
The above link shows that the following code can be used for autocompletion in python.
import readline import rlcompleter if 'libedit' in readline.__doc__: readline.parse_and_bind("bind ^I rl_complete") else: readline.parse_and_bind("tab: complete")
But I don't see where to put it so that it can be loaded at startup. I tried
~/.pythonrc
, but it did not work.Does anybody know what is the current way to load such a configuration automatically for an interactive python session?

Changing a predictor input to Linear Discriminant Analysis (LDA) and comparing the output
I've been running Linear Disriminant Analysis (LDA) through python (using the method from Sebastian Raschka).
My data is related to galaxies  I've been trying to classify galaxies as either nonmerging or merging based upon a set of predictors (9 total). I've been varying the measurement of one of the predictors and rerunning LDA.
Is this a statistically legitimate thing to do? I was also wondering how to compare the runs of LDA and answer the question: are they different if I change the way I measure this one predictor?
My first discriminant component (PCA1 or LDA1 in this case) has a much larger eigenvalue than all others for all runs, so one thing I've been trying is comparing eigenvector 1 between the different runs of LDA. The ordering of the predictors changes as does their relative values. I guess I should mention that I've also normalized all the predictors prior to LDA.
So is it legitimate to directly compare the eigenvector 1 values for each predictor and if they change in value or order this indicates that the different measurement of the predictor is significant? How much of a change would be necessary to call this 'significant'?
Here is an example of eigenvectors of two runs where I change the 2nd to last input predictor:
Eigenvector 1 for Model1: [[0.092 ] [0.0066] [ 0.4605] [ 0.7649] [0.3991] [ 0.1067] [ 0.0826] [0.126 ] [ 0.0314]] Eigenvalue 1: 2.29e+00 Eigenvector 1 for Model2: [[ 0.0824] [0.0012] [0.3643] [0.7967] [ 0.4032] [0.0449] [0.0769] [0.0297] [ 0.2331]] Eigenvalue 1: 2.70e+00
Thank you for your input!

Fitting a lognormal or poisson distribution
I have a vector of 1096 numbers, the daily average concentration of NOx measured in 3 years in a measurement station. You can observe the type of distribution in the image:
I used these commands to do the histogram:
NOxV<scan("NOx_Vt1517.txt") hist.NOxVt<hist(NOxV, plot = FALSE, breaks = 24) plot(hist.NOxVt, xlab = "[NOx]", ylab = "Frequenze assolute", main = "Istogramma freq. ass. NOx 1517 Viterbo") points(hist.NOxVt$mids, hist.NOxVt$counts, col= "red")
My professor suggested that I fit the histogram with a Poisson distribution  paying attention to the transition: discrete > continuous (I don't know what that means) or with a "Lognormal" distribution.
I tried to do the Poisson fit, with some command lines that she gave us at lesson, but R gave me an error after having executed the last code line of the following:
my_poisson = function(params, x){ exp(params)*params^x/factorial(x) } y<hist.NOxVt$counts/1096; x<hist.NOxVt$mids; z < nls( y ~ exp(a)*a^x/factorial(x), start=list(a=1) )
Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model In addition: There were 50 or more warnings (use warnings() to see the first 50)"
After this problem I couldn't solve (even searching similar problems on the internet) I decided to fit the distribution with a Lognormal, but I have no idea how to do it, because the professor did not explain it to us, and I still don't have enough R experience to figure it out on my own.
I would appreciate any suggestion or examples of how to do a lognormal fit and/or Poisson fit.

Find outlier after data fit in R
My problem is a combination of my basic statistical understanding and handling it in R.
The situation: I have 191 different types that are seen in a general environment (column "general"). Only a subset are capable of getting into the special environment (column "special"). I suspect a linear relationship between the two  the more frequent something is, the more often it will advance into the special environment.
My question is: are there some cases where they advance more often than normal? I think the "normal" is found by fitting a model. In that sense, I believe that my question is answered by finding outliers from my model. Is that correct?
The data&code:
library(ggplot2) library(sfsmisc) n < c(1:191) general < c(1, 7, 4, 11, 6, 11, 102, 19, 18, 1, 17, 4, 4, 2, 1, 1, 1, 1, 1, 3, 2, 20, 5, 3, 10, 4, 8, 15, 35, 3, 12, 4, 9, 1, 18, 3, 7, 53, 5, 1, 33, 6, 10, 39, 6, 16, 10, 1, 27, 3, 61, 100, 2, 5, 16, 30, 2, 3, 3, 2, 2, 3, 3, 60, 12, 2, 2, 13, 20, 14, 17, 16, 2, 1, 7, 3, 1, 10, 8, 4, 1, 5, 9, 8, 2, 1, 25, 2, 61, 22, 45, 4, 31, 4, 3, 47, 4, 1, 5, 2, 2, 30, 84, 136, 14, 1, 1, 15, 1, 1, 1, 124, 1, 3, 2, 1, 2, 40, 192, 3, 1, 22, 3, 1, 26, 1, 36, 132, 6, 4, 9, 201, 4, 43, 180, 98, 3, 1, 3, 11, 82, 83, 5, 10, 3, 2, 10, 23, 2, 25, 50, 4, 35, 2, 2, 3, 10, 18, 3, 2, 2, 13, 8, 5, 11, 28, 11, 3, 4, 11, 3, 74, 56, 5, 13, 5, 3, 3, 4, 4, 5, 6, 9, 12, 4, 47, 13, 131, 1, 3, 1) special < c(0, 1, 0, 2, 0, 2, 30, 2, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 1, 4, 0, 1, 1, 0, 7, 16, 1, 1, 0, 0, 0, 4, 0, 1, 18, 1, 0, 8, 3, 2, 10, 2, 3, 0, 0, 3, 1, 13, 23, 0, 0, 2, 9, 0, 0, 1, 0, 0, 0, 0, 15, 2, 1, 0, 4, 10, 7, 4, 0, 0, 1, 1, 1, 0, 3, 2, 1, 0, 1, 0, 1, 0, 0, 12, 0, 14, 4, 17, 0, 8, 0, 0, 8, 0, 0, 0, 0, 0, 14, 37, 52, 2, 1, 0, 3, 1, 0, 0, 43, 0, 0, 0, 0, 0, 15, 47, 1, 0, 4, 0, 0, 10, 0, 6, 27, 2, 0, 2, 57, 1, 8, 67, 36, 1, 0, 0, 2, 23, 18, 1, 0, 0, 1, 0, 6, 0, 4, 8, 0, 7, 0, 2, 0, 6, 0, 0, 0, 0, 4, 2, 0, 1, 4, 0, 0, 0, 2, 2, 30, 12, 2, 2, 2, 0, 0, 1, 1, 1, 1, 1, 0, 1, 16, 0, 49, 0, 0, 0)
I combine them into a dataframe, plot them and add the linear model. It seems to fit pretty well: scatterplot
d.combined < data.frame(n=n, general=general, special=special) ggplot(d.combined , aes(x=general, y=special)) + geom_point() + geom_smooth(method = "lm", se = F)
The linear model looks good, Rsquared is high, pvalue is low:
lm_comb < lm(special~general, d.combined) summary.lm(lm_comb) #Residual standard error: 3.023 on 189 degrees of freedom #Multiple Rsquared: 0.9232, Adjusted Rsquared: 0.9228 #Fstatistic: 2273 on 1 and 189 DF, pvalue: < 2.2e16
Going with the Cook's distance I can see that there are 7 clear outliers, which is perfect and even fits what I know from previous data.
cooksdc < cooks.distance(lm_comb) plot(cooksdc, cex=0.5) abline(h = 4*mean(cooksdc, na.rm=T), col="red") text(x=1:length(cooksdc)+1, y=cooksdc, labels=ifelse(cooksdc>4*mean(cooksdc, na.rm=T), names(cooksdc), ""), col="red")
I also used the outliers package and (mis)used the scores() function with t, z and chi squared tests. They find 11 outliers for the last and 12 for the first two tests, all at 95% probability. They all make sense, too.
But the TukeyAnscombe plot shows a cone towards the right, which means it is not linear. Right?
TA.plot(lm_comb) #requires sfsmisc package
The questions:
 Am I on the right track? I.e., does my approach in principle even work for my question? Am I interpreting the TA plot correctly? Why does the linear regression give me such a high Rsquared when the TA plot indicates nonlinearity?
 What options do I have to fit an apparently nonlinear model? How?
 Do I have to do that or is there another solution?

Is there a way to get the error in fitting parameters from scipy.stats.norm.fit?
I have some data which I have fitted a normal distribution to using the scipy.stats.normal objects fit function like so:
import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm import matplotlib.mlab as mlab x = np.random.normal(size=50000) fig, ax = plt.subplots() nbins = 75 mu, sigma = norm.fit(x) n, bins, patches = ax.hist(x,nbins,normed=1,facecolor = 'grey', alpha = 0.5, label='before'); y0 = mlab.normpdf(bins, mu, sigma) # Line of best fit ax.plot(bins,y0,'k',linewidth = 2, label='fit before') ax.set_title('$\mu$={}, $\sigma$={}'.format(mu, sigma)) plt.show()
I would now like to extract the uncertainty/error in the fitted mu and sigma values. How can I go about this?

How to find knot vector if control points are known for a NURBS curve?
I have a set of control points
pts = [[849, 1181], [916, 1257], [993, 1305], [1082,1270], [1137,1181], [1118,1055], [993,1034], [873,1061], [849, 1181]]
I have the logic for generating an open knot vector:
/* Subroutine to generate a Bspline open knot vector with multiplicity equal to the order at the ends. c = order of the basis function n = the number of defining polygon vertices nplus2 = index of x() for the first occurence of the maximum knot vector value nplusc = maximum value of the knot vector  $n + c$ x() = array containing the knot vector */ knot(n,c,x) int n,c; int x[]; { int nplusc,nplus2,i; nplusc = n + c; nplus2 = n + 2; x[1] = 0; for (i = 2; i <= nplusc; i++){ if ( (i > c) && (i < nplus2) ) x[i] = x[i1] + 1; else x[i] = x[i1]; } }
And another one for generating periodic knot vector:
/* Subroutine to generate a Bspline uniform (periodic) knot vector. c = order of the basis function n = the number of defining polygon vertices nplus2 = index of x() for the first occurence of the maximum knot vector value nplusc = maximum value of the knot vector  $n + c$ x[] = array containing the knot vector */ #include <stdio.h> knotu(n,c,x) int n,c; int x[]; { int nplusc,nplus2,i; nplusc = n + c; nplus2 = n + 2; x[1] = 0; for (i = 2; i <= nplusc; i++){ x[i] = i1; } }
I however need to generate a nonuniform knot vector in the range of [0,1]
The above algorithms result in uniform knot vectors.
Kindly suggest if there is any way to do this. It would be preferable if the code is in python

RBF ANN on MATLAB
I have a matrix of 71 rows and 7 columns. Using this matrix and a mathematical formula, I create another matrix that has 71 rows and 1 column. And now my problem!!!! When I send the first matrix as the argument
P
and the second matrix as the argumentT
to thenewrb
function for the estimated time series, MATLAB shows thenonmatching matrix error
.How do I solve my problem?

A confusion about space transformation in OpenGL
In the book of 3D graphics for game programming by JungHyun Han, at page 3839, it is given that
the basis transformation matrix from e_1, e_2, e_3 to u,v,n is
However, this contradicts with what I know from linear algebra. I mean shouldn't the basistransformation matrix be the transpose of that matrix ?
Note that the author does his derivation, but I couldn't find where is the missing point between what I know and what the author does.
The code: Vertex Shader:
#version 330 layout(location = 0) in vec4 position; layout(location = 1) in vec4 color; uniform vec3 cameraPosition; uniform vec3 AT; uniform vec3 UP; uniform mat4 worldTrans; vec3 ep_1 = ( cameraPosition  AT )/ length(cameraPosition  AT); vec3 ep_2 = ( cross(UP, ep_1) )/length( cross(UP, ep_1 )); vec3 ep_3 = cross(ep_1, ep_2); vec4 t_ep_1 = vec4(ep_1, 1.0f); vec4 t_ep_2 = vec4(ep_2, cameraPosition.y); vec4 t_ep_3 = vec4(ep_3, cameraPosition.z); mat4 viewTransform = mat4(t_ep_1, t_ep_2, t_ep_3, vec4(0.0f, 0.0f, 0.0f, 1.0f)); smooth out vec4 fragColor; void main() { gl_Position = transpose(viewTransform) * position; fragColor = color; } )glsl";
Inputs:
GLuint transMat = glGetUniformLocation(m_Program.m_shaderProgram, "worldTrans"); GLfloat dArray[16] = {0.0}; dArray[0] = 1; dArray[3] = 0.5; dArray[5] = 1; dArray[7] = 0.5; dArray[10] = 1; dArray[11] = 0; dArray[15] = 1; glUniformMatrix4fv(transMat, 1, GL_TRUE, &dArray[0]); GLuint cameraPosId = glGetUniformLocation(m_Program.m_shaderProgram, "cameraPosition"); GLuint ATId = glGetUniformLocation(m_Program.m_shaderProgram, "AT"); GLuint UPId = glGetUniformLocation(m_Program.m_shaderProgram, "UP"); const float cameraPosition[4] = {2.0f, 0.0f, 0.0f}; const float AT[4] = {1.0f, 0.0f, 0.0f}; const float UP[4] = {0.0f, 0.0f, 1.0f}; glUniform3fv(cameraPosId, 1, cameraPosition); glUniform3fv(ATId, 1, AT); glUniform3fv(UPId, 1, UP);