Matrix calculation (in R)  least squares
I have a matrix in R as per the following:
(72  90  60)
(60 83   )
(50  77 83 )
( 40  84 65)
Reproducible as follows:
B = matrix(c(72,60,50,NA,NA,83,NA,40,90,NA,77,NA,NA,NA,83,84,60,NA,NA,65),nrow=4,ncol=5)
It is a sparse matrix in the sense that not all elements are populated. What i want to do is find an adjustment matrix (or vector, that is added to each column) so that the average of the rows is equal to the average of columns. The answer is:
(13.04, 10.93, 14.68, 22.70, 5.90)
I computed this iteratively and is also rounded to 2dp so doesn't quite equate, but, for example, it should be the case that if you add the adjustments to the original matrix:
A = t(t(B)+c(13.04,10.93,14.68,21.70,5.9))
then the average of the row averages should equal the average of the column averages:
mean(colMeans(A, na.rm = TRUE, dims = 1))
mean(rowMeans(A, na.rm = TRUE, dims = 1))
However, I think this is a least squares problem and can be solved algebraicly but i don't know how to do it or how to set it out. The question is, is there a way to find the adjustment algebraicly using matrix calculation rather than iteratively. Can anyone help?
See also questions close to this topic
 Remove nth percentage of cells randomly from a raster in r

python 3/ R UCI Robot failure data set
hi I am trying to get this data into python 3 / R. How do i read it into .csv format?
https://archive.ics.uci.edu/ml/machinelearningdatabases/robotfailuremld/ https://archive.ics.uci.edu/ml/machinelearningdatabases/robotfailuremld/lp1.data 
Preprocessing tweets.json file in R
I have extracted the tweets in JSON file using twitterscraper. Now, I have to preprocess these tweets in R. How to handle the tweets json file in R, anyone can help me in this problem?

Haskell Multiply Matrix NxN
I have a problem, I want to multiply matrix NxN So far I got
main = do putStrLn "Digite 'matriz' seguido do tamanho (100,1000,2000)" matriz :: Int > [[(Int, Int)]] matriz let n = map (\x > map (\y > (x, y)) [1..n]) [1..n]
But Im having problems with the putStrLn , the problem is: input a value
n
(100,1000,2000) then I have to calculate the matrix times matrix e.g n = 100  > matrix 100x100 times another matrix 100x100 how about that one? 
Matrix3x2 Performance
In my graphics application, I can represent matrices using either
SharpDX.Matrix3x2
orSystem.Numerics.Matrix3x2
. However, upon running both matrices through a performance test, I found that SharpDX's matrices handily defeatSystem.Numerics.Matrix3x2
by a margin of up to 70% in terms of time. My test was a pretty simple repeated multiplication, here's the code:var times1 = new List<float>(); for (var i = 0; i < 100; i++) { var sw = Stopwatch.StartNew(); var mat = SharpDX.Matrix3x2.Identity; for (var j = 0; j < 10000; j++) mat *= SharpDX.Matrix3x2.Rotation(13); sw.Stop(); times1.Add(sw.ElapsedTicks); } var times2 = new List<float>(); for (var i = 0; i < 100; i++) { var sw = Stopwatch.StartNew(); var mat = System.Numerics.Matrix3x2.Identity; for (var j = 0; j < 10000; j++) mat *= System.Numerics.Matrix3x2.CreateRotation(13); sw.Stop(); times2.Add(sw.ElapsedTicks); } TestContext.WriteLine($"SharpDX: {times1.Average()}\nSystem.Numerics: {times2.Average()}");
I ran these tests on an Intel i56200U processor.
Now, my question is, how can SharpDX's matrices possibly be faster? Isn't
System.Numerics.Matrix3x2
supposed to utilise SIMD instructions to execute faster?The implementation of
SharpDX.Matrix3x2
is available here, and as you can see, it's written in plain C#. 
Nodes in 2D heat plate  Matrix Solving  Laplace Equation  Matlab
I do not understand why Matlab is giving me the " Index exceeds Matrix dimensions"
j = 1; for i = 1:Nx; n = i+(j1)*Nx; M(n,n) = 1; B(n,1) = bottom_boundary(i); end
I am probably just being really stupid and not seeing the error but my logic seems to be correct. It lets me loop through the for loop the first time and places the value in B(1,1) when n = 1. When n = 5, it gives me the error that the index exceeds matrix dimensions. B is a 16x1 row vector. Workspace  Variables

R debugging: OLS yields same result for different betas
I'm new to R and I'm trying to write a program to predict with ols. I've generated 10 datasets with the lapply function
R<10 n<100 p<50 s<10 dlist<lapply(1:R,function(x){ set.seed(x) IM<diag(p) X<mvrnorm(n,mu=rep(0,p),Sigma=IM) #matrix X epsilon<matrix(rnorm(n,mean=0,sd=1), nrow=n) #residual b<c(rep(1,s),rep(0,ps)) #beta depends on s Y<X%*%b+epsilon #model X1<data.frame(Y=Y,X) #training/testing set samplesize = floor(0.75 * nrow(X1)) index<sample(seq_len(nrow(X1)),size=samplesize) #training set X.train<X1[index,] #testing set X.test<X1[index,] xtrain<model.matrix(Y~.1,data=X.train) ytrain<X.train$Y xtest=model.matrix(Y~.1,data=X.test) ytest=X.test$Y return(list(xtrain=xtrain,ytrain=ytrain,xtest=xtest,ytest=ytest)) })
So whenever I run OLS on that dlist:
ols<matrix(NA,nrow(dlist[[i]]$xtest),R) for(i in 1:R){ ytrain<dlist[[i]]$ytrain xtrain<dlist[[i]]$xtrain ytest<dlist[[i]]$ytest xtest<dlist[[i]]$xtest #ols on training set olslm<lm(ytrain~xtrain) #prediction on testing set xtrain<dlist[[i]]$xtest ols[,i]=predict(olslm,newdata=data.frame(xtrain)) } ols
and then calculate the mse:
matrix<list() for(i in 1:R){matrix[[i]]<matrix(dlist[[i]]$ytest,ncol=1)} ytest<do.call(cbind,matrix) olsmse<mean((olsytest)^2) olsmse
I always get the same result when I change parameter s to something else (40 for example) to show that beta has a different structure than before. And I also get the same result when I change the X to a different matrix.
Does anyone have any idea why that's the case?
Thanks a lot in advance!

How to find the best degree of polynomials?
I'm new to Machine Learning and currently got stuck with this. First I use linear regression to fit the training set but get very large RMSE. Then I tried using polynomial regression to reduce the bias.
import numpy as np from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures from sklearn.metrics import mean_squared_error poly_features = PolynomialFeatures(degree=2, include_bias=False) X_poly = poly_features.fit_transform(X) poly_reg = LinearRegression() poly_reg.fit(X_poly, y) poly_predict = poly_reg.predict(X_poly) poly_mse = mean_squared_error(X, poly_predict) poly_rmse = np.sqrt(poly_mse) poly_rmse
Then I got slightly better result than linear regression, then I continued to set degree = 3/4/5, the result kept getting better. But it might be somewhat overfitting as degree increased.
The best degree of polynomial should be the degree that generates the lowest RMSE in cross validation set. But I don't have any idea how to achieve that. Should I use GridSearchCV? or any other method?
Much appreciate if you could me with this.

Sklearn LinearRegression shapes not alligned
I'm trying to use sklearn for a linear regression prediction.
I have two lists of numbers(age in seconds sinds 1970) and some price.
When I try to predict my line I get the following error:
ValueError: shapes (1,1000) and (14697,14697) not aligned: 1000 (dim 1) != 14697 (dim 0)
My code:
inkoop = np.asarray(list(df['Bedrag_Inkoop_Ingediend_Mono']),dtype=np.number) tempT = list(df['dtGeboorteDatum']) dtAge = np.asarray([(timedatetime.date(1970,1,1)).total_seconds() for time in tempT],dtype=np.number) dtAgeTestY = dtAge[1000:] dtAgeTrainY = dtAge[:1000] inkoopTestX = inkoop[1000:] inkoopTrainX = inkoop[:1000] regr = linear_model.LinearRegression() regr.fit(inkoopTrainX,dtAgeTrainY.reshape(1,1)) agePredictY = regr.predict(inkoopTestX.reshape(1,1)) # The coefficients print('Coefficients: \n', regr.coef_) # The mean squared error print("Mean squared error: %.2f" % mean_squared_error(inkoopTest, agePredictY)) # Explained variance score: 1 is perfect prediction print('Variance score: %.2f' % r2_score(inkoopTest, agePredictY)) plt.scatter(inkoopTestX, dtAgeTestY, color='black') plt.plot(dtAgeTestY, agePredictY, color='blue', linewidth=3) plt.xticks(()) plt.yticks(()) plt.show()
He errors on agePredictY = regr.predict(inkoopTestX.reshape(1,1)). I also tried without reshaping

Using NumPy functions in Cython for leastsquares fitting of array elements
I need to write a script that will do leastsquares fitting, pixel by pixel for a stack of 4 similar 500x500 images. As in, I need to fit the values of a specific pixel location on all four images to a vector of length three, using the same 4x3 matrix for each pixel.
I don't see a way to do this without doing a nested forloop iteration through each pixel, so I figured cython could speed things up. I have never worked with cython before, but I wrote the following code based on the documentation examples.
The issue is, this is running as slow or slower (~27 s) than a pure python implementation (~25 s).
Does anyone see what is slowing this down? Thanks!
import numpy as np cimport numpy as np cimport cython npint = np.int16 npfloat = np.float64 ctypedef np.int16_t npint_t ctypedef np.float64_t npfloat_t @cython.boundscheck(False) @cython.wraparound(False) def fourbythree(np.ndarray[npfloat_t, ndim=2] U_mat, np.ndarray[npint_t, ndim=3] G): assert U_mat.dtype == npfloat and G.dtype == npint cdef unsigned int z = G.shape[0] cdef unsigned int rows = G.shape[1] cdef unsigned int cols = G.shape[2] cdef np.ndarray[npfloat_t, ndim= 3] a = np.empty((z  1, rows, cols), dtype=npfloat) cdef npfloat_t resid cdef unsigned int rank cdef Py_ssize_t row, col cdef np.ndarray s for row in range(rows): for col in range(cols): a[:, row, col] = np.linalg.lstsq(U_mat, G[:, row, col])[0] return a

what does opencv fitLine do?
According to this answer,
cv::fitLine
withdistType
set toCV_DIST_L2
is a standard least squares method. However, looking at the code it's seems to use a different formula:origin = (mean(x), mean(y)); // this is as in standard ls angle = 0.5 * atan2( 2 * cov(x, y), (var(x)  var(y)));
can someone explain me the last formula?

Convergence when utilizing scipy.odr module to find bestfit parameters when there is only horizontal errorbars
I am trying to fit a piecewise (otherwise linear) function to a set of experimental data. The form of the data is such that there is only horizontal error bars and no vertical error bars. I am familiar with
scipy.optimize.curve_fit
module but that works when there is only vertical error bars corresponding to the dependent variable y. After searching for my specific need, I came across the following post where it explains about the possibility of usingscipy.odr
module when errorbars are those of independent variable x. (Correct fitting with scipy curve_fit including errors in x?)Attached is my version of the code which tries to find bestfit parameters using ODR methodology. It actually draws bestfit function and it seems it's working. However, after changing initial (educated guess) values and trying to extract bestfit parameters, I am getting the same guessed parameters I inserted initially. This means that the method is not convergent and you can verify this by printing
output.stopreason
and getting['Numerical error detected']
So, my question is whether this methodology is consistent with my function being piecewise and if not, if there is any other correct methodology to adopt in such cases?
from numpy import * import matplotlib.pyplot as plt from matplotlib.ticker import MaxNLocator from scipy.odr import ODR, Model, Data, RealData x_array=array([8.2,8.6,9.,9.4,9.8,10.2,10.6,11.,11.4,11.8]) x_err_array=array([0.2]*10) y_array=array([2.05179545,1.64998354,1.49136169,0.94200805,0.60205999,0.,0.,0.,0.,0.]) y_err_array=array([0]*10) # Linear Fitting Model def func(beta, x): return piecewise(x, [x < beta[0]], [lambda x:beta[1]*xbeta[1]*beta[0], lambda x:0.0]) data = RealData(x_array, y_array, x_err_array, y_err_array) model = Model(func) odr = ODR(data, model, [10.1,1.02]) odr.set_job(fit_type=0) output = odr.run() f, (ax1) = plt.subplots(1, sharex=True, sharey=True, figsize=(10,10)) ax1.errorbar(x_array, y_array, xerr = x_err_array, yerr = y_err_array, ecolor = 'blue', elinewidth = 3, capsize = 3, linestyle = '') ax1.plot(x_array, func(output.beta, x_array), 'blue', linestyle = 'dotted', label='BestFit') ax1.legend(loc='lower right', ncol=1, fontsize=12) ax1.set_xlim([7.95, 12.05]) ax1.set_ylim([2.1, 0.1]) ax1.yaxis.set_major_locator(MaxNLocator(prune='upper')) ax1.set_ylabel('$y$', fontsize=12) ax1.set_xlabel('$x$', fontsize=12) ax1.set_xscale("linear", nonposx='clip') ax1.set_yscale("linear", nonposy='clip') ax1.get_xaxis().tick_bottom() ax1.get_yaxis().tick_left() f.subplots_adjust(top=0.98,bottom=0.14,left=0.14,right=0.98) plt.setp([a.get_xticklabels() for a in f.axes[:1]], visible=True) plt.show()