Eigen: Is it possible to create LeastSquareDiagonalPreconditionerlike conditioner if i only can compute Aty and Ax?
I want to solve leastsquares like system A^t * A * x = A^t * x
. (I'm implementing GaussNewton method for special problem).
I wrote special routines which allow me to compute A * x
and A^t * y
products. With such routines it's easy to use matrixfree solvers thanks to Eigen.
But my approach converges not as good as Eigen::LeastSquaresConjugateGradient
. I made a small test and it's looks like LeastSquareDiagonalPreconditioner
speed ups convergence a lot.
My question is  how i can use LeastSquareDiagonalPreconditioner
or implement own Preconditioner if i can only compute matrix products?
I'm not very good with understanding preconditioning/conjugate gradient stuff btw.
EDIT
For clarity  i want to use Matrixfree solvers from Eigen with my product routines.
EDIT 2
Matrixvector products was obtained by using forward and reverse mode autodiff on some objective functions.
1 answer

The easiest might be to implement your own preconditioner class inheriting
DiagonalPreconditioner
and implementing something likeLeastSquareDiagonalPreconditioner ::factorize()
but adapted to your type. Basically you need to compute:m_invdiag(j) = 1./mat.col(j).squaredNorm();
for all column
j
using a strategy to what you already implemented for the product operators.
See also questions close to this topic

Forward variadic argument to thread with lambda
I'm having trouble finding how use
std::thread()
with lambdas. Namely, with having a variadic argument lambda receive the arguments by forwarding. As an example:template<typename... T> auto foo(T&&... t){ [](T&&... t){}(std::forward<T>(t)...); // (1) return std::thread( // (2) [](T&&... t){}, std::forward<T>(t)... ); } auto bar(){ int n=1; foo(1); (A) foo(n); (B) }
A.1: compiles
A.2: compiles
B.1: compiles
B.2: doesn't compile
I don't understand:
 Why the
std::thread()
(2) version using (B) doesn't compile and (A.2) does?  Why are there differences between (B.1) and (B.2)
 Why the

Conflicts errors (atof and atoi) when compiling C++ header file
I have this header file:
#ifndef PARSER_HPP #define PARSER_HPP #ifdef _DEFAULT_INCLUDES #include <AsDefault.h> #endif #include "Builder.hpp" #include <string> using namespace std; struct Attributes{ string name; string value; }; enum Direction_enum{ write, read }; enum ErrorLevel_enum{ NO_ERROR, OPENING_FILE_ERROR, BUFFERING_DATA_ERROR, CLOSING_FILE_ERROR, CREATING_XML_READER_ERROR, CLOSING_XML_READER_ERROR }; enum Step_enum{ WAITING, CLEANING_UP, OPENING_FILE, BUFFERING_DATA, CLOSING_FILE, CREATING_XML_READER, READING_XML_DATA, CLOSING_XML_READER, ERROR }; enum StructType_enum{ DEFAULT, OTHER, LOCAL_MACHINES }; enum TagType_enum{ identity, acknowledge, value, heartbeat, watchdog }; BuildOpcUaType_enum en; class Parser{ private: //Function blocks: //Structs: BuildOpcUaLists_type buildOpcUaLists_type[20]; //Enumeration types: BuildOpcUaType_enum enCommunicationType[20]; Direction_enum enDirection; ErrorLevel_enum enErrorLevel; Step_enum enStep; StructType_enum enStructType; TagType_enum enTagType; //Variables: Attributes attributes[10]; unsigned int BufferIds[200]; string connectionStrings[20]; unsigned long int nFileLength, nOpenFileIdent, nXMLReaderIdent; //Make sure it is correct unsigned short int length, noOfOpenedStructs, readListIndex, writeListIndex, readListDestIndex, writeListSrcIndex; string comType, sErrorMessage, sStatusText, sXMLElementName, sXMLElementValue; string structNameAttValues[10]; unsigned int TagId_Read[200]; unsigned int TagId_Write[200]; unsigned short int xmlData[10000]; //Boolean variables: bool isArrayTag, isBufferIdTag, isDatatypeTag, isNameTag, bStart, isTagIdTag; //Constants: string sFilePath; string sDeviceName; //The rest? public: Parser(); ~Parser(); };
endif
Whenever it compiles, I get these errors: declaration of C function 'double atof(const char*)' conflicts with... previous declaration 'float atof(long unsigned int)' here declaration of C function 'int atoi (const char*)' conflicts with... previous declaration 'long int atoi(long unsigned int)' here

IF statement to control range of numbers using array C++
I have created an array that holds 5 numbers, and the user inputs the numbers. If the mark is less than 0 and greater than 100, I want to print out "invalid mark number". How could I go about doing this? I thought about using an IF statement but didn't know how to go about it.
using namespace std; int mark[5]; int main () { cout << "enter mark 0: "; cin >> mark[0]; cout << "enter mark 1: "; cin >> mark[1]; cout << "enter mark 2: "; cin >> mark[2]; cout << "enter mark 3: "; cin >> mark[3]; cout << "enter mark 4: "; cin >> mark[4]; }

Eigen solver for sparse matrix product
I want to use Eigen to compute L_1^{1}L_2, where both L_1 and L_2 are lower triangular matrices and are stored as column oriented sparse matrices in Eigen. I tried the Eigen triangular solver. However, that requires L_2 to be dense.

eigenvalues and eigenvectors scipy
I was wondering which method does
[scipy.sparse.linalg.eigs][1]
actually implement to calculate the first k eigenvalues and eigenvectors.I found this http://en.wikipedia.org/wiki/List_of_numerical_analysis_topics#Eigenvalue_algorithms but which one is actually implemented.

Sum of the multiplication of all columns of one matrix (nbym) with another matrix (nbyn)
I hope the title is selfexplanatory. I hoped to make the title such that others could find it as well. I know how to carry out the operation with a loop, but it must be quicker using some kind of matrix multiplication, which I am interested to learn.
The code with the loop looks something like this
x < matrix(rexp(300, rate=.1), nrow=20) y < matrix(rexp(400, rate=.1), nrow=20) res < as.data.frame(matrix(0,ncol = 15, nrow = 20)) for (i in 1:20){ res < res + x*y[,i] }

Enabling Open MP Support in Visual Studio 2017 slows down codes
I am trying to use OpenMP to speed up my codes for Neural Network Computation. As I am using Visual Studio 2017, I need to enable the OpenMP Support in the Property sheets. However, after I have done that, some part of the code slows down by around 5 times even though I did not include any
#pragma omp
in the code.I have isolated the sections and found out that this particular function is causing the problem:
void foo(Eigen::Matrix<float,3,Eigen::Dynamic> inputPts) { std::vector<Eigen::MatrixXf> activation; activation.reserve(layerNo); activation.push_back(inputPts); int inputNo = inputPts.cols(); for (int i = 0; i < layerNo  2; i++) activation.push_back(((weights[i]*activation[i]).colwise()+bias[i]).array().tanh()); activation.push_back(((weights[layerNo  2]*activation[layerNo  2]).colwise()+bias[layerNo  2])); val = activation[layerNo  1]/scalingFactor; std::vector<Eigen::MatrixXf> delta; delta.reserve(layerNo); Eigen::Matrix<float, 1, Eigen::Dynamic> seed; seed.setOnes(1, inputNo); delta.push_back(seed); for (int i = layerNo  2; i >= 1; i) { Eigen::Matrix<float,Eigen::Dynamic,Eigen::Dynamic> d_temp = weights[i].transpose()*delta[layerNo  2  i], d_temp2 = 1  activation[i].array().square(), deltaLayer = d_temp.cwiseProduct(d_temp2); delta.push_back(deltaLayer); } grad = weights[0].transpose()*delta[layerNo  2]; }
The two forloops are the one that slow down significantly (from ~3ms to ~20ms). Strangely, although this function is called many times in the program, only some of them are affected.
I have included the header file
<omp.h>
. I am not sure whether it is due to the Eigen library, which is used everywhere. I tried definingEIGEN_DONT_PARALLELIZE
and callingEigen::initParallel()
as suggested in the official site but it does not help.The weird thing is that I did not even include any
parallel pragma
at all, there should not be any overhead to handle the OpenMP functions? Why is it still slowing down? 
Bottleneck in the Eigen partial_lu_inplace when factorizing a large number of small matrices
I need to factorize ~1e05 small matrices of maximal variable size of 20x20. Profiling of the matrix factorization using HpcToolkit shows that the hotspot in the code is in
Eigen::internal::partial_lu_inplace
.I checked the Eigen documenation on the inplace matrix decomposition, and I understand it can be important for large matrices to use inplace decomposition, to reuse the memory and have better cache efficiency.
I am currently computing the decomposition like this:
// Factorize the matrix. matrixFactorization_ = A_.partialPivLu();
Profiling with HpcToolkit shows that the inplace factorization is the hotspot:
Is it possible to disable the inplace decomposition and test if the code will be faster for small matrices I am dealing with?
Note: If you look at the CPU time in the column on the image, you'll notice the runtime is in seconds: I am not after microsecond optimizations here, the calculation takes ~4 seconds in total.
Edit: HPCToolkit statistically profiles the code in fully optimized mode
O3
, but with the information that is required to map the measurements to the sourcecodeg3
. 
Algorithm for Eigenvalues/ Eigenvectors calculation in Eigen library?
Please I want to know the "Eigen C++ Library" uses which algorithm for the calculation of the eigenvectors and eigenvalues of a complex matrix?
Thank you

Least Squares: Why is np.ployfit faster than linegress when iterated over DataFrame rows?
import pandas as pd from scipy.stats import linregress import numpy as np import time df = pd.DataFrame({'signal': [270400, 270320, 270278.2609, 270195.8333, 270207.1429, 270081.8182, 269891.3043, 269991.6667, 270153.3333, 270285, 270357.1429, 270485.7143, 270526.3158, 270422.7273, 270310.7143, 270486.3636, 270593.9394, 270655.5556, 270669.2308, 270800, 270911.1111, 271083.3333, 271180, 271084.6154, 270800, 270966.6667, 271100, 271030.7692, 271053.8462, 271123.8095, 271050, 271165.5172, 271384, 271486.6667, 271554.5455, 271600, 271629.1667, 271526.6667, 271612.5, 271663.6364, 271553.5714, 271384, 271400, 271676.9231, 271666.6667, 271624, 271540.7407, 271503.125, 271600, 271620.8333, 271426.087, 271492.8571, 271482.1429, 271426.3158, 271207.6923, 271146.6667, 271057.8947, 271000, 271005.5556, 271044, 270930.4348, 270707.6923, 270833.3333, 270950, 270912.5, 270850, 270711.1111, 270925.9259, 271270, 271220, 271050, 271069.2308, 271037.5, 270954.5455, 270720, 270631.25, 270450, 270387.5, 270196, 270271.4286, 269920, 269760, 269562.5, 269500, 269867.8571, 269792.8571, 269854.1667, 269666.6667, 269417.3913, 269183.3333, 268966.6667, 268859.0909, 268707.6923, 268634.4828, 268516.6667, 268500, 268753.3333, 269175, 269309.5238, 269440.9091, 269394.4444, 269147.3684, 269133.3333, 269083.3333, 269126.087, 269145, 269164.7059, 269175, 269247.3684, 269540.9091, 270039.2857, 270150, 270283.3333, 270480.9524, 270692.8571, 270350, 270414.2857, 270277.7778, 270359.2593, 270148.1481, 269905, 269800, 269731.8182, 270186.6667, 270271.4286, 269983.3333, 269808, 269900, 269652.381, 269434.6154, 269659.2593, 269770.9677, 269831.0345, 269960, 270000, 270021.7391, 269868.4211, 269421.0526, 269106.25, 268813.3333, 268685.7143, 269421.4286, 269712.5, 269546.1538, 269332.2581, 269103.3333, 269289.2857, 269493.5484, 269650, 269710, 269444.4444, 269727.2727, 269833.3333, 269844.4444, 269985.7143, 270445.4545, 270320, 270092.5926, 269978.9474, 269866.6667, 269666.6667, 269468.75, 269582.3529, 269300, 269336.3636, 269309.5238, 269233.3333, 269208, 269100]}) for i in [10 ** 3, 10 ** 4]: df1 = pd.concat([df] * i) start = time.time() slope, intercept, r_value, p_value, std_err = linregress(df1.index, df1.signal) end = time.time() print('running time of for SCIPY Library code: ' + str(end  start)) start = time.time() coeffs = np.polyfit(df1.index, df1.signal, 1) # 1=linear end = time.time() print('running time of for Numpy Library code: ' + str(end  start), '\n') new_col = 'signal2' window = 10 print('slidingwindow ') for i in [10 ** 3, 10 ** 4]: df1 = pd.concat([df] * i) start = time.time() for index, row in df.iterrows(): if index >= window: df1 = df[index  window:index] slope, intercept, r_value, p_value, std_err = linregress(df1.index, df1.signal) df.loc[index, new_col + 'stats'] = slope * (index) + (intercept) end = time.time() print('running time of for SCIPY Library code: ' + str(end  start)) start = time.time() for index, row in df.iterrows(): if index >= window: df1 = df[index  window:index] coeffs = np.polyfit(df1.index, df1.signal, 1) # 1=linear df.loc[index, new_col + 'np'] = coeffs[0] * (index) + (coeffs[1]) end = time.time() print('running time of for Numpy Library code: ' + str(end  start), '\n')
running time of for SCIPY Library code: 0.015627384185791016 running time of for Numpy Library code: 0.02213263511657715 running time of for SCIPY Library code: 0.046897172927856445 running time of for Numpy Library code: 0.24495553970336914 slidingwindow running time of for SCIPY Library code: 0.12841463088989258 running time of for Numpy Library code: 0.08266448974609375 running time of for SCIPY Library code: 0.1006932258605957 running time of for Numpy Library code: 0.08883190155029297
In the question Least Squares: Is there a way to improve the performance of this code? packages used: sm.OLS and pd it is shown that linegress is faster than np.ployfit. Here, when iterrows is added, np.ployfit is faster. Why?

Estimating Parameters (Coefficients) of Autoregressive (AR) Equation using Least Means Square (LMS) Algorithm
I want to do two things.
 Estimating Coefficients of AR model using LMS filter
 Using Coefficients found in step 1 and predict future samples of a signal using AR equation.
I don't have a desired signal so I need to use future samples of the signal.
The attached Figure 1 shows the estimation of two Coefficients which is step 1
% x=data=signal plus noise;
% mu=step size factor;
% M=number of filter coefficients;
% w1 is a matrix and each column is the history of each filter coefficient versus time n;
N=length(x); y=zeros(1,N); w=zeros(1,M);
for n=M:N1
x1=x(n:1:nM+1); y(n)=w*x1'; e(n)=x(n+1)y(n); w=w+2*mu*e(n)*x1; w1(nM+1,:)=w(1,:);
end;
The problem with my code is its giving me very small coefficients (the values are very different from actual ones) There is some error in my code and I am not sure what is it. The above code should give me Coefficients which then I pass to step 2 for AR prediction.

Least Squares: Is there a way to improve the performance of this code? packages used: sm.OLS and pd
Following function calculates the slope and intercept from a pandas data frame. For big data the calculations takes too long.
import pandas as pd import statsmodels.api as sm df = pd.DataFrame({'signal': [270400, 270320, 270278.2609, 270195.8333, 270207.1429, 270081.8182, 269891.3043, 269991.6667, 270153.3333, 270285, 270357.1429, 270485.7143, 270526.3158, 270422.7273, 270310.7143, 270486.3636, 270593.9394, 270655.5556, 270669.2308, 270800, 270911.1111, 271083.3333, 271180, 271084.6154, 270800, 270966.6667, 271100, 271030.7692, 271053.8462, 271123.8095, 271050, 271165.5172, 271384, 271486.6667, 271554.5455, 271600, 271629.1667, 271526.6667, 271612.5, 271663.6364, 271553.5714, 271384, 271400, 271676.9231, 271666.6667, 271624, 271540.7407, 271503.125, 271600, 271620.8333, 271426.087, 271492.8571, 271482.1429, 271426.3158, 271207.6923, 271146.6667, 271057.8947, 271000, 271005.5556, 271044, 270930.4348, 270707.6923, 270833.3333, 270950, 270912.5, 270850, 270711.1111, 270925.9259, 271270, 271220, 271050, 271069.2308, 271037.5, 270954.5455, 270720, 270631.25, 270450, 270387.5, 270196, 270271.4286, 269920, 269760, 269562.5, 269500, 269867.8571, 269792.8571, 269854.1667, 269666.6667, 269417.3913, 269183.3333, 268966.6667, 268859.0909, 268707.6923, 268634.4828, 268516.6667, 268500, 268753.3333, 269175, 269309.5238, 269440.9091, 269394.4444, 269147.3684, 269133.3333, 269083.3333, 269126.087, 269145, 269164.7059, 269175, 269247.3684, 269540.9091, 270039.2857, 270150, 270283.3333, 270480.9524, 270692.8571, 270350, 270414.2857, 270277.7778, 270359.2593, 270148.1481, 269905, 269800, 269731.8182, 270186.6667, 270271.4286, 269983.3333, 269808, 269900, 269652.381, 269434.6154, 269659.2593, 269770.9677, 269831.0345, 269960, 270000, 270021.7391, 269868.4211, 269421.0526, 269106.25, 268813.3333, 268685.7143, 269421.4286, 269712.5, 269546.1538, 269332.2581, 269103.3333, 269289.2857, 269493.5484, 269650, 269710, 269444.4444, 269727.2727, 269833.3333, 269844.4444, 269985.7143, 270445.4545, 270320, 270092.5926, 269978.9474, 269866.6667, 269666.6667, 269468.75, 269582.3529, 269300, 269336.3636, 269309.5238, 269233.3333, 269208, 269100] }) def OLS_slope_intercept(df,select_colname='open'): model = sm.OLS(df[select_colname], sm.add_constant(df.index)).fit() intercept = model.params[0] slope = model.params[1] return slope, intercept import time start = time.time() slope, intercept = OLS_slope_intercept(df,'signal') end= time.time() print('running time of code:' +str(endstart))

Why matrices get copied while passing through Ref & in Eigen
I tested whether the variables get copied by writing the following pieces of code. This piece of code comes from the official documentation: https://eigen.tuxfamily.org/dox/classEigen_1_1Ref.html
void cov(const Ref<const MatrixXf> & x, const Ref<const MatrixXf> & y, Ref<MatrixXf> C) { cout << "address of x : " << &x << endl; cout << "address of C : " << &C << endl; } int main(int argc, const char * argv[]) { MatrixXf m1(3,3); MatrixXf m2(3,3); MatrixXf m3(3,3); m1 << 1,2,3,4,5,6,7,8,9; m2 << 1,2,3,4,5,6,7,8,9; m3 << 1,2,3,4,5,6,7,8,9; cout << "address of m1 : " << &m1 << endl; cout << "address of m3 : " << &m3 << endl; cov(m1, m2, m3); }
The output is as followed.
address of m1 : 0x7ffeefbff4e8 address of m3 : 0x7ffeefbff498 address of x : 0x7ffeefbff370 address of C : 0x7ffeefbff308
The address of x and m1, m3 and C are different (I supposed that they should be the same, since I am passing the variables through reference). Could anyone explain to me why?
Thanks to @Nelfeal's answer. I tried to use debugger to prove this. The following is the debugging info for the above code. We could see that within m1 and x. The "m_data" shared the same address 0x329f800.
However, could someone please tell me the difference btw the following 2 pieces of code? I supposed "Ref" is already a reference itself then why do we still have to add the reference mark "&"?
void cov(const Ref<const MatrixXf> x, const Ref<const MatrixXf> y, Ref<MatrixXf> C) void cov(const Ref<const MatrixXf> &x, const Ref<const MatrixXf> &y, Ref<MatrixXf> C)

Eigen rowwise addition/subtraction performance
During profiling of my program i found that following lines is bottleneck
// Some big nested loop { const auto inpRow = inpMap.row(counter); outMap.row(adjRow) = inpRow; outMap.row(colInd) += inpRow; }
outMap
andinpMap
areEigen::Map<Eigen::MatrixRX<Scalar>>
whereEigen::MatrixRX
defined asEigen::Matrix<Scalar, 1, 1, Eigen::RowMajor>
i.e. row major matrix.Is there a way to improve performance of such operations? (Except
parallel for
of course)