Results of Ridgeregression
I've faced a problem connected with Ridgeregression.
As it's known, Ridgeregression is used in cases of strong conditionality of features. This is just my case: the determinant of my matrix of interfactor correlation is of the order of 10^(18)
. Multicollinearity is my case. The sampling of data consists of 8 quantitative features.
Ridge regression gives worse (or just the same) results than the standard linear regression.
What leads to this result? How can the results be improved?
1 answer

Ridge regression has one obvious disadvantage. Unlike best subset, forward stepwise, and backward stepwise selection, which will generally select models that involve just subset of the variables, ridge regression will include all predictiors in the final model. The lasso is a relatively recent alternative to ridge regression that overcomes this disadvantage. However, have you already considered selecting the tuning parameter using crossvalidation? reference: Chapter 6 Linear Model Selection and Regularization; ISLR
See also questions close to this topic

expected conv2d_7 to have shape (4, 268, 1) but got array with shape (1, 270, 480)
I'm having trouble with this autoencoder I'm building using Keras. The input's shape is dependent on the screen size, and the output is going to be a prediction of the next screen size... However there seems to be an error that I cannot figure out... Please excuse my awful formatting on this website...
Code:
def model_build(): input_img = InputLayer(shape=(1, env_size()[1], env_size()[0])) x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img) x = MaxPooling2D((2, 2), padding='same')(x) x = Conv2D(16, (3, 3), activation='relu', padding='same')(x) x = MaxPooling2D((2, 2), padding='same')(x) x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) encoded = MaxPooling2D((2, 2), padding='same')(x) x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded) x = UpSampling2D((2, 2))(x) x = Conv2D(16, (3, 3), activation='relu', padding='same')(x) x = UpSampling2D((2, 2))(x) x = Conv2D(32, (3, 3), activation='relu')(x) x = UpSampling2D((2, 2))(x) decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) model = Model(input_img, decoded) return model if __name__ == '__main__': model = model_build() model.compile('adam', 'mean_squared_error') y = np.array([env()]) print(y.shape) print(y.ndim) debug = model.fit(np.array([[env()]]), np.array([[env()]]))
Error:
Traceback (most recent call last): File "/home/ai/Desktop/algernontest/rewarders.py", line 46, in debug = model.fit(np.array([[env()]]), np.array([[env()]])) File "/home/ai/.local/lib/python3.6/sitepackages/keras/engine/training.py", line 952, in fit batch_size=batch_size) File "/home/ai/.local/lib/python3.6/sitepackages/keras/engine/training.py", line 789, in _standardize_user_data exception_prefix='target') File "/home/ai/.local/lib/python3.6/sitepackages/keras/engine/training_utils.py", line 138, in standardize_input_data str(data_shape)) ValueError: Error when checking target: expected conv2d_7 to have shape (4, 268, 1) but got array with shape (1, 270, 480)
EDIT:
Code for get_screen imported as env():
def get_screen(): img = screen.grab() img = img.resize(screen_size()) img = img.convert('L') img = np.array(img) return img

What algorithm is better for forecasting a sequence of numbers?
Dears, I have a list of numbers for the last 8 years and I need to forcast for the next 3 years. the only parameters I have is the year and the value. I'm planning to use ML.NET as my project is web app MVC 5.
Please help!

Transformer step/sec decrease over time to 0
Problem: Training steps/s while using a Transformer model repeatedly drops from 20 steps/s to <1 step/s.
This is internally reproducible. GPU usage plummets to ~0% during the periods at <1 step/sContext: We train with a Transformer model.
Training behaves poorly with Transformer, but works well with Universal Transformer.
In both scenarios, we use 4x P100, subword tokens (https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/text_encoder.py#L448), and MirroredStrategy Enabled distribution strategy.
(We observed the same behavior with 8x V100, as well.)
Ablation tests on 1) P100 versus V100, 2) MirroredStrategy versus t2t’s builtin multiGPU, and 3) Universal Transformer versus Transformer reveal that #3 is the driving variable.
We’re using tensor2tensor’s transformer / universal_transformer implementations. The hparams used are transformer_tiny and universal_transformer_tiny. One noteworthy deviation from common usage is that our hparams.max_length value is large (2500) and our batch size is often small, since our median sequence length is 750 tokens.
Current behaviour With Transformer, training run’s step/sec alternates between roughly 20 step/sec and 0.5 step/sec.
For the first 34 hours it is biased towards 20 steps/sec. For the next is ~2 it hours it begins dropping to 0.5 step/sec more frequently, before finally dropping almost exclusively to 0.5 step/sec.
If we restart our training process from a checkpoint that was made when the model ran slowly, the behaviour repeats itself, starting at 20 step/sec before dropping back down to 0.5.
Below are two graphs The first is a graph of our model’s step/sec degradation over a training run. The median value early on is ~20 (with some occasional drops to below 5 step/sec) but as the training continues our performance drops to almost 0 step/sec. Note the vertical axis is logarithmic
The second graph shows 3.25 consecutive runs of our model, where each restart picks up a checkpoint the previous run generated. These restarts were not caused an error, but are due to our system automatically preempting gpu intensive jobs after 24 hours. Note the consistent degradation in performance after every restart.
Expected behavior
Our runs with universal transformer exhibit a completely flat step/sec curve. The figure below shows the expected behaviour in red in terms of step/sec variance. Note that the model’s step/sec exhibit almost no variation except for the sharp drops attributed to evaluation steps.
Other info / logs Our GPU utilization (as measured by nvidiasmi) is tightly coupled with the above graph. Where our step/sec is high, our gpu utilization is nearly always at 50%, occasionally dropping to 0 for a second or two before shooting back up. When our step/sec consistently drops to below one, our gpu utilization is mostly at 0%. Every few minutes it will briefly shoot up to 50% and then drop to 0% a second later.
In terms of auc performance, our transformer model continues to improve even as the step/sec decay.
Hardware config info:
System information
 Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
Yes  OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ubuntu 18.04  TensorFlow installed from (source or binary):
Binary / pip install  TensorFlow version
1.13.1  Python version:
3.6.7  CUDA/cuDNN version:
CUDA = 10 CUDNN_VERSION 7.5
 GPU model and memory:
8X V100 16 GB
(We observed same behavior with CUDA 9.2 & TF 1.12 compiled for CUDA 9.2.) Docker Image
Related issues:
https://github.com/tensorflow/tensor2tensor/issues/1484 https://github.com/tensorflow/tensorflow/issues/26635

Regression model point estimation
I'd like to retrieve the values of a second order polynomial regression line based on a list of values for a parameter.
Here is the model:
fit < lm(y ~ poly(age, 2) + height + age*height)
I would like to use a list of values for age and retrieve the value on the regression line, as well as the standard deviation and standard errors. 'age' is a continuous variable, but I want to create an array of discrete values and return the predicted values from the regression line.
Example:
age < c(10, 11, 12, 13, 14)

quadratic regression in a loop
Following is my dataframe
data < data.frame(y = rep(1:10, times = 4), dataID = rep(1:4, each = 10),x1 = rnorm(40), x2 = rnorm(40), x3 = rnorm(40))
For each dataID and x combination, I am interested in calculating the Rsquared of linear regression between y and each x
variable < c("x1", "x2", "x3", "x4") for(v in seq_along(variable)){ varref < variable[v] temp < data %>% dplyr::select(y, dataID, varref) modID < sort(unique(temp$dataID)) for(m in seq_along(modID)){ modRef < modID[m] tempMod < temp %>% dplyr::filter(dataID == modRef) %>% dplyr::select(dataID) Rsq < summary(lm(y ~ ., data = tempMod))$adj.r.squared }
However what I really want to do is to regress the linear and nonlinear term. So I am wondering if there is any way to refer to the nonlinear term like this:
Rsq < summary(lm(y ~ . + I(.^2), data = tempMod))$adj.r.squared

Regression from feature combinations
I wish to use tensorflow to do regression from a set of different features an artifact can have. My input data will be a binary vector of features that the artifact can either have or not. The output will be a number. Thus a valid datapoint could be ([1,0,1,..,1], 102.5), thus this artifact have feature 0,1 and k and these features yielded an output of 102.5.
A simple example of a similar application would be ice cream cost. For example, the ice cream can be in a cone (x_0), be extra chilled or not (x_1), be purchased on a specific holiday (x_3) etc. These features would give a price (they can have combined factors on the price, for example; the specfic holiday may have a shortage of cones etc.). In this rather silly example, a ice cream that have a cone, is not purchased on the holiday and is extra chilled could cost 19.5, thus have the form ([1,1,0], 19.5).
How would one create such a regression network? The articles I have found uses feature vectors where only one of the values can have a 1.
Thank you in advance!