R tscv caret/glmnet error: missing values in resampled performance measures
I have a dataframe with 610 obs and 121 variables (fred_md) and i want to predict y with a Lasso regression by using a rolling origin forecast scheme. Some variables are transformed by: taking the log, first difference, logfirstdiff, second difference. My outcome variable y (logfirstdiff): has a mean of ~0.003 and a max value of ~0.018. As a first step I checked for missing values: none
sapply(fred_md, function(x) sum(is.na(x)))
when i apply the first iteration of the rolling origin:
timeSlices <- createTimeSlices(1:nrow(fred_md), initialWindow = 490, horizon = 1, fixedWindow = TRUE) trainSlices <- timeSlices[] testSlices <- timeSlices[] set.seed(42) t_grid = expand.grid(lambda=seq(0,1,by=0.1),alpha=1) plsFitTime <- train(CPIAUCSL ~., data = fred_md[trainSlices[],], method = "glmnet", tuneGrid = t_grid, family="gaussian") pred <- predict(plsFitTime,fred_md[testSlices[],])
I get the following output:
490 samples 120 predictors No pre-processing Resampling: Bootstrapped (25 reps) Summary of sample sizes: 490, 490, 490, 490, 490, 490, ... Resampling results across tuning parameters: lambda RMSE Rsquared MAE 0.0 0.001527792 0.7638545 0.001183233 0.1 0.003083116 NaN 0.002318999 0.2 0.003083116 NaN 0.002318999 0.3 0.003083116 NaN 0.002318999 0.4 0.003083116 NaN 0.002318999 0.5 0.003083116 NaN 0.002318999 0.6 0.003083116 NaN 0.002318999 0.7 0.003083116 NaN 0.002318999 0.8 0.003083116 NaN 0.002318999 0.9 0.003083116 NaN 0.002318999 1.0 0.003083116 NaN 0.002318999 Tuning parameter 'alpha' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were alpha = 1 and lambda = 0.
After checking which variables are used:
the model has no variables included (besides the intercept?).
i get the error: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.
(i) how can i fix this?
(ii) is there a way to store and/or plot the coefficient values/variable importance over the whole forecast origin schmeme (116 rolling windows)? I would like to identify the most important variables for predicting y.
Thanks for your help!