Why does RMSE increase with horizon when using the timeslice method in caret's trainControl function?
I'm using the timeslice
method in caret's trainControl
function to perform crossvalidation on a time series model. I've noticed that RMSE increases with the horizon
argument.
I realise this might happen for several reasons, e.g., if explanatory variables are being forecast and/or there's autocorrelation in the data such that the model can better predict nearer vs. farther ahead observations. However, I'm seeing the same behaviour even when neither is the case (see trivial reproducible example below).
Can anyone explain why RSMEs are increasing with horizon
?
# Make data
X = data.frame(matrix(rnorm(1000 * 3), ncol = 3))
X$y = rowSums(X) + rnorm(nrow(X))
# Iterate over different different forecast horizons and record RMSES
library(caret)
forecast_horizons = c(1, 3, 10, 50, 100)
rmses = numeric(length(forecast_horizons))
for (i in 1:length(forecast_horizons)) {
ctrl = trainControl(method = 'timeslice', initialWindow = 500, horizon = forecast_horizons[i], fixedWindow = T)
rmses[i] = train(y ~ ., data = X, method = 'lm', trControl = ctrl)$results$RMSE
}
print(rmses) #0.7859786 0.9132649 0.9720110 0.9837384 0.9849005
See also questions close to this topic

Why does it take longer to train my Elastic Net model with caret vs glmnet?
I am fitting an Elastic Net model to a very wide matrix. I like the preprocessing functions in
caret
but I have found that it takes about 5 times longer to train that if I just useglmnet
. Why?# Sample data. set.seed(123) trainX < replicate(1000, rnorm(30)) colnames(trainX) < paste0("var", 1:1000) trainOutcome < gl(2, 15) # Train model using glmnet. alpha_to_test < seq(0, 1, 0.1) system.time({ sapply(alpha_to_test, function(a) { fit < glmnet::cv.glmnet( x = trainX, y = as.numeric(trainOutcome), alpha = a, nfolds = nrow(trainX) # LOOCV ) }) }) # 7.272s # Train model using caret using same search space. fit < glmnet::cv.glmnet(trainX, y = as.numeric(trainOutcome)) lambda_to_test < fit$lambda grid < expand.grid(alpha = alpha_to_test, lambda = lambda_to_test) system.time({ fit < train( x = trainX, y = trainOutcome, method = "glmnet", trControl = trainControl(method = "loocv", selectionFunction = "oneSE"), tuneGrid = grid ) }) # 45.316s

Caret model with custom cost function
I want to train an Elastic Net classifier with a caseweighted error function. The function needs to accept an extra argument
weights
that allows different observations to carry different weights when calculating performance error. Is this possible to do in Caret?# Weighted RMSE function with weights argument. weighted_rmse < function(prediction, target, weights) { sum(abs(prediction  target)) * weights } # Example data. trainX < iris[1:100, 1:4] trainOutcome < as.numeric(iris$Species[1:100]) fit < train( x = trainX, y = trainOutcome, method = "glmnet", trControl = trainControl( method = "loocv", selectionFunction = "oneSE", summaryFunction = twoClassSummary # < replace with something like weighted_rsme() ) )

R tscv caret/glmnet error: missing values in resampled performance measures
I have a dataframe with 610 obs and 121 variables (fred_md) and i want to predict y with a Lasso regression by using a rolling origin forecast scheme. Some variables are transformed by: taking the log, first difference, logfirstdiff, second difference. My outcome variable y (logfirstdiff): has a mean of ~0.003 and a max value of ~0.018. As a first step I checked for missing values: none
sapply(fred_md, function(x) sum(is.na(x)))
when i apply the first iteration of the rolling origin:
timeSlices < createTimeSlices(1:nrow(fred_md), initialWindow = 490, horizon = 1, fixedWindow = TRUE) trainSlices < timeSlices[[1]] testSlices < timeSlices[[2]] set.seed(42) t_grid = expand.grid(lambda=seq(0,1,by=0.1),alpha=1) plsFitTime < train(CPIAUCSL ~., data = fred_md[trainSlices[[1]],], method = "glmnet", tuneGrid = t_grid, family="gaussian") pred < predict(plsFitTime,fred_md[testSlices[[1]],])
I get the following output:
490 samples 120 predictors No preprocessing Resampling: Bootstrapped (25 reps) Summary of sample sizes: 490, 490, 490, 490, 490, 490, ... Resampling results across tuning parameters: lambda RMSE Rsquared MAE 0.0 0.001527792 0.7638545 0.001183233 0.1 0.003083116 NaN 0.002318999 0.2 0.003083116 NaN 0.002318999 0.3 0.003083116 NaN 0.002318999 0.4 0.003083116 NaN 0.002318999 0.5 0.003083116 NaN 0.002318999 0.6 0.003083116 NaN 0.002318999 0.7 0.003083116 NaN 0.002318999 0.8 0.003083116 NaN 0.002318999 0.9 0.003083116 NaN 0.002318999 1.0 0.003083116 NaN 0.002318999 Tuning parameter 'alpha' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were alpha = 1 and lambda = 0.
After checking which variables are used:
plsFitTime$beta NULL
the model has no variables included (besides the intercept?).
i get the error: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.
(i) how can i fix this?
(ii) is there a way to store and/or plot the coefficient values/variable importance over the whole forecast origin schmeme (116 rolling windows)? I would like to identify the most important variables for predicting y.
Thanks for your help!

Sagemaker: Training everytime I need make a prediction: how should I structure the solution?
I have been asked to migrate a custom model to Sagemaker. This model is a forecasting script that trains everytime it is run and then predicts after training. (It is a twolayer forecasting prediction with SARIMAX). The flow is as explained below:
 train arima model to get exogenous variables (training algorithm 1)
 predict with that trained model
 use the output variables to train the second layer (training algorithm 2)
 predict with this last trained model and output the solution
This is not what im used to do in Sagemaker (I train a model once that will be invoked multiple times), so how could I frame this? Train models separately from two separate docker images and create two endpoints? The whole trainpredicttrainpredict workflow would no longer be automatic, right? How would I trigger this workflow? Please help!

how to convert this php native to laravel code ? this code use moving average method, is there anyone who can help solve my problem?
hello guys have a project and I'm having trouble creating my project. I have native PHP code and I can't change it to the Laravel programming language. My program uses the moving average method and for forecasting. Please help me how to fix it. I use the laravel framework.
<?php $tahun = $_POST['tahun']; **$rows = mysqli_query($db,"SELECT tahun,nm_brg,dt_actual FROM tb_actual tc INNER JOIN tb_barang brg ON tc.kd_brg = brg.kd_brg INNER JOIN tb_tahun thn ON thn.id_tahun = tc.id_tahun WHERE thn.tahun <= '$tahun' AND brg.kd_brg = '$_GET[id]'");** while ($rw = mysqli_fetch_object($rows)){ $TAHUN[$rw>tahun]=$rw>dt_actual; } $chr = mysqli_query($db,"SELECT * FROM tb_tahun"); while ($chrs = mysqli_fetch_object($chr)){ if(!empty($TAHUN[$chrs>tahun])){ $DATA[$chrs>tahun]=$TAHUN[$chrs>tahun]; }else{ $DATA[$chrs>tahun]=0; } } $rwx = mysqli_query($db,"SELECT * FROM tb_tahun"); $start=1; $dg = 0; $sum=0; while ($row = mysqli_fetch_object($rwx)): if($tahun>=$row>tahun): ?> <tr> <td><?=$start++?></td> <td><?=$row>tahun?></td> <td> <?php if(!empty($DATA[$row>tahun])) { echo round ($DATA[$row>tahun]); } else { echo 0; } ?> </td> <td><?php if(!empty($DATA[$row>tahun2])){ if ($DATA[$row>tahun2]) { $awal = $DATA[$row>tahun2]; $akhir = $DATA[$row>tahun1]; $hsl = ($awal + $akhir)/2; echo "(".$awal." + ".$akhir.") / 2 = ".$hsl; $_SESSION['angka'][$row>tahun]=$hsl; }else{ echo ""; } }else{echo "";}?> </td> <td><?php if(!empty($DATA[$row>tahun2])){ if(!empty($DATA[$row>tahun]) && $DATA[$row>tahun]>0){ echo $DATA[$row>tahun]$hsl; } }else{echo "";}?> </td> <td><?php if(!empty($DATA[$row>tahun2])){ $ses = $DATA[$row>tahun]$hsl; if(!empty($DATA[$row>tahun])){ if($ses<0){ echo $ses*1; $sum = $sum + $ses*1; }else{ echo $ses; $sum = $sum + $ses; } $dg++; } }else{echo "";}?> </td> </tr> <?php endif;endwhile;?>
this is my php native code. i dont know how to convert this code to laravel. it will look like this

How I can calculate MAPE in r with actual e predicted values?
I'm trying to calculare MAPE in R but I have any problems. I have a dataset of food retails from 2017 to 2020 and I split this into training set e and test set. Now, after to calculate forecast value in this way:
tsData_train < ts(t1[,3], start= 2017, end= 2019, frequency=12) tsData_test < ts(t2[,3], start=2019, end= 2020, frequency=12) #choose best model with arima a1 <auto.arima(tsData_train, max.d = 1, D=1) fitA < arima(tsData_train, method="ML") pred < predict(fitA,n.ahead = 4) futurVal < forecast(a1 ,h=24)`
Please, could you tell me if the code is correct. I would like to calculate MAPE manually, my insigh is:
abs((actualpredicted)/predicted)
but I don't know which are actual and predicted. Help me, please.