How I can calculate MAPE in r with actual e predicted values?
I'm trying to calculare MAPE in R but I have any problems. I have a dataset of food retails from 2017 to 2020 and I split this into training set e and test set. Now, after to calculate forecast value in this way:
tsData_train < ts(t1[,3], start= 2017, end= 2019, frequency=12)
tsData_test < ts(t2[,3], start=2019, end= 2020, frequency=12)
#choose best model with arima
a1 <auto.arima(tsData_train, max.d = 1, D=1)
fitA < arima(tsData_train, method="ML")
pred < predict(fitA,n.ahead = 4)
futurVal < forecast(a1 ,h=24)`
Please, could you tell me if the code is correct. I would like to calculate MAPE manually, my insigh is:
abs((actualpredicted)/predicted)
but I don't know which are actual and predicted. Help me, please.
1 answer

The MAPE is the Mean Absolute Persentage Error so it lacks a mean. Moreover :
MAPE = mean(abs((Y  Yhat)/Y))
With Y is actual or observed values and Yhat is the predicted values.
See also questions close to this topic

R error while running ARIMA for Time Series Forecasting
I am getting below error after running Arima Model. Please find below my code. Please help in resolving the same.
> model < arima(log(gas_train),order=c(8,1,2),seasonal = list(order=c(8,1,2),period=12)) Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, : nonfinite finitedifference value [17] In addition: Warning messages: 1: In log(s2) : NaNs produced 2: In log(s2) : NaNs produced 3: In log(s2) : NaNs produced 4: In log(s2) : NaNs produced 5: In log(s2) : NaNs produced

R, HPD (Highest posterior density) interval based on samples from posterior, WinBUGS
How to calculate HPD (Highest posterior density) interval from posterior samples? I have four parameters and i generate 1000 samples from posterior parameters distribution. Now How to calculate HPD in R software. I used package code But I got an error that
HPDinterval(winbugsresult$sims.list,prob=0.05) Error in UseMethod("HPDinterval") : no applicable method for 'HPDinterval' applied to an object of class "list"
where "winbugsresult" is a list that contains posterior samples.
I also used a vector I got following error
HPDinterval(winbugsresult$sims.list$alpha ,prob=0.05) Error in UseMethod("HPDinterval") : no applicable method for 'HPDinterval' applied to an object of class "c('double', 'numeric')"
I used just a random vector from normal and i got error again
HPDinterval(rnorm(100)) Error in UseMethod("HPDinterval") : no applicable method for 'HPDinterval' applied to an object of class "c('double', 'numeric')"

If statement with employees data
I have one data set.Which contain data about employees in company.You can see data below:
#Data output_test<data.frame( Employees=c(1,2,3,10,15,122,143,150,250,300,500,1000) )
So next steep should be classification. I need to classify Employees by size of company.Rule is that every number of Employees determine size of company.For example if number is below 10 that meaning that is "micro" company, if number is greater then 10 but below or equal to 50 company is "small" company.For "medium" company number of Employees is greater then 50 but equal or small to 250 and last is "large" company which have Employees greater then 250. In order to do this i wrote this line of code whit IF else statment
# Code library(dplyr) output_test_final<output_test%>% mutate( Size= if(Employees>=10){ "Micro" } else { if(Employees>=50){ "Small" } else { if(Employees>=250){ "Medium" } else { "Large" } } } )
So results from this code are not good.So can anybody help me how to fix this code and get table like table below ?

Pandas: Combining Actual and Prediction Values
I’m trying to combine the actual target values and predicted target value as a dataframe. However, I’m getting the following error. Not sure why this is happening.
a = pd.DataFrame(y_test, columns=['Actual']) b = pd.DataFrame(final_model.predict(X_test), columns=['Predictions']) c = pd.concat([a, b]) c.head()

Accuracy is lower than f1score for imbalanced data
For a binary classification, I have a dataset with 55% negative label and 45% positive labels.
The results of the classifier shows that the accuracy is lower than the f1score. Does that mean that the model is learning the negative instances much better than the positive ones?

how to present the prediction together with original data using RandomForestRegressor?
I am trying to predict a label from data in a DataFrame, with RandomForestRegressor.
For that I first remove useless columns, so that the regressor does not try to use them, especially row's ID, then use a get_dummy() function to change string values into indicators, and then split the data into training and test samples.
# columns selection (let say there was also a column 'ID' so we drop this one) features = features[['L', 'A', 'B']] # string to indicators features = pd.get_dummies(features) # Saving labels labels = np.array(features['L']) # Remove the labels from the features features = features.drop('L', axis = 1) # Convert to numpy array features = np.array(features) # Divide into training and testing samples train_features, test_features, train_labels, test_labels = train_test_split(features, labels, test_size = 0.33, random_state = 42) # Instantiate model and fit rf = RandomForestRegressor(n_estimators = 100, random_state = 42, max_depth = 8) rf.fit(train_features, train_labels) # predict predictions = rf.predict(test_features)
So at this stage I have a sample data looking like
A B_b1 B_b2 1 0 1 2 1 0
And predictions looking like L 100 200
How can I, after getting the prediction, put it next to the original data, given the link with ID is lost? I expect something like:
ID A B L 11 1 b2 100 12 2 b1 200
I can think of complex ways (mainly because of the conversion from pd.dataframe to np.array), but what would be the most straight forward and readable (not most efficient) way? Thank you!

How to fit an ARIMA model with noncomplete years data in R?
How i can fit
ARIMA
model to the following filteredyear/month
Flow
data? that is fit the model usingJanuary to July 10
data for the chosen years (ie., 2012,2015,2017,2020) and then use the fitted model toforecast
July 11August 31.library(forecast) library(tidyverse) library(dplyr) set.seed(1500) FakeData < data.frame(Date = seq(as.Date("20100101"),to = as.Date("20200710"), by = "day"), Flow = runif(length(Date), 25, 75)) %>% mutate( Year = year(Date), Month = month(Date), Day = day(Date)) %>% filter(Year %in% c(2012,2015,2017,2020)) %>% filter(between(Month, 1,7)) %>% filter(!c(Month == 07 & Day >= 11)) #convert to Timeseries data TsData < ts(FakeData$Flow) # how to provide start and end date that only goes to up to July 10 of the filtered year? # find the best ARIMA model AA < auto.arima(TsData) # fit and forecast with the model ModelFit < forecast(AA, h = 51) # I want to forecast for July 11 to August 31 of 2020

Changing Training Period in Arima Model in R
Arima forecasts in R seem to change depending on the amount of data I input to the Arima() function. For example, I have a 30 days worth of sales data. If I input days 120 and forecast day 21, I get a different forecast than if I input days 1020. This doesn't make sense to me because an Arima(p,d,q) model only looks at the previous p,q days data. So why would a forecast change if I change the amount of data I input into the function?

Forecasting using GRNN (Generalized Regression Neural Network) in Matlab
Does Anyone know? I have a problem when using GRNN (Generalized Regression Neural Network) in Matlab. I use this neural network for forecasting of energy consumption. I have data about energy consumption from 19902015. The trend is going up. The most recent year is, the highest score is,. The pattern is 1 output and 3 input. when I predict for 2016, 2017, 2018, the value should always go up from before. But the all 3 latest year have same value of energy consumption with 2015. Can anyone solve this problem?

Fitting ARIMA model for nonsequential dates in R?
I want to fit an ARIMA model for the
filtered
year and use it toforecast
Flow for theJuly 11August 31
period for the year 2020. Any help would be appreciated.library(forecast) library(tidyverse) library(lubridate) DF < data.frame(Date = seq(as.Date("20100101"),to = as.Date("20200710"), by = "day"), Flow = runif(length(Date), 25, 75)) %>% mutate( Year = year(Date), Month = month(Date), Day = day(Date)) %>% filter(Year %in% c(2012,2015,2017,2020))

Converting log back to exponential form python
Date Value 20180217 2202 20180218 2449 20180219 2409 20180220 2364 20180221 2306 20180222 2492 20180223 2300 20180224 2359 20180225 2481 20180226 2446
Hello, I am using the dataset above in Python. I am using the ARIMA model to forecast the data. Everything works perfectly except the forecasted data is in the logarithmic form. I need the data in exponential form. But, the array of the forecast is showing log numbers. How do I change the array of the forecast showing log numbers back to exponential form?

Forecast with ARIMA model using specific months and years in R?
I want to train my model using specific years and months (which are not a continuous sequence). Then use the fitted model coefficient along with data from
July 10 to August 31
of theyear 2017
to forecastYr20
July 10 to August 31
Flow
datalibrary(tidyverse) library(dplyr) library(lubridate) library(tseries) library(forecast) set.seed(1500) DF < data.frame(Date = seq(as.Date("20100101"), to = as.Date("20181231"), by = "day"), Flow = runif(3287,25,75)) %>% mutate(Year = year(Date), Month = month(Date), Day = day(Date), JDay = yday(Date)) %>% filter(Year %in% c(2011,2012,2015,2017)) %>% filter(between(Month, 7,8)) Yr20 < data.frame(Date = seq(as.Date("20200101"), to = as.Date("20200709"), by = "day"), Flow = runif(191,20,60)) # acf(DF$Flow) AA < auto.arima(DF$Flow) # this will gave me order of the model fitModel < arima(DF$Flow, order = c(1,1,1)) # fit the model #Forecast_flow < forecast(fitModel, h = 10) # just a rough example using forecast function