How can I extract transition matrix for nonhomogeneous HMM model built with depmixS4?
I created the following nonhomogeneous Hidden Markov Model using depmix:
#primary time series
datats<ts(data$primary)
y< as.numeric(datats)
#Preparing covariates
cov1ts<ts(data$cov1)
x1< as.numeric(cov1ts)
cov2ts<ts(data$cov2)
x2< as.numeric(cov2ts)
#Build model
hmm_model < depmix(y~1, data = data.frame(y), nstates = 2, transition = ~ (scale(x1)+scale(x2)))
hmm_model <fit(hmm_model)
summary(hmm_model)
I now want to make a prediction about the next state. In the past I did this using homogeneous HMM as explained in this post: How to predict outofsample observations with depmixS4 package in R?
Specifically, in my case I did:
#[...] Created homogeneous model "hom_model" like before but without transition parameter
#transition probabilities at states
mat1<hom_model@transition[[1]]@parameters$coefficients
mat1<hom_model@transition[[2]]@parameters$coefficients
#transition matrix
transmat<rbind(mat1, mat2)
# prediction as described in post, not very relevant for this question
But now for nonhomogeneous hmm, I cannot obtain the transition matrix in the same way because now when I obtain mat1 and mat2, I get the coefficients of the covariates and intercept for each state. Specifically, my output for mat1 in the nonhom case looks like this:
St1 St2
(Intercept) 0 0.6704946
scale(x1) 0 1.7279190
scale(x2) 0 2.0905961
I am unsure on how to obtain the transition matrix for the nonhomogeneous case, and also a bit confused as why the State 1 coefficients are all 0.
Thank you
do you know?
how many words do you know
See also questions close to this topic

pivot_wider does not keep all the variables
I would like to keep the variable
cat
(category) in the output of my function. However, I am not able to keep it. The idea is to apply a similar function tom < 1  (1  se * p2)^df$n
based on the category. But in order to perform that step, I need to keep the variable category.Here's the code:
#script3 suppressPackageStartupMessages({ library(mc2d) library(tidyverse) }) sim_one < function() { df<data.frame(id=c(1:30),cat=c(rep("a",12),rep("b",18)),month=c(1:6,1,6,4,1,5,2,3,2,5,4,6,3:6,4:6,1:5,5),n=rpois(30,5)) nr < nrow(df) df$n[df$n == "0"] < 3 se < rbeta(nr, 96, 6) epi.a < rpert(nr, min = 1.5, mode = 2, max = 3) p < 0.2 p2 < epi.a*p m < 1  (1  se * p2)^df$n results < data.frame(month = df$month, m, df$cat) results %>% arrange(month) %>% group_by(month) %>% mutate(n = row_number(), .groups = "drop") %>% pivot_wider( id_cols = n, names_from = month, names_glue = "m_{.name}", values_from =m ) } set.seed(99) iters < 1000 sim_list < replicate(iters, sim_one(), simplify = FALSE) sim_list[[1]] #> # A tibble: 7 x 7 #> n m_1 m_2 m_3 m_4 m_5 m_6 #> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 0.970 0.623 0.905 0.998 0.929 0.980 #> 2 2 0.912 0.892 0.736 0.830 0.890 0.862 #> 3 3 0.795 0.932 0.553 0.958 0.931 0.798 #> 4 4 0.950 0.892 0.732 0.649 0.777 0.743 #> 5 5 NA NA NA 0.657 0.980 0.945 #> 6 6 NA NA NA 0.976 0.836 NA #> 7 7 NA NA NA NA 0.740 NA
^{Created on 20220507 by the reprex package (v2.0.1)}

calculate weighted average over several columns with NA
I have a data frame like this one:
ID duration1 duration2 total_duration quantity1 quantity2 1 5 2 7 3 1 2 NA 4 4 3 4 3 5 NA 5 2 NA
I would like to do a weighted mean for each subject like this:
df$weighted_mean< ((df$duration1*df$quantity1) + (df$duration2*df$quantity2) ) / (df$total_duration)
But as I have NA, this command does not work and it is not very nice....
The result would be this:
ID duration1 duration2 total_duration quantity1 quantity2 weighted_mean 1 5 2 7 3 1 2.43 2 NA 4 4 3 4 4 3 5 NA 5 2 NA 2
Thanks in advance for the help

I am to extract data from netCDF file using R for specific loaction the code i've written as showen and I have an error at the end of the code
I need some help with extracting date from NetCDF files using R , I downloaded them from cordex (The Coordinated Regional climate Downscaling Experiment). In total I have some files. This files have dimensions of (longitude, latitude, time) and the variable is maximum temperature (tasmax). At specific location, I need to extract data of tasmax at different time. In total I have some files. This files have dimensions of (longitude, latitude, time) and variable maximum temperature (tasmax). At specific location, I need to extract data of tasmax at different time.I wrote the code using R but at the end of code, an error appeared. Error ( location subscript out of bounds)
getwd() setwd("C:/Users/20120/climate change/rcp4.5/tasmax")
dir() library ("ncdf4") libra,ry(ncdf4.helpers) library ("chron") ncin < nc_open("tasmax_AFR44_ICHECECEARTH_rcp45_r1i1p1_KNMIRACMO22T_v1_mon_200601201012.nc") lat < ncvar_get(ncin, "lat") lon < ncvar_get(ncin, "lon") tori < ncvar_get(ncin, "time") title < ncatt_get(ncin,0,"title") institution < ncatt_get(ncin,0,"institution") datasource < ncatt_get(ncin,0,"source") references < ncatt_get(ncin,0,"references") history < ncatt_get(ncin,0,"history") Conventions < ncatt_get(ncin,0,"Conventions") tustr < strsplit(tunits$value,"") ncin$dim$time$units ncin$dim$time$calendar tas_time < nc.get.time.series(ncin, v = "tasmax", time.dim.name = "time") tas_time[c(1:3, length(tas_time)  2:0)] tmp.array < ncvar_get(ncin,"tasmax") dunits < ncatt_get(ncin,"tasmax","units") tmp.array < tmp.array273.15 tunits < ncatt_get(ncin,"time","units") nc_close(ncin) which.min(abs(lat28.9)) which.min(abs(lon30.2)) tmp.slice < tmp.array[126,32981,] tmp.slice
Error in tmp.array[126, 32981, ] : subscript out of bounds

Do I inverse transform my predictions and test dataset before measuring a model's performance?
I've created a toy example of timeseries forecasting with a series [1, 2, 3, ..., 999, 1000]. I split the series into training (2/3) and testing (1/3) sets, and transformed the training set with scikit's MinMaxScaler.
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense, Dropout from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error import numpy as np import pandas as pd # define dataset series = pd.DataFrame([i+1 for i in range(1000)]) train = series[:int(len(series)*0.67)] test = series[int(len(series)*0.67):] # Scale scaler = MinMaxScaler() trainNorm = scaler.fit_transform(train) testNorm = scaler.transform(test) # TimeSeriesGenerator takes a funny shape and I don't know why trainNorm = np.array(trainNorm).reshape(len(trainNorm)) testNorm = np.array(testNorm).reshape(len(testNorm))
I use TimeSeriesGenerator to convert the training set into a lagged training set according to the number of time steps I desire. I also construct a simple neural network.
# Number of steps to "look back" for forecasting n_input = 5 # define generator generator = TimeseriesGenerator(trainNorm, trainNorm, length=n_input, batch_size=795) # define model model = Sequential() model.add(Dense(64, activation='relu', input_dim=(n_input))) model.add(Dropout(0.2)) model.add(Dense(64)) model.add(Dropout(0.2)) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') model.summary() # fit model model.fit_generator(generator, steps_per_epoch=1, epochs=200, verbose=2)
I then created a list of predictions to compare to the test set. I also use the training set to perform walkforward validation.
"""This section creates a list of predictions and performs walkforward validation.""" preds = [] history = [x for x in trainNorm] # step over each timestep in the test set for i in range(len(testNorm)): # Forecast for predictions x_input = np.array(history[n_input:]).reshape((1, n_input)) y_pred = model.predict(x_input, verbose=0) # store forecast in list of predictions # and add actual observation to history for the next loop preds.append(y_pred[0]) history.append(testNorm[i]) # Reverse normalization to original values for scoring preds = scaler.inverse_transform(preds) history = np.array(history).reshape(1, 1) history = scaler.inverse_transform(history) test = np.array(test) # estimate prediction error mse = mean_squared_error(test, preds) predError = np.sqrt(mse) print(f"Mean Square Error: {round(mse, 2)}") print(f"Root Mean Square Error: {round(predError, 2)}")
I think my model trains properly and my scoring seems accurate, but I'm not sure. My questions concern the latter part of my code. I'm not sure when to introduce an inverse transformation for scoring my model, or whether I even do so in the first place.
Can I score my model without the inverse transformation? If I do need an inverse transformation, do I do it before the code for scoring and after the code for the walkforward validation loop? Did I code the inverse transformation and reshape my data properly? I'd just like to know whether I'm on the right track with how to do things with my toy model.
 ADF vs KPSS test for stationarity and ACF vs PACF

How can you graph multiple overlapping 18 month periods with daily data?
I am doing an exploratory data analysis for data that is collected at the daily level over many years. The relevant time period is about 18  20 months from the same date each year. What I would like to do is visually inspect these 18 month periods one on top of the other. I can do this as below by adding data for each geom_point() call. I would like to avoid calling that one time for each period
min ex:
library(tidyverse) minex < data.frame(dts = seq((mdy('01/01/2010')), mdy('11/10/2013'), by = 'days')) minex$day < as.numeric(minex$dts  min(minex$dts)) minex$MMDD < paste0(month(minex$dts), "", day(minex$dts)) minex$v1 < 20 + minex$day^0.4 cos(2*pi*minex$day/365) + rnorm(nrow(minex), 0, 0.3) ggplot(filter(minex, dts %in% seq((mdy('11/10/2013')  (365 + 180)), mdy('11/10/2013'), by = 'days')), aes(day, v1)) + geom_point() + geom_point(data = filter(minex, dts %in% seq((mdy('11/10/2012')  (365 + 180)), mdy('11/10/2012'), by = 'days')), aes(day+365, v1), color = 'red')

Prediction for package delivery to summarize packages
I have the problem, that a customer can buy something. Now I want to predict if the customer is buying another things in the next few days. So that you can summarize the packages and delivery not every package individually. Has someone read a similar issue? My data looks like
customerid order article mandatoryDeliveryDate 1 03.05 Shoes 05.05 1 04.05 Paper 05.05 2 10.04 PS5 11.04 2 12.04 Laptop 16.04 3 28.04 Clock 30.04 3 [XXX]
What I want to predict, if
customer 3
is buying something in the next few days. Is it possible to predict if thecustomer 3
is buying something in the next few days taking into account the mandatory delivery date? And can you also give the probablity of these prediction?Is there also any blog/paper/jupyter notebook or anything else where someone implemented a similar prediction for package delivery.

How my LSTM model knows about testing data and simply cheats previous values/patterns?
I have EncoderDecoder LSTM model that learns to predict 12 months data in advance, while looking back 12 months. If it helps at all, my dataset has around 10 years in total (120 months). I keep 8 years for training/validation, and 2 years for testing. My understanding is that my model does not have access to the testing data at the training time.
The puzzling thing is that my model predictions are simply a shift of previous points. But how did my model know the actual previous points at the time of prediction? I did not give the monthly values in the testing set to the model! If we say that it simply copies the previous point which you give as input, then I am saying that I am giving it 12 months with completely different values than the ones it predicts (so it does not copy the 12 months I am giving), but the forecasted values are shifts of actual ones (which have never been seen).
Below is an example:
My code source is from here:
Below is my code:
#train/test splitting split_position=int(len(scaled_data)*0.8)# 8 years for training train=scaled_data[0:split_position] test=scaled_data[split_position:] #print(train) print('length of train=',len(train)) #print(test) print('length of test=',len(test)) # split train and test data into yearly train/test sets (3d)[observation,year, month] def split_data_yearly(train, test): # restructure into windows of yearly data train = array(split(train, len(train)/12)) test = array(split(test, len(test)/12)) return train, test # evaluate one or more yearly forecasts against expected values def evaluate_forecasts(actual, predicted): scores = list() # calculate an RMSE score for each day for i in range(actual.shape[1]): # calculate mse mse = mean_squared_error(actual[:, i], predicted[:, i]) # calculate rmse rmse = math.sqrt(mse) # store scores.append(rmse) # calculate overall RMSE s = 0 for row in range(actual.shape[0]): for col in range(actual.shape[1]): s += (actual[row, col]  predicted[row, col])**2 score = math.sqrt(s / (actual.shape[0] * actual.shape[1])) ################plot prediction vs actual############################### predicted=predicted.reshape(predicted.shape[0],predicted.shape[1]) jump=12 inv_scores = list() for i in range(len(predicted)): sample_predicted = predicted[i,:] sample_actual=actual[i,:] #inverse normalization sample_predicted_inv= scaler.inverse_transform(sample_predicted.reshape(1, 1)) sample_actual_inv= scaler.inverse_transform(sample_actual.reshape(1, 1)) #print(sample_actual_inv) #print(data_sd[(split_position+(i*jump)1):(split_position+(i*jump1))+len(sample_actual_inv)]) #inverse differencing s=numpy.array(smoothed).reshape(1,1) sample_actual_inv=sample_actual_inv+s[(split_position+(i*jump)):(split_position+(i*jump))+len(sample_actual_inv)] sample_predicted_inv=sample_predicted_inv+s[(split_position+(i*jump)):(split_position+(i*jump))+len(sample_actual_inv)] months=['August'+str(19+i),'September'+str(19+i),'October'+str(19+i),'November'+str(19+i),'December'+str(19+i),'January'+str(20+i),'February'+str(20+i),'March'+str(20+i),'April'+str(20+i),'May'+str(20+i),'June'+str(20+i),'July'+str(20+i)] pyplot.plot( months,sample_actual_inv,'b',label='Actual') pyplot.plot(months,sample_predicted_inv,'', color="orange",label='Predicted') pyplot.legend() pyplot.xticks(rotation=25) pyplot.title('Encoder Decoder LSTM Prediction', y=1.08) pyplot.show() ################### determine RMSE after inversion ################################ mse = mean_squared_error(sample_actual_inv, sample_predicted_inv) rmse = math.sqrt(mse) inv_scores.append(rmse) return score, scores,inv_scores # summarize scores def summarize_scores(name, score, scores): s_scores = ', '.join(['%.1f' % s for s in scores]) print('%s: [%.3f] %s' % (name, score, s_scores)) # convert history into inputs and outputs def to_supervised(train, n_input, n_out=12): # flatten data data = train.reshape((train.shape[0]*train.shape[1], train.shape[2])) X, y = list(), list() in_start = 0 # step over the entire history one time step at a time for _ in range(len(data)): # define the end of the input sequence in_end = in_start + n_input out_end = in_end + n_out # ensure we have enough data for this instance if out_end <= len(data): X.append(data[in_start:in_end, :]) y.append(data[in_end:out_end, 0]) # move along one time step in_start += 1 return array(X), array(y) # train the model def build_model(train, n_input): # prepare data train_x, train_y = to_supervised(train, n_input) #take portion for validation val_size=12; test_x,test_y=train_x[val_size:], train_y[val_size:] train_x,train_y=train_x[0:val_size],train_y[0:val_size] # define parameters verbose, epochs, batch_size = 1,25, 8 n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1] # reshape output into [samples, timesteps, features] train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1)) # define model model = Sequential() model.add(LSTM(64, activation='relu', input_shape=(n_timesteps, n_features))) model.add(RepeatVector(n_outputs)) model.add(LSTM(64, activation='relu', return_sequences=True)) model.add(TimeDistributed(Dense(100, activation='relu'))) model.add(TimeDistributed(Dense(1))) #sgd = optimizers.SGD(lr=0.004, decay=1e6, momentum=0.9, nesterov=True) model.compile(loss='mse', optimizer='adam') # fit network train_history= model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, validation_data=(test_x, test_y),verbose=verbose) loss = train_history.history['loss'] val_loss = train_history.history['val_loss'] pyplot.plot(loss) pyplot.plot(val_loss) pyplot.legend(['loss', 'val_loss']) pyplot.show() return model # make a forecast def forecast(model, history, n_input): # flatten data data = array(history) data = data.reshape((data.shape[0]*data.shape[1], data.shape[2])) # retrieve last observations for input data input_x = data[n_input:, :] # reshape into [1, n_input, n] input_x = input_x.reshape((1, input_x.shape[0], input_x.shape[1])) # forecast the next year yhat = model.predict(input_x, verbose=0) # we only want the vector forecast yhat = yhat[0] return yhat # evaluate a single model def evaluate_model(train, test, n_input): # fit model model = build_model(train, n_input) # history is a list of yearly data history = [x for x in train] # walkforward validation over each year predictions = list() for i in range(len(test)): # predict the year yhat_sequence = forecast(model, history, n_input) # store the predictions predictions.append(yhat_sequence) # get real observation and add to history for predicting the next year history.append(test[i,:]) # evaluate predictions days for each year predictions = array(predictions) score, scores, inv_scores = evaluate_forecasts(test[:, :, 0], predictions) return score, scores,inv_scores # split into train and test train, test = split_data_yearly(train, test) # evaluate model and get scores n_input = 12 score, scores, inv_scores = evaluate_model(train, test, n_input) # summarize scores summarize_scores('lstm', score, scores) print('RMSE score after inversion:',inv_scores) # plot scores months=['July','August','September','October','November','December','January','February','March','April','May','June'] #pyplot.plot(months, scores, marker='o', label='lstm') #pyplot.show()

CNN Model classification prediction result problem
I have a model that distinguishes between cat and dog, trained with mobile net once and efficientnet again. The problem arises when test the model with prediction, where the prediction output with the Mobile Net is correct(0.9 0.1) for cat and (0.1 0.9) for dog.while result in efficientnet (0.9 0.1) for cat and (0.8 0.2) for dog.the change is about 0.1 What i do pls

Markovswitching dynamic factor model (MSDFM) for Recession Probabilities / Python
I am searching for a good code example in Python for the calculation of the recession probabilities of the Markovswitching dynamic factor model (MSDFM) by Chauvet / Piger which are also publicated by the FRED database: https://fred.stlouisfed.org/series/RECPROUSM156N . I want to implement the same methodology for different countries and searching for a code to calculate it.
Does anyone have an idea where to look for this?

Hidden Markov Model for sequential data
Is it possible to predict next digit in a sequence using an hmm? For example I have the dataset as follows: [0, 3, 4, 1] [1, 3, 4, 2] etc.
After training my model with the data described above I want to be able to predict the next digit for every number that I get as input. i.e if 0 is the input I want to know how possible it is to get 1 or 2 or 3... after that. Also I don't know the transition matrix and the probabilities of each next state from the start. I guess I have to compute them too.