BEKK model simulation in R
I have been working with a BEKK(1,1) model with dimension 3,4, and 5, for a time series analysis. I was given the feedback to include a simulation study. In order to trust the results that I obtain, I want to, via simulations, show that the estimation of the BEKK model parameters works also well for the sample sizes considered in the paper. I want to show that the distributional theory can be applied for my sample size.
I want to investigate if the sample size is enough to apply the asymptotic results?
Method:
I wish to generate data from the fitted model in the case of the dimension 3, with the sample size 3000. Estimate parameter by fitting BEKK model to the generated data sets and repeat this step, say 10000 times. Then I obtain 10000 estimators for each parameter for which the sampling distribution can be constructed and then it has to be compared with the asymptotic distribution.
Then repeat this procedure for dimension 4 and 5.
#I've been using the mgarchBEKK package when creating my BEKK models.
#The package provides the example below as help for simulation:
## Simulate series:
simulated < simulateBEKK(2, 1000, c(1,1))
## Prepare the matrix:
simulated < do.call(cbind, simulated$eps)
## Estimate with default arguments:
estimated < BEKK(simulated)
I'm not a master in R by any means. So I'm not quite sure how to code the procedure that I describe above.
Any help is greatly apprecieated :)
See also questions close to this topic

Plot data for each row within a single command
I'm an R newbie and need help with the following.
I have the following data
# Simulate matrix of integers set.seed(1) df < matrix(sample.int(5, size = 3*5, replace = TRUE), nrow = 3, ncol = 5) print(df) df < tbl_df(df) # tabulate as dataframe df < rbind(df, c(3,5,4,1,4)) print(df)
Within a single command, I need to plot the data for each row, so that yaxis: data in each row (in my case these are values from 1 to 5); xaxis: values 1,2,3,4,5 that refer to each column. So effectively, for each row, I am trying to plot how row values change for every single column.
I have tried the following, which works but has two problems which I need to resolve. First, this only plots 1 row at a time. Not an efficient way of doing things especially if there are many rows. Second, I could not find a way to refer to the xaxis as the number of columns, so I resorted to simply counting the number of columns (i.e. 5) and put a c(1:5) vector to represent a number of columns. I also tried to put ncol(df) to represent xaxis but that returns an error saying that variables have different length. Indeed when requesting ncol(df) it return number 5, which is the number of columns but it does not do what I wanted it to, i.e. to represent number of columns sequentially 1,2,3,4,5.
plot(c(1:5),df[1,], type = "b", pch=19, col = "blue", xlab = "number of columns", ylab = "response format")
Thank you, your help is much appreciated

Efficiency R codes in windows server
I test r codes in local computer and it works well. local computer: RAM:4 CPU:5i system: windows. I use same codes in windows server: RAM:64 CPU:Intel Xeon 2 x L5640 (24Core x 2.2GhzHT) and hard disk:1TB NVMe and R codes does not work more quickly and it does not work at all with more data. Why R does not have enough efficiency on the server? Unfortunately, I'm not familiar with Linux. I have limited time to use the server and I need your guidance.

How to find next latest date within 30 days of existing dates in R?
I have a dataset_A like below and i would like to know what is the recent date within on (or) next 30 days for the same id from dataset_B for each date in dataset_A.
dataset_A
id disch_date 1234 20180314 1234 20180317 1234 20180419 2345 20180415 2345 20180515 3456 20180516
dataset_B
id visit_date 1234 20180317 1234 20180320 1234 20180228 1234 20180430 2345 20180512 2345 20180513 2345 20180717 3456 20180516 3456 20180519
expected_output
id disch_date visit_date 1234 20180314 20180317 1234 20180317 20180317 1234 20180419 20180430 2345 20180415 20180512 2345 20180515 NA 3456 20180516 20180516
Tried:
dataset_A$visit_date < sapply(dataset_A$disch_date, function(x) any( abs(xdataset_B$visit_date) <= 30))

Plots of prior and posterior distributions for different models
Consider the code shown below that displays graphically the prior and posterior of the BetaBinomial Model using different parameters in the prior.
colors = c("red","blue","green","orange","purple") n = 10 N = 10 theta = .2 x = rbinom(n,N,theta) grid = seq(0,2,.01) alpha = c(.5,5,1,2,2) beta = c(.5,1,3,2,5) plot(grid,grid,type="n",xlim=c(0,1),ylim=c(0,4),xlab="",ylab="Prior Density", main="Prior Distributions", las=1) for(i in 1:length(alpha)){ prior = dbeta(grid,alpha[i],beta[i]) lines(grid,prior,col=colors[i],lwd=2) } legend("topleft", legend=c("Beta(0.5,0.5)", "Beta(5,1)", "Beta(1,3)", "Beta(2,2)", "Beta(2,5)"), lwd=rep(2,5), col=colors, bty="n", ncol=3) for(i in 1:length(alpha)){ dev.new() plot(grid,grid,type="n",xlim=c(0,1),ylim=c(0,10),xlab="",ylab="Density",xaxs="i",yaxs="i", main="Prior and Posterior Distribution") alpha.star = alpha[i] + sum(x) beta.star = beta[i] + n*N  sum(x) prior = dbeta(grid,alpha[i],beta[i]) post = dbeta(grid,alpha.star,beta.star) lines(grid,post,lwd=2) lines(grid,prior,col=colors[i],lwd=2) legend("topright",c("Prior","Posterior"),col=c(colors[i],"black"),lwd=2) }
Some of the plots
How to have similar codes to the one above for the PoissonGamma model and inverse chi squaredNormal model?
What I’ve tried for the PoissonGamma is first to change the
x=rpois(n,lambda)
, and to change Beta by Gamma because here the prior has Gamma distribution. For the inverse chi squaredNormal model, would bex=rinvgamma(alpha,beta)
, here the prior and posterior both have Inverse Gamma distribution.Where I am having more difficulties is in this part
alpha.star = alpha[i] + sum(x) beta.star = beta[i] + n*N  sum(x) prior = dbeta(grid,alpha[i],beta[i]) post = dbeta(grid,alpha.star,beta.star)
I don’t know how to change it so that fits for this new model. I have the same issue for the inverse chi squaredNormal model.
Could someone please help?
I would really appreciate any help you're willing to provide.
Any suggestion in the code is welcome.
The code can be founded here https://stats.stackexchange.com/questions/70661/howdoesthebetaprioraffecttheposteriorunderabinomiallikelihood

how to convert text variable to numeric variable in r
I got the problem to convert text variable to numeric variable. I think I successful construct the dummy variable with several levels. However, when i try to read the datafile, the categories of these variable are still text..
here is my code, output, and datafile:
after that, when I try to generate the correlation value, it still shows error:
Error in cor(hpNumeric) : 'x' must be numeric
Any suggestions are appreciated !

about dummy variables in r
I am new to the R language, for my assignment, I am trying to generate several levels dummies for different variables(total in 3). however, each approach i got problem:
method1: followed by https://stats.idre.ucla.edu/r/modules/codingforcategoricalvariablesinregressionmodels/ The code:
> housing_prices2$Fuel.Type.f < factor(housing_prices2$Fuel.Type) > is.factor(housing_prices2$Fuel.Type.f) [1] TRUE > housing_prices2$Fuel.Type.f[1:10] [1] Electric Gas Gas Gas Gas Gas Oil [8] Oil Electric Gas Levels: Electric Gas None Oil Solar Unknown/Other Wood
works well. However, when I got problem in next line:
> summary(lm(write ~ Fuel.Type.f, data = housing_prices2)) Error in model.frame.default(formula = write ~ Fuel.Type.f, data = housing_prices2,: object is not a matrix
I just have no idea about this error and it doesn't make sense to me, so I decided to use another method;
method2: followed by Convert categorical variables to numeric in R
for variable Fuel.Type, it works well:
> Fuel.Type < as.factor(c("Electric", "Gas", "None", "Oil", "Solar", "Unknown/Other", + "Wood")) > Fuel.Type [1] Electric Gas None Oil Solar [6] Unknown/Other Wood Levels: Electric Gas None Oil Solar Unknown/Other Wood > unclass(Fuel.Type) [1] 1 2 3 4 5 6 7 attr(,"levels") [1] "Electric" "Gas" "None" "Oil" [5] "Solar" "Unknown/Other" "Wood"
but when I try to generate dummies for other variables, then i got this error:
> housing_prices2$Heat.Type.f[1:10] NULL Warning message: Unknown or uninitialised column: 'Heat.Type.f'.
I have clueless about what's going on about these error either... any suggestions are appreciated!
BTW, here is my sample data table:
>$ Fuel.Type : chr "Electric" "Gas" "Gas" "Gas" >$ Heat.Type : chr "Electric" "Hot Water" "Hot Water" "Hot Air" >$ Sewer.Type : chr "Private" "Private" "Public" "Private"

Determining how often and how long a specific event occurs in SQL data
I am curious how would be the best method to find out how often and how long an event occured within a set of SQL data that is managed using Microsoft SQL Server Management Studio 17.
Below is a simplified data table to illustrate the type of thing I'd be interested in solving. Say data is collected by a sensor for every 100ms and I want to know how often and how long the power dropped to 0.
I have a couple ideas how to do this using CTEs and/or Window functions, however my understanding of these functions doesn't seem to translate in SQL Management Studio as my code keeps tripping errors at points that should theoretically be correct.
For example I thought I could Window functions partitioned by the position number, filtered by the points when power was 0 then subtract the LAST_VALUE from the FIRST_VALUE. However the environment doesn't recognize these arguments.
I thought also about a CTE that already filters out the points where power was zero, but I couldn't bring that to a remotely functional point.
CREATE TABLE SensorData ( [TimeStamp] DATETIME , [Position] INT, [POWER] INT ); INSERT INTO SensorData ([TimeStamp], [Position], [Power]) VALUES (4, 1, 59), (101, 1, 60), (207, 1, 50), (321, 1, 58), (428, 1, 55), (534, 1, 59), (646, 1, 51), (755, 1, 0), (868, 1, 0), (975, 1, 0), (1081, 1, 0), (1193, 2, 45), (1307, 2, 52), (1412, 2, 51), (1519, 2, 55), (1629, 2, 58), (1735, 2, 0), (1851, 2, 0), (1960, 2, 0), (2066, 2, 54); SELECT * FROM SensorData;
How the output looks at the end isn't so important. What's important is I know the number of events where, in this case, the power went to zero and how long this event lasted (last TimeStamp within the event minus the first TimeStamp)
Any advice would be greatly appreciated!

Python: Capturing high collinearity in statsmodel (regression) for panel data
I was trying to solve a problem that asked me to assess whether the digital advertising campaign is successful in driving volume sales.
Because we only have limited impression data, I have filled 0 in impression for those that are nan.
The data is on a weekly basis and it looks like this:
Then I did a scatterplot for impression and volume:
However when I try to regression using impression as x and volume as y in statsmodel it gives me collinearity warning but I 'm only using impression as x variables. The code I'm using is the following:
And it gives me the following result:
Could anyone help me to figure out?? Is my approach wrong? I'm really confused since I only have one x, there shouldn't be any collinearity. Should I be concerned that because it is a panel data that I should use other approaches? Any suggestion is appreciated, thank you so much in advance!

How to display timeseries meta data in dygraphs legend or annotation
Can you use a categorical series of data in a timeseries dygraph as the annotations or to highlight areas of results in a "rejected" category?
I have a timeseries that includes a data qualifier column for rejected or estimated values. I'd like the legend to show me the qualifier when i scroll over those values on the dygraph. Or alternatively add annotations automatically where data are qualified. Or highlight ranges where there is a qualifier.
ts_df
looks like this:my_time Temp Result_Qualifier 1 20170928 17:00:00 8.2 REJ 2 20170928 18:30:00 8.3 NA 3 20170928 20:00:00 8.0 NA 4 20170928 21:30:00 7.6 NA 5 20170929 01:00:00 6.9 REJ 6 20170929 02:30:00 6.6 REJ
dygraph(ts_df, main = "ts_df", xlab = "my_time", ylab = "Temp") %>% dyRangeSelector()
Including numeric and categorical data as two series will still plot the numeric data but the legend stops working for some data.

Examples of Introduction to Microsimulation or Agentbased modelling using R
Could anyone recommend an introductory book chapter, or publication that would help someone get started building and simulating biological models, using microsimulation and/or agentbased methods using R? I realise there are specialised languages and packages that likely do microsimulation easier or faster or ... This is for students who use R for most everything hence it is more efficient to use a known platform. Tx.

Access to "hidden" variables in Simulink
Some example models that are included in the Simulink library, include some variables that their value is not visible. For instance, in the model power_PVarray_grid_det the sample time of the powergui is defined as Ts_Power. I can not figure out the value of this variable. Is there a way to find these hidden variables, have access to them and change their value?
Thanks in advance

Probability simulation
I have been asked to simulate whether a printer works after every hour in one day (i.e. 24 hours). If the printer works after the hour, then it has an 90% probability of working and 10% probability of being broken at the end of the next hour.
If broken, then it has 50% probability of working or being broken the next hour.
Assume random uniform distribution and that the first hour the printer is working.
My code in Python is below:
Chance = [] Status = [] for i in range(24): Chance.append(random.uniform(0,1)) Chance[0] = 1 Chance for i in Chance: if i > 0.1: Status.append('Working') else: Status.append('Broken') Chance, Status
My problem is I can't simulate the the current event based on the previous event, i.e. if the previous event is broken how can I adjust the probability of the current event to 0.5.

Max iterations for Multivariate Adaptive Regression Splines in Earth Package in R
I am running a Multivariate Adaptive Regression Splines model with the earth package in R. When I run the model as an additive model I have no problem with convergence. However, when I allow the model to estimate interactions between hinge functions it becomes too complex and the model fails to converge within 25 iterations.
My question, is there an option to increase the number of max iterations? The documentation is not clear on how to do this but I find it hard to believe that there would not be an option to increase the number of iterations!
Here is an example of the model I am running, the degree = 2 allows for interactions between hinge functions.
mars2 < earth( y ~ ., data = df_train, trace = 4, degree = 2, glm = list(family=binomial) ) "earth glm 1: did not converge after 25 iterations"

R rugarch: $ operator invalid for atomic vectors?
So, I am trying to make a huge nested for loop (optimizations be left for later) to fit all of the GARCH models available from
rugarch
.This is my MWE that reproduces the error:
library(rugarch) ## Small parameter space to search over AR_terms = c(0,1,2) I_terms = c(0,1) MA_terms = c(0,1,2) garch_p_terms = c(0,1,2) garch_q_terms = c(0,1,2) ## Models to search over var_models = c("sGARCH","eGARCH","gjrGARCH","apARCH","iGARCH","csGARCH") for (x in var_models) { if (x == 'fGARCH') { for (y in sub_var_models) { for (AR in AR_terms) { for (MA in MA_terms) { for (I in I_terms) { for (p in garch_p_terms) { for (q in garch_q_terms) { cat(y) spec = spec_creator('fGARCH', y, MA, AR, I, p, q) garch = ugarchfit(spec = spec, data = apple['A'], solver = 'hybrid', solver.control = list(trace=0)) cat('Fit Success') } } } } } } next ## To skip evaluating fGARCH as its own model with not submodel below. } for (AR in AR_terms) { for (MA in MA_terms) { for (I in I_terms) { for (p in garch_p_terms) { for (q in garch_q_terms) { cat(x) spec = spec_creator(x, 'null', MA, AR, I, p, q) garch = ugarchfit(spec = spec, data = apple['A'], solver = 'hybrid', solver.control = list(trace=0)) cat('Fit Success') } } } } } } )
with my
spec_creator
function defined here: (thefGARCH
model allows a submodel family, which is the reason for most of the redundant code)## Function to create the specs, purely to make the for loop area more readable. spec_creator = function(model, sub_model, AR_term, I_term, MA_term, garch_p_term, garch_q_term) { require(rugarch) if (sub_model == 'null') { spec = ugarchspec(variance.model = list(model = model, garchOrder = c(garch_p_term, garch_q_term), submodel = NULL, external.regressors = NULL, variance.targeting = FALSE), mean.model = list(armaOrder = c(AR_term, I_term, MA_term))) } else { spec = ugarchspec(variance.model = list(model = 'fGARCH', garchOrder = c(garch_p_term, garch_q_term), submodel = sub_model, external.regressors = sub_model, variance.targeting = FALSE), mean.model = list(armaOrder = c(AR_term, I_term, MA_term))) } }
When I run the above, I get successful messages for many
sGARCH
models, but eventually get this error:Error: $ operator is invalid for atomic vectors
, with the traceback pointing tougarchfit()
and ahessian()
function.I am assuming this is some sort of convergence issue, but have no idea what kind. Could someone possibly help me out here?
EDIT: This is my data (though this same error comes with other datasets as well),
A 28.57223993 28.30616607 28.2447644 28.29934366 28.39485735 28.80420177 29.29541506 29.42504079 29.31588228 29.51373208 30.25737443 28.94747231 28.85195861 28.72915529 29.17943414 29.12485489 29.04298601 28.96111712 27.95822332 28.5381279 28.68822085 28.12878349 27.96504572 29.32952709 30.31877609 30.1345711 29.629713 30.01859019 30.71447569 30.55756033 29.09756526 29.72522669 29.96401093 29.96401093 28.98840675 27.59663575 28.07420423 28.89971546 28.70868807 27.75355111 28.28569885 29.21354618 31.89475207 31.29438027 31.36260434 31.41718359

Cannot seem to converge beyond a loss of 3 on an object detector being trained on YOLO
Data
I have scoured the internet for resources on how to tackle this problem, but there seems to be a lot of resources that point me in the wrong direction. I am looking for a methodical guide, or academic resources that point me in the right direction towards the investigation of this problem. Given the nature of the training (object detection on drone based footage), there are quite a few variables about the data and the configuration that I could describe, but do not understand where to start. However, I will try to cover as much as I can below:
The data itself contains a lot of examples and instances of a couple of classes while the other classes (there are 5 in total) lack many instances. Would this be affecting the models ability to converge?
The data is very varied. All instances of the different classes constitute of different view angles and times of day. The data itself has been augmented too, in order to get more variance within the data.
The data from each dataset has been truncated end to end, and has not been randomly shuffled. Would this be affecting the models ability to converge?
Problem
Given that my data and my configuration is such, is there anything I can do to stop the plateauing of the loss function. I have tried increasing and decreasing the learning rate, but can't seem to get a better performance either. Therefore I am looking for three things in the answer of this questions:
 Possible academic resources that will point me towards methodical techniques I can use in order to get past this plateau and get to lower values of loss.
 Maybe blogs/videos or articles on techniques that can be used to overcome this.
 Tips from the members of cross validated themselves.