Trouble understanding the difference between observed and covariate in the aov_car package
I am trying to run a mixed-effects ANOVA on a 2 (between) by 3 (within) study. I've been following much of this tutorial https://ademos.people.uic.edu/Chapter21.html#21_data_background: but I'm having trouble adding covariates like age, gender, personality.
Looking at the aov_car documentation (https://github.com/singmann/afex/blob/master/examples/examples.aov_car.R), it seems that observed is used to represent covariates and thus, am having some trouble understanding the difference.
Can anyone clarify the difference between the observed and covariate inputs and if possible, let me know of the proper syntax for running the aov_car with covariates.
Note I'm somewhat new to R/statistics.
See also questions close to this topic
-
two functions from different packages not working together
So I am doing work on ruin probability. I am performing monte-carlo simulations through R. The package that works perfectly for this is 'ruin'. You simply specify inputs for different ruin models. I am trying to simulate a SparreAndersen model where there are two random variable generators from different distributions. Typical distributions such as exponential or gamma work fine, using rgamma or rexp.
The problem that I am having is that I require a mixed exponential function as one of the r.v. generators. I found a great package that achieves this, called 'gendist' where I use the rmixt function to generate r.v.s from a mixed exponential function.
Running rmixt with all parameters works fine, giving a correct and accurate numerical output. Running rexp with all parameters works fine, giving a correct and accurate numerical output. Running a SparreAndersen() function works fine when using any function that is NOT the rmixt function.
The problem: SparreAndersen does not like to use rmixt from 'gendist' package as a r.v. generator
When I try to run this function, I get this as an error
I have no idea why this isn't working??? See my entire code below:
library(ruin) library(gendist) rmixt(1, phi=1, spec1="exp", arg1=list(rate=3), spec2="exp", arg2=list(rate=7) ) rexp(1, rate=1) modelSA <- SparreAndersen(initial_capital = 1, premium_rate = 5, claim_interarrival_generator = rmixt, claim_interarrival_parameters = list(1, phi=1, spec1="exp", arg1=list(rate=3), spec2="exp", arg2=list(rate=7)), claim_size_generator = rexp, claim_size_parameters = list(rate=1))
edit: I have found a working solution, not involving the 'gendist' package. Let me know if this is the best way to go about doing this.
My new working code:
# Function which generates multiple exponential rmixedexp <- function(...) { choice <- c(...) return(rexp(n=1, sample(choice, 1, replace=TRUE))) } modelSA <- SparreAndersen(initial_capital = 5, premium_rate = 5, claim_interarrival_generator = rmixedexp, claim_interarrival_parameters = list(1, 3, 7), claim_size_generator = rexp, claim_size_parameters = list(rate=1))
I had to make my own function which takes any number of parameters, for example, 2. Each parameter is the rate of the exponential, chosen randomly (probability half).
-
How to calculate the total needed to hit a target
This might just boil down to a simple math problem but alas, I'm stuck.
The problem - Agent 1 received 9 customer surveys so far, 7 of which are positive, therefore Agent 1 hit a satisfaction rate of 77.8%. The target is 95%, how many more surveys are needed in order to hit target, assuming all new surveys are positive?
In a spreadsheet, I can calculate this simply by adding 1 to each row, until I hit a 95% pass rate, which in the example is 31 (Sheet Example).
I feel so stupid for posting this but I'm hoping there is a math method of getting the answer, without having to go through each of the possible outcomes manually until hitting the target %.
-
How to calculate statistical threshold/cutoff point given multiple variables across two groups?
I have a data-set that is comparing two groups (Baseline and Surgical Modification) across multiple injury severity (Peak Force).
Reference scatter plot for context:
BBB score - yaxis is what is defined and measured for 'neurological recovery'. Peak Force - xaxis is various injury severity across subjects.
The goal is to come up with a model, type of analysis that tries to define the threshold injury severity (x-axis) that shows no improvement of neurological recovery (BBB score - yaxis) is seen with surgical decompression.
I was thinking of doing an ANOVA model to compare the mean difference across groups and determining what 'peak force' threshold shows no improvement with BBB score given the group that has Durotomy/Piotomy (dependent variable).
Any help would be greatly appreciated!
-
on the predict.merMod function arguments
In the function
predict.merMod
of thelme4
package, what is the difference between the following arguments:allow.new.levels=TRUE
,re.form=NA
andre.form=~0
if we have only a random intercept? -
Temporal autoregression in glmmTMB: why does it require time as a factor?
With regards to autocorrelation, how can
glmmTMB
tell how far apart time steps are if the time sequence must be provided toar1()
as a factor?In
glmmTMB
,ar1
requires timesteps to be evenly spaced and to be coded as a factor (see this vignette). Given a numerical time seriestime.steps
, is it enough to recode it asas.factor(time.steps)
for the model to run correctly? How canglmmTMB
tell how far apart moments in time are if the time sequence must be provided as a factor? -
Different results of glmer in R when the order of data is shuffled
I found that glmer in lme4 package sometimes gives slightly different results, e.g. AIC/BIC and/or fixed/random factors, when model is a bit complex and the order of data is shuffled. Therefore, I'm picking up the best results in terms of AIC or BIC. Does this make sense?
The second question is where is the divergence coming from? In theory, the order of data should not matter as long as the i.i.d assumption is made. The different results suggest that glmer uses (perhaps iterative) algorithms that depend on initial values.
I would be grateful if you could provide any feedback or suggestions.
For your information, here is the sample code I tested, assuming that multiple practitioners make binary diagnoses based on four types of features in medial images.
# data generation nPracts <- 40 # number of practitioners nImages <- 100 # number of medial images var1 <- rep(seq(-20,20,10), 20) # 4 image features var2 <- c(rep(-20,20), rep(-10,20), rep(0,20), rep(10,20), rep(20,20)) var3 <- rep(seq(-20,25,5), 10) var4 <- c(rep(10,20), rep(0,20), rep(-20,20), rep(20,20), rep(10,20)) fix1 <- 0.05 # 4 fixed factors fix2 <- 0.5 fix3 <- 0.1 fix4 <- 0.3 rand1 <- rnorm(nPracts, mean=0, sd=0.1) # 4 random factors (practitioner-dependent) rand2 <- rnorm(nPracts, mean=0, sd=0.05) rand3 <- rnorm(nPracts, mean=0, sd=0.01) rand4 <- rnorm(nPracts, mean=0, sd=0.03) data <- data.frame() for (j in 1:nPracts){ for (i in 1:nImages){ z <- var1[i]*(fix1+rand1[j])+var2[i]*(fix2+rand2[j])+var3[i]*(fix3+rand3[j])+var4[i]*(fix4+rand4[j]) pr <- 1/(1+exp(-z)) y <- rbinom(1, 1, pr) # binary diagnosis data<- rbind(data, c(j, var1[i], var2[i], var3[i], var4[i], y)) } } colnames(data) <- c('practitioner', 'var1', 'var2', 'var3', 'var4', 'diagnosis') data$diagnosis<- factor(data$diagnosis) data$practitioner <- factor(data$practitioner) # refit model 5 times to see the divergence for (k in 1:5){ set.seed(123+k) data2 <- data[sample(nrow(data)),] # change data order fit <- glmer(diagnosis~ var1+var2+var3+var4+(0+var1|practitioner)+(0+var2|practitioner)+(0+var3|practitioner)+(0+var4|practitioner), data=data2, family=binomial(link="logit")) print(fit) }
These are two samples results from the last block. They are slightly different.
AIC BIC logLik deviance df.resid 511.9535 568.5999 -246.9767 493.9535 3991 Random effects: Groups Name Std.Dev. practitioner var1 0.099567 practitioner.1 var2 0.080622 practitioner.2 var3 0.002188 practitioner.3 var4 0.046851 Number of obs: 4000, groups: practitioner, 40 Fixed Effects: (Intercept) var1 var2 0.39222 0.05796 0.56366 var3 var4 0.10046 0.31254
AIC BIC logLik deviance df.resid 512.0151 568.6616 -247.0076 494.0151 3991 Random effects: Groups Name Std.Dev. practitioner var1 0.100646 practitioner.1 var2 0.080578 practitioner.2 var3 0.006207 practitioner.3 var4 0.044535 Number of obs: 4000, groups: practitioner, 40 Fixed Effects: (Intercept) var1 var2 0.26154 0.05492 0.54916 var3 var4 0.10028 0.30373