Can Linear Mixed Modelling handle NAs (missing data) in the response variable and/or predictor variable?
I am using the lme4 package and the lmer function in R as I am undertaking linear Mixed Modelling. Is it okay to have NAs in both the response and covariate (predictor) level? I know Linear Mixed modelling excludes the NAs and uses maximum likelihood estimates but I am not sure whether NAs can exist in both the response and predictor variables? If I don't exclude NAs my modelling runs fine but I notice uneven response groups (for the different time points)? Does this matter.
E.g. at baseline (n= 1,980) at 1 month time point (n = 1,841) etc...
Background to my data includes patient data collected at 4 different time points (this is the response variable). There are list of patient characteristics (covariates/predictor variables) included in the model. These include BMI, age, presence of diabetes, blood pressure, radiation dose etc... Some patient data wasn't collected during followup so there are missing data (1666) in the dataframe.
do you know?
how many words do you know
See also questions close to this topic

R: geom_venn how to change font
Is there way to change font style when drawing a venn diagram using geom_venn function? I have tried theme(text=element_text(size=16, family="Comic Sans MS")) but for some reason it doesn't work

How to use forloop to create duration variable?
I need to create a variable that tells me how many years there was peace before the variable conflict = 1 (meaning a conflict starts).
Someone recommended to me to do a forloop with a variable that starts with 0, adds 1 if conflict==0, saves the value if conflict==1 and puts it back to 0 afterwards. Can someone help me with how I put this into R code language?
Thank you!

Flextable output to .docx cannot be opened in Word
I've been trying to import a .csv file into R, then convert it to a flextable, then export the flextable into a Word .docx document. I use the following code
library(tidyverse) library(readxl) library(scales) library(janitor) library(stringr) library(magrittr) library(officer) library(flextable) library(dplyr) # load data  Rpops_survey < read_csv("RPOPS Data Quant.csv") spec(Rpops_survey) # convert data  Rpops_survey_dframed < as.data.frame(Rpops_survey) Rpops_ft1 < flextable(Rpops_survey_dframed) Rpops_doctemp < read_docx() Rpops_doctemp < body_add_flextable(Rpops_doctemp, value = Rpops_ft1) # export data  fileout < tempfile(fileext = ".docx") fileout < "test.docx" print(Rpops_doctemp, target = fileout)
The file is created, but trying to open it in Word produces the following error: Word docx error

More efficient resampling (discrete distribution)
Currently i have a program where the resampling step is the bottleneck. I can't think of a way to make it more efficient. Code currently used is given below:
std::vector<Object> Objects; vector<double> Numbers(50); discrete_distribution<int> dist(Numbers.begin(), Numbers.end()); std::vector<Object> NewObjects; NewObjects.reserve(50); for(int 1 = 0; i<50; i++){ int x = dist(rand_gen); NewObjects.push_back(Objects[x]); } Objects = NewObjects;
The main issue why it's so slow is because the Objects being copied are really big. Hopefully that helps a bit.

How to convert dataframe from Charachter to Numeric
I know this question may be repeated but i tried all the solutions in :
How to convert entire dataframe to numeric while preserving decimals?
https://statisticsglobe.com/convertdataframecolumntonumericinr
But didn't work
i imported excel data : from my computer manually :
File > import data > excel and i set the type of data as numeric
i checked my data using
View(Old_data)
and it s true of type numeric
but after data i Converted rows into columns and columns into rows using R :
New_data < as.data.frame(t(Old_data))
When i checked my new data using :
View(New_data)
I found that my columns are of type character and not numeric
i tried to convert New_data to numeric
New_data_B > as.numeric(New_data)
i checked my data using
dim(New_data_B) 17 1091
Here's example of my data
New_data_B
#> Name MT95T843 MT95T756 #> 1 QC_G.F9_01_4768 70027.02132 95774.13597 #> 2 QC_G.F9_01_4765 69578.18634 81479.29575 #> 3 QC_G.F9_01_4762 69578.18634 87021.95427 #> 4 QC_G.F9_01_4759 68231.14338 95558.76738 #> 5 QC_G.F9_01_4756 64874.12936 96780.77245 #> 6 QC_G.F9_01_4753 63866.65780 91854.35304 #> 7 CtrF01R5_G.D1_01_4757 66954.38799 128861.36163 #> 8 CtrF01R4_G.D5_01_4763 97352.55229 101353.25927 #> 9 CtrF01R3_G.C8_01_4754 61311.78576 7603.60896 #> 10 CtrF01R2_G.D3_01_4760 85768.36117 109461.75445 #> 11 CtrF01R1_G.C9_01_4755 85302.81947 104253.84537 #> 12 BtiF01R5_G.D7_01_4766 61252.42545 115683.73755 #> 13 BtiF01R4_G.D6_01_4764 81873.96379 112164.14229 #> 14 BtiF01R3_G.D2_01_4758 84981.21914 0.00000 #> 15 BtiF01R2_G.D4_01_4761 36629.02462 124806.49101 #> 16 BtiF01R1_G.D8_01_4767 0.00000 109927.26425 #> 17 rt 13.90181 13.90586
also i converted my data to csv file and i imported it :
Old_data < as.data.frame(read.csv("data.csv" , sep="," , header=TRUE,stringsAsFactors=FALSE))
And also using :
#install.packages("readxl") library("readxl") Old_data < read_excel("data.xlsx")
Im just confused of this problem, i think the problem is because i converted rows into columns and columns into rows
Please tell me for any clarification and also if i can send the data to someone so he can try Thank you very much

Python: How to create histogram from binning and frequency table
Is there an appropriate way of drawing histogram from a binning and frequencies table?
import pandas as pd freq=[3,14,21,12,6] i=pd.interval_range(start=50,freq=10,end=100,closed='left') df=pd.DataFrame(index=i,data=freq) df.columns=['Freq'] df.index.name='Class' df.columns.name='Column' print(df) #??hist(df)

Fit loglinear model using lme4
I have a loglinear model that I know how to run using
glm
but I would like to know how to run it usinglme4
Example dataset:library(MASS) UCBAdf < as.data.frame(UCBAdmissions)
Fit loglinear model using
glm
glmFitT < glm(Freq ~ Admit * Dept * Gender, family=poisson(link="log"), data=UCBAdf)
Output from
glm
:Call: glm(formula = Freq ~ Admit * Dept * Gender, family = poisson(link = "log"), data = UCBAdf) Deviance Residuals: [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 6.23832 0.04419 141.157 < 2e16 *** AdmitRejected 0.49212 0.07175 6.859 6.94e12 *** DeptB 0.37186 0.06918 5.375 7.65e08 *** DeptC 1.45083 0.10142 14.305 < 2e16 *** DeptD 1.31107 0.09591 13.669 < 2e16 *** DeptE 2.26803 0.14430 15.718 < 2e16 *** DeptF 3.14728 0.21773 14.455 < 2e16 *** GenderFemale 1.74969 0.11484 15.235 < 2e16 *** AdmitRejected:DeptB 0.04163 0.11319 0.368 0.71304 AdmitRejected:DeptC 1.02764 0.13550 7.584 3.34e14 *** AdmitRejected:DeptD 1.19608 0.12641 9.462 < 2e16 *** AdmitRejected:DeptE 1.44908 0.17681 8.196 2.49e16 *** AdmitRejected:DeptF 3.26187 0.23120 14.109 < 2e16 *** AdmitRejected:GenderFemale 1.05208 0.26271 4.005 6.21e05 *** DeptB:GenderFemale 1.28357 0.27358 4.692 2.71e06 *** DeptC:GenderFemale 2.27046 0.16270 13.954 < 2e16 *** DeptD:GenderFemale 1.69763 0.16754 10.133 < 2e16 *** DeptE:GenderFemale 2.32269 0.20663 11.241 < 2e16 *** DeptF:GenderFemale 1.83670 0.31672 5.799 6.66e09 *** AdmitRejected:DeptB:GenderFemale 0.83205 0.51039 1.630 0.10306 AdmitRejected:DeptC:GenderFemale 1.17700 0.29956 3.929 8.53e05 *** AdmitRejected:DeptD:GenderFemale 0.97009 0.30262 3.206 0.00135 ** AdmitRejected:DeptE:GenderFemale 1.25226 0.33032 3.791 0.00015 *** AdmitRejected:DeptF:GenderFemale 0.86318 0.40267 2.144 0.03206 *  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 2.6501e+03 on 23 degrees of freedom Residual deviance: 2.8866e14 on 0 degrees of freedom AIC: 207.06 Number of Fisher Scoring iterations: 3
How to run the equivalent model in
lme4
? Thank you! 
How can I use anova() for significance testing of lm and lmer objects while respecting variable contrast codings?
I am confused about the relationship between the significance test result shown in the output of summary() called on a lm or lmer object, and the result shown in the output of anova() called on that same object. Specifically, I don't understand (a) why, for factors with df=1 (for which it should be possible to compare results), the results don't always agree; and (b) why summary() respects the contrast weights assigned to each factor but anova() does not.
Here is an example for lm:
data(iris) ## Apply treatment coding to Species, and fit model contrasts(iris$Species) < contr.treatment(length(levels(iris$Species))) iris.lm.treatment < lm(Sepal.Length ~ Petal.Length * Species, data=iris) # check Petal.Length pvalue in lm() output coef(summary(iris.lm.treatment))["Petal.Length","Pr(>t)"] [1] 0.05199902 # check Petal.Length pvalue in anova() output as.data.frame(anova(iris.lm.treatment))["Petal.Length","Pr(>F)"] [1] 1.244558e56 ## Apply sum coding to Species, and fit model contrasts(iris$Species) < contr.sum(length(levels(iris$Species)))/2 iris.lm.sum < lm(Sepal.Length ~ Petal.Length * Species, data=iris) # check Petal.Length pvalue in lm() output coef(summary(iris.lm.sum))["Petal.Length","Pr(>t)"] [1] 2.091453e12 # check Petal.Length pvalue in anova() output as.data.frame(anova(iris.lm.sum))["Petal.Length","Pr(>F)"] [1] 1.244558e56
The significance test of Petal.Length in the fitted lm changes when the contrast coding of Species changes – that makes sense because the model evaluates each factor with orthogonal factors held constant at zero. However, the significance test of Petal.Length in the anova result is the same either way, and does not match the result from either lm.
The behavior with lmer (with significance testing accomplished via the lmerTest package) is confusing in a related way:
library(lmerTest) data(ham) ## Apply treatment coding to variables, and fit model contrasts(ham$Information) < contr.treatment(length(levels(ham$Information))) contrasts(ham$Product ) < contr.treatment(length(levels(ham$Product ))) ham.lmer.treatment < lmer(Informed.liking ~ Information * Product + (1  Consumer) + (1  Consumer:Product), data=ham) # check Information pvalue in lmer() output coef(summary(ham.lmer.treatment))["Information2","Pr(>t)"] [1] 0.4295516 # check Information pvalue in anova() output as.data.frame(anova(ham.lmer.treatment))["Information","Pr(>F)"] [1] 0.04885354 ## Apply sum coding to variables, and fit model contrasts(ham$Information) < contr.sum(length(levels(ham$Information)))/2 contrasts(ham$Product ) < contr.sum(length(levels(ham$Product )))/2 ham.lmer.sum < lmer(Informed.liking ~ Information * Product + (1  Consumer) + (1  Consumer:Product), data=ham) # check Information pvalue in lmer() output coef(summary(ham.lmer.sum))["Information1","Pr(>t)"] [1] 0.04885354 # check Information pvalue in anova() output as.data.frame(anova(ham.lmer.sum))["Information","Pr(>F)"] [1] 0.04885354
Here, it is still the case that variable coding appears to affect the results shown in the output of summary() but not the results shown in the output of anova(). However, both anova() results match the lmer() result obtained when sum coding is used.
It seems to me that in both cases, anova() is ignoring the variable codings used and using some other variable coding – which, in the lmer case, appears to be sum coding – to evaluate significance. I would like to know how to perform a statistical test that uses the assigned variable codings. For lmer, at least, I can accomplish this with contestMD(); e.g.,
# test Information significance while respecting contrast weights contestMD(ham.lmer.treatment, as.numeric(names(fixef(ham.lmer.treatment))=="Information2"))[,"Pr(>F)"] [1] 0.4295516 # matches output from summary(ham.lmer.treatment)
However, I can't figure out how to do the equivalent test for lm (presumably using glht, but I can't figure out the right function call). So, my questions are:
Conceptually, why does anova() not respect the assigned variable codings? (Presumably this is all intended behavior, but the reason why is opaque to me.)
Practically, what variable coding is being used by anova() when called on an lm object?
How can I perform the kind of significance testing I want with an lm object? (I used examples with df=1 above because they can be compared between model output and anova() output, but of course what I really want to do is test for effects that have df>1.)

How to use R to predict slope of the main predictor under an threeway interaction fitted by lmer
I have a linear mixed effect model with threeway interaction fitted by the following code:
m < lmer(cog ~ PRS*poly(Age, 2, raw=T)*Score + gender + Edu + fam + factor(Time) + (1family/DBID), data = test_all, REML = F)
In this model, there is a threeway interaction between PRS, Score, and polynomial terms of age with two degrees (linear + quadratic). For this three way interaction, how can I obtain the marginal effect(slope) of one variable, conditional on the other variables? For example, what is the slope of PRS, when age = 50 and score = 1?
Second, I tried to use the following code to plot this threeway interaction:
plot < ggpredict(m, ci.lvl=0.95, c("PRS [all]", "Age [60, 65, 70, 75, 80]", "Score[0, 0.321, 0.695, 1.492, 1.914, 3.252]")) plot(m)
The interaction plot finally shows but R didn't give the confidence interval. The error message is
Error: Confidence intervals could not be computed.
How can I plot this threeway interaction with confidence interval?

how to separate the mixed models, and fit separate linear models
I am trying to fit a linear model and separate the mixed models. Then fit separate linear models to
model_steeper
andmodel_flatter
. First, I create training samples with Input >= 5 and separate the pointsnSample<length(data$Input) Train.Sample<data.frame(trainInput=data$Input,trainOutput=rep(NA,nSample)) Train.Sample.Steeper<data.frame(trainSteepInput=data$Input,trainSteepOutput=rep(NA,nSample)) Train.Sample.Flatter<data.frame(trainFlatInput=data$Input,trainFlatOutput=rep(NA,nSample)) head(cbind(data,Train.Sample,Train.Sample.Steeper,Train.Sample.Flatter))
and the result is:
dput(head(cbind(data,Train.Sample,Train.Sample.Steeper,Train.Sample.Flatter))) structure(list(Output = c(0.430030802963404, 0.387872242279496, 0.773463398992163, 3.47962503801818, 1.18311295613965, 0.534018180113726 ), Input = c(0.707348558586091, 0.596670078579336, 1.55126970726997, 2.00976222474128, 1.69353070948273, 0.437843651510775), trainInput = c(0.707348558586091, 0.596670078579336, 1.55126970726997, 2.00976222474128, 1.69353070948273, 0.437843651510775), trainOutput = c(NA, NA, NA, NA, NA, NA), trainSteepInput = c(0.707348558586091, 0.596670078579336, 1.55126970726997, 2.00976222474128, 1.69353070948273, 0.437843651510775 ), trainSteepOutput = c(NA, NA, NA, NA, NA, NA), trainFlatInput = c(0.707348558586091, 0.596670078579336, 1.55126970726997, 2.00976222474128, 1.69353070948273, 0.437843651510775), trainFlatOutput = c(NA, NA, NA, NA, NA, NA)), row.names = c(NA, 6L), class = "data.frame")
Then, I tried:
Train.Sample.Steep.lm < lm(trainSteepOutput ~ trainSteepInput, Train.Sample.Steeper)
But the error is:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (nonNA) cases
I do not know what to do next. Does any one know this?

Random effects specification in gamlss in R
I would like to use the gamlss package for fitting a model benefiting from more available distributions in that package. However, I am struggling to correctly specify my random effects or at least I think there is a mistake because if I compare the output of a lmer model with Gaussian distribution and the gamlss model with Gaussian distribution output differs. If comparing a lm model without the random effects and a gamlss model with Gaussian distribution and without random effects output is similar.
I unfortunately cannot share my data to reproduce it. Here my code:
df < subset.data.frame(GFW_food_agg, GFW_food_agg$fourC_area_perc < 200, select = c("ISO3", "Year", "Forest_loss_annual_perc_boxcox", "fourC_area_perc", "Pop_Dens_km2", "Pop_Growth_perc", "GDP_Capita_current_USD", "GDP_Capita_growth_perc", "GDP_AgrForFis_percGDP", "Gini_2008_2018", "Arable_land_perc", "Forest_loss_annual_perc_previous_year", "Forest_extent_2000_perc")) fourC < lmer(Forest_loss_annual_perc_boxcox ~ fourC_area_perc + Pop_Dens_km2 + Pop_Growth_perc + GDP_Capita_current_USD + GDP_Capita_growth_perc + GDP_AgrForFis_percGDP + Gini_2008_2018 + Arable_land_perc + Forest_extent_2000_perc + (1ISO3) + (1Year), data = df) summary(fourC) resid_panel(fourC) df < subset.data.frame(GFW_food_agg, GFW_food_agg$fourC_area_perc < 200, select = c("ISO3", "Year", "Forest_loss_annual_perc_boxcox", "fourC_area_perc", "Pop_Dens_km2", "Pop_Growth_perc", "GDP_Capita_current_USD", "GDP_Capita_growth_perc", "GDP_AgrForFis_percGDP", "Gini_2008_2018", "Arable_land_perc", "Forest_loss_annual_perc_previous_year", "Forest_extent_2000_perc")) df < na.omit(df) df$ISO3 < as.factor(df$ISO3) df$Year < as.factor(df$Year) fourC < gamlss(Forest_loss_annual_perc_boxcox ~ fourC_area_perc + Pop_Dens_km2 + Pop_Growth_perc + GDP_Capita_current_USD + GDP_Capita_growth_perc + GDP_AgrForFis_percGDP + Gini_2008_2018 + Arable_land_perc + Forest_extent_2000_perc + random(ISO3) + random(Year), data = df, family = NO, control = gamlss.control(n.cyc = 200)) summary(fourC) plot(fourC)
How do the random effects need to be specified in gamlss to be similar to the random effects in lmer?
If I specify the random effects instead using
re(random = ~1ISO3) + re(random = ~1Year)
I get the following error: Error in model.frame.default(formula = Forest_loss_annual_perc_boxcox ~ : variable lengths differ (found for 're(random = ~1  ISO3)')

How to interpret a natural logtransformed estimate from a linear mixed model on repeated measurements data, after backtransforming it?
I've got repeated measurements data (37 patients, 103 observations) in which I want to see by how much my continuous outcome (
cont_outcome
) changes per year with a linear mixed model using followup time in years (fu_time
) as the time variable. Because my outcome variable is very right skewed, I've log transformed it (natural log[outcome]).The data looks as follows:
library(dplyr) library(magrittr) library(nlme) mydata < structure(list(ID = c(2, 2, 2, 2, 2, 2, 8, 8, 8, 14, 14, 14, 14, 14, 20, 20, 21, 21, 25, 25, 30, 30, 31, 31, 34, 34, 34, 34, 34, 38, 38, 40, 40, 40, 40, 40, 48, 48, 54, 54, 58, 58, 65, 65, 86, 86, 87, 87, 92, 92, 96, 96, 96, 96, 96, 103, 103, 115, 115, 137, 137, 141, 141, 145, 145, 154, 154, 154, 154, 154, 159, 159, 160, 160, 164, 164, 164, 164, 164, 179, 179, 179, 179, 179, 185, 185, 205, 205, 205, 205, 205, 213, 213, 221, 221, 271, 271, 277, 277, 310, 310, 320, 320), fu_time = c(5.7, 2.7, 0.5, 3.2, 0, 2.2, 0.5, 4, 0, 1.5, 4, 1, 0.5, 0, 0, 0.3, 0, 0.4, 0, 5.6, 0, 0.7, 2.3, 0, 1.1, 1.6, 0, 4.2, 0.6, 4.3, 0, 0.6, 0, 1.1, 0.1, 3.7, 0, 0.8, 0, 5, 0, 0.8, 1.1, 0, 0.2, 0, 0, 1.1, 0, 5.9, 1.4, 2, 4.4, 1, 0, 0, 1.2, 0, 3.6, 0, 0.6, 0, 0.3, 0, 0.4, 1.4, 0.9, 0.4, 0, 4.2, 0, 0.3, 0, 0.5, 0, 1.4, 3.7, 0.5, 0.9, 0.7, 3.6, 0.2, 1.2, 0, 3.9, 0, 0.1, 0, 0.6, 3, 1.1, 0, 3.5, 3.1, 0, 1.4, 0, 0, 1.3, 0, 1.5, 0, 0.8), cont_outcome = c(9166.8, 7803.60703059892, 6007.4, 8359.78482915823, 4577.5, 7048.927921802, 803.6094403348, 3706.1, 2834.9, 12631.640625, 20091.6, 13507.8568496174, 12318.3051438352, 10076.9, 75773.4, 87150.2422157607, 9461.9, 9281.3649794135, 1171, 2401.4, 8296.6, 9532.9382285201, 1608.1117689641, 1094.8, 12153.6379335835, 11451.5214843705, 10758.6, 10883.4, 11549.8622884445, 11174.7, 8770, 9249.8056640615, 10738.1, 11848.461839848, 11208.340344807, 17518.9, 4135, 5577.12568364527, 272.6, 961.5, 40183.4, 47439.991323044, 15238.1209494318, 12838.8, 6084.44836914335, 6253.3, 14733.1, 20123.0440344545, 1588.3, 4453.2, 6740.57697070795, 6238.9838188163, 15523, 6043.15615496565, 4054.6, 24824.4, 27923.6125781384, 87.3, 1281.6, 2198.4, 2678.36612114335, 16180.4, 16266.9311489703, 8780.5, 9725.03523843347, 9704.02880408264, 9223.94743359175, 9769.01741423425, 9102.1, 20000.6, 14642.2, 25463.1303222693, 42213.8, 35333.71875005, 5487.7, 7382.0422781139, 6553.8, 7889.8811544924, 7999.42835741673, 6451.9312487042, 7037.9, 5973.8220351655, 6794.3513711514, 5319.9, 9518, 6892.8, 16098.0389453171, 11286.7, 15431.737807293, 15377, 15224.4723769555, 14789.2, 20458.7, 8805, 7564, 251.8, 160.1, 908, 1485.5, 6049, 8948.2, 801, 1516.9), cont_outcome_natlog = c(9.12334354033531, 8.96234134560572, 8.70074732161274, 9.03118796760855, 8.42890827654866, 8.86063081641333, 6.68911338042628, 8.21773538975163, 7.94976194217113, 9.44396010595827, 9.9080570962505, 9.51102678399723, 9.41884165813006, 9.21800095464279, 11.2355025865596, 11.3753888302416, 9.1550284875451, 9.13576390326269, 7.06561336359772, 7.78380717959662, 9.02360107130568, 9.16250826272016, 7.38281595538559, 6.99832697716652, 9.40538382235389, 9.34587788088557, 9.28346071372637, 9.29499397159198, 9.35442879280046, 9.32140757348384, 9.07909208536623, 9.13235782099512, 9.28155344366033, 9.37995333559373, 9.32441345379483, 9.77103557713117, 8.32724260745779, 8.62642881220255, 5.60800551926243, 6.86849456502883, 10.6012092540025, 10.7672208507678, 9.63155552435753, 9.46022711493256, 8.71349134710413, 8.74086460338749, 9.5978519421682, 9.90962090672887, 7.37041954084111, 8.40137821785455, 8.81590080420515, 8.73857259855338, 9.65007807402644, 8.70668169666408, 8.30760731803409, 10.119582319437, 10.2372279388406, 4.46935046284556, 7.15586457631409, 7.69548510202803, 7.89296223129914, 9.69155591218576, 9.69688956219896, 9.08028863261591, 9.18245879196938, 9.18029641888851, 9.12955836305925, 9.18697116825287, 9.11626043511259, 9.90351755208614, 9.59166304944946, 10.1449868151533, 10.6505024607846, 10.4725929923869, 8.61026450318852, 8.90680561076318, 8.78780031307755, 8.9733363509245, 8.98712536278601, 8.77213478337683, 8.8590651091986, 8.69514220843703, 8.82384686524476, 8.57920978516654, 9.16094002168106, 8.83823266752201, 9.68645273891508, 9.33138032035193, 9.64418156424508, 9.64062816551569, 9.63065943693362, 9.60165246364014, 9.92616349887503, 9.08307502093031, 8.93115542977835, 5.52863512161025, 5.07579862000267, 6.81124437860129, 7.30350669490266, 8.70764824810691, 9.09920767372367, 6.68586094706836, 7.32442405759763)), row.names = c(NA, 103L ), class = c("tbl_df", "tbl", "data.frame")) head(mydata) # A tibble: 6 x 4 ID fu_time cont_outcome cont_outcome_natlog <dbl> <dbl> <dbl> <dbl> 1 2 5.7 9167. 9.12 2 2 2.7 7804. 8.96 3 2 0.5 6007. 8.70 4 2 3.2 8360. 9.03 5 2 0 4577. 8.43 6 2 2.2 7049. 8.86
To answer my question I've run the following two mixed models with random intercepts and slopes over time using the
lme
function from thenlme
package. The first model uses the original, untransformed outcome variable. The second model uses the logtransformed outcome variable. For both models the estimate and 95%CI (of the fixed effects offu_time
) with theintervals
function (also fromnlme
). For the second model I exponentiate the estimates to transform them back to their original scale:# The first model model_1 < lme(fixed=cont_outcome ~ fu_time, random=~1 + fu_time  ID, data=mydata) intervals(model_1)$fixed["fu_time",] # to obtain the estimate and 95%CI for the fixed effect of time, which gives... lower est. upper 836.3064 1537.2157 2238.1251 # The second model model_2 < lme(fixed=cont_outcome_natlog ~ fu_time, random=~1 + fu_time  ID, data=mydata) exp(intervals(model_2)$fixed["fu_time",]) # exponentiated estimate with 95% CI, which gives... lower est. upper 1.079379 1.145608 1.215900
I don't understand how to interpret the estimate of the second model. The estimate resulting from the first model makes total sense, in that from descriptive statistics I did earlier to this I know that patients +/ increase with 10002000 per year in
cont_outcome
. This is not at all reflected in the backtransformed estimate of the second model...Am I doing something wrong here? Is this not how you're supposed to do this? Thanks in advance.