Can Linear Mixed Modelling handle NAs (missing data) in the response variable and/or predictor variable?
I am using the lme4 package and the lmer function in R as I am undertaking linear Mixed Modelling. Is it okay to have NAs in both the response and covariate (predictor) level? I know Linear Mixed modelling excludes the NAs and uses maximum likelihood estimates but I am not sure whether NAs can exist in both the response and predictor variables? If I don't exclude NAs my modelling runs fine but I notice uneven response groups (for the different time points)? Does this matter.
E.g. at baseline (n= 1,980) at 1 month time point (n = 1,841) etc...
Background to my data includes patient data collected at 4 different time points (this is the response variable). There are list of patient characteristics (covariates/predictor variables) included in the model. These include BMI, age, presence of diabetes, blood pressure, radiation dose etc... Some patient data wasn't collected during followup so there are missing data (1666) in the dataframe.
do you know?
how many words do you know
How to interpret a natural logtransformed estimate from a linear mixed model on repeated measurements data, after backtransforming it?
I've got repeated measurements data (37 patients, 103 observations) in which I want to see by how much my continuous outcome (
cont_outcome
) changes per year with a linear mixed model using followup time in years (fu_time
) as the time variable. Because my outcome variable is very right skewed, I've log transformed it (natural log[outcome]).The data looks as follows:
library(dplyr) library(magrittr) library(nlme) mydata < structure(list(ID = c(2, 2, 2, 2, 2, 2, 8, 8, 8, 14, 14, 14, 14, 14, 20, 20, 21, 21, 25, 25, 30, 30, 31, 31, 34, 34, 34, 34, 34, 38, 38, 40, 40, 40, 40, 40, 48, 48, 54, 54, 58, 58, 65, 65, 86, 86, 87, 87, 92, 92, 96, 96, 96, 96, 96, 103, 103, 115, 115, 137, 137, 141, 141, 145, 145, 154, 154, 154, 154, 154, 159, 159, 160, 160, 164, 164, 164, 164, 164, 179, 179, 179, 179, 179, 185, 185, 205, 205, 205, 205, 205, 213, 213, 221, 221, 271, 271, 277, 277, 310, 310, 320, 320), fu_time = c(5.7, 2.7, 0.5, 3.2, 0, 2.2, 0.5, 4, 0, 1.5, 4, 1, 0.5, 0, 0, 0.3, 0, 0.4, 0, 5.6, 0, 0.7, 2.3, 0, 1.1, 1.6, 0, 4.2, 0.6, 4.3, 0, 0.6, 0, 1.1, 0.1, 3.7, 0, 0.8, 0, 5, 0, 0.8, 1.1, 0, 0.2, 0, 0, 1.1, 0, 5.9, 1.4, 2, 4.4, 1, 0, 0, 1.2, 0, 3.6, 0, 0.6, 0, 0.3, 0, 0.4, 1.4, 0.9, 0.4, 0, 4.2, 0, 0.3, 0, 0.5, 0, 1.4, 3.7, 0.5, 0.9, 0.7, 3.6, 0.2, 1.2, 0, 3.9, 0, 0.1, 0, 0.6, 3, 1.1, 0, 3.5, 3.1, 0, 1.4, 0, 0, 1.3, 0, 1.5, 0, 0.8), cont_outcome = c(9166.8, 7803.60703059892, 6007.4, 8359.78482915823, 4577.5, 7048.927921802, 803.6094403348, 3706.1, 2834.9, 12631.640625, 20091.6, 13507.8568496174, 12318.3051438352, 10076.9, 75773.4, 87150.2422157607, 9461.9, 9281.3649794135, 1171, 2401.4, 8296.6, 9532.9382285201, 1608.1117689641, 1094.8, 12153.6379335835, 11451.5214843705, 10758.6, 10883.4, 11549.8622884445, 11174.7, 8770, 9249.8056640615, 10738.1, 11848.461839848, 11208.340344807, 17518.9, 4135, 5577.12568364527, 272.6, 961.5, 40183.4, 47439.991323044, 15238.1209494318, 12838.8, 6084.44836914335, 6253.3, 14733.1, 20123.0440344545, 1588.3, 4453.2, 6740.57697070795, 6238.9838188163, 15523, 6043.15615496565, 4054.6, 24824.4, 27923.6125781384, 87.3, 1281.6, 2198.4, 2678.36612114335, 16180.4, 16266.9311489703, 8780.5, 9725.03523843347, 9704.02880408264, 9223.94743359175, 9769.01741423425, 9102.1, 20000.6, 14642.2, 25463.1303222693, 42213.8, 35333.71875005, 5487.7, 7382.0422781139, 6553.8, 7889.8811544924, 7999.42835741673, 6451.9312487042, 7037.9, 5973.8220351655, 6794.3513711514, 5319.9, 9518, 6892.8, 16098.0389453171, 11286.7, 15431.737807293, 15377, 15224.4723769555, 14789.2, 20458.7, 8805, 7564, 251.8, 160.1, 908, 1485.5, 6049, 8948.2, 801, 1516.9), cont_outcome_natlog = c(9.12334354033531, 8.96234134560572, 8.70074732161274, 9.03118796760855, 8.42890827654866, 8.86063081641333, 6.68911338042628, 8.21773538975163, 7.94976194217113, 9.44396010595827, 9.9080570962505, 9.51102678399723, 9.41884165813006, 9.21800095464279, 11.2355025865596, 11.3753888302416, 9.1550284875451, 9.13576390326269, 7.06561336359772, 7.78380717959662, 9.02360107130568, 9.16250826272016, 7.38281595538559, 6.99832697716652, 9.40538382235389, 9.34587788088557, 9.28346071372637, 9.29499397159198, 9.35442879280046, 9.32140757348384, 9.07909208536623, 9.13235782099512, 9.28155344366033, 9.37995333559373, 9.32441345379483, 9.77103557713117, 8.32724260745779, 8.62642881220255, 5.60800551926243, 6.86849456502883, 10.6012092540025, 10.7672208507678, 9.63155552435753, 9.46022711493256, 8.71349134710413, 8.74086460338749, 9.5978519421682, 9.90962090672887, 7.37041954084111, 8.40137821785455, 8.81590080420515, 8.73857259855338, 9.65007807402644, 8.70668169666408, 8.30760731803409, 10.119582319437, 10.2372279388406, 4.46935046284556, 7.15586457631409, 7.69548510202803, 7.89296223129914, 9.69155591218576, 9.69688956219896, 9.08028863261591, 9.18245879196938, 9.18029641888851, 9.12955836305925, 9.18697116825287, 9.11626043511259, 9.90351755208614, 9.59166304944946, 10.1449868151533, 10.6505024607846, 10.4725929923869, 8.61026450318852, 8.90680561076318, 8.78780031307755, 8.9733363509245, 8.98712536278601, 8.77213478337683, 8.8590651091986, 8.69514220843703, 8.82384686524476, 8.57920978516654, 9.16094002168106, 8.83823266752201, 9.68645273891508, 9.33138032035193, 9.64418156424508, 9.64062816551569, 9.63065943693362, 9.60165246364014, 9.92616349887503, 9.08307502093031, 8.93115542977835, 5.52863512161025, 5.07579862000267, 6.81124437860129, 7.30350669490266, 8.70764824810691, 9.09920767372367, 6.68586094706836, 7.32442405759763)), row.names = c(NA, 103L ), class = c("tbl_df", "tbl", "data.frame")) head(mydata) # A tibble: 6 x 4 ID fu_time cont_outcome cont_outcome_natlog <dbl> <dbl> <dbl> <dbl> 1 2 5.7 9167. 9.12 2 2 2.7 7804. 8.96 3 2 0.5 6007. 8.70 4 2 3.2 8360. 9.03 5 2 0 4577. 8.43 6 2 2.2 7049. 8.86
To answer my question I've run the following two mixed models with random intercepts and slopes over time using the
lme
function from thenlme
package. The first model uses the original, untransformed outcome variable. The second model uses the logtransformed outcome variable. For both models the estimate and 95%CI (of the fixed effects offu_time
) with theintervals
function (also fromnlme
). For the second model I exponentiate the estimates to transform them back to their original scale:# The first model model_1 < lme(fixed=cont_outcome ~ fu_time, random=~1 + fu_time  ID, data=mydata) intervals(model_1)$fixed["fu_time",] # to obtain the estimate and 95%CI for the fixed effect of time, which gives... lower est. upper 836.3064 1537.2157 2238.1251 # The second model model_2 < lme(fixed=cont_outcome_natlog ~ fu_time, random=~1 + fu_time  ID, data=mydata) exp(intervals(model_2)$fixed["fu_time",]) # exponentiated estimate with 95% CI, which gives... lower est. upper 1.079379 1.145608 1.215900
I don't understand how to interpret the estimate of the second model. The estimate resulting from the first model makes total sense, in that from descriptive statistics I did earlier to this I know that patients +/ increase with 10002000 per year in
cont_outcome
. This is not at all reflected in the backtransformed estimate of the second model...Am I doing something wrong here? Is this not how you're supposed to do this? Thanks in advance.