using the boxTidwell function in the CAR package and getting a bizarre error
I am trying to use the boxTidwell function in the CAR package in r to run a number of tests on continuous data. My data looks something like this:
Gender Age X1 X2 Outcome
M 20.1 1.23 4.43 1
F 19.5 2.33 3.21 0
M 18.0 1.33 7.55 1
M 17.2 3.22 6.44 0
M 12.5 4.15 8.99 1
F 14.2 5.15 10.22 0
F 13.9 6.12 12.34 1
F 9.4 7.12 3.21 1
When I use boxTidwell on the dataframe, I get an error
library(car)
gender<c("M","F","M","M","M","F","F","F")
age<c(20.1, 19.5, 18.0, 17.2, 12.5, 14.2, 13.9, 9.4)
X1<c(1.23,2.33,1.33,3.22,4.15,5.15,6.12,7.12)
X2<c(4.43,3.21,7.55,6.44,8.99,10.22,12.34,3.21)
outcome<c(1,0,1,0,1,0,1,1)
df<cbind(gender,age,X1,X2,outcome)
as.data.frame(df)
boxTidwell(outcome~age+X1+X2, ~gender, data=df)
Error in boxTidwell.default(y, X1, X2, max.iter = max.iter, tol = tol, : the variables to be transformed must have only positive values In addition: Warning message: In model.response(mf, "numeric") : using type = "numeric" with a factor response will be ignored
I am not sure what the problem is, I assume it is because I am using a binary outcome. Any suggestion would be much appreciated
1 answer

The data is insufficient for the algorithm to come up with a solution
boxTidwell(outcome~age+X1+X2, ~gender, data=df) # Score Statistic pvalue MLE of lambda #age 0.3575862 0.7206530 4.339394 #X1 0.3081380 0.7579773 3.377788 #X2 0.9979096 0.3183232 29.886634
It is noticeable if we subset the data created below to mimic the OP's data (of 9 rows)
boxTidwell(outcome~age+X1+X2, ~gender, data=df[1:8,])
Error in lm.fit(cbind(1, x.log.x, x1.p, x2), y, ...) : NA/NaN/Inf in 'x'
NOTE: In the OP's post, the
data.frame
is created after converting tomatrix
(withcbind
). It is problematic asmatrix
can hold only a single class and all the columns convert tofactor
withas.data.frame
(orcharacter
ifstringsAsFactors = FALSE
)data
set.seed(24) df < data.frame(gender = sample(c("M", "F"), 100, replace = TRUE), age = rnorm(100, 20, 1), X1 = rnorm(100, 4, 1), X2 = rnorm(100, 10, 1), outcome = sample(0:1, 100, replace = TRUE))