Direction of Estimate coefficient in a Log Regression
I'm analysing ordinal logistic regression and I'm wondering, how to know which direction the estimate coefficient has? My Variables are just 0, 1 for Women, Men and 0,1,2,4 for different postures. So my question is, how do I know, if the estimate describes the change from 0 to 1 or the change from 1 to 0, talking about gender?
The output added a 1 to PicSex, is it a sign, that this one has a 1>0 direction? See the code for that.
Thank you for any help
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: Int ~ PicSex + Posture + (1  PicID)
data: x
Random effects:
Groups Name Variance Std.Dev.
PicID (Intercept) 0.0541 0.2326
Number of groups: PicID 16
Coefficients:
Estimate Std. Error z value Pr(>z)
PicSex1 0.3743 0.1833 2.042 0.0411 *
Posture 1.1232 0.1866 6.018 1.77e09 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
1 answer

Your sex has two levels,0 or 1. So PicSex1 means the effect of PicSex being 1 compared to PicSex being 0. I show an example below using the wine dataset:
library(ordinal) DATA = wine > head(DATA$temp) [1] cold cold cold cold warm warm Levels: cold warm
Here cold comes first in Levels, so it is set as the reference in any linear models.First we verify the effect of cold vs warm
do.call(cbind,tapply(DATA$rating,DATA$temp,table)) #warm has a higher average rating
Fit the model
# we fit the a model, temp is fixed effect summary(clmm(rating ~ temp + contact+(1judge), data = DATA)) Cumulative Link Mixed Model fitted with the Laplace approximation formula: rating ~ temp + contact + (1  judge) data: DATA link threshold nobs logLik AIC niter max.grad cond.H logit flexible 72 81.57 177.13 332(999) 1.03e05 2.8e+01 Random effects: Groups Name Variance Std.Dev. judge (Intercept) 1.279 1.131 Number of groups: judge 9 Coefficients: Estimate Std. Error z value Pr(>z) tempwarm 3.0630 0.5954 5.145 2.68e07 *** contactyes 1.8349 0.5125 3.580 0.000344 ***
Here we see warm being attached to "temp" and as we know, it has a positive coefficient because the rating is better in warm, compared to cold (the reference).
So if you set another group as the reference, you will see another name attached, and the coefficient is reversed (3.. compared to +3.. in previous example)
# we set warm as reference now DATA$temp = relevel(DATA$temp,ref="warm") summary(clmm(rating ~ temp + contact+(1judge), data = DATA)) Cumulative Link Mixed Model fitted with the Laplace approximation formula: rating ~ temp + contact + (1  judge) data: DATA link threshold nobs logLik AIC niter max.grad cond.H logit flexible 72 81.57 177.13 269(810) 1.14e04 1.8e+01 Random effects: Groups Name Variance Std.Dev. judge (Intercept) 1.28 1.131 Number of groups: judge 9 Coefficients: Estimate Std. Error z value Pr(>z) tempcold 3.0630 0.5954 5.145 2.68e07 *** contactyes 1.8349 0.5125 3.580 0.000344 ***
So always check what is the reference before you fit the model
See also questions close to this topic

R ggplot: Force facet_grid margin to be on the next row
library(ggpointdensity) library(viridis) ggplot(data = iris, mapping = aes(x = Sepal.Width, y = Sepal.Length)) + geom_pointdensity() + scale_color_viridis() + theme_bw() + facet_grid( ~ cut(iris$Sepal.Width,3), margins = F, scales = 'free')
I need to get the margin (second plot) to be generated as in the next row.
Currently, if I use margin = T, the margin plot will be generated on the same row which makes it hard to see the overall point distribution.

Spreadsheetlike calculation in R with a growth component
I have a spreadsheet calculation that I want to transfer into R. I have a few columns and rows that I am trying to do some simple arithmetic on, and a couple of ifelse.
The short question is:
 How do I represent this data in one (or more) meaningful object(s)?
 How do I repeat this data over n years, with a growth factor for two of the columns?
Preferably with short and elegant code.
More details:
Problem 1  how to represent columns and rows. Should I use use dataframe, table, vector, matrix, or whatever? The idea was to have an object representing every column. For example to arrive at CF, I would want something representing column E and something representing S, and do E+S to arrive at CF, and then do sum(CF).
Problem 2  this represents one year. I would like to repeat the same "dataframe" over t years with a negative yearly growth n for iso and positive growth p for St. This is where things get complicated and I lose track.
The red cell is a cash flow. Once I have, say 15 years, of cash flows, I could match them against the initial investment and get the internal rate of return.
(For the sake of the example just consider all columns as fixed numbers)
I have tried brute forcing with forloops, but I ran into lots of strange stuff, and never really got far.
Does someone have any good ideas?

looping over a list of filter expressions: problem with NSE in map2 call within mutate
I have defined a list of expressions containing arguments I want to pass to a
dplyr::filter
call.library(tidyverse) # using tidyr 1.0.0 cond_filter < list(expr(1 > 0), # condition to select all rows expr(Species == "setosa"), expr(Species != "virginica"))
I further have a data frame that I put into a listcolumn and which I then expand by the number of filter expressions in said list.
iris_nest < iris %>% nest(data = everything()) %>% expand_grid(., filters = cond_filter)
In a last step I then want to filter each nested data set based on the filter expression of my list. Usually I would do this inside a pipe using
mutate
on the nested tibble and then usingmap2
to loop over the data sets and filter conditions. However, somehow the calls below are not working. It seems that I do not get the nonstandard evaluation to work correctly:# not working: map2 inside mutate with mapper function iris_nest %>% mutate(data = map2(data, filters, ~ filter(.x, !! .y))) > Error in splice(dot_call(capture_dots, frame_env = frame_env, named = named, : > object '.y' not found # not working: map2 inside mutate with anonymous function iris_nest %>% mutate(data = map2(data, filters, function(x,y) filter(x, !! y))) > Error in splice(dot_call(capture_dots, frame_env = frame_env, named = named, : > object '.y' not found
If I do it outside of the mutate call everything works fine:
# working: map2 not nested in mutate iris_nest$data < map2(iris_nest$data, iris_nest$filters, ~ filter(.x, !! .y))
I would prefer a solution where I can use
map2
insidemutate
. What would I need to change to get this running? Maybe defining a helper function could make it work, but I am somehow lacking the imagination of how such a helper function would need to look like. 
Lavaan error: number of items to replace is not a multiple of replacement length
I am a bit confused with an error message I get from running a cfa with the lavaan package:
1st I enter 3 factors (task, relationship, change) and the underlying items
yukl.model < ' task =~ df$yukl_task_1 + df$yukl_task_2 + df$yukl_task_3 + df$yukl_task_4 + df$yukl_task_5 relationship =~ df$yukl_rel_1 + df$yukl_rel_2 + df$yukl_rel_3 + df$yukl_rel_4 + df$yukl_rel_5 + df$yukl_rel_6 change =~ df$yukl_change_1 + df$yukl_change_2 + df$yukl_change_3'
And then I try to fit them. (df is the name of my data frame. The items are from that data frame)
fit < cfa(yukl.model, data=df)
But all I get is this error message: "number of items to replace is not a multiple of replacement length"
There are no NAs in my code and I don't understand where I am trying to "replace" items. Thanks for your help :)

Unable to run legacy code to prepare forecast
I am trying to run this code to prepare forecast. This is a legacy code. I am new to R. I have made some changes but unable to comprehend most of it and not able to run it. Can some one help me to understand and run it?
Summary:
This code is supposed to read the sales history of multiple products from an excel file. create a time series, then forecast using multiple methods, there are multiple variables created to store the forecast value in a Matrix. Also create the MAPE to calculate the error of the forecast and then write the forecasted value in to an excel. The excel only has 1 sheet, I want to prepare forecast of each product individually by running a loop. I am trying to use batch forecasting technique. I have around 10,000 Products to do the forecast. For this example, I have prepared a demo data set just to show the type of data. Here is the reprex of the code below. Hope to get response.
#rm(list = ls()) setwd("d:\\DemandPlanning") # install.packages("reprex") reprex::reprex() #> No input provided and clipboard is not available. #> Rendering reprex... library(readxl) library(fpp2) #> Loading required package: ggplot2 #> Loading required package: forecast #> Registered S3 method overwritten by 'xts': #> method from #> as.zoo.xts zoo #> Registered S3 method overwritten by 'quantmod': #> method from #> as.zoo.data.frame zoo #> Registered S3 methods overwritten by 'forecast': #> method from #> fitted.fracdiff fracdiff #> residuals.fracdiff fracdiff #> Loading required package: fma #> Loading required package: expsmooth library(forecast) library(devtools) #> Loading required package: usethis library(ggplot2) library(fma) library(expsmooth) library(tseries) library(timeSeries) #> Loading required package: timeDate library(ggseasonplot) #> Error in library(ggseasonplot): there is no package called 'ggseasonplot' library(data.table) library(Rcpp) library(xlsx) library(openxlsx) #> #> Attaching package: 'openxlsx' #> The following objects are masked from 'package:xlsx': #> #> createWorkbook, loadWorkbook, read.xlsx, saveWorkbook, #> write.xlsx library(naniar) #sample data for reprex sales_data < data.frame( dates = c('Week1', 'Week2', 'Week3', 'Week4', 'Week5', 'Week6', 'Week7', 'Week8', 'Week9', 'Wk10', 'Wk11'), Prod1 = c(92, 94, 60, 71, 26, 89, 52, 151, 14, 12, 19), Prod2 = c(56, 63, 90, 76, 47, 65, 65, 61, 85, 48, 64), Prod3 = c(106, 154, 118, 103, 90, 140, 100, 112, 88, 44, 80), Prod4 = c(311, 256, 231, 125, 78, 50, 32, 15, 15, 12, 10), Prod5 = c(492, 468, 391, 368, 331, 342, 343, 441, 341, 352, 376), Prod6 = c(102, 193, 223, 169, 192, 114, 269, 106, 128, 139, 118), Prod7 = c(96, 56, 23, 109, 135, 150, 132, 145, 72, 77, 155) ) #timeseries tsdataraw < ts( clean_data, start = c(2013,4),end = c(2019,3), frequency = 12) #> Error in is.data.frame(data): object 'clean_data' not found tsdata_ < matrix(NA, nrow = 72, ncol = ncol(tsdataraw)) #> Error in ncol(tsdataraw): object 'tsdataraw' not found i < 1 for (i in 1:ncol(tsdataraw)) { tryCatch({ tsdata_[, i] < tsclean(tsdataraw[, i]) }, error = function(e){cat("ERROR", "SKU",j,"ABU", i, "\n")}) } #> Error in ncol(tsdataraw): object 'tsdataraw' not found tsdata < ts(tsdata_, start = c(2013,4), end = c(2019,3), frequency = 12) #> Error in is.data.frame(data): object 'tsdata_' not found fcts < window(tsdata, start = c(2013, 4), end = c(2018, 3)) #> Error in window(tsdata, start = c(2013, 4), end = c(2018, 3)): object 'tsdata' not found fcts2 < window(tsdata, start = c(2018,4)) #> Error in window(tsdata, start = c(2018, 4)): object 'tsdata' not found fc_ets < matrix(NA, nrow = 12, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found fc_arima < matrix(NA, nrow = 12, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found fc_autoarima < matrix(NA, nrow = 12, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found fc_naive < matrix(NA, nrow = 12, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found fc_rwdrift < matrix(NA, nrow = 12, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found fc_tbats < matrix(NA, nrow = 12, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found fc_hwadditive < matrix(NA, nrow = 12, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found fc_hwmultiplicative < matrix(NA, nrow = 12, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found fc_stlf < matrix(NA, nrow = 12, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found ma_ < matrix(NA, nrow = 60, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found wma_ < matrix(NA, nrow = 60, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found fc_ma < matrix(NA, nrow = 1, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found fc_wma < matrix(NA, nrow = 1, ncol = ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found accuracy_ets < matrix(NA, nrow = ncol(fcts), ncol = 7) #> Error in ncol(fcts): object 'fcts' not found accuracy_arima < matrix(NA, nrow = ncol(fcts), ncol = 7) #> Error in ncol(fcts): object 'fcts' not found accuracy_autoarima < matrix(NA, nrow = ncol(fcts), ncol = 7) #> Error in ncol(fcts): object 'fcts' not found accuracy_drift < matrix(NA, nrow = ncol(fcts), ncol = 7) #> Error in ncol(fcts): object 'fcts' not found accuracy_naive < matrix(NA, nrow = ncol(fcts), ncol = 7) #> Error in ncol(fcts): object 'fcts' not found accuracy_tbats < matrix(NA, nrow = ncol(fcts), ncol = 7) #> Error in ncol(fcts): object 'fcts' not found accuracy_hwa < matrix(NA, nrow = ncol(fcts), ncol = 7) #> Error in ncol(fcts): object 'fcts' not found accuracy_hwm < matrix(NA, nrow = ncol(fcts), ncol = 7) #> Error in ncol(fcts): object 'fcts' not found accuracy_stlf < matrix(NA, nrow = ncol(fcts), ncol = 7) #> Error in ncol(fcts): object 'fcts' not found accuracy_ma < matrix(NA, nrow = ncol(fcts), ncol = 7) #> Error in ncol(fcts): object 'fcts' not found accuracy_wma < matrix(NA, nrow = ncol(fcts), ncol = 7) #> Error in ncol(fcts): object 'fcts' not found fitts < list(ncol(fcts)) #> Error in ncol(fcts): object 'fcts' not found mapefc < matrix(NA, nrow = ncol(fcts), ncol = 11) #> Error in ncol(fcts): object 'fcts' not found skus < matrix(NA, nrow = 12, ncol = length(sku)) #> Error in matrix(NA, nrow = 12, ncol = length(sku)): object 'sku' not found skus[, j] < c(sku[j]) #> Error in eval(expr, envir, enclos): object 'sku' not found i < 1 for (i in 1:ncol(fcts)) { if (sum(is.na(tsdata_[, i])) == 72){next} tryCatch({ fitts[[i]] < stl(fcts[, i], s.window = 36) fc_ets[, i] < forecast(fitts[[i]], method = "ets", h = 12)$mean fc_arima[, i] < forecast(fitts[[i]], method = "arima", h = 12)$mean fc_autoarima[, i] < forecast(auto.arima(fcts[, i]), h = 12)$mean fc_rwdrift[, i] < forecast(fitts[[i]], method = "rwdrift", h = 12)$mean fc_naive[, i] < forecast(fitts[[i]], method = "naive", h = 12)$mean fc_tbats[, i] < forecast(tbats(fcts[, i]), h = 12)$mean fc_hwadditive[, i] < forecast(HoltWinters(fcts[, i], seasonal = "additive"), h = 12, prediction.interval = TRUE)$mean fc_hwmultiplicative[, i] < forecast(HoltWinters(fcts[, i], seasonal = "multiplicative"), h = 12, prediction.interval = TRUE)$mean fc_stlf[, i] < stlf(fcts[, i], h = 12)$mean ma_[, i] < filter(fcts[, i], filter = (1/3)*c(0,1,1,1), sides = 1) wma_[, i] < filter(fcts[, i], filter = (1/10)*c(0,5,3,2), sides = 1) fc_ma[, i] < forecast(ma_[, i], h = 1)$mean fc_wma[, i] < forecast(wma_[, i], h = 1)$mean accuracy_ets[i, ] < accuracy(fc_ets[, i], fcts2[, i]) accuracy_arima[i, ] < accuracy(fc_arima[, i], fcts2[, i]) accuracy_autoarima[i, ] < accuracy(fc_autoarima[, i], fcts2[, i]) accuracy_drift[i, ] < accuracy(fc_rwdrift[, i], fcts2[, i]) accuracy_naive[i, ] < accuracy(fc_naive[, i], fcts2[, i]) accuracy_tbats[i, ] < accuracy(fc_tbats[, i], fcts2[, i]) accuracy_hwa[i, ] < accuracy(fc_hwadditive[, i], fcts2[, i]) accuracy_hwm[i, ] < accuracy(fc_hwmultiplicative[, i], fcts2[, i]) accuracy_stlf[i, ] < accuracy(fc_stlf[, i], fcts2[, i]) accuracy_ma[i, ] < accuracy(ma_[, i], fcts[, i]) accuracy_wma[i, ] < accuracy(wma_[, i], fcts[, i]) mapefc[i, ] < c(accuracy_ets[i, 5], accuracy_arima[i, 5], accuracy_autoarima[i, 5], accuracy_drift[i, 5], accuracy_naive[i, 5], accuracy_tbats[i, 5], accuracy_hwa[i, 5], accuracy_hwm[i, 5], accuracy_stlf[i, 5], accuracy_ma[i, 5], accuracy_wma[i, 5]) df_1 < data.frame(SKU = sales_data[, j], ABU = sales_data[i], Date = c("01042018", "01052018", "01062018", "01072018", "01082018", "01092018", "01102018", "01112018", "01122018", "01012019", "01022019", "01032019"), ETS = fc_ets[,i], Arima = fc_arima[, i], Auto.Arima = fc_autoarima[, i], Drift = fc_rwdrift[, i], Naive = fc_naive[, i], TBATS = fc_tbats[, i], HoltWinters_Additive = fc_hwadditive[, i], HoltWinters_Multiplicative = fc_hwmultiplicative[, i], STLF = fc_stlf[, i], MA = fc_ma[, i], WMA = fc_wma[, i]) df_2 < data.frame(SKU = sales_data[j], ABU = sales_data[i], Date = "MAPE", ETS = mapefc[i, 1], Arima = mapefc[i, 2], Auto.Arima = mapefc[i, 3], Drift = mapefc[i, 4], Naive = mapefc[i, 5], TBATS = mapefc[i, 6], HoltWinters_Additive = mapefc[i, 7], HoltWinters_Multiplicative = mapefc[i, 8], STLF = mapefc[i, 9], MA = mapefc[i, 10], WMA = mapefc[i, 11]) df_ < rbind(df_1, df_2) }, error = function(e){cat("ERROR", "SKU",j,"ABU", i, "\n")}) write.table(list(df_), file = "C:/Users/forecasted_data.csv", append = TRUE, quote = FALSE, sep = ",", col.names = TRUE, row.names = FALSE) } #> Error in ncol(fcts): object 'fcts' not found
^{Created on 20191115 by the reprex package (v0.3.0)} Reply

How to correctly transform variables in linear models using residual plots?
So I have a linear model as follows;
Mod=lm(PRICPTM ~ DISTANCE + PCTLOAD + ORIGIN + MARKET + DEREG + PRODUCT,data=x) summary(Mod) residualPlots(model = Mod)
Data: https://docs.google.com/spreadsheets/d/1lz5ll6w9mUSPCi6aKO1wONbr4Kj9lVezyGcH7gBg/edit?usp=sharing
The residual plots didn't look linear and about the mean. SO I transformed the dependent variable by trial and error to the following.
Mod=lm(log(PRICPTM) ~ DISTANCE + PCTLOAD + ORIGIN + MARKET + DEREG + PRODUCT,data=x) residualPlot(model = Mod)
So now this one looks good. But the partial residual plots are still off.
What I need to do is;
1) transform the independent variables to correct the model.
2) Understand if there is any standard technique available for transforming the variables by looking at the residual plots (ie, if residual plot is sloping upwards, do this, if it is exponentially increasing, do this etc...)

How do AUC/Sensitivity/Specificity values differ in a k fold cross validation vs that in a development set?
For reproducibility, I have the following code, using the TitanicSurvival data set (10 fold cross validation repeated 3 times) in R.
Also, I am new to the forums and quite new to R, so I apologize in advance if I've broken any forum guidelines. I'll correct any mistakes if noted.
library(caret) library(pROC) df < TitanicSurvival df2 < na.omit(df) #Multiple regression model on df2: titanic < glm(survived~ sex + age, family=binomial, data = df2) roc(df2$survived, titanic$fitted.values) # AUC = 0.7735 pred < ifelse(predict(titanic, df2, type="response")>0.5, 1, 0) actual < titanic$y conf_mat < table(pred, actual) sensitivity(conf_mat) # 0.8449111 specificity(conf_mat) # 0.6838407 #Now, using 10fold cross validation repeated 3 times, on the same df2 data set: ctrl < trainControl(method="repeatedcv", repeats = 3, number = 10, classProbs = TRUE, summaryFunction = twoClassSummary, savePredictions = T) model < train(survived ~ sex + age, data = df2, trControl=ctrl, method="glm", preProc = c("center", "scale"), metric="ROC") # Model ROC(AUC): 0.7738766, Sens: 0.8449321, Spec: 0.6840347
Now, my questions are:
 How are the AUC/Sens/Spec values in the cross validation model calculated? I'm trying to understand this output considering that the 10fold CV creates multiple training and testing sets.
 How do the AUC/Sens/Spec values in the cross validation model differ from that shown in the "titanic" model?
 In the medical literature, some studies perform multiple regression with kfold cross validation, without an external validation set. In this particular study, when they write "model accurately distinguished presence of HCC with cstatistics of 0.84 (95%CI 0.810.86) and 0.83 (95%CI 0.800.85) in derivation and validation cohorts (Figure 1), respectively", what does this mean? Since kfold CV creates multiple training sets and testing sets, how does one come up with these respective values? source
Sorry for the long questions, but I've been doing a lot of searching and still can't get a clear understanding.
Thanks!

Getting column names after a pipeline for transformed data
My code has both numerical and categorical variables inside. So, i used ColumnTransformer inside a pipeline with a LogisticRegression algorithm.
As a result, I can get transformed variable coefficients. I can also get transformed variable intercept. I can get predicted Y value.
But I must acquire also which features correspond to these coefficients. I couldn't make it happen, i tried every possibility. As you know fit_transform give an array without columns as a result(transformed input data, which is used for training). Without columns, it is worthless in terms of interpretation and analysis of the situation.
Can you help me improve my code to get columns list with the same order of coefficients as well as transformed data?
logit = LogisticRegression(C= 1000, penalty = 'l1', tol = 0.001) #numeric_pipe = make_pipeline(MinMaxScaler(feature_range = (1,1))) numeric_pipe = Pipeline(steps = [('minmax', MinMaxScaler(feature_range = (1,1)))]) #categoric_pipe = make_pipeline(OneHotEncoder(sparse = True, handle_unknown='ignore')) categoric_pipe = Pipeline(steps = [('onehot', OneHotEncoder(sparse = True, handle_unknown='ignore'))]) preprocessor = ColumnTransformer(transformers = [('num',numeric_pipe, num_cols), ('cat',categoric_pipe,cat_cols)]) pipe_final = Pipeline(steps = [('prep', preprocessor), ('log',logit)]) pipe_final.fit(data_x, data_y.values.ravel()) print(pipe_final.named_steps['prep'].get_feature_names()) trans = pipe_final.named_steps['prep'].fit(data_x).transform(data_x) trans_feat = pipe_final.named_steps['prep'].fit(data_x).transform(data_x) print(data_x.info()) print(confusion_matrix(data_y, pipe_final.predict(data_x))) print(classification_report(data_y, pipe_final.predict(data_x))) print(roc_auc_score(data_y, pipe_final.predict(data_x))) data_y['predicted'] = pipe_final.predict(data_x) data_final = data_x.join(data_y) print(pipe_final.named_steps['log'].classes_) print(pipe_final.named_steps['log'].coef_) print(pipe_final.named_steps['log'].intercept_)

What is the main difference between Linear regession and Logistic regression in terms of MSE,Rsquare or MAE
I have seen many answers on stack overflow and according to one of answer which I found a valid one Linear regression uses ordinary least squares method to minimise the errors and arrive at a best possible fit, while logistic regression uses maximum likelihood method to arrive at the solution. Linear regression is usually solved by minimizing the least squares error of the model to the data, therefore large errors are penalized quadratically. Logistic regression is just the opposite. Using the logistic loss function causes large errors to be penalized to an asymptotically constant.
So even though both functions are using different approach to penalize the errors, Is it possible both have huge difference in their results even though normalisation of data has been done for both classifiers (using sklearn). Result from multiple linear reg MSE 37.27418021905559 MAE  30.04448048489848 Logistic regression MSE 76.70855954013724 MAE61.96228295828829

How to plot estimates from different models in one plot
I'm performing a linear Mixed Effect for two different drugs. I used the contrast statement to determine the impact of time.
I'm trying to plot estimates from different models in one plot. This is the output of the model 1
confint(model1) Estimate lwr upr 1 == 0 2.969735 1.846697 5.786166 2 == 0 4.163577 0.646791 5.680363 3 == 0 2.193842 3.377740 5.765425
the output from model 2
Linear Hypotheses: Estimate lwr upr 1 == 0 65.0877 60.5934 65.5820 2 == 0 65.8362 62.9191 65.7532 3 == 0 0.9484 4.6095 6.1064
I'm trying to plot the estimates and CI of model 1(3 == 0 2.193842 3.377740 5.765425) and model 2(3 == 0 0.9484 4.6095 6.1064) in one plot How can you do that?

Permutation test error for likelihood ratio test of mixed model in R: permlmer, lmer, lme4, predictmeans
I would like to test the main effect of a categorical variable using a permutation test on a likelihood ratio test. I have a continuous outcome and a dichotomous grouping predictor and a categorical time predictor (Day, 5 levels).
Data is temporarily available in rda format via this Drive link.
library(lme4) lmer1 < lmer(outcome ~ Group*Day + (1  ID), data = data, REML = F, na.action=na.exclude) lmer2 < lmer(outcome ~ Group + (1  ID), data = data, REML = F, na.action=na.exclude) library(predictmeans) permlmer(lmer2,lmer1)
However, this code gives me the following error:
Error in density.default(c(lrtest1, lrtest), kernel = "epanechnikov") : need at least 2 points to select a bandwidth automatically
The following code does work, but does not exactly give me the outcome of a permutated LRtest I believe:
library(nlme) lme1 < lme(outcome ~ Genotype*Day, random = ~1  ID, data = data, na.action = na.exclude) library(pgirmess) PermTest(lme1)
Can anyone point out why I get the "epanechnikov" error when using the
permlmer
function?Thank you!

Partial R² for fixed terms in GLMM
I am trying to find a statistically sound and also correct way to calculate the effect size of fixed terms in generalized linear mixed models, some with poisson, others with gamma and others with binomial distribution.
I run my analysis using glmer() from package 'lme4', which fits via maximumlikelihood. Now, I want to calculate the effect size of each fixed term included in the models, and for this it seems that the only available way for GLMM is the r2beta() function in the 'r2glmm' package of R, which calculates standardized measures of effect size. However, in the case of GLMM this function only supports models fitted with glmmPQL(), which applies penalized quasilikelihood.
Both, models fitted with glmer() and with glmmPQL() give exactly the same estimates for each variable, but for the glmmPQL() models the Std. Error is higher. Therefore, I would like to know if it would be resonable to fit the models with glmmPQL only with the purpose of reporting effect size of fixed terms?
Here is an example of the results of both models:
###Fitted with glmer() m<glmer(Spr~crop*area + (1Population), family=poisson, data=spr_land) Random effects: Groups Name Variance Std.Dev. Population (Intercept) 7.407e17 8.606e09 Number of obs: 146, groups: Population, 6 Fixed effects: Estimate Std. Error z value Pr(>z) (Intercept) 0.97909 0.35360 2.769 0.00562 ** crop 0.09118 0.88818 0.103 0.91824 area 0.39257 0.43540 0.902 0.36725 crop:area 3.47514 1.63213 2.129 0.03324 * ### fitted with glmmPQL() m<glmmPQL(Spr~crop*area, random=~1Population, family=poisson, data=spr_land) Random effects: Formula: ~1  Population (Intercept) Residual StdDev: 3.640652e05 0.9014434 Variance function: Structure: fixed weights Formula: ~invwt Fixed effects: Spr ~ crop * area Value Std.Error DF tvalue pvalue (Intercept) 0.979089 0.3231787 140 3.0295576 0.0029 crop 0.091171 0.8117710 2 0.1123107 0.9208 area 0.392572 0.3979443 2 0.9865004 0.4279 crop:area 3.475174 1.4917642 2 2.3295733 0.1452
Additionally, in a very usefull summary about GLMM done by Ben Bolker, I read that one most parsimoniuous way to calculate effect size would be simply to calculate the correlation between the values of the predictor of interest with the prredicted values of the response.
So, any hint on which would be the most appropriate way?
Many thanks in advance!