mlr: creating plotBMRBoxplots for only one of the learner
does anyone know whether it is possible to create the plots integrated in the mlr
package for only one of the learners?
For example:
BMR_Boxplot < plotBMRBoxplots(bmr, measure = mse)
BMR_Boxplot
Looking at the arguments, I don't see the possibility to choose one specific learner  is there any known workaround?
Many thanks!
1 answer

If you subset your bmr object to the results of only one learner, it is easily possible.
Maybe would be nice to have this as feature. Example code for subsetting to the first learner:
lrns = list(makeLearner("classif.lda"), makeLearner("classif.rpart")) tasks = list(iris.task, sonar.task) rdesc = makeResampleDesc("CV", iters = 5L) meas = list(acc, ber) bmr = benchmark(lrns, tasks, rdesc, measures = meas) bmr$results[[2]] = NULL bmr$learners[[2]] = NULL plotBMRBoxplots(bmr, ber, style = "violin")
See also questions close to this topic

Rcpp::Function in parellel for section
i am trying to parallelize for cycle computing fitness value of individuals. For this whole algorithm i am using Rcpp, but fitness function is passed from R.
So i am trying to do something like this
#pragma omp parallel for for (int i = 0; i < population.size(); i++) { population[i].computeFitness(FitnessFunction); }
,where FitnessFunction is Rcpp::Function and computeFitness is just class function essentially assigning computed value to member variable.
void computeFitness(Rcpp::Function optFunction) { this>_fitness = Rcpp::as<double>(optFunction(this>_coords)); }
But this crashes, because, as i now know, R is singlethreaded and i cannot use any any underlying R instances in parallel sections.
So is there any way to convert Rcpp::Function to either std::function, functor or something similar? Is there any other way to pass function from R to Rcpp, that would allow me to parallelize computation of this fitness value?
This whole work is for creating parallel optimization package of Moth Search Algoritm for CRAN.
Basically same code in c++ with std::function works well. Also Rcpp code works fine without it being parallel.

lapply on xml format (R language)
The code works in other API but not this one.
Normally it will turn every data in "https://api.coinnest.co.kr/api/pub/trades?coin=btc&since=710000" into a data.frame like this:
date price amount tid type 1535599861 5750000 0.0342 710001 buy 1535599854 5750010 0.0312 710002 sell 1535599832 5750030 0.0442 710003 buy
code:
library(dplyr) library(httr) query="https://api.coinnest.co.kr/api/pub/trades?coin=btc" real=paste(query,"&since=710000",sep="") out=content(GET(url=real)) coinnest < lapply(out, function(x) { df < data.frame(date=x$date,price= x$price,tid=x$tid, amount = x$amount,type=x$type)}) %>% bind_rows()

Reactively filtering/subsetting a data frame in shiny
I want to filter a data frame such that I only choose rows where a categorical variable contains values chosen by the user of the shiny app.
Here is a reproducible example:
## app.R ## library(shiny) library(shinydashboard) library(tidyverse) library(DT) ui < dashboardPage( dashboardHeader(), dashboardSidebar( selectInput("cut", "Type of cut", c("All",as.character(unique(diamonds$cut))), selected = "All", multiple = TRUE) ), dashboardBody( DT::dataTableOutput("table") ) ) server < function(input, output) { selectdata < reactive({ diamonds %>% filter(ifelse(any(input$cut == "All"), cut %in% unique(diamonds$cut), cut %in% input$cut)) }) output$table < DT::renderDT({ selectdata() }, options = list(scrollX = TRUE)) } shinyApp(ui, server)
The app runs without error, but when a user removes "All" and chooses e.g. "Premuim" and "Good" nothing shows up. However, when a user chooses "Ideal" all of the rows show up. I cannot seem to see what I am doing wrong.

cannot visualise using ggplot with NA values
I am trying to create a stacked bar chart showing % frequency of occurrences by group
library(dplyr) library(ggplot2) brfss_2013 %>% group_by(incomeLev, mentalHealth) %>% summarise(count_mentalHealth=n()) %>% group_by(incomeLev) %>% mutate(count_inc=sum(count_mentalHealth)) %>% mutate(percent=count_mentalHealth / count_inc * 100) %>% ungroup() %>% ggplot(aes(x=forcats::fct_explicit_na(incomeLev), y=count_mentalHealth, group=mentalHealth)) + geom_bar(aes(fill=mentalHealth), stat="identity") + geom_text(aes(label=sprintf("%0.1f%%", percent)), position=position_stack(vjust=0.5))
However, this is the traceback I receive:
1. dplyr::group_by(., incomeLev, mentalHealth) 8. plyr::summarise(., count_mentalHealth = n()) 9. [ base::eval(...) ] with 1 more call 11. dplyr::n() 12. dplyr:::from_context("..group_size") 13. `%%`(...) In addition: Warning message: Factor `incomeLev` contains implicit NA, consider using `forcats::fct_explicit_na` >
Here is a sample of my data
brfss_2013 < structure(list(incomeLev = structure(c(2L, 3L, 3L, 2L, 2L, 3L, NA, 2L, 3L, 1L, 3L, NA), .Label = c("$25,000$35,000", "$50,000$75,000", "Over $75,000"), class = "factor"), mentalHealth = structure(c(3L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("Excellent", "Ok", "Very Bad"), class = "factor")), row.names = c(NA, 12L ), class = "data.frame")
Update:
Output of str(brfss_2013):
'data.frame': 491775 obs. of 9 variables: $ mentalHealth: Factor w/ 5 levels "Excellent","Good",..: 5 1 1 1 1 1 3 1 1 1 ... $ pa1min_ : int 947 110 316 35 429 120 280 30 240 260 ... $ bmiLev : Factor w/ 6 levels "Underweight",..: 5 1 3 2 5 5 2 3 4 3 ... $ X_drnkmo4 : int 2 0 80 16 20 0 1 2 4 0 ... $ X_frutsum : num 413 20 46 49 7 157 150 67 100 58 ... $ X_vegesum : num 53 148 191 136 243 143 216 360 172 114 ... $ sex : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 2 2 2 1 2 ... $ X_state : Factor w/ 55 levels "0","Alabama",..: 2 2 2 2 2 2 2 2 2 2 ... $ incomeLev : Factor w/ 4 levels "$25,000$35,000",..: 2 4 4 2 2 4 NA 2 4 1 ...

How can I include several qqplots with ggplot2 using facet_wrap
I am using this code, and I can get the qqplot for each column of my data. However, I do not know how to combine the
qqplots
together byfacet_wrap
withggplot2
The data
> head(X) V1 V2 V3 V4 V5 1 1 1889 1651 1561 1778 2 2 2493 2048 2087 2197 3 3 2119 1700 1815 2222 4 4 1645 1627 1110 1533 5 5 1976 1916 1614 1883 6 6 1712 1712 1439 1546
I need to to do a qqplot for V2,V3,V4,V5 I am using this code for each variable
q1<qqnorm(X$V2, pch = 20, main="QQPlot for V2")
Also I can do it by this
ggplot(X, aes(sample = V2)) +stat_qq() + stat_qq_line()
I do not know how to use facet in ggplot2 to combine all the qqplots

ggplot geom vline between two dates on x axis
Is there any way to place geomvline between two dates on x axis. For example like red line in below picture.
My data is something like this below where can be dynamic in length.
df < data.frame(Date= seq(as.Date("20190111"), as.Date("20190120"), by="days") ,value = runif(10, 0, .99)) ggplot(data = df , aes(x = Date, y = value)) +geom_line()+ scale_x_date(date_labels= "%d%b%y",date_breaks ="1 day")+ geom_vline(aes(xintercept=df[["Date"]][5]),linetype="dotted",col="blue", size=1.5)
I tried with position dodge

Correct usage of mlr::mergeBenchmarkResults
I'm trying to resolve an error I'm facing with
mlr::mergeBenchmarkResults
, which is:Error in mergeBenchmarkResults(bmrs = list(bmr, bmr_no_mos)): The following task  learner combination(s) occur either multiple times or are missing: * wnv_no_mos  rpart * wnv_no_mos  rf * wnv  rf_no_mos * wnv  xgb_no_mos * wnv  extraTrees_no_mos Traceback: 1. mergeBenchmarkResults(bmrs = list(bmr, bmr_no_mos)) 2. stopf("The following task  learner combination(s) occur either multiple times or are missing: \n* %s\n", . msg)
In a nutshell, my desire is...
I have two tasks, two learner sets, and two benchmarking objects. I wish to combine the two benchmarking objects using
mlr::mergeBenchmarkResults
.tsk = makeClassifTask(data = wnv, target = "y", positive = "Infected") lrns = list(makeLearner(id = "rpart", cl = "classif.rpart", predict.type = "prob"), makeLearner(id = "rf", cl = "classif.randomForest", predict.type = "prob")) bmr = benchmark(learners = lrns, tasks = tsk, resamplings = rdesc, measures = meas, show.info = TRUE) tsk_no_mos = makeClassifTask(data = wnv_no_mos, target = "y", positive = "Infected") lrns_2 = list(makeLearner(id = "rf_no_mos", cl = "classif.randomForest", predict.type = "prob"), makeLearner(id = "xgb_no_mos", cl = "classif.xgboost", predict.type = "prob", nthread=25), makeLearner(id = "extraTrees_no_mos", cl = "classif.extraTrees", predict.type = "prob", numThreads = 25)) bmr_no_mos = benchmark(learners = lrns_2, tasks = tsk_no_mos, resamplings = rdesc, measures = meas, show.info = TRUE) mergeBenchmarkResults(bmrs = list(bmr, bmr_no_mos))
What am I doing wrong?

MLR package R  Change formula for GLM
I am new to the mlr package. I am trying to change formula for my glm that I am fitting using the mlr package.
I am fitting my logistic regression using the below code.
#logistic regression logistic.learner < makeLearner("classif.logreg",predict.type = "response") #cross validation (cv) accuracy cv.logistic < crossval(learner = logistic.learner, task = trainTask, iters = 3,stratify = TRUE, measures = acc, show.info = F) #cross validation accuracy cv.logistic$aggr cv.logistic$measures.test #train model fmodel < train(logistic.learner,trainTask) getLearnerModel(fmodel)
The following is my output. Clearly not all features are important and I only want to use a few by tweaking my glm formula. But I don't know how to change that setting using the mlrpackage.
> summary(fmodel$learner.model) Call: stats::glm(formula = f, family = "binomial", data = getTaskData(.task, .subset), weights = .weights, model = FALSE) Deviance Residuals: Min 1Q Median 3Q Max 2.3484 0.3611 0.5153 0.7130 2.5401 Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 3.024e+00 1.137e+00 2.660 0.007810 ** GenderMale 2.469e03 3.027e01 0.008 0.993492 MarriedYes 5.911e01 2.558e01 2.311 0.020851 * Dependents1 4.398e01 3.005e01 1.463 0.143402 Dependents2 3.120e01 3.517e01 0.887 0.374985 Dependents3 8.299e03 4.246e01 0.020 0.984407 EducationNot Graduate 4.421e01 2.663e01 1.660 0.096877 . Self_EmployedYes 3.111e02 3.250e01 0.096 0.923736 ApplicantIncome 3.549e05 4.886e05 0.726 0.467542 CoapplicantIncome 3.083e05 6.131e05 0.503 0.615105 LoanAmount 2.748e03 2.756e03 0.997 0.318682 Loan_Amount_Term 2.254e03 2.281e03 0.988 0.322916 Credit_History1 4.066e+00 4.373e01 9.296 < 2e16 *** Property_AreaSemiurban 9.163e01 2.725e01 3.362 0.000774 *** Property_AreaUrban 2.191e01 2.642e01 0.829 0.406880 Gender.dummy1 2.910e01 7.389e01 0.394 0.693675 Dependents.dummy1 2.670e01 8.188e01 0.326 0.744307 Self_Employed.dummy1 1.584e01 4.418e01 0.358 0.719984 LoanAmount.dummy0 9.821e01 5.160e01 1.903 0.056996 . Loan_Amount_Term.dummy1 9.370e01 8.666e01 1.081 0.279623 Credit_History.dummy1 1.271e01 3.675e01 0.346 0.729438 Income_by_loan 2.112e03 5.259e03 0.402 0.687931 Loan_amount_by_term 1.978e01 2.570e01 0.770 0.441523  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 762.89 on 613 degrees of freedom Residual deviance: 551.83 on 591 degrees of freedom AIC: 597.83 Number of Fisher Scoring iterations: 5
Any help would be appreciated
Thanks!

In nested resampling, classification accuracy results change wildly
Using mlr package in R, I am creating random forest models. To evaluate classification accuracy of the model I am using nested resampling as described in here. My problem is that classification accuracies of the random forest models within the inner loops are usually 15% higher than the outer loop results. I am observing classification accuracies of ~85% within the inner loop, but the accuracy of the outer loop usually ends up around 70%. I cannot provide data here but I am pasting the code I am using.
How is that possible? What may be the reason?
rf_param_set < makeParamSet( ParamHelpers::makeDiscreteParam('mtry', values = c(3, 7, 14)), ParamHelpers::makeDiscreteParam('ntree', values = c(1000, 2000)) ) rf_tune_ctrl < makeTuneControlGrid() rf_inner_resample < makeResampleDesc('Bootstrap', iters = 5) acc632plus < setAggregation(acc, b632plus) rf_learner < makeTuneWrapper('classif.randomForest', resampling = rf_inner_resample, measures = list(acc), par.set = rf_param_set, control = rf_tune_ctrl, show.info = TRUE) # rf_outer_resample < makeResampleDesc('Subsample', iters = 10, split = 2/3) rf_outer_resample < makeResampleDesc('Bootstrap', iters = 10, predict = 'both') rf_result_resample < resample(rf_learner, clf_task, resampling = rf_outer_resample, extract = getTuneResult, measures = list(acc, acc632plus), show.info = TRUE)
You can the resulting output below.
Resampling: OOB bootstrapping Measures: acc.train acc.test acc.test [Tune] Started tuning learner classif.randomForest for parameter set: Type len Def Constr Req Tunable Trafo mtry discrete   3,7,14  TRUE  ntree discrete   1000,2000  TRUE  With control class: TuneControlGrid Imputation value: 0 [Tunex] 1: mtry=3; ntree=1000 [Tuney] 1: acc.test.mean=0.8415307; time: 0.1 min [Tunex] 2: mtry=7; ntree=1000 [Tuney] 2: acc.test.mean=0.8405726; time: 0.1 min [Tunex] 3: mtry=14; ntree=1000 [Tuney] 3: acc.test.mean=0.8330845; time: 0.1 min [Tunex] 4: mtry=3; ntree=2000 [Tuney] 4: acc.test.mean=0.8415809; time: 0.3 min [Tunex] 5: mtry=7; ntree=2000 [Tuney] 5: acc.test.mean=0.8395083; time: 0.3 min [Tunex] 6: mtry=14; ntree=2000 [Tuney] 6: acc.test.mean=0.8373584; time: 0.3 min [Tune] Result: mtry=3; ntree=2000 : acc.test.mean=0.8415809 [Resample] iter 1: 0.9961089 0.7434555 0.7434555 [Tune] Started tuning learner classif.randomForest for parameter set: Type len Def Constr Req Tunable Trafo mtry discrete   3,7,14  TRUE  ntree discrete   1000,2000  TRUE  With control class: TuneControlGrid Imputation value: 0 [Tunex] 1: mtry=3; ntree=1000 [Tuney] 1: acc.test.mean=0.8479891; time: 0.1 min [Tunex] 2: mtry=7; ntree=1000 [Tuney] 2: acc.test.mean=0.8578465; time: 0.1 min [Tunex] 3: mtry=14; ntree=1000 [Tuney] 3: acc.test.mean=0.8556608; time: 0.1 min [Tunex] 4: mtry=3; ntree=2000 [Tuney] 4: acc.test.mean=0.8502869; time: 0.3 min [Tunex] 5: mtry=7; ntree=2000 [Tuney] 5: acc.test.mean=0.8601446; time: 0.3 min [Tunex] 6: mtry=14; ntree=2000 [Tuney] 6: acc.test.mean=0.8586638; time: 0.3 min [Tune] Result: mtry=7; ntree=2000 : acc.test.mean=0.8601446 [Resample] iter 2: 0.9980545 0.7032967 0.7032967 [Tune] Started tuning learner classif.randomForest for parameter set: Type len Def Constr Req Tunable Trafo mtry discrete   3,7,14  TRUE  ntree discrete   1000,2000  TRUE  With control class: TuneControlGrid Imputation value: 0 [Tunex] 1: mtry=3; ntree=1000 [Tuney] 1: acc.test.mean=0.8772566; time: 0.1 min [Tunex] 2: mtry=7; ntree=1000 [Tuney] 2: acc.test.mean=0.8750990; time: 0.1 min [Tunex] 3: mtry=14; ntree=1000 [Tuney] 3: acc.test.mean=0.8730733; time: 0.1 min [Tunex] 4: mtry=3; ntree=2000 [Tuney] 4: acc.test.mean=0.8782829; time: 0.3 min [Tunex] 5: mtry=7; ntree=2000 [Tuney] 5: acc.test.mean=0.8741619; time: 0.3 min [Tunex] 6: mtry=14; ntree=2000 [Tuney] 6: acc.test.mean=0.8687918; time: 0.3 min [Tune] Result: mtry=3; ntree=2000 : acc.test.mean=0.8782829 [Resample] iter 3: 0.9902724 0.7329843 0.7329843