How to subset from aov summary in R?
Maybe this is a simple question but I'm wondering how can I subset DF
and F.values
for the terms appearing in an aov
summary?
For example, using the base R builtin dataset npk
, how can I extract the residual and other DF
s and F.values
that appear in the summary of the following model:
fit < summary(aov(yield ~ block + N * P + K, data = npk)) # example is fully reproducible
P.S. I'm looking for base R solutions.
1 answer

The
fit
output is alist
oflength
1 (by checkingstr(fit)
). We extract it with[[
and then do$
or[[
to extract the componentsfit[[1]]$Df #[1] 5 1 1 1 1 14 #where 14 is the Residuals df fit[[1]]$`F value` #[1] 4.391098 12.105541 0.537330 6.088639 1.361073 NA
See also questions close to this topic

How to set defult shell in R
I need to call a
bash
script from an R session, but it seems that the default shell issh
, I obtain this, indeed:> system(sprintf("echo $0"))) sh
How can I set
bash
as default shell?Thanks!
PS: this happens from both Rstudio and from an R console (called from a bash terminal...)

regex to extract paragraph inbetween two words using R
I have a corpus of docx files that contains different sections; abstract introduction, patient and methods, results,references
I want to extract the part starts from
PATIENTS AND METHODS
and end inRESULTS
(references1).PS: the phrase "PATIENTS AND METHODS" can sometimes be "MATERIALS AND METHODS"
How to do this with REGEX or is there any other solution?

Grouping data in R by summing some columns and keeping the rest of the columns
I have a data frame structured like this :
exdataframe < data.frame(c(rep("ma1",4),rep("ma2",3),rep("ma3",2),rep("ma4",1)), c(rep("1",4),rep("2",3),rep("3",2),rep("1",1)), c(rep("xxx",4),rep("yyyy",3),rep("zz",2),rep("xxx",1)), c("20180527","20180624", "20180701" ,"20180708","20180624", "20180701" ,"20180708","20180527","20180624", "20180701"), c(112,1,3,0,0,0,3,19,45,9), c(1000,0,0,0,200,300,8,90.9,0,1)) colnames(exdataframe) < c("ID","classid","classname","date","x","y")
I want to group this data frame with by column ID while summing the columns x and y and keeping all of the columns. Wen I do :
exdataframe_gr < exdataframe %>% group_by(ID) %>% filter(x == sum(x),y == sum(y))
I am getting a data frame with only one row which is the row corresponding one entry in the original data frame. The output that I want is :
ID ClassID Classname Date X Y ma1 1 xxx "could be anything" 116 1000 ma2 2 yyyy "could be anything" 3 508 ma3 3 zz "could be anything" 64 90.9 ma4 1 xxx "could be anything" 9 1
The date column could be anyhting  I dont care about its value. My original data is much bigger than this  2000 rows, 45 columns.
I searched internet and here but could not find a similar example. Any help is appreciated as I can not find a solution.

document.getElementById.show() is not a function
I have a form and how it works is that when a user clicks a list item and selects a "size value", it is given a class of selected, and the user can submit the form.
The part that I'm trying to figure out is that when no size is selected, and thus no list item has a class of "selected", the form won't be submitted and instead the error message would show.
I'm having trouble getting this to work and the error message to show, the form still just submits. Currently I'm getting an error telling me that
document.getElementById("errormessage").show() is not a function.
Does anyone know why this is happening? And can anyone help me with my code to get it to work the way I want to?
Below is my html:
<form> <ul> <li> <ul> <li class="sizevalue"></li> <li class="sizevalue"></li> <li class="sizevalue"></li> <li class="sizevalue"></li> </ul> </li> </ul> </form> <div id="errormessage">Please select a size</div> <div class="mt10"> <input type="submit" class="modalAddToBagButton"> </div>
And my jQuery:
.on('click', '.modalAddToBagButton', function(e) { e.preventDefault(); var x = document.getElementsByClassName("esvalue"); var i = x.length; var selected = false; while(i) { if (x[i].hasAttribute("selected")) { selected = true; } } if(selected == false) { //Displays error document.getElementById("errormessage").show(); } else { $(this).closest("#dialogaddToBag").find('form').submit(); } });

Using dictionary inside a function
I found an exercise in a Udemy's course which asks me to create a function called return_day. It suggests me to use a dictionary, but I have been trying for the past two hours without success. So I passed the exercise writing:
def return_day(x): if x == 1: return "Sunday" elif x==2: return "Monday" elif x==3: return "Tuesday" elif x==4: return "Wednesday" elif x==5: return "Thursday" elif x==6: return "Friday" elif x==7: return "Saturday" return None
...but it's completely different. Could someone help me? Why the code below does not work?
def return_day(x): if x > 0 and x<=7: returnx=dict(1="Sunday",2="Monday",3="Tuesday",4="Wednesday",5="Thursday",6="Friday",7="Saturday") return None

TypeScript: function expression overloads
I have this TypeScript code which uses overloads on a function declaration. This code works as expected.
function identity(x: string): string; function identity(x: number): number; function identity(x: string  number): string  number { return x; } const a = identity('foo') // string const b = identity(1) // number const c = identity({}) // type error (expected)
I am trying to achieve the equivalent of this using function expressions instead of function declarations, however I get a type error:
/* Type '(x: string  number) => string  number' is not assignable to type '{ (x: string): string; (x: number): number; }'. Type 'string  number' is not assignable to type 'string'. Type 'number' is not assignable to type 'string' */ const identity: { (x: string): string; (x: number): number; } = (x: string  number): string  number => x;
I want to know how I can achieve the same effect of overloading the function but with function expressions.

Create loop/function to remove negative varImp results
I would like to create the loop which is going to model data, get the variable importance, assign the negative values/importance columns and filter them from the data and model it again until there are no negative values. Here below you can see the example code for creating the model and getting variable importance:
library(party) library(caret) model_cforest < cforest(drat~.,data=mtcars,controls=cforest_unbiased()) cforest_var < varImp(model_cforest,conditional=TRUE)
As we can see cforest_var gives us this table:
Overall mpg 0.009778909 cyl 0.033507134 disp 0.056359569 hp 0.000000000 wt 0.044186730 qsec 0.000000000 vs 0.000309504 am 0.050791540 gear 0.060967894 carb 0.000000000
On base of this table i would like then to remove the column vs (which has negative value) and run the
cforest
model again (and if there is again negative value, remove it and run model until there are no negative values).Final result should be a table with the most important variables.
Here is as far as i got:
removeNeg < function(data){ model_cforest < cforest(drat~., mtcars,controls=cforest_unbiased()) cforest_var < varImp(model_cforest,conditional=TRUE) varImp_neg < row.names(cforest_var)[apply(cforest_var, 1, function(u) any(u < 0))] }
but i have feeling that it is wrong direction and i stucked in one place.Thanks for help!

Statsmodel intercept is different to Seaborn lmplot intercept
What could explain the difference in intercepts between statsmodel OLS regression and also seaborn lmplot?
My statsmodel code:
X = mmm_ma[['Xvalue']] Y = mmm_ma['Yvalue'] model2 = sm.OLS(Y,sm.add_constant(X), data=mmm_ma) model_fit = model2.fit() model_fit.summary()
My seaborn lmplot code:
sns.lmplot(x='Xvalue', y='Yvalue', data=mmm_ma)
My statsmodel intercept is 28.9775 and my seaborn lmplot's intercept is around 45.5.
Questions:
 Should the intercepts be the same?
 Why might explain why these are different? (can I change some code to make it equal)
 Is there a way to achieve a plot similar to seaborn lmplot but using the exact regression results to ensure they align?
Thanks

"TypeError: can't pickle NotImplementedType objects" in KerasRegression model
I'm creating a simple regression neural network in Keras. However, when I try to run it as follows
seed = 7 numpy.random.seed(seed) dataset = numpy.loadtxt("instancesFipo.txt", delimiter=", ") #testset = numpy.loadtxt("instancesFipo6.txt", delimiter=", ") Xtrain = dataset[:,0:8] #All rows, first 8 columns Ytrain = dataset[:,8] #All rows, 9th column model = Sequential() #Create layers of neural net model.add(Dense(50, input_dim=8, kernel_initializer='normal', activation='relu')) model.add(Dense(50, kernel_initializer='normal', activation='relu')) model.add(Dense(1, kernel_initializer='normal')) #Create loss function and algorithm model.compile(loss='mean_squared_error', optimizer='adam') estimator = KerasRegressor(build_fn=model, epochs=100, batch_size=10, verbose=0) kfold = KFold(n_splits=10, random_state=seed) results = cross_val_score(estimator, Xtrain, Ytrain, cv=kfold) print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std()))
I'm getting "TypeError: can't pickle NotImplementedType objects", which is induced by the call to
cross_val_score
. Not sure what's going on. Any help would be appreciated and thanks! 
Slow subset of matrix by rownames in R
From a large matrix, I only want to keep certain rows which are included in an extra vector. This vector contains some rownames of the matrix.
I used a
microbenchmark
to compare four different solutions but I would like to know if a faster solution exist in R.Here is a reproducible example:
m < matrix(rnorm(1e8), nrow = 1e4, ncol = 1e4) rownames(m) < paste0("row_", seq(1:nrow(m))) dim(m); m[1:5, 1:5] include_list < paste0("row_", ceiling(runif(1000, 1, 1e4))) # rows to keep library(microbenchmark) microbenchmark( m1 < m[rownames(m) %in% include_list, , drop = FALSE], m2 < m[include_list, , drop = FALSE], m3 < m[match(include_list, rownames(m)), , drop = FALSE], m4 < subset(m, rownames(m) %in% include_list) )
Here are the results of the
microbenchmark
:Unit: milliseconds expr min lq mean median uq max neval cld m3 < m[match(include_list, rownames(m)), , drop = FALSE] 251.9575 258.0483 279.9949 266.1691 284.8803 422.3918 100 b m2 < m[include_list, , drop = FALSE] 251.5875 256.4012 275.9379 263.2740 277.1073 459.7414 100 b m1 < m[rownames(m) %in% include_list, , drop = FALSE] 226.1647 229.9530 239.3308 234.0090 239.8762 305.8925 100 a m4 < subset(m, rownames(m) %in% include_list) 227.7144 230.8488 242.3036 234.8678 239.1995 388.8809 100 a

How to find a family in a group of individuals
Say I have a group S of potential family members. I define a family as a set of individuals, which contains one or two adults (over 18 years old) and at most 9 children (under 24 years old), but with the condition that every children has to be at least 15 years younger than both of the adults.
An example:
id: 1 age: 14 id: 2 age: 25 id: 4 age: 6 id: 5 age: 35 id: 6 age: 50 id: 7 age: 44
potential families would be:
{1, 4, 5, 6}, {1, 4, 5, 7}, {1, 4, 6, 7}, {14, 25, 6, 6, 7}, ...
I want to find every family subset of the set. I don't really know how to proceed because of this age interval, which is specific for each pair of individuals. I don't really know how to form subsets from those.
In a next step I would choose the family with the most members. Thank you in advance

How do I delete rows within a subset of a dataframe
I am filtering data for analysis and stumbled upon a problem I can not find a solution for. I did look into the prepdatpackage but it does not seem to satisfy my needs. My dataframe(df) consists of reaction times of several participants measured over 4 blocks. To filter out outliers I need to apply a (mean +/ 2.5 sd)rule for every block of each participant.
I tried creating my own function in order to apply this rule to every subsection (for each block of every participant seperatly) of my dataframe. I created the function below so I can use it with a for loop (this loop might not be optimal in R, but that is not the main concern here):
filter < function(subject, block){ m < mean(df[df$subj == subject & df$block == block,3]) stdv< sd(df[df$subj == subject & df$block == block,3]) lowerbound < m  2.5 * stdv upperbound < m + 2.5 * stdv outliers < which((df[(df$subj == subject & df$block == block),3] <= lowerbound df[(df$subj == subject & df$block == block),3] >= upperbound)) #Here I retrieve the index for all the rows I need to eliminate df << df[c(outliers), ] }
I can't get my head around this indexing. For the first block of the first subject there seems to be no problem, and the function deletes the right rows. But for the next blocks (and subjects) 'outliers' also consists of the right indexes of the subset (subject and block) I ask to "select" in the function, but when I try to eliminate the rows by it, it looks like the indexes are applied to the indexes of my whole dataframe and not on the specific subset of the subject and block I used in my function. Is there something I am missing, or not (yet) aware of to use? Or is my overall way of thinking wrong??(I am still adapting to R)
subj block rt 1 1 2 345 2 1 2 118 3 1 2 302 4 1 2 698 5 1 2 154 6 2 3 347 7 2 3 391 8 2 3 414 9 2 3 427 10 2 3 369 11 6 1 685 12 6 1 369 13 6 1 457 14 6 1 566 15 6 1 542

split file ANOVA in r with result
I am new to R and am trying to run a oneway ANOVA with split file option (similar to SPSS). The dataset file is called
rdatasetnew
. The three variables of interest are:orderNo
: the person identifier
rating
: satisfaction rating for each order
design
: website design usedThe split file is done by
orderNo
as shown in the dataset below. I have used this Q&A to develop my split file syntax (Perform an ANOVA for each individual level of a factor in R)The syntax I have developed looks as follows:
lapply(split(rdatasetnew, rdatasetnew$orderNo), aov, formula = rating ~ design)
The above script offers sum of squares, degrees of freedom and residual standard error for each person. However, I want to get a summary table and mean data for each person. How can I do so?
orderNo rating design 1123 1 Traditional Modern 1123 8 Traditional Modern 1123 1 Modern 1123 9 Modern 1123 8 Modern 1124 1 Modern 1124 10 Traditional Modern 1124 3 Traditional Modern 1124 10 Traditional 1124 8 Modern Extreme 1124 10 Traditional Modern 1168 6 Traditional Modern 1168 2 Traditional Modern 1168 10 Traditional Modern 1168 5 Modern 1168 8 Traditional Modern 1168 7 Traditional Modern 1168 2 Traditional Modern

GamesHowell PostHoc Test in userfriendlyscience package: no output, just NA
I am running a oneway ANOVA in R and the homogeneity of variance assumption is violated. After running a signficant Welch's test, I am trying to run GamesHowell Post Hoc test in R. I am using the oneway function in the userfriendlyscience package. When I run my code, the oneway ANOVA runs, but the post hoc test output does not return anything other than NA. Any suggestions on how to fix my code?
Sample Data
>head(dat) Region Composite 1 4.14 2 3.54 2 3.75 3 3.06 1 4.49 2 3.99
My output:
diff ci.lo ci.hi t df p 21 NA NA NA NA NA NA 31 NA NA NA NA NA NA 32 NA NA NA NA NA NA
My code:
library(userfriendlyscience) dat$Region < as.factor(dat$Region) one.way < oneway(y = dat$Composite, x = dat$Region, posthoc = "games howell") one.way

Recreate various NonParametric ANCOVA analyses in R
I am looking to recreate various analyses in R that can compute several types of NonParametric ANCOVA.
Let's use the
mtcars
data from thedatasets
package in R for example purposes. Let's say I wanted to predictMPG
fromTransmission
while controlling forCylinders
. I would conduct a normal ANCOVA in R with the following code:summary(aov(mpg ~ cyl + am, mtcars))
That part is easy. Here is where things get a bit tricky for me (given that I am not statistics savvy). I have been reading several articles that talk about different approaches to NonParametric ANCOVA. For example, on page 334 of this article published by Lawson (1983) describes three different approaches of NonParametric ANCOVA:
 Parametric ANCOVA on Ranks
 Quade's NonParametric ANCOVA
 Puri and Sen's NonParametric ANCOVA
I think I am on the right track recreating the first two with the following code:
summary(aov(rank(mpg) ~ rank(cyl) + am, mtcars)) ## Ranks summary(aov(lm(rank(mpg) ~ rank(cyl), mtcars)$residuals ~ am, mtcars)) ## Quade
However, I am at a loss when it comes to recreating the Puri and Sen's NonParametric ANCOVA. I did come across this article. On page 374 where the paragraph talks about Quade's work, the author mentions a procedure that is slightly different from Quade's version of ANCOVA. I wonder if this is the Puri and Sen approach or something different? I recreated what was described below. It appears to be a modified version of the above Quade code.
summary(lm(rank(mpg) ~ rank(cyl) + am, mtcars)) ## Puri and Sen?
Just to summarize, I am exploring different ways to run a NonParametric ANCOVA, and I would like to recreate both Quade's and Puri and Sen's procedures in R. If there are any other methods, I would also be interested in exploring those.
Thank you