In R, how do I find outliers in a multivariate regression?
My dataset is called "data" and my model name is m1. It includes 3 variables, called Rating, Reviews, and Mgb. Reviews is an integer value, whereas Rating and Mgb are both float values. My data is in a .csv format excel spreadsheet.
For a homework assignment, I need to find which rows in the spreadsheet are outliers without editing them in any way. In the next two questions, I need to find high leverage points and find influential points using Cook's distance. (I've already done that last one.) How do I do the first two?
See also questions close to this topic

(R) Adding rows to a data frame using cor() is not working properly
I am trying to create a neat list of correlation values, which I then want to sort. However, when I try to produce this list, I obtain a list with the correlation variables pasted around my text.
I am using a for loop that is based on two 'if' conditions. When the condition is true, I want to add the correlation value to the data frame. If it is not true, the value has to be 0. I have tried to use rbind to add all rows together after running through one of the two conditions.
However, I get an error warning when I want to cbind correlation values alongside fixed "0" values (e.g. when the condition is false).
My code is currently as follows:
for (i in listLength) {
if (cases[i, 2] > threshold) { monitor < subset(idFiles, ID == i); monitor < na.omit(monitor); corl < cor(monitor["sulfate"], monitor["nitrate"]); } else { corl < 0; } frame < rbind(frame, corl); }
print(frame);
When I run the function now, the output is, as mentioned, a list of values, but alas without a clean index. The text of the correlation function (nitrate and sulfate), surrounds the index value and makes the table difficult to interpret. As well, I cannot run the sort() function on this table, to sort the values from e.g. high to low, based on the index number.
Could anyone provide me with a suggestion on how to obtain a tidy list with indexes for the correlation values, which can be sorted from e.g. increasing to decreasing values, while keeping the link to the index number?
Thanks in advance for any help!

Tukey test after LMM keeping contrasts
I want to test a 2x3 factorial design and contrasted the variables like this
my.helmert = matrix(c(2, 1, 1, 0, 1, 1), ncol = 2) contrasts(Target3$mask) = my.helmert contrasts(Target3$length)
So for mask I want to compare the first group with the average of the two other groups and in a second step the second with the third group.
This works fine in my LMM
Target3.2_TT.lmer = lmer(logTotalTime ~ mask*length+ (1+lengthSubject) +(1Trialnum), data = Target3)
There is a significant interaction between mask and length, thatÂ´s why I want to take a look at this effect and calculate a post hoc test (Turkey) like this:
emmeans(Target3.2_TT.lmer, pairwise ~ mask : length)
This also works pretty fine with one problem: now my contrasts are gone. The text calculates the differences for all masks and not just 1 vs. 2 and 3 and 2 vs. 3. Is there a possibility to keep my contrasts in the Post hoc test?

RMarkdown: How can I wrap a table (made with kable) in text when knitting to PDF
I'm trying to wrap a table in text and ran into multiple problems:
When trying to just simply position the figure to the left, the caption does not get leftaligned as well and stays centered (which defeats the purpose of left aligning):
```{r tableVerbrauchUSB} kable(USB_Summary2018, booktabs = TRUE, caption = 'Antibiotikaverbrauch in DDDs des Jahres 2018')%>% kable_styling(latex_options = "hold_position", position = "left") ```
When trying to use the
"float_left"
command as specified in the kable_styling options, it looks like this (note: I had to delete thelatex_options = "hold_position
, it wouldn't knit otherwise):I have no idea what is happening to my table with
float_left
, but it seems to completly break it.If important, my yamlheader:
 output: bookdown::pdf_document2: fig_caption: yes number_sections: yes toc: no lang: de geometry: margin = 1in fontsize: 11pt headerincludes:  \usepackage{fancyhdr}  \usepackage[doublespacing]{setspace} #Options: singlespacing, onehalfspacing, doublespacing  \usepackage{chngcntr}  \counterwithout{figure}{section}  \counterwithout{table}{section}  \usepackage{microtype}  \usepackage{amsmath}  \usepackage{float}  \floatplacement{figure}{H}  \floatplacement{table}{H}  \usepackage{wrapfig}  \setlength{\parindent}{1cm} 
Can anybody tell me how I can have a table that is wrapped in text?

Remove outliers in an image after applying treshold
Here`s the deal. I want to create a mask that visualizes all the changes between two images (GeoTiffs which are converted to 2D numpy arrays).
For that I simply subtract the pixel values and normalize the absolute value of the subtraction:
Since the result will be covered in noise, I use a treshold and remove all pixels with a value below a certain limit.
def treshold(array, thresholdLimit): print("Treshold...") result = (array > thresholdLimit) * array return result
This works without a problem. Now comes the issue. When applying the treshold, outliers remain, which is not intended:
What is a good way to remove those outliers? Sometimes the outliers are small chunks of pixels, like 56 pixels together, how could those be removed?
Additionally, the images I use are about 10000x10000 pixels.
I would appreciate all advice!
EDIT:
Both images are landsat satelite images, covering the exact same area. The difference here is that one image shows cloud coverage and the other one is free of clouds. The bright snakey line in the top right is part of a river that has been covered by a cloud. Since water bodies like the ocean or rivers are depicted black in those images, the difference between the bright cloud and the dark river results in the river showing a high degree of change.
I hope the following images make this clear:
I also tried to smooth the result of the tresholding by using a median filter but the result was still covered in outliers:
from scipy.ndimage import median_filter def filter(array, limit): print("MedianFilter...") filteredImg = np.array(median_filter(array, size=limit)).astype(np.float32) return filteredImg

Delete outliers
I have a large data set with over 2000 observations. The data involves toxin concentrations in animal tissue. My response variable is
myRESULT
and I have multiple observations perANALYTE
of interest. I need to remove the outliers, as defined by numbers more than three SD away from the mean, from within eachANALYTE
group.While I realize that I should not remove outliers from a dataset normally, I would still like to know how to do it in R.
Here is a small portion of what my data look like:

Injecting Outliers in the dataset in r
I am working on generating a dataset with n=20 in a linear regression y=b0+b1x+e* (i am not sure whether i should include the error term in my code).
 x and y are normally distributed with mean 0 and standard deviation 1.
 the error term e is also said to be normally distributed with mean 0 and sd 1, BUT with 10% identical outliers in the y direction
My code starts with this
n11 < 20 m1 < 0 sd1< 1 b0 < 0 b1 < 1 x < rnorm(n11,m1, sd1) y < b0 + b1*x + e11 e11 < rnorm(n11,m1, sd1) data11<data.frame(y,x,e11,b0,b1) model1<lm(y~x, data=data11)
I don't know how and where I should put in code the said 10% identical outliers in the y direction I need help. Thank you so much.