How to make multiple bar graphs for factors in R
I would love to make a figure like what I have for my numeric features
hist(df[ , purrr::map_lgl(df, is.numeric)])
If I try to do the same thing with factors
hist(df[ , purrr::map_lgl(df[,interest_factors], is.factor)])
I get
Any suggestions? I just want to quickly view them
Thanks
See also questions close to this topic

how can we highlight specific letters and save it to xls
I have a data like this
df<structure(list(X = structure(3:1, .Label = c("CQLSKGQSYSVNVTFTSNIQSKSSKAVVHGILMGVP", "KLALQLHPDRNPDDPQAQEKFQDLGAAYEVLSDSEKRKQYD", "MVEAIVEFDYQAQHDDELTISVGEIITNIRKEDGGWW" ), class = "factor")), class = "data.frame", row.names = c(NA, 3L))
I am trying to highlight letter D in each of the row and then save it in xlsx. I tried to use this Excel Cell Coloring using xlsx
library(xlsx) # To export the data sheetname < "mysheet" write.xlsx(d, "mydata.xlsx", sheetName=sheetname) file < "mydata.xlsx" # I want to color special letters in one cell wb < loadWorkbook(file) # load workbook fo1 < Fill(foregroundColor="blue") # create fill object # 1 cs1 < CellStyle(wb, fill=fo1) # create cell style # 1 sheets < getSheets(wb) # get all sheets sheet < sheets[[sheetname]] # get specific sheet rows < getRows(sheet, rowIndex=2:(nrow(df)+1)) # get rows # 1st row is headers cells < getCells(rows, colIndex = 2:cols) # get cells values < lapply(cells, getCellValue) # extract the cell values
Next we find the letter that need to be highlighted
# find cells meeting conditional criteria D highlightblue < NULL for (i in names(values)) { x < as.character(values[i]) if (x == D && !is.na(x)) { highlightblue < c(highlightblue, i) } }
apply the formatting and save the workbook.
lapply(names(cells[highlightblue]), function(ii) setCellStyle(cells[[ii]], cs1)) saveWorkbook(wb, file)
However, I cannot figure out how to color a letter inside a cell.

Horizontal standard error bars on bar graphs with negative values
I have a bar graph that looks this:
and I am trying to get standard error bars on it  so two standard error bars for each column (one for the positive Y, and one for the negative N). I am aware of geom_errorh, but I cannot get it to work for this type of bar graph. Here is a reproducible example with the code that I used to get a bar chart like the one above:
Dataframe
Behavior<as.character(c("Hammock","Hammock","Climbing Trees","Climbing Trees","Structures","Structures","Grade","Grade")) Presence<c("Y","N","Y","N","Y","N","Y","N") Mean<as.numeric(c("18.5", "6.4","3.5","6.8","13.2","10.1","4.7","2.3")) SD<as.numeric(c("17.6","11.9","1.2","4.4","3.6","6.25","1.23","0.4")) DF<data.frame(Behavior,Presence,Mean,SD)
Coord Flip Geom Bar
brks < seq(20, 20, 2) lbls = paste0(as.character(c(seq(20, 0, 2), seq(2, 20, 2))), "") ggplot(DF, aes(x = Behavior, y = Mean, fill = Presence )) + geom_bar(data = subset(DF, Presence == "N"), stat = "identity") + geom_bar(data = subset(DF, Presence == "Y"), stat = "identity") + scale_y_continuous(breaks = brks,labels = lbls) + scale_fill_manual(values=c("#0b6bb6", "#6eaf46"),name="", breaks=c("N", "Y"),labels=c("N", "Y"))+ coord_flip()+ theme_bw()+ xlab("Pen Characteristic  Behavior")+ ylab("Average Behavior per Session")+
Is it possible to get the SE bars on this type of graph?
Thanks!

Triangulation  Python
Triangulation is the process of locating an unknown point given two known points and distances from those known points.
Consider two points (x1,y1) and (x2,y2).
Let (x,y) be an unknown point whose distances to points 1 and 2 d1 and d2, respectively, are known. The Pythagorean theorem allows us to express x and y (the coordinates of the unknown point) in terms of (x1,y1), (x2,y2), d1, and d2:
Inverting these two formulae to make them explicit in x and y is really a lot of fun, but it takes a while. Here is the final result:
Note that the two possible values for x, x+, and x arise from the positive and negative sense of the square root, and the value of y derives from its particular x. So because of the power of 2 in the Pythagorean theorem, we find that there are two possible points, (x+, y+) and (x, y), that are each d1 from (x1, y1) and d2 from (x2, y2). "Triangulation" refers to the fact that we need a third known point to decide which of (x+, y+) and (x, y) is our soughtafter point. So long as that third point (x3, y3), is closer to (x+, y+) than it is to (x, y) (or vice versa), we can choose the final unknown point to be the one that is closer to (x3, y3).
These formulae can be expressed as functions.
import numpy as np def a( d1, d2, x1, y1, x2, y2 ): numerator=d1**2d2**2  ((x1**2+y1**2)(x2**2+y2**2)) denominator=2*(y2y1) return numerator/denominator def b( x1, y1, x2, y2 ): return (x2x1)/(y2y1) def solve_xy( x1, y1, x2, y2, d1, d2 ): bb=b(x1,y1,x2,y2) aa=a(d1,d2,x1,y1,x2,y2) rad=4*(bb*(aay1)x1)**2  4*(1+bb**2)*(x1**2d1**2+(y1aa)**2) pre=2*(x1bb*(aay1)) den=2*(1+bb**2) xp=(pre+np.sqrt(rad))/den xm=(prenp.sqrt(rad))/den yp=aa+xp*bb ym=aa+xm*bb return xm,ym,xp,yp
Consider that point 1 is a distance of 4.1 (arbitrary units) from the unknown point with (x1, y1) coordinates of (2.4,1.8) and that point 2 is a distance of 3.8 from the unknown point with (x2, y2) coordinates of (1.9,1.9).
 Use the functions above to determine two potential sets of coordinates (xp, yp) or (xm, ym) for the unknown point.
To check this calculation, use a pointtopoint distance calculation (for which Pythagoras gets credit) to determine if the same given distance can be calculated using the original known coordinates and the (xp, yp) or (xm, ym) coordinates.
Define a function called pythagoras that takes as arguments the coordinates of two points (e.g., for a and b, (xa, ya, xb, yb)) and returns the distance between the two points.
Use pythagoras to determine if the calculated distances of point 1 from (xp, yp) and (xm, ym) are equivalent. Do the same for point 2.
Now, to properly call this "triangulation" to locate a single point, we need a third point as a reference. We need not specify the distance from the unknown point to this third; we merely choose the resulting point that is closer to this third reference point.
 Write a function called triangulate that takes as arguments the potential coordinates of the unknown point (xp, yp) and (xm, ym) followed by the coordinates of the reference point (x3, y3) and returns the coordinates of the point closest to the third point.

Applying a product of sequences to a column for each row
I'm working with a test dataset that I want to predict the probability of false negatives for using the following equation:
y = ∏ij(1  d)
Where y = the probability of a false negative, i = 1, j = number of samples (obtained from the sample column in the dataset below; 1, 2, or 3) and d = proportion of samples the amplified/pcr detection.
site sample sample.volume..L. pcr1 pcr2 pcr3 pcr4 pcr5 pcr6 d.prop f.negatives pond 1 1 2 1 1 1 0 1 1 0.83 0.167 pond 1 2 2 1 1 0 0 1 1 0.67 0.333 pond 1 3 2 0 0 1 1 1 1 0.67 0.333
I calculated
d.prop
using the following code:testdf$detection.proportion < length(subset(testdf, select = c(pcr1, pcr2, pcr3, pcr4, pcr5, pcr6))) p.detection < rowSums(data[,c(1, 2, 3, 10)] == "1") testdf$detection.proportion < p.detection/testdf$detection.proportion
Basically for each row I did: # of occurrences of 1/sum(1+0). This is because
d.prop
is the d in the equation.For f.negatives (which is y, which is what I'm looking for) I have only managed to do the (1d) part of the equation using this:
testdf$false.negatives < (1  testdf$detection.proportion)
I need to know how to do ∏ij part of the equation. This basically means for each row there will be a different ∏ij value:
If sample = 1 then ∏ij = 1*1 = 1 If sample = 2 then ∏ij = 1*2 = 2 If sample = 3 then ∏ij = 1*2*3 = 6
I'm aware of:
ij < cumprod(1:3)
but I don't know how I would apply this to each row, with a different j value (based on the sample column). 
Proportion across rows and specific columns in R
I want to get a proportion of pcr detection and create a new column using the following test dataset. I want a proportion of detection for each row, only in the columns pcr1 to pcr6. I want the other columns to be ignored.
site sample pcr1 pcr2 pcr3 pcr4 pcr5 pcr6 pond 1 1 1 1 1 0 1 1 pond 1 2 1 1 0 0 1 1 pond 1 3 0 0 1 1 1 1
I want the output to create a new column with the proportion detection. The dataset above is only a small sample of the one I am using. I've tried:
data$detection.proportion < rowMeans(subset(testdf, select = c(pcr1, pcr2, pcr3, pcr4, pcr5, pcr6)), na.rm = TRUE)
This works for this small dataset but I tried on my larger one and it did not work and it would give the incorrect proportions. What I'm looking for is a way to count all the 1s from pcr1 to pcr6 and divide them by the total number of 1s and 0s (which I know is 6 but I would like R to recognize this in case it's not inputted).

What to do when the "which" function doesn't find the value?
I get this error when the which function cannot find the value. I want it to simply return a value indicating that nothing was found. How would I do that? Also I'd like to use a for loop to reiterate through each variable in the dataframe, how would I look at each column in the dataframe individually? I just need to know how to call up the columns or rows of the matrix, I'm good with loops  Ive been programming for years, just a little new to r. Thank you!
Day1 = c("S", "Be", "N", "S", "St") Day2 = c("S", "S", "M", "Ta", "Sa") Day3 = c("S", "Ba", "E", "Te", "U") Day4 = c("V") Week = data.frame(Day1, Day2, Day3, Day4) print(Week) n = which(Week$Day4 == "S") if (n[1] == 1) { print("true") } else { print("false") }

How to implement Multivariate Exponentially Weighted Moving Average( MEWMA ) using python?
I am doing a project where we continuously monitor actions of a person and if the person has a fall like action it is detected. So here we train our classifier using the variation of the person in the frame because of his movement. This is done by dividing the frame into 5 sectors and calculating the variations in these 5 sectors. so we have 5 variables for EWMA, So we need MEWMA. How can this be implemented?

How do you set repeated measures in a mixed model in R?
I am trying to create a mixed model in R with repeated measures based on SAS code. I would like to get the same result as in SAS so that I can fully understand what the R code is doing and what each argument means.
Here is the SAS code:
proc import datafile= "\\vmwarehost\Shared Folders\Desktop\Trials.xlsx" out=P_Mating dbms=xlsx replace; sheet = 'Data2'; run; Proc mixed data=P_Mat; class ID Trl M_ID ; model CIs= DS M_ID Rd T2; Repeated /subject=ID type=un; run;
I originally wrote this in R: library(nlme)
mixed <lme(CIs~DS+M_ID+Rd+T2,random = list(~1ID),data=P_Mat) summary(mixed)
And I think the results actually make sense but they are not the same as the results from SAS.
Here is the data: P_Mat
Rd Trl DS ID Cs ACs CIs LnCI T2 H2 Dom M_ID 1 1 282.237 P31 1 4 5 1.791759469 75 94 7 61 1 2 220.024 P27 0 1 1 0.693147181 89 48 0 53 1 3 179.249 P5 5 5 10 2.397895273 77 84 8 51 1 4 174.153 P25 3 0 3 1.386294361 79 77 1 60 1 5 151.749 P58 2 0 2 1.098612289 72 94 14 53 1 6 64.449 P33 4 1 5 1.791759469 71 96 10 61 1 7 40.749 P39 3 4 7 2.079441542 74 91 14 44 1 8 0.499 P26 1 1 2 1.098612289 78 76 19 46 2 9 174.153 P25 0 3 3 1.386294361 76 87 1 53 2 10 151.749 P58 7 0 7 2.079441542 79 82 8 61 2 11 61.110 P24 3 6 9 2.302585093 74 87 4 51 2 12 0.499 P26 0 0 0 0 79 82 5 53 2 13 220.024 P27 4 5 9 2.302585093 73 100 27 61 2 14 282.237 P31 1 1 2 1.098612289 73 96 21 60 2 15 21.262 P23 1 5 6 1.945910149 75 94 1 51 2 16 179.249 P5 6 0 6 1.945910149 79 84 15 61 1 1 26.001 P19 1 1 2 1.098612289 75 94 0 61 1 2 40.251 P10 2 3 5 1.791759469 89 48 0 53 1 3 42.501 P36 2 2 4 1.609437912 77 84 0 51 1 4 61.110 P24 2 0 2 1.098612289 79 77 0 60 1 5 93.501 P43 1 0 1 0.693147181 72 94 0 53 1 6 149.75 P56 2 1 3 1.386294361 71 96 0 61 1 7 180.25 P18 0 0 0 0 74 91 0 44 1 8 227.70 P13 2 1 3 1.386294361 78 76 0 46 2 9 149.75 P56 1 1 2 1.098612289 76 87 0 53 2 10 26.001 P19 3 0 3 1.386294361 79 82 0 61 2 11 40.749 P39 0 3 3 1.386294361 74 87 0 51 2 12 93.501 P43 2 3 5 1.791759469 79 82 0 53 2 13 180.25 P18 0 1 1 0.693147181 73 100 0 61 2 14 42.50 P36 4 0 4 1.609437912 73 96 0 60 2 15 227.70 P13 0 1 1 0.693147181 75 94 0 51 2 16 64.449 P33 1 1 2 1.098612289 79 84 0 61
I have looked at other similar questions: Converting Repeated Measures mixed model formula from SAS to R
Convert mixed model with repeated measures from SAS to R
Fitting repeated measures in R
But am still feeling lost. Sorry  I am still very new to all of this (as I am sure is obvious)... so I am having trouble understanding what each argument means based on the nlme help file, vignettes and also when to use "*" versus "+" in this code.
My repeated measures are "ID" and "DS" columns (because some of those appear twice in the data/it's the same individual (ID) who has the same rank/hierarchy score (DS)) and I am looking at CIs as a function of DS, M_ID, Rd and T2 to see which of those has the strongest relationship (or which have relationship at all) with CIs.
I am also trying to understand how to plot a mixed model and how that is different than lm (I tried abline and it didn't work):
plot(CIs~DS, data=P_Mat) abline(mixed)
I got the following error:
Error in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...) : (list) object cannot be coerced to type 'double' In addition: Warning message: In abline(mixed) : only using the first two of 4 regression coefficients
Any ideas/guidance you can offer will be very much appreciated! Thank you!

How do I convert code for a mixed model with repeated measures from SAS to R?
I do not know SAS and I am just starting to understand the mixed model code in R. I have this SAS code and need to do the same thing in R so I can work with it.
Here is the code:
proc import datafile= "\\vmwarehost\Shared Folders\Desktop\Trials.xlsx" out=Mat dbms=xlsx replace; sheet = 'Data2'; run; Proc mixed data=Mat; class Bird_Metal Trial Male_ID ; model CourtshipInteractions= Bird_DS Male_ID Round Temp2; Repeated /subject=Bird_Metal type=un; run;
What I think this means  read in data (not sure about the rest of that first chunk).
Use a mixed model with data=Mat. Consider "Bird_Metal" "Trial" and "Male_ID" as factors that may influence the result? I'm not sure about that part. Model the impact of Bird_DS, Male_ID , Round and Temp2 on Courtship Interactions with Bird_Metal as a repeated measure. I'm not sure what type=un means.
Any ideas at all on how to do this in R would be very much appreciated! I have been struggling with trying to understand how to code mixed models with repeated measure in R and now I'm not sure what this SAS code is doing exactly. Thanks so much!