How to make multiple bar graphs for factors in R
I would love to make a figure like what I have for my numeric features
hist(df[ , purrr::map_lgl(df, is.numeric)])
If I try to do the same thing with factors
hist(df[ , purrr::map_lgl(df[,interest_factors], is.factor)])
I get
Any suggestions? I just want to quickly view them
Thanks
See also questions close to this topic

R regression: Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) :undefined columns selected
I've been trying to build regression models using different ways and method and as I was trying out this code here, using the Caret package:
library(caret) set.seed(222) ind < sample(2, nrow(model2), replace = T, prob = c(0.7,0.3)) train < model2[ind==1,] test < model2[ind==2,] custom < trainControl(method = "repeatedcv", number = 6, repeats = 6, verboseIter = T) lm < train(train$SS~., train, method = 'lm', trControl = custom) lm$results
But I kept receiving this error note:
Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) :undefined columns selected
Here are the
str()
of my data set.> str(train) 'data.frame': 19 obs. of 15 variables: $ SST : num 0 0 0 0 1 0 0 0 0 1 ... $ SSA : num 0 1 0 0 0 0 0 1 0 0 ... $ SSR : num 0 0 0 1 0 0 0 0 0 0 ... $ SSC : num 0 0 0 0 0 0 0 0 0 1 ... $ SSF : num 1 1 1 0 1 1 1 1 1 1 ... $ SSS : num 0 0 0 1 0 0 0 0 0 0 ... $ SST : num 1 1 1 0 1 1 1 1 1 1 ... $ SSH : num 1 1 1 0 1 1 1 1 1 1 ... $ SSC : num 1 1 1 0 1 1 1 1 1 1 ... $ SSW : num 0 0 0 0 0 0 0 0 0 0 ... $ QTY : num 45 45 49 13 48 109 45 42 45 31 ... $ SS : num 470000 550000 460000 630000 1060000 530000 480000 510000 460000 630000 ... $ BASE..SS : num 6.27e+09 6.67e+09 6.14e+09 8.54e+09 1.43e+10 ... $ Ex : num 13341 13341 13341 13341 13341 ... $ TPH. : num 45 45 45 90 95 65 45 45 45 45 ...
Any kind of help will be greatly appreciated. Thank you for taking time off this question!

How to execute a function without argument using lapply
I have the following functions:
set.seed(1) make_seq < function() { paste0(sample(LETTERS, size = 30, replace = TRUE), collapse = "") } make_seq() #> [1] "GJOXFXYRQBFERJUMSZJUYFQDGKAJWI"
It takes no argument and spits out a sequence.
What I want to do is to compactly create 100 sequences with the above function with
lapply
. But why this failed?> lapply(1:100, make_seq()) Error in get(as.character(FUN), mode = "function", envir = envir) : object 'GJOXFXYRQBFERJUMSZJUYFQDGKAJWI' of mode 'function' was not found
What's the right way to do it?

ggplot change color of one bar from stacked bar chart
Is there way to change colors of one bar( x  value) manualy in
ggplot
data
for_plot_test=structure(list(name = c("A", "B", "C", "A1", "A2", "A3", "A4", "BI", "A", "B", "C", "A1", "A2", "A3", "A4", "BI"), n = c(1L, 3L, 5L, 7L, 9L, 11L, 13L, 15L, 2L, 4L, 6L, 8L, 10L, 12L, 14L, 16L), value = c(0, 0.05, 0, 0.05, 0.05, 0.1, 0.05, 0, 1, 0.7, 0.6, 0.5, 0.4, 0.2, 0.2, 0.1), variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("PROGRESS", "prev_progress"), class = "factor")), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 16L), vars = "name", labels = structure(list(name = c("Applications", "BI", "Clients", "CRE & Scoring", "Portfolio & Production", "SG Russia", "Transactions", "УКЛ & Prescoring")), row.names = c(NA, 8L), class = "data.frame", vars = "name", drop = TRUE, indices = list(0:1, 14:15, 6:7, 10:11, 2:3, 12:13, 8:9, 4:5), group_sizes = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), biggest_group_size = 2L, .Names = "name"), indices = list(c(0L, 8L), c(7L, 15L), c(3L, 11L), c(5L, 13L), c(1L, 9L), c(6L, 14L), c(4L, 12L), c(2L, 10L)), drop = TRUE, group_sizes = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), biggest_group_size = 2L, .Names = c("name", "n", "value", "variable"))
Current plot
colot_progress=c("#be877a","#dcbfad") s < ggplot(for_plot_test, aes(x= reorder(name, n),y = value, fill = variable,label=ifelse(for_plot$value==0,"",scales::percent(for_plot$value))))+ geom_bar(stat='identity',position = "stack")+ scale_fill_manual(values=colot_progress,aesthetics = "fill")+ coord_flip()+ theme_minimal() + theme( axis.title = element_blank(), axis.text.x=element_blank(), panel.grid = element_blank(), legend.position="none" )+ geom_text(size = 5, position = position_stack(vjust = 0.5)) s

Scipy poisson distribution with an upper limit
I am generating a random number using scipy stats. I used the Poisson distribution. Below is an example:
import scipy.stats as sct A =2.5 Pos = sct.poisson.rvs(A,size = 20)
When I print Pos, I got the following numbers:
array([1, 3, 2, 3, 1, 2, 1, 2, 2, 3, 6, 0, 0, 4, 0, 1, 1, 3, 1, 5])
You can see from the array that some of the number,such as 6, is generated.
What I want to do it to limit the biggest number(let's say 5), i.e. any random number generated using sct.poisson.rvs should be equal or less than 5,
How can I tweak my code to achieve it. By the way, I am using this in Pandas Dataframe.

Why do some analysts use binomial distribution to describe click through rates/probabilities?
Binomial Distribution assumed that p is consistent for all individual events but probability of clicking is totally different for every user. Isn’t it contradicted?
How could I apply poisson distribution and ztest into a/b testing with average order numbers per user as the metric? Is it possible to break down poisson into binomial in this kind of experiment?

Summary Statistics in JMP
New to JMP for statistical analysis. When you run the distribution (Analyze < Distribution) it usually gives you a histogram for the data and a list of summary statistics. The summary statistics are no longer appearing instead, only the histogram and a frequency chart appear. I reset my summary statistics to default and that didn't work. The data in the columns is set to numeric. Not sure what's going on. Thanks for your responses in advance.

NMDS and adonis
I know usually people run the
adonis
function on the original community matrix against an environmental matrix or a matrix with different site parameters for the community. Although I am wondering if it would be ok to run the original community matrix against a matrix that was a consolidated form of the original matrix (for example collapsing plant guilds) in order to determine what in the original community matrix is causing the differences in visualized NMDS?Example: I had a plant community composition data that I used to plot the NMDS. I then created a second matrix that consolidated groups like Native Forbs, Exotic Shrubs, Native Grasses, etc  originating from the original community matrix. To me it makes sense, but I am not sure because it technically uses the same data set. BUT I am not trying to figure out if there are separate site characteristics causing this, just why the NMDS plots differ. I know it is circular thinking, but is it acceptable?
Here is the plot: Plot output
I can post the data set if it is helpful, but I figured this was more of a theoretical question
YLRveg2018 < ddply(VegComp, c("Plot"), summarise, Avena.fatua= sum(Avena.fatua), Bromus.diandrus= sum(Bromus.diandrus), Bromus.hordeaceous= sum(Bromus.hordeaceous), Festuca.muyros= sum(Festuca.muyros), Festuca.perennis= sum(Festuca.perennis), Carduus.pycnocephalus= sum(Carduus.pycnocephalus), Cirsium.vulgare= sum(Cirsium.vulgare), Erodioum.cicatarium= sum(Erodioum.cicatarium), Geranium.dissectum= sum(Geranium.dissectum), Helminthotheca.echiodies= sum(Helminthotheca.echiodies), Lactuca.seriola= sum(Lactuca.seriola), Medicago.polymorpha= sum(Medicago.polymorpha), Oxalis.pes.capre= sum(Oxalis.pes.capre), Raphanus.sativus= sum(Raphanus.sativus), Senecio.vulgare= sum(Senecio.vulgare), Sonchus.oleraceous= sum(Sonchus.oleraceous), Vicia.sativa= sum(Vicia.sativa), Artemisia.californica= sum(Artemisia.californica), Baccharis.pilularis= sum(Baccharis.pilularis), Ericameria.ericoides= sum(Ericameria.ericoides), Mimulus.aurantiacus= sum(Mimulus.aurantiacus), Bromus.carinatus= sum(Bromus.carinatus), Elymus.triticoides= sum(Elymus.triticoides), Hordeum.Brachyantherum= sum(Hordeum.Brachyantherum), Stipa.pulchra= sum(Stipa.pulchra), Achillea.millefolium= sum(Achillea.millefolium), Eschscholzia.californica= sum(Eschscholzia.californica), Lupinus.variicolor= sum(Lupinus.variicolor), Echium.sp= sum(Echium), Bareground = sum(Bare), Thatch= sum(Thatch)) Guilds2018 < ddply(GuildComp, c("Plot"),summarise, Thatch.Depth = mean(Thatch.depth), Thatch = sum(Thatch), Bareground = sum(Bare), OverallPlantCover = sum(VegCover), ExoticGrasses = sum(ExoGrass), ExoticForbs = sum(ExoForb), ExoticShurbs = sum(ExoShrub), NativeGrasses = sum(NativeGrass), NativeForbs = sum(NativeForb), NativeShrubs = sum(NativeShrub))
adonis(Veg2018 ~ NativeGrasses+Bareground+ExoticGrasses+Thatch+Thatch.Depth+ExoticForbs+NativeForbs+NativeShrubs, permutations = 999, distance = "bray", data=PlantGuilds)

Adjusting values in a data frame such that specific conditions are met
I'm working on designing a simulation for my dissertation and could use some help with a current problem. I'm trying to adjust the first four values in column 2 such that the new value is .01. and then the corresponding values in the first column absorb what was removed from the second column such that the communalities remain constant. The communality for each row is calculated as
comm1 < apply(lambda^2,1,sum)
Additionally, no value in the data frame can be greater than 1 (or less than 1), so the distribution of the values being transferred to column 1 would have to be adjusted to compensate for this. Please let me if I can make things clearer anywhere, and thank you for the help.
Here's an example of the data frame (lambda) that I'm working with including how it was generated.
numfac < 4 varfac < 4 size_specific < .6 size_g < .4 ## Generate true population IC structure # Generate empty lambda matrix; dimensions controlled by condition lambda < matrix(0,nrow=numfac*varfac,ncol=numfac+1) #Generate specific factor loadings i<2 for (i in 2:(numfac+1)) { loadings < seq(size_specific.1,size_specific+.1,by=(.2/(varfac1))) #from .3 to .5, from .4 to .6, from .5 to .7 #Write specific factor loadings to lambda matrix lambda[((i2)*varfac+1):((i1)*varfac),i] < loadings } #Generate general factor loadings and write to lambda matrix lambda[,1] < sample(seq(size_g.1,size_g+.1,by=(.2/(numfac*varfac1)))) #from .3 to .5, from .4 to .6, from .5 to .7 lambda [,1] [,2] [,3] [,4] [,5] [1,] 0.3933333 0.5000000 0.0000000 0.0000000 0.0000000 [2,] 0.3533333 0.5666667 0.0000000 0.0000000 0.0000000 [3,] 0.3133333 0.6333333 0.0000000 0.0000000 0.0000000 [4,] 0.4866667 0.7000000 0.0000000 0.0000000 0.0000000 [5,] 0.3400000 0.0000000 0.5000000 0.0000000 0.0000000 [6,] 0.4333333 0.0000000 0.5666667 0.0000000 0.0000000 [7,] 0.3266667 0.0000000 0.6333333 0.0000000 0.0000000 [8,] 0.3666667 0.0000000 0.7000000 0.0000000 0.0000000 [9,] 0.4733333 0.0000000 0.0000000 0.5000000 0.0000000 [10,] 0.4600000 0.0000000 0.0000000 0.5666667 0.0000000 [11,] 0.4466667 0.0000000 0.0000000 0.6333333 0.0000000 [12,] 0.4066667 0.0000000 0.0000000 0.7000000 0.0000000 [13,] 0.3000000 0.0000000 0.0000000 0.0000000 0.5000000 [14,] 0.3800000 0.0000000 0.0000000 0.0000000 0.5666667 [15,] 0.5000000 0.0000000 0.0000000 0.0000000 0.6333333 [16,] 0.4200000 0.0000000 0.0000000 0.0000000 0.7000000

Perform ttest R two datasets and multiple variables
I have two data sets (two different countries), and for each data set I have 3 variables: 2 independents (years and car model), 1 dependent (sales)
To make it more visual it would look something like this:
Country 1
Car_Model Year Sales A 1 100 A 2 200 B 1 80 B 2 90 C 1 66 C 2 20
Then for the second data set:
Country 2
Car_Model Year Sales A 1 120 A 2 220 B 1 82 B 2 92 C 1 62 C 2 22
I have been asked to perform a ttest to check if there is a significant difference between countries, but taking into consideration the variable car_model and year. However, I can not understand if it is possible to perform a ttest in such a problem, and which kind of ttest should I perform or how I should use the variables.
I think the idea here is to check the significant difference of model A in country 1 against model A in country 2, then model B in country 1 against model B in country 2, and so on. And at the end get a p value based on those previous p values. Is it possible to do this using a ttest in R?