overlay a normal distribution to a histogram of nonnormally distributed values in ggplot r
I'm trying to overlay a normal bell curve on top of the histogram of these fake data that are intentionally NOT normally distributed. My goal is to show other students how nonnormally distributed data look in comparison to a normal distribution.
While I have figured out how to get the bell curve on from other questions that have been asked, my y axis is acting strange. For a density plot, I would assume that the axis would go from 0 to 1, but for some values, it says the density is 2 (see image of screenshot below). I want bars that show the density and a bell curve that shows the normal distribution. Any help would be appreciated!
Here's the fake dataset:
library(dplyr)
tester2 < tibble(
fake = c(2, 2, 2, 2, 10, 10, 10, 10, 5, 3, 4, 5, 6, 7, 8, 9, 10, 10, 5, 2, 4, 5, 6, 7, 8, 4, 4, 5, 5, 2, 2, 2, 2, 2, 10, 10, 10, 10, 5, 2, 2, 2, 2, 2, 10, 10, 10, 10, 5, 2, 2, 2, 2, 2, 10, 10, 10, 10, 5, 2, 3, 4, 5, 5, 5, 5, 5, 4, 6, 5),
also_fake = c(1, 2, 2, 2, 3, 3, 3.3, 4, 4, 5, 1, 2, 2, 2, 3, 3.6, 3, 4, 4, 5, 1, 2, 2, 2.1, 3, 3, 3, 4, 4, 5, 1, 2, 2, 2, 3.1, 3, 3, 4.6, 4, 5, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5)
)
Here's my code so far:
testing < ggplot(tester2, aes(x = also_fake)) +
geom_histogram(aes( y = ..density..)) +
geom_rug() +
stat_function(fun = dnorm,
color = "blue",
args=list(mean = mean(tester2$also_fake),
sd = sd(tester2$also_fake)))
And here's what it produces:
EDIT: This question is different from this question because I do not want a density plot: Superimpose a normal distribution to a density using ggplot in R
It is also different from this question because my values are intentionally nonnormally distributed: ggplot2: histogram with normal curve.
See also questions close to this topic

Warnings when restoring graphical parameters
I am writing my first R package and currently working on a function to make a plot using some particular graphical parameters. I want the user defined graphical parameters to get restored after the plot is made but always get same warning messages:
opar < par() par(oma = c(5, 4, 0, 0) + 0.1, mar = c(0, 0, 1, 1) + 0.1) par(opar)
Warning messages:
1: In par(opar) : graphical parameter "cin" cannot be set
2: In par(opar) : graphical parameter "cra" cannot be set
3: In par(opar) : graphical parameter "csi" cannot be set
4: In par(opar) : graphical parameter "cxy" cannot be set
5: In par(opar) : graphical parameter "din" cannot be set
6: In par(opar) : graphical parameter "page" cannot be setIs there a better way of doing that? I know the
suppressWarnings()
function but 1. I don't want the messages to get hided and 2. if the function is called two times, a warning message appears:> There were 12 warnings (use warnings() to see them)

Fail to authenticate BigQuery with R under the bigrquery package
I am trying to use
set_service_token
in thebigrquery
package for a noninteractive authentication.Here is my code:
library(bigrquery) set_service_token("client_secret.json")
But it kept showing the error message below:
Error in read_input(file) :
file must be connection, raw vector or file pathHowever, when I simply to read the JSON path, it works:
lapply(fromJSON("client_secret.json"), names)
$`installed`
[1] "client_id" "project_id" "auth_uri" "token_uri" "auth_provider_x509_cert_url" "client_secret" "redirect_uris"Can anyone help me with this? Thank you very much!

Plotting mutiple graph in one graph in R with using function
I am trying to plot a graph in one plot only. I have 4 different plots coming by using a function. This is my code:
hazard.plot.w2p(beta = beta.spreda, eta = eta.spreda, time = exa1.dat$time, line.colour = "blue") hazard.plot.w2p(beta = 1.076429, eta = 26.21113, time = exa1.dat$time, line.colour = "blue") hazard.plot.w2p(beta = 5, eta = 32.97954, time = exa1.dat$time, line.colour = "blue") hazard.plot.w2p(beta = 2, eta = 32.9795, time = exa1.dat$time, line.colour = "blue")
Here is a function i used to get output:
hazard.plot.w2p < function(beta, eta, time, line.colour, nincr = 500) { max.time < max(time, na.rm = F) t < seq(0, max.time, length.out = nincr) r < numeric(length(t)) for (i in 1:length(t)) { r[i] < failure.rate.w2p(beta, eta, t[i]) } plot(t, r, type = 'l', bty = 'l', col = line.colour, lwd = 2, main = "", xlab = "Time", ylab = "Failure rate", las = 1, adj = 0.5, cex.axis = 0.85, cex.lab = 1.2) }
I want to plot all the 4 plots in one plot only.
Here is a sample data set:
fail time a 4.55 a 4.65 a 5.21 b 3.21 a 1.21 a 5.65 a 7.12

merging multiple lists into a data frame for ggpot2
I am trying to reproduce the below plot:
I could plot each density plot using the following code:
plot_densities2 < function(density) { print(ggplot(data = densities, aes(x = x, y = y, fill = id)) + theme_bw() + geom_area(alpha = 0.5)) } filenames < c("~/sample521.samuniq.sorted.bam", "~/sample524.samuniq.sorted.bam") for ( i in filenames){ print(i) density < extracting_pos_neg_reads(i) densities < cbind(rbind(data.frame(density[[1]][1:2]), data.frame(density[[2]][1:2])), id = rep(c("neg", "pos"), each = length(density[[1]]$x))) plot_densities2(densities) }
Unfortunately, I do not know how to add the additional lists to the
densities
data frame inside the above for loop.The full code can be found below and the data can be downloaded from here
#apt update && apt install zlib1gdev #install if necessary #source("http://bioconductor.org/biocLite.R") #biocLite("Rsamtools") #load library library(Rsamtools) #install.packages("ggplot2") library("ggplot2") extracting_pos_neg_reads < function(bam_fn) { #read in entire BAM file bam < scanBam(bam_fn) #names of the BAM fields names(bam[[1]]) # [1] "qname" "flag" "rname" "strand" "pos" "qwidth" "mapq" "cigar" # [9] "mrnm" "mpos" "isize" "seq" "qual" #distribution of BAM flags table(bam[[1]]$flag) # 0 4 16 #1472261 775200 1652949 #function for collapsing the list of lists into a single list #as per the Rsamtools vignette .unlist < function (x) { ## do.call(c, ...) coerces factor to integer, which is undesired x1 < x[[1L]] if (is.factor(x1)) { structure(unlist(x), class = "factor", levels = levels(x1)) } else { do.call(c, x) } } #store names of BAM fields bam_field < names(bam[[1]]) #go through each BAM field and unlist list < lapply(bam_field, function(y) .unlist(lapply(bam, "[[", y))) #store as data frame bam_df < do.call("DataFrame", list) names(bam_df) < bam_field dim(bam_df) #[1] 3900410 13 # #use chr22 as an example #how many entries on the negative strand of chr22? ###table(bam_df$rname == 'chr22' & bam_df$flag == 16) # FALSE TRUE #3875997 24413 #function for checking negative strand check_neg < function(x) { if (intToBits(x)[5] == 1) { return(T) } else { return(F) } } #test neg function with subset of chr22 test < subset(bam_df)#, rname == 'chr22') dim(test) #[1] 56426 13 table(apply(as.data.frame(test$flag), 1, check_neg)) #number same as above #FALSE TRUE #32013 24413 #function for checking positive strand check_pos < function(x) { if (intToBits(x)[3] == 1) { return(F) } else if (intToBits(x)[5] != 1) { return(T) } else { return(F) } } #check pos function table(apply(as.data.frame(test$flag), 1, check_pos)) #looks OK #FALSE TRUE #24413 32013 #store the mapped positions on the plus and minus strands neg < bam_df[apply(as.data.frame(bam_df$flag), 1, check_neg), 'pos'] length(neg) #[1] 24413 pos < bam_df[apply(as.data.frame(bam_df$flag), 1, check_pos), 'pos'] length(pos) #[1] 32013 #calculate the densities neg_density < density(neg) pos_density < density(pos) #display the negative strand with negative values neg_density$y < neg_density$y * 1 return (list(neg_density, pos_density)) } #https://stackoverflow.com/a/53698575/977828 plot_densities2 < function(density) { print(ggplot(data = densities, aes(x = x, y = y, fill = id)) + theme_bw() + geom_area(alpha = 0.5)) } filenames < c("~/josh/sample521.samuniq.sorted.bam", "~/josh/sample524.samuniq.sorted.bam") for ( i in filenames){ print(i) density < extracting_pos_neg_reads(i) densities < cbind(rbind(data.frame(density[[1]][1:2]), data.frame(density[[2]][1:2])), id = rep(c("neg", "pos"), each = length(density[[1]]$x))) plot_densities2(densities) }

Trying to remove an axis below xaxis using ggplot
I am new to
ggplot
and is trying to plot two lines using it. But my xaxis appeared to be very weird, and now i want to remove it. Here is my code.ggplot(BJ11, aes(Date, mean,group=1)) + geom_line(aes(color = "stateair daily values")) + geom_line(data = bjvalue2, aes(color = "CNEMC values"))
Here are my data:
> head(BJ11) Date min max mean 1 20150101 6 154 54.58333 2 20150102 12 157 63.54167 3 20150103 147 322 209.25000 4 20150104 106 360 201.16667 5 20150105 9 186 90.87500 6 20150106 10 121 43.16667 > head(bjvalue2) Date mean 1 20150101 43 2 20150102 52 3 20150103 150 4 20150104 176 5 20150105 92 6 20150106 40
what should i do to remove both the thick black axis above "Date" and the xaxis?

Reorder factor not working on grouped data
I have this dataframe that I call
top_mesh_terms
structure(list(topic = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), term = c("Diabetes Mellitus", "Depression", "Syndrome", "Diabetes Mellitus, Type 2", "Lung Diseases", "Colorectal Neoplasms", "Osteoarthritis", "Sclerosis", "Lymphoma", "Lung Diseases, Obstructive", "Diabetes Mellitus", "Disease", "Hypertension", "Syndrome", "Carcinoma", "Infection", "Coronary Disease", "Lung Neoplasms", "Obesity", "Infarction"), beta = c(0.0196989252285569, 0.018472562347772, 0.0175512616261399, 0.0146680780420432, 0.0133507951269683, 0.01224603797061, 0.0116799262133244, 0.0107893497000735, 0.00926496950657875, 0.00891926541108836, 0.0324598963852768, 0.0198135918084849, 0.0162689075944415, 0.0157166860189554, 0.014855885836076, 0.0127365678034364, 0.0109544570325732, 0.00964124158432716, 0.00956596829604797, 0.00880281359338067)), class = c("tbl_df", "tbl", "data.frame" ), row.names = c(NA, 20L))
As the title suggests, I'd like to reorder the
term
column bybeta
and then draw a bar chart. I hoped to see a bar chart with ordered bars, but that is not the case. Here is the code I used and the resulting graph:top_mesh_terms %>% group_by(topic) %>% mutate(term = fct_reorder(term, beta)) %>% ungroup() %>% ggplot(aes(term, beta)) + geom_bar(stat = "identity") + facet_wrap(~ topic, scales = "free") + coord_flip() + scale_y_continuous(labels = scales::percent_format()) + labs(x = "MeSH Term", y = "Beta")

Reweighting histogram in matplotlib (drawing patches)
Let me preface this by saying that my understanding of matplotlib is limited  I mostly just use
plot()
,hist()
andshow()
from pyplot. I have a basic understanding of what patch objects are, but (evidently) do not properly understand how to work with them.I have a histogram, where the bars need to be 'reweighted' by a factor which is a function of the value of the datapoints (think of it as adjusting for a known bias in the sampling process). I have
n, bins, patches = plt.hist(data, bins = num_bins, normed = 1)
so in principle I should have access to everything 
bins
to get the value of the data for every particular bin, andpatches
, where I intended to useget_height()
andset_height()
to do the actual rescaling.The trouble is that, though I can manage to get my patches properly rescaled, I have no idea how to draw/display them.
EDIT: I have tried reading the documentation, but didn't understand. Then I tried copypasting example code involving patches from the internet, ran it to see whether it produced the expected result (it did), and then simply substituted the existing code's list of patches with my
patches
. It drew an empty plot.EDIT 2: I'm at a complete loss for what could be going wrong. When I directly access
patches
, everything seems to be in order  there is the expected number of objects, each one'sget_height()
and other methods seem to return the appropriate values...I'm interested in both how to solve my problem, and an explanation of why the objects in
patches
aren't behaving the way I expect them to.EDIT 3:
An example of what I meant  I copied the following code to display patches from a standard matplotlib example:
import numpy as np import matplotlib from matplotlib.patches import Circle, Wedge, Polygon from matplotlib.collections import PatchCollection import matplotlib.pyplot as plt fig, ax = plt.subplots() resolution = 50 # the number of vertices N = 3 x = np.random.rand(N) y = np.random.rand(N) radii = 0.1*np.random.rand(N) patches = [] for x1, y1, r in zip(x, y, radii): circle = Circle((x1, y1), r) patches.append(circle) x = np.random.rand(N) y = np.random.rand(N) radii = 0.1*np.random.rand(N) theta1 = 360.0*np.random.rand(N) theta2 = 360.0*np.random.rand(N) for x1, y1, r, t1, t2 in zip(x, y, radii, theta1, theta2): wedge = Wedge((x1, y1), r, t1, t2) patches.append(wedge) # Some limiting conditions on Wedge patches += [ Wedge((.3, .7), .1, 0, 360), # Full circle Wedge((.7, .8), .2, 0, 360, width=0.05), # Full ring Wedge((.8, .3), .2, 0, 45), # Full sector Wedge((.8, .3), .2, 45, 90, width=0.10), # Ring sector ] for i in range(N): polygon = Polygon(np.random.rand(N, 2), True) #patches.append(polygon) colors = 100*np.random.rand(len(patches)) p = PatchCollection(patches, alpha=0.4) p.set_array(np.array(colors)) ax.add_collection(p) fig.colorbar(p, ax=ax) plt.show()
meant to demonstrate the use of patches, and it behaved exactly as expected (drawing several patches, as described in the code). However, when I commented out all the patchcreating parts of the code (removing them one by one and rerunning to confirm that the remaining patches were still being drawn, to rule out the possibility that I might be taking out something important by accident), and instead defined
patches
throughn, bins, patches = plt.hist(data, bins = num_bins, normed = 1)
only an empty plot was displayed (to clarify, the
hist()
command still drew its histogram as normal, but no patches were drawn after the first histogram had been shown). 
PHP  chart combined bars grid columns (histogram and pareto)
I have to make the following report. Combined graph barsgridcolumns is a grid with integer numeric data, the rows are items and the columns represent time, for example, months. Rows and columns have totals and are the ones that are plotted.
Grafico combinado barrasgrillacolumnas
As shown in the image, above the data grid goes the graph of columns and to the right of the registers goes the bar graph. I had it armed in excel painting the cells proportionally to the value represented in the total and now I must pass it to php. It does not have to be dynamic, it is the result of a query, it could be an image.
How could he do it? some bookstore? a grid? I thank you for your guidance, all suggestions are welcome

how to plot histogram of a Series with datetime.time dtype in Python?
I have a Series which each data type of each element is datetime.time. The data includes the times that an event happend within 24 hours and I want to plot histogram of this Series. I used the following code and I considered 48 bins to show the number of happened event every 30 minutes.
fig = plt.figure() sns.set() plt.xlabel("Departure time") plt.ylabel("Frequency of daily travel") _=plt.hist(data1["Time"], bins=48) plt.show()
But I got this error:
return pydt.time(h, m, s, msus).strftime(fmt)[:3] ValueError: microsecond must be in 0..999999
I have no idea because I checked and microsecond is 0 in all elements of the Series.

How do I draw lines to join these points in ggplot?
I'm trying to plot a graph, with the following code.
ggplot(test2, aes(x = Month, y = Spend, color = YEAR)) + geom_point()
The output looks like below. However, I want to join the points/draw line for each year. I tried geom_line and geom_abline, but they are not working. Is there any way I can acheive this?
Dataset:
structure(list(Month = c("01", "01", "02", "02", "03", "03", "04", "04", "05", "05", "06", "06", "07", "07", "08", "08", "09", "09", "10", "10", "11", "11", "12", "12"), YEAR = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("2016", "2017"), class = "factor"), Spend = c(66142.27, 75735, 61247.19, 65126.4, 65947.08, 73293.63, 63489.61, 72500.34, 64634.54, 69689.61, 60988.69, 67231.09, 64966.94, 72014.3, 66683.24, 70857.17, 65637.03, 68606.12, 69224.13, 71083.37, 65561.6, 70094.81, 66152.87, 67784.81 )), row.names = c(NA, 24L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars = "Month", drop = TRUE, indices = list( 0:1, 2:3, 4:5, 6:7, 8:9, 10:11, 12:13, 14:15, 16:17, 18:19, 20:21, 22:23), group_sizes = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), biggest_group_size = 2L, labels = structure(list( Month = c("01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12")), row.names = c(NA, 12L), class = "data.frame", vars = "Month", drop = TRUE))
 Is this a good graph?

R: Cannot Output Plots To Files From A Loop
I am trying to output a series of the file using the R.
Usually, we can use the following code to output a plot:
jpeg("XXXXX_XXXX.jpg") ggplot(data=YEAR_ZIP_DATA, aes(x=SOME_VARIABLE)) + geom_bar() dev.off()
The above code can get me a file in the current working directory called
XXXXX_XXXX.jpg
Now I want to write a loop to create a series of file: for each year, draw a bar chart for each zip code and save to the current directory. Here is the code:
# year_list: a list of distinctive years for(year in year_list){ # zip_list: a list of distinctive zip codes for(zip in zip_list){ # some code to get a filename like 10010_2018.jpg filename < (some code) # some code to subset the data to get the current zip and year year_zip_data < (some code) jpeg(filename) ggplot(data=year_zip_data, aes(x=SOME_VARIABLE)) + geom_bar() dev.off() } }
However, after the above loop, there is nothing in the current working directory... How should I solve the problem?
Thanks in advance!

Excel Normal Distribution for proving null hypothesis
i been trying to prove the null hypothesis for a question for step 4 question 2
with this data in excel : excel table
i have been following my college's presentation file formula to get h8, i assume it is a random variable ? my presentation file is like this slide1 slide2 slide3
and my h8 excel table formula is
=SQRT(D2/SQRT(E2)^2+(D9)/SQRT(E9)^2)
and the h7 excel table is only subtraction of 2017 average with 2010 average
and my normal distribution formula is
=NORMDIST(H7,0,H8,TRUE)
is it suppose to return 0 in normal distribution formula? is the value correct for the question i have thanks

STATA: simulating normal distribution 1000 times and showing it on histogram
I am enrolled in a STATA course and we were given a homework. So far I haven't figured out the solution (but i genuinely tried at least). Now I am turning to you. The task goes like this:
"Simulate 10001000 samples with n observations (where n = 1, 100, 10000 and 100000) on the [0,1] interval as normal distribution. Provide histograms to prove the consistency and asymptotic normal distribution"
I understand the underlying statistical theory we are supposed to prove, it is just I haven't got a clue how to script it. Thanks in advance!

How to normalize a nonnormal distribution?
I have the above distribution with a mean of
0.02
, standard deviation of0.09
and with a sample size of13905
.I am just not sure why the distribution is is leftskewed given the large sample size. From bin [2.0 to 0.5], there are only 10 sample count/outliers in that bin, which explains the shape.
I am just wondering is it possible to normalize to make it more smooth and 'normal' distribution. Purpose is to feed it into a model, while reducing the standard error of the predictor.