overlay a normal distribution to a histogram of nonnormally distributed values in ggplot r
I'm trying to overlay a normal bell curve on top of the histogram of these fake data that are intentionally NOT normally distributed. My goal is to show other students how nonnormally distributed data look in comparison to a normal distribution.
While I have figured out how to get the bell curve on from other questions that have been asked, my y axis is acting strange. For a density plot, I would assume that the axis would go from 0 to 1, but for some values, it says the density is 2 (see image of screenshot below). I want bars that show the density and a bell curve that shows the normal distribution. Any help would be appreciated!
Here's the fake dataset:
library(dplyr)
tester2 < tibble(
fake = c(2, 2, 2, 2, 10, 10, 10, 10, 5, 3, 4, 5, 6, 7, 8, 9, 10, 10, 5, 2, 4, 5, 6, 7, 8, 4, 4, 5, 5, 2, 2, 2, 2, 2, 10, 10, 10, 10, 5, 2, 2, 2, 2, 2, 10, 10, 10, 10, 5, 2, 2, 2, 2, 2, 10, 10, 10, 10, 5, 2, 3, 4, 5, 5, 5, 5, 5, 4, 6, 5),
also_fake = c(1, 2, 2, 2, 3, 3, 3.3, 4, 4, 5, 1, 2, 2, 2, 3, 3.6, 3, 4, 4, 5, 1, 2, 2, 2.1, 3, 3, 3, 4, 4, 5, 1, 2, 2, 2, 3.1, 3, 3, 4.6, 4, 5, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5)
)
Here's my code so far:
testing < ggplot(tester2, aes(x = also_fake)) +
geom_histogram(aes( y = ..density..)) +
geom_rug() +
stat_function(fun = dnorm,
color = "blue",
args=list(mean = mean(tester2$also_fake),
sd = sd(tester2$also_fake)))
And here's what it produces:
EDIT: This question is different from this question because I do not want a density plot: Superimpose a normal distribution to a density using ggplot in R
It is also different from this question because my values are intentionally nonnormally distributed: ggplot2: histogram with normal curve.
See also questions close to this topic

DiagrammeR export_graph Invalid asm.js
I'm having a problem exporting graphs in
R
to PDFs usingDiagrammeR's
export_graph
function inRStudio
.Example below to reproduce the problem. The PDFs are produced inconsistently so sometimes not at all.
The error message I get is on calling the export_graph in the code snipet below.
I'm using RStudio Version 1.1.463 and R 3.5.2 on Windows 10.
"\<"unknown">":1919791: Invalid asm.js: Function definition doesn't match use"
library(data.tree) library(yaml) library(DiagrammeR) library(DiagrammeRsvg) fileName < system.file("extdata", "jennylind.yaml", package="data.tree") cat(readChar(fileName, file.info(fileName)$size)) lol < yaml.load_file(fileName) jl < as.Node(lol) pic < ToDiagrammeRGraph(jl) render_graph(pic) export_graph(pic, "C:/Tmp/plot.pdf", file_type = "pdf")

Plotting in ggplot after converting to data.frame with a single column?
I'm trying to convert some simple data into a form I thought ggplot2 would accept.
I snag some simple stock data and now I just want to plot, later I want to plot say a 10day moving average or a 30day historical volatility period to go with it, which is I'm using ggplot.
I thought it would work something like this line of pseudocode
ggplot(maindata)+geom_line(moving average)+geom_line(30dayvol)
library(quantmod) library(ggplot2) start = as.Date("20080101") end = as.Date("20190213") start tickers = c("AMD") getSymbols(tickers, src = 'yahoo', from = start, to = end) closing_prices = as.data.frame(AMD$AMD.Close) ggplot(closing_prices, aes(y='AMD.Close'))
But I can't even get this to work. The problem of course appears to be that I don't have an xaxis. How do I tell ggplot to use the index column as a. Can this not work? Do I have to create a new "date" or "day" column?
This line for instance using the Regular R plot function works just fine
plot.ts(closing_prices)
This works without requiring me to enter a hard xaxis, and produces a graph, however I haven't figured out how to layer other lines onto this same graph, evidently ggplot is better so I tried that.
Any advice?

Using scale_color_gradient2 with a variable of class Date
I'm trying to color by date with ggplot2, but when I try to customize the color using
scale_color_gradient2
, I get an error sayingError in as.Date.numeric(value) : 'origin' must be supplied
.I can't seem to figure out how to pass the origin to
scale_color_gradient2
.I've provided an example below. Any advice?
set.seed(1) x1 < rnorm(100) x2 < rnorm(100) day < sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 100) myData < data.frame(x1, x2, day) # this plot works as expected ggplot(myData, aes(x = x1, y = x2, color = day)) + geom_point() # scale_color_gradient2() asks for an origin, but I can't figure out how to supply one ggplot(myData, aes(x = x1, y = x2, color = day)) + geom_point() + scale_color_gradient2()

Geom_line making wrong connections, wrong data format?
Now this isn't right. I have a data set that for each team should provide a time series line, one called xG, one called xGA.
Instead I get this monstrosity:
The data
TimeSeriesxGxGA
for the plot looks like this. Here's the first 10 lines:Team, Date, variable, value Aston Villa, 20181218, xG, 37.56 Birmingham City, 20181218, xG, 34.30 Blackburn Rovers, 20181218, xG, 33.55 Bolton Wanderers, 20181218, xG, 19.575 Brentford, 20181218, xG, 35.03 Bristol City, 20181218, xG, 32.43 Derby County, 20181218, xG, 27.73 Hull City, 20181218, xG, 28.91 Ipswich Town, 20181218, xG, 15.61 Leeds United, 20181218, xG, 34.61
It goes on like that for 384 lines and xG and xGA are variables that all increase over time.
The plot call is like so:
ggplot(TimeSeriesxGxGA) + geom_line(aes(Date, value, colour = variable), size = 1) + ylim(0, 55) + theme(axis.text.x = element_text(angle = 45, hjust = 1, face="bold")) + labs(title = "xG vs xGA since midDec", x = "", y = "")
Where am I going wrong? x and y must be right (Date and value). Surely, the colour of the lines is what should be distinguished by colour?
Just for reference, here's a plot of one team's xG and xGA. The lines should look like this:

Circlize plot  Histograms with same Y axis
I'm using the pakage circlize to draw histograms of two different bed (dataframes). I could do the histograms in two diffrerent tracks using the "circos.trackHist" function, but I cant compare them because they have different scales in the Y axis. This function has a parameter to force the scales between the cells of the same track (force.ylim=TRUE), but I couldn't found a way to force the Y scales between tracks.
Is this possible?

matplotlib density graph / histogram
I have an array: [0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,1] (16 long) How can I create density histogram with, for instance bins=4 to see where there appears to be most 1:s? This histogram would for instance be very tall in the middle part, and raise at the end a little (most 1:s in the beginning and the end). I have this:
plt.hist([0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,1], bins=4)
This is what i get. This histogram just presents that it's as many 1:s as 0:s.
How can I later create a graph (line) to show me the average raise and fall och the histogram?

Mapping Histograms on the image
Features extracted using the HoG method is a single vector. Is there any way to map these histograms on the image and use the mapped image for the further processing to extract the features.
img = imread('cameraman.tif'); [featureVector,hogVisualization] = extractHOGFeatures(img);

Plotting with R
I'm trying to learn R and various techniques for plotting graphs with R .how can i plot a graph similar to the below graph using libraries such as ggplot.

How to use a colored shape as yticks in matplotlib or seaborn?
I am working on a task called knowledge tracing which estimates the student mastery level over time. I would like to plot a similar figure as below using the Matplotlib or Seaborn.
It uses different colors to represent a knowledge concept, instead of a text. However, I have googled and found there is no article is talking about how we can do this.
I tried the following
# simulate a record of student mastery level student_mastery = np.random.rand(5, 30) df = pd.DataFrame(student_mastery) # plot the heatmap using seaborn marker = matplotlib.markers.MarkerStyle(marker='o', fillstyle='full') sns_plot = sns.heatmap(df, cmap="RdYlGn", vmin=0.0, vmax=1.0) y_limit = 5 y_labels = [marker for i in range(y_limit)] plt.yticks(range(y_limit), y_labels)
Yet it simply returns the
__repr__
of the marker, e.g.,<matplotlib.markers.MarkerStyle at 0x1c5bb07860>
on the yticks.Thanks in advance!
Updated I used the solution provided by ImportanceOfBeingErnest. I provided the Seaborn implementation as below:
question_symbols = { 'hollow': '⚪', 'fill': '⚫' } # Assume there are at most 10 questions colors_list = sns.color_palette(n_colors=10) # Prepare the dataset question_ids = [26, 30, 41, 70] question_sequence = [ 30, 30, 30, 70, 70, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 41, 41, 41, 41, 41, 41, 41, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 41, 41 ] answer_sequence = [ 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1 ] y_pred = (np.random.rand(4, 50)  0.5) * 2 question_color_map = {} for i, qid in enumerate(question_ids): question_color_map[qid] = colors_list[i] # Prepare the sequence of colors and symbols y_colors = [question_color_map[qid] for qid in question_ids] x_colors = [question_color_map[qid] for qid in question_sequence] y_symbols = [question_symbols['fill'] for qid in question_ids] x_symbols = [question_symbols['hollow'] if correct == 0 else question_symbols['fill'] for correct in qa_data_batch[0]//n_questions] # Plot the headmap plt.figure(figsize=(25, 1.5)) sns.set_style({ 'xtick.top': False, 'xtick.bottom': False, 'ytick.left': False }) sns_plot = sns.heatmap( y_pred, cmap="RdYlGn", vmin=1.0, vmax=1.0, xticklabels=x_symbols, yticklabels=y_symbols, square=True ) sns_plot.set_yticks(np.arange(len(y_symbols)) + 0.2) sns_plot.set_yticklabels(y_symbols, size=30) for tick, color in zip(sns_plot.get_yticklabels(), y_colors): tick.set_color(color) sns_plot.set_xticks(np.arange(len(x_symbols)) + 0.5) sns_plot.set_xticklabels(x_symbols, size=30) for tick, color in zip(sns_plot.get_xticklabels(), x_colors): tick.set_color(color)
The result will be similar as follows:

How to show a Waterfall Chart in a Data Table using ReactJS
What I'm try to do is to render a waterfall chart in data table. I am basically recreating the GUI for the Google Chrome Developer Tool's "Network" tab (picture provided). I have all the columns and rows rendered but I can seem to recreate the "Waterfall" column just like depicted in the google dev tools GUI. Any idea on how I can go about this?
Here is the picture for the google dev tools network tab im trying to recreate googledevtoolgui

How to use exp and sqrt properties correctly
use double precision use sqrt() and exponential function exp() use * to compute the square do not use pow()
I am getting values they are just not anything as to what I expected. I tried making them all signed but it didn't change anything and I've tried printing out with 12 decimal places and nothing seems to be working.I have linked the math library and defined it as well.
double normal(double x, double sigma, double mu) { double func = 1.0/(sigma * sqrt(2.0*M_PI)); double raise = 1.0/2.0*((xmu)/sigma); double func1 = func * exp(raise); double comp_func = (func1 * func1); return comp_func; } int main(void) { // create two constant variables for μ and σ const double sigma, mu; //create a variable for x  only dynamic variable in equation unsigned int x; //create a variable for N values of x to use for loop int no_x; //scaniing value into mu printf("Enter mean u: "); scanf("%lf", &mu); //scanning value into sigma printf("Enter standard deviation: "); scanf("%lf", &sigma); //if sigma = 0 then exit if(sigma == 0) { printf("error you entered: 0"); exit(0); } //storing number of x values in no_x printf("Number of x values: "); scanf("%d", &no_x); //the for loop where i am calling function normal N times for(int i = 1; i <= no_x; i++) { //printing i for the counter in prompted x values printf("x value %d : ", i); // scanning in x scanf("%lf", &x); x = normal(x,sigma,mu); printf("f(x) = : %lf.12", x); printf("\n"); } return 0; }
C:>.\a.exe Enter mean u: 3.489 Enter std dev s: 1.203 Number of x values: 3 x value 1: 3.4 f(X) = 0.330716549275 x value 2: 3.4 f(X) = 0.000000025104 x value 3: 4 f(X) = 0.303015189801
But this is what I am receiving
C:\Csource>a.exe Enter mean u: 3.489 Enter standard deviation: 1.203 Number of x values: 3 x value 1 : 3.4 f(x) = : 15086080.000000 x value 2 : 3.4 f(x) = : 15086080.000000 x value 3 : 4 f(x) = : 1610612736.000000

random.normalvariate Vs numpy.random.normal
Do 'random.normalvariate' and 'numpy.random.normal' perform the same task of generating values for the Normal distribution?
If they do not, what are the differences and which should be used when?

3sigma rule example and min / max values
We have a sample  data from an experiment, for example masses of women. We measure every woman two times to be more precise and take the mean value as a result. We know that the population should have normal distribution. Nevertheless, to conduct further analyses we perform normality testy and  let's consider three cases:
A. Sample has normnal distribution.
B. Sample does not have normal distribution but there are outliers (after removal we have normality).
C. Sample does not have normal distribution and there are no outliers.
My questions:
In which case it is true that the 3sigma range = mean ± 3*SD will cover 99.7% of the population?
Is it in any case possible that min and max values from the samples will be inside the 3sigma range = mean ± 3*SD?
Will it be possible that min and max values from the samples will be inside the 3sigma range = mean ± 3*SD, if we calculate SD as a sum of between observations variation (normaly undestood SD) and within sample variation (we measure woman two times)? Should we in any case calculate SD in such way?