Why is it scipy.stats.gaussian_kde() slower than seaborn.kde_plot() for the same data?
In python 3.7, I have this numpy array with shape=(2, 34900). This arrays is a list of coordinates where the index 0 represents the X axis and the index 1 represents the y axis.
When I use seaborn.kde_plot() to make a visualization of the distribution of this data, I'm able to get the result in about 515 seconds when running on a i5 7th generation.
But when I try to run the following piece of code:
#Find the kernel for
k = scipy.stats.kde.gaussian_kde(data, bw_method=.3)
#Define the grid
xi, yi = np.mgrid[0:1:2000*1j, 0:1:2000*1j]
#apply the function
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
which finds the gaussian kernel for this data and applies it to a grid I defined, it takes much more time. I wasn't able to run the full array but when running on a slice with the size of 140, it takes about 40 seconds to complete.
The 140 sized slice does make an interesting result which I was able to visualize using plt.pcolormesh()
.
My question is what I am missing here. If I understood what is happening correctly, I'm using the scipy.stats.kde.gaussian_kde()
to create an estimation of a function defined by the data. Then I'm applying the function to a 2D space and getting it's Z component as result. Then I'm plotting the Z component. But how can this process be any different from
seaborn.kde_plot()
that makes the code take so much longer.
Scipy's implementation just goes through each point doing this:
for i in range(self.n):
diff = self.dataset[:, i, newaxis]  points
tdiff = dot(self.inv_cov, diff)
energy = sum(diff*tdiff,axis=0) / 2.0
result = result + exp(energy)
1 answer

Seaborn has in general two ways to calculate the bivariate kde. If available, it uses
statsmodels
, if not, it falls back toscipy
.The scipy code is similar to what is shown in the question. It uses
scipy.stats.gaussian_kde
. The statsmodels code usesstatsmodels.nonparametric.api.KDEMultivariate
.However, for a fair comparisson we would need to take the same grid size for both methods. The standard gridsize for seaborn is 100 points.
import numpy as np; np.random.seed(42) import seaborn.distributions as sd N = 34900 x = np.random.randn(N) y = np.random.randn(N) bw="scott" gridsize=100 cut=3 clip = [(np.inf, np.inf), (np.inf, np.inf)] f = lambda x,y : sd._statsmodels_bivariate_kde(x, y, bw, gridsize, cut, clip) g = lambda x,y : sd._scipy_bivariate_kde(x, y, bw, gridsize, cut, clip)
If we time those two functions,
# statsmodels %timeit f(x,y) # 1 loop, best of 3: 16.4 s per loop # scipy %timeit g(x,y) # 1 loop, best of 3: 8.67 s per loop
Scipy is hence twice as fast as statsmodels (the seaborn default). The reason why the code in the question takes so long is that instead of a grid of size 100, a grid of size 2000 is used.
Seeing those results one would actually be tempted to use scipy instead of statsmodels. Unfortunately it does not allow to choose which one to use. One hence needs to manually set the respective flag.
import seaborn.distributions as sd sd._has_statsmodels = False # plot kdeplot with scipy.stats.kde.gaussian_kde sns.kdeplot(x,y)
See also questions close to this topic

Divide each value in list array
I am trying to divide by 80 of each array value in the list. What I have tried is,
dfs = pd.read_excel('ff1.xlsx', sheet_name=None) dfs1 = {i:x.groupby(pd.to_datetime(x['date']).dt.strftime('%Y%m%d'))['duration'].sum() for i, x in dfs.items()} d = pd.concat(dfs1).groupby(level=1).apply(list).to_dict() print(d)
OP :
{'20170506': [197, 250], '20170507': [188, 80], '20170508': [138, 138], '20170509': [216, 222], '20170609': [6]}
But Expected OP :
1 : Divide by 80 {'20170506': [2, 3], '20170507': [2, 1], '20170508': [2, 2], '20170509': [2, 2], '20170609': [0]} 2 : total of each array and subtract each value (3+2 = 53 and 52) {'20170506': [3, 2], '20170507': [1, 2], '20170508': [2, 2], '20170509': [2, 2], '20170609': [0]}
How to do this using python?

Export list of data frames to CSVs in python
I've got a list of data frames I'm trying to preform a function on and export the results. The function spits out a result, and I then want to turn the results into a data frame and export to a .CSV. Here's what I currently have:
for df, filename in zip(df_list, filename_list): function(df) results_df = pd.DataFrame(function_results) results_df.to_csv(filename)
The error occurs when I try to export the .csv. If I just run the loop with the function and print results to the console like so:
for df in df_list: function(df)
It works fine. When I try to loop the .csv export though I get
Attribute Error: 'list' object has no attribute 'close'
Any ideas?

Tips on saving distribution shape parameters to sqlite3
I have been playing around with distributions in Python using scipy. I have some code which produces a dataframe containing fit parameters for my various data fits. Example is shown below
fit = pd.DataFrame({'domain': ['T1','T1', 'T1'], 'type':['A1', 'A2', 'A3'], 'dist':['triang', 'triang', 'trapz'], 'shp':[(0.1, None), (0.1, None), (0.2, 0.8)], 'loc':[3, 100, 85], 'scale':[60, 50, 95]})
I have a range of these inputs using different distributions, so the shape parameters change and need to be kept in the right order. Therefore, when It is retrieved from the database, it will fit in with the distribution order.So for example, using the Triangle distribution: (c, d, loc, scale).
I have tried to convert the tuple to a string (keeping the brackets) so it can be viewed in the database. For example, for the 3rd row.
a = str(fit['shp'][2])
which produces
'(0.2, 0.8)'
. This can be saved to the Sqlite3 database.However, I am not to sure how to convert it back to a tuple?
I did try,
b = tuple(a)
But this does not seem to produce the desired result, as shown below.
('(', '0', '.', '2', ',', ' ', '0', '.', '8', ')')
What would you guys recommend about saving the shape tuple to a database?
BJR

extrapolation of the curve
I am trying to extrapolate the fitting curve. There are many questions posted with a similar problem, but I don't understand any of them. I tried the simplest of the solutions, however, it is not working. Can anyone of you review the code please?
import numpy as np import matplotlib.pyplot as plt from scipy.optimize import curve_fit old_settings = np.seterr(all='ignore') x=np.array([0.21, 0.43, 0.50, 0.65, 0.86, 1.0, 1.5, 2.0, 2.5, 3.0, 4.0]) y=[43, 33, 30, 24, 18, 16, 14, 13, 14, 13, 13] yerr= [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2] xerr=[0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01, 0.01,0.01,0.01] def func(x, a, b, c, d): return a * np.exp(b * x) + c * np.exp(d * x) def func1(x,m,n): return m *np.exp (n*x) def func2(x, s, t): return s *np.exp(t*x) # Here you give the initial parameters for a,b,c which Python then iterates over # to find the best fit popt, pcov = curve_fit(func,x,y,p0=(1, 1e6, 0.5, 1)) print(popt) # This contains your three best fit parameters p1 = popt[0] # This is your a p2 = popt[1] # This is your b p3 = popt[2] # This is your c p4 = popt[3] # This is your d residuals = y  func(x,p1, p2, p3,p4) fres = sum( (residuals**2)/func(x,p1, p2, p3,p4) ) # The chisqaure of your fit print(fres) """ Now if you need to plot, perform the code below """ curvey = func(x, p1, p2,p3,p4) # This is your y axis fitline curvey1 = func1(x, p1,p2) # This is your y axis fitline curvey2 = [func2(i,p3,p4) for i in x] # This is the curve I want to extrapolate to x=0 plt.plot(x, curvey, 'red', label= r"Fit: $A \cdot e^{a \cdot x}+ B \cdot e^ {b \cdot x}$" "\n" r"$\chi ^2: 0.52 $" ) plt.plot(x, curvey1, 'blue', label= r"Fit: $A \cdot e^{a \cdot x}$") plt.plot([0]+x, curvey2, 'green', label= r"Fit: $B \cdot e^{b \cdot x}$") plt.errorbar(x,y, yerr=yerr, xerr=xerr, fmt='.',label='experimental data') plt.legend(loc='best') plt.ylim(0,45) plt.xlabel('x') plt.ylabel('y') plt.show()
The curve I want to extrapolate is given in the line 42.

Numpy/Scipy: Singular Matrix Calculating probability of multivariate observation
I'm trying to calculate probabilities for observations in matrices where my rows are observations and my columns are features using python. I always get singular matrix errors, even when using random matrices, so I suspect something is wrong with my code:
from scipy.stats import multivariate_normal import numpy as np def get_mean(x, axis=0): return x.mean(axis=axis) def get_sigma(x): return np.cov(x, rowvar=False) def get_probabilities(x, mu, sigma): return multivariate_normal.pdf(x, mean=mu, cov=sigma) x = np.random.rand(10,10) t = np.random.rand(1, 10) mu = get_mean(x) sigma = get_sigma(x) p = get_probabilities(t, mu, sigma)
This results in:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in get_probabilities File "/usr/local/lib/python3.5/distpackages/scipy/stats/_multivariate.py", line 512, in pdf psd = _PSD(cov, allow_singular=allow_singular) File "/usr/local/lib/python3.5/distpackages/scipy/stats/_multivariate.py", line 159, in __init__ raise np.linalg.LinAlgError('singular matrix') numpy.linalg.linalg.LinAlgError: singular matrix
What am I doing wrong?

Are these frequent calls to a function that returns a fixed value in dblquad avoidable?
This script calculates the electric field at (x0, y0, z0) from a uniformly charged annulus using SciPy's dblquad.
My question is about the use of
gfun
andhfun
which define the functional dependence of the inner integral limits on the outer integral variable. This would be a cookiecutter shape if I'd chosen to integrate in cartesian coordinates, but when I use cylindrical coordinates the functions return constant floats.Is there a way to eliminate these function calls that simply return constants in order to avoid the time penalty of the function calls?
The example may not be optimized in other ways, but it's just a simple example to show the use of
gfun
andhfun
.def Excalc(r, th): x, y, z = r*np.cos(th), r*np.sin(th), 0.0 return (x0x) * ((x0x)**2 + (y0y)**2 + (z0z)**2)**1.5 def Eycalc(r, th): x, y, z = r*np.cos(th), r*np.sin(th), 0.0 return (y0y) * ((x0x)**2 + (y0y)**2 + (z0z)**2)**1.5 def Ezcalc(r, th): x, y, z = r*np.cos(th), r*np.sin(th), 0.0 return (z0z) * ((x0x)**2 + (y0y)**2 + (z0z)**2)**1.5 def gfun(x): return rmin def hfun(x): return rmax import numpy as np import matplotlib.pyplot as plt from scipy.integrate import dblquad twopi = 2.*np.pi # annulus of uniform, unit charge density rmin, rmax = 0.8, 1.2 thmin, thmax = 0, twopi # point to evaluate the field x0, y0, z0 = 1.5, 0, 1 eps = 1E10 Ex, Exerr = dblquad(Excalc, thmin, thmax, gfun, hfun, epsrel=eps) Ey, Eyerr = dblquad(Eycalc, thmin, thmax, gfun, hfun, epsrel=eps) Ez, Ezerr = dblquad(Ezcalc, thmin, thmax, gfun, hfun, epsrel=eps) print Ex, Ey, Ez print Exerr, Eyerr, Ezerr

Degree of Irregularity (DOI)
As part of my research, I need to implement a path loss model based on the degree of irregularity (DOI).
The model uses Weibull distribution.
I have the collected data for every 5 degree (72 degrees in total) and from this data I need to find out the overall DoI value for the entire rotation.
I need to implement this in Matlab and I was wondering if somebody could help me by explaining how to compute the value. Sample dataset
References http://behnam.dezfouli.com/publication/MLPWCJNCA2014.pdf (page 109)
https://www.usenix.org/legacy/publications/library/proceedings/mobisys04/pdf/p125zhou.pdf (page 129)

Fitting distribution to data: fitdist() in library "fitdistrplus" in R asks for definition of the exponential distribution
I want to fit a distribution in a sequence of values stored in a vector.
To that end I use the package fitdistrplus.
The data are the following:
Y = c(43, 78, 69, 73, 48, 76, 65, 74, 76, 79, 83, 81)
There are 2 steps in fitting a distribution with fitdsitrplus. Firstly, one identifies the general type of distribution and secondly one determines its parameters.
The code for the first step:
descdist(Y, boot = 1000)
The output:
One thing that strikes me is that the beta distribution fits well the data according to fitdistrplus inspite of the fact that the Y values are not in the [0, 1] interval. Any ideas on that?
Anyways, I picked the exponential and then asked fitdistrplus to estimate its parameters. I tried to do it in 3 different ways, however all returned exceptions. The code was:
fw < fitdist(Y, distr = "exponential") summary(fw) Error in fitdist(Y, distr = "pexp"): The dpexp function must be defined fw < fitdist(Y, distr = "dexp") summary(fw) Error in fitdist(Y, distr = "dexp"): The ddexp function must be defined fw < fitdist(Y, distr = "pexp") summary(fw) Error in fitdist(Y, distr = "pexp"): The dpexp function must be defined
How should I tackle this problem? What I am doing wrong?
Is there some other way to fit distributions to data that you believe would be a better choice? (it should detect the distribution type and estimate its parameters).
Your advice will be appreciated.

Generate random variables from specified distribution in R
Rather than using the rnorm, rgamma, rlogis, etc functions, is there a function that takes a distribution (for instance, one you may have fit using fitdistrplus, say) as an argument and generates random values from it?

How to adjust x axis labels in a seaborn plot?
I am trying to plot histogram similar to this:Actual plot
However, I am unable to customize the x axis labels similar to the above figure. My seaborn plot looks something like this, my plot
I want the same xaxis labels ranging from 0 to 25000 with equal interval of 5000. It would be great if anyone can guide me in the right direction?
Code for my figure:
sns.set_style('darkgrid') kws = dict(linewidth=.3, edgecolor='k') g = sns.FacetGrid(college, hue='Private',size=6, aspect=2, palette = 'coolwarm') g = g.map(plt.hist, 'Outstate', bins=24,alpha = 0.7,**kws).add_legend()

Add Second Colorbar to a Seaborn Heatmap / Clustermap
I was trying to help someone add a colorbar for the vertical blue bar in the image below. We tried many variations of
plt.colorbar(row_colors)
(like above and belowsns.clustermap()
) and looked around online for 2 hours, but no luck. We just want to add a colorbar for the blues, please help!import pickle import numpy as np import seaborn as sns import pandas as pd import matplotlib.pyplot as plt feat_mat, freq, label = pickle.load(open('file.pkl', 'rb')) feat_mat_df = pd.DataFrame(feat_mat[4]) freq_df = pd.DataFrame(freq) freq_df_transposed = freq_df.transpose() my_palette = dict(zip(set(freq_df_transposed[int('4')]), sns.color_palette("PuBu", len(set(freq_df_transposed[int('4')])))))) row_colors = freq_df_transposed[int('4')].map(my_palette) sns.clustermap(feat_mat_df, metric="euclidean", standard_scale=1, method="complete", cmap="coolwarm", row_colors = row_colors) plt.show()

Changing color scale/gradient vertically in bar like plot using seaborn
I wanted to have vertical gradient for each bar of the seaborn barplot/countplot ,
#to reproduce above plot import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set(style="whitegrid", color_codes=True) np.random.seed(sum(map(ord, "categorical"))) titanic = sns.load_dataset("titanic") sns.countplot(x="deck", data=titanic, palette="Greens_d") plt.show()
This image has horizontal gradient but I want the gradient to be vertical, like the linear down or linear up gradient in Excel https://support.office.com/enus/article/addagradientcolortoashape11cf6392723c4be8840ab2dab4b2ba3e
See the example here from https://matplotlib.org/gallery/lines_bars_and_markers/gradient_bar.html for vertical gradient Neglect the background, colour is immaterial.
p.s. Newbie to seaborn

Gaussian function model in R
I have a dataset of barnacle density and coral cover by photo from two coral reef locations. I want to see if there is a pattern in barnacle density with depth or coral cover.
I have tried linear models and a negative binomial with the formula
m2 < glm.nb(dens.cm ~ depth + coral.cover+location+depth:location, data =data)
However, after looking at a distribution of the density data with depth, I think a Gaussian function may better explain the patterns.
Density of barnacles per m2 by depth (m) and location
I am looking for advice on how to design a Gaussian model for my data in R. Any advice is appreciated!
> dput(dat) structure(list(photo = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L, 111L, 112L, 113L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L, 91L, 92L, 93L, 94L, 114L, 115L, 116L, 117L, 118L, 119L, 120L, 121L, 122L, 123L, 124L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L, 133L, 134L), .Label = c("101", "102", "103", "104", "105", "106", "107", "108", "201", "202", "203", "204", "205", "206", "207", "208", "209", "210", "211", "212", "301", "302", "303", "304", "305", "306", "307", "501", "502", "503", "504", "505", "506", "507", "508", "509", "510", "511", "512", "513", "601", "602", "603", "604", "605", "606", "607", "608", "609", "6157", "6173", "6177", "6178", "6181", "6182", "6199", "6201", "6202", "6203", "6210", "6211", "6214", "6222", "6237", "6241", "6245", "6256", "6260", "6261", "6296", "6297", "6299", "6302", "6304", "6308", "6309", "6311", "6312", "6313", "6314", "6315", "6320", "6322", "6323", "6324", "6325", "6326", "6327", "6328", "6329", "6424", "6426", "6428", "6431", "701", "702", "703", "704", "705", "706", "707", "708", "709", "801", "802", "803", "804", "805", "806", "807", "808", "809", "810", "D01", "D02", "D03", "D04", "D05", "D06", "D07", "D08", "D10", "D11", "D12", "D13", "D14", "D15", "D16", "D17", "D18", "D19", "D20", "D21", "D22"), class = "factor"), location = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("fgb", "usvi"), class = "factor"), depth = c(19.5072, 19.812, 21.5, 20.7264, 21.336, 19.5072, 19.812, 20.0312, 21.9456, 23.4696, 23.4696, 24.0792, 23.1648, 23.4696, 21.336, 19.5072, 20.1168, 20.7264, 21.0312, 21.0312, 21.9456, 20.4216, 19.5072, 21.0312, 22.2504, 21.9456, 20.4216, 20.4216, 20.4216, 21.336, 20.7264, 20.7264, 20.4216, 20.4216, 19.812, 20.1168, 20.1168, 20.7264, 19.812, 21.9456, 22.86, 22.2504, 21.9456, 22.5552, 22.2504, 21.0312, 21.336, 21.336, 21.6408, 23.4696, 23.7744, 21.9456, 22.2504, 22.2504, 21.6408, 22.2504, 22.2504, 21.5, 23.1648, 22.5552, 22.2504, 22.5552, 22.2504, 21.9456, 21.85, 22.2504, 24.0792, 22.2504, 15, 15, 15, 15, 15, 15, 13, 13, 13, 13, 13, 13, 13, 21, 21, 21, 21, 7, 7, 7, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 38, 38, 38, 38, 32.6992, 29.5656, 31.0896, 31.0896, 32.6136, 33.8328, 35.3568, 35.3568, 31.0896, 37.7952, 29.5656, 31.0896, 31.0896, 32.6136, 33.8328, 35.3568, 35.3568, 36.8808, 37.7952, 37.7952, 38.1), dens.m = c(267.86719, 350.47852, 431.81125, 622.71004, 599.24271, 1420.18674, 193.38521, 161.44909, 910.49021, 110.35386, 479.12616, 408.42407, 315.60503, 74.8805, 104.48846, 137.99029, 469.71577, 356.37609, 950.49046, 272.49611, 528.00183, 269.93556, 480.50256, 118.2897, 185.00516, 438.49583, 276.08897, 227.43988, 86.33476, 185.46051, 84.80511, 451.02732, 400.5159, 163.67933, 90.92022, 137.38598, 202.10666, 159.44588, 197.77431, 453.77111, 101.17702, 134.19122, 122.93134, 429.97449, 430.17319, 1153.40396, 214.65884, 1342.54685, 578.08208, 578.44438, 252.6739, 2174.60653, 354.51124, 340.84014, 390.41988, 244.08631, 806.81267, 651.94004, 57.84774, 303.84401, 411.5247, 555.01574, 118.71732, 94.01832, 572.41467, 444.28938, 123.78678, 320.6036361, 0, 0, 49.41053235, 0, 125.6693464, 0, 93.84212658, 198.2007337, 327.6507767, 907.6881184, 0, 239.4739237, 0, 0, 443.5415909, 0, 51.88753895, 401.7879564, 0, 428.9613238, 0, 17.05628117, 0, 0, 0, 62.93519689, 0, 14.42007124, 0, 0, 0, 52.11494159, 0, 0, 0, 0, 0, 0, 0, 10.83275387, 141.8632389, 0, 0, 0, 0, 446.919281, 132.8611692, 143.198051, 33.05694578, 167.1561242, 51.78159277, 99.97872, 75.88997, 502.1027409, 354.7612359, 18.01753245, 59.73474983, 101.6708376, 192.2764503, 279.5383788, 138.1696187, 289.6458105, 166.5402349, 65.25117077, 649.1753683, 346.42269), coral.cover = c(28.52606, 11.05908, 31.28802, 28.91658, 3.54822, 12.18002, 16.72137, 1.92059, 23.42574, 64.22509, 37.25867, 48.04682, 58.10703, 36.08555, 45.99744, 67.4129, 41.21151, 53.32379, 14.54049, 40.63984, 57.09064, 42.2561, 39.77932, 23.7793, 35.67588, 28.4876, 35.53832, 21.61865, 35.1461, 14.45028, 45.70443, 52.544, 53.58537, 27.60442, 16.56497, 6.12609, 31.23248, 48.8958, 25.30934, 40.41436, 28.02014, 36.47627, 28.28651, 13.44436, 25.07424, 38.02122, 49.11345, 7.12683, 24.52069, 15.27754, 35.67601, 8.35171, 1.87428, 6.0433, 20.08231, 13.70174, 39.39322, 9.61437, 10.3376, 50.15105, 37.62041, 39.14767, 41.23067, 38.1632, 46.12196, 16.10196, 36.32152, 44.90422, 2.0575, 12.13155, 5.20272, 5.34756, 4.0912, 0.60427, 5.47876, 1.29702, 0.78458, 0.56643, 0.75587, 2.14695, 8.99664, 0.73209, 1.15917, 1.40533, 4.95436, 0.63981, 1.03059, 1.19857, 0.38732, 60.28733, 25.67675, 10.33979, 13.07546, 4.08467, 6.10119, 35.65439, 5.54589, 15.93534, 6.06176, 9.86548, 7.00005, 21.27449, 12.13181, 26.65331, 5.83493, 14.69534, 6.87034, 23.73075, 7.24837, 1.58201, 2.56882, 0.35245, 20.23897, 42.96672, 44.67648, 28.76856, 37.52041, 40.01538, 4.705, 29.9067, 30.06042, 7.45481, 14.35932, 8.60488, 16.68506, 23.30932, 14.51399, 33.59438, 38.95256, 43.35688, 2.65983, 9.84355, 37.1201, 50.76407)), .Names = c("photo", "location", "depth", "dens.m", "coral.cover"), class = "data.frame", row.names = c(NA, 134L))

Is there a Python package that can trace a curve with a Gaussian lineshape over several x and y values?
My apologies for my ignorance in advance; I've only been learning Python for about two months. Every example question that I've seen on Stack Overflow seems to discuss a single distribution over a series of data, but not one distribution per data point with band broadening.
I have some (essentially) infinitelythin bars at value x with height y that I need to run a line over so that it looks like the following photo:
The bars are the obtained from the the table of data on the far right. The curve is what I'm trying to make.
I am doing some TDDFT work to calculate a theoretical UV/visible spectrum. It will output absorbance strengths (yvalues, i.e., heights) for specific wavelengths of light (xvalues). Theoretically, these are typically plotted as infinitelythin bars, though we experimentally obtain a curve instead. The theoretical data can be made to appear like an experimental spectrum by running a curve over it that hugs y=0 and has a Gaussian lineshape around every absorbance bar.
I'm not sure if there's a feature that will do this for me, or if I need to do something like make a loop summing Gaussian curves for every individual absorbance, and then plot the resulting formula.
Thanks for reading!

Fit CDF with 2 Gaussian using LeastSq
I am trying to fit empirical CDF plot to two Gaussian cdf as it seems that it has two peaks, but it does not work. I fit the curve with leastsq from scipy.optimize and erf function from scipy.special. The fitting only gives constant line at a value of 2. I am not sure in which part of the code that I make mistake. Any pointers will be helpful. Thanks!
%matplotlib inline import numpy as np import matplotlib.pyplot as plt x = np.array([ 90.64115156, 90.85690063, 91.07264971, 91.28839878, 91.50414786, 91.71989693, 91.93564601, 92.15139508, 92.36714415, 92.58289323, 92.7986423 , 93.01439138, 93.23014045, 93.44588953, 93.6616386 , 93.87738768, 94.09313675, 94.30888582, 94.5246349 , 94.74038397, 94.95613305, 95.17188212, 95.3876312 , 95.60338027, 95.81912935, 96.03487842, 96.2506275 , 96.46637657, 96.68212564, 96.89787472, 97.11362379, 97.32937287, 97.54512194, 97.76087102, 97.97662009, 98.19236917, 98.40811824, 98.62386731, 98.83961639, 99.05536546, 99.27111454, 99.48686361, 99.70261269, 99.91836176, 100.13411084, 100.34985991, 100.56560899, 100.78135806, 100.99710713, 101.21285621]) y = np.array([3.33333333e04, 3.33333333e04, 3.33333333e04, 1.00000000e03, 1.33333333e03, 3.33333333e03, 6.66666667e03, 1.30000000e02, 2.36666667e02, 3.40000000e02, 5.13333333e02, 7.36666667e02, 1.01666667e01, 1.38666667e01, 2.14000000e01, 3.31000000e01, 4.49666667e01, 5.50000000e01, 6.09000000e01, 6.36000000e01, 6.47000000e01, 6.54666667e01, 6.61000000e01, 6.67000000e01, 6.76333333e01, 6.84000000e01, 6.95666667e01, 7.10000000e01, 7.27666667e01, 7.50666667e01, 7.75333333e01, 7.93333333e01, 8.11333333e01, 8.31333333e01, 8.56333333e01, 8.81333333e01, 9.00666667e01, 9.22666667e01, 9.37666667e01, 9.47333333e01, 9.59000000e01, 9.70333333e01, 9.77333333e01, 9.83333333e01, 9.90333333e01, 9.93666667e01, 9.96333333e01, 9.99000000e01, 9.99666667e01, 1.00000000e+00]) plt.plot(a,b,'r.') # Fitting with 2 Gaussian from scipy.special import erf from scipy.optimize import leastsq def two_gaussian_cdf(params, x): (mu1, sigma1, mu2, sigma2) = params model = 0.5*(1 + erf( (xmu1)/(sigma1*np.sqrt(2)) )) +\ 0.5*(1 + erf( (xmu2)/(sigma2*np.sqrt(2)) )) return model def residual_two_gaussian_cdf(params, x, y): model = double_gaussian(params, x) return model  y params = [5.,2.,1.,2.] out = leastsq(residual_two_gaussian_cdf,params,args=(x,y)) double_gaussian(out[0],x) plt.plot(x,two_gaussian_cdf(out[0],x))
which return to this plot