How to programatically get parameter names and values in scipy
Is there any way to get the parameters of a distribution? I know almost every distribution has "loc" and "scale" but theres differences between them, for example alpha has "a", beta has "a" ,"b".
What i want to do is programatically print(after fiting a distribution) key value pairs of parameter,value.
But i dont want to write a print routine for every possible distribution.
2 answers

inspect
ing the_pdf
method appears to work:import inspect # keys [p for p in inspect.signature(stats.beta._pdf).parameters if not p=='x'] # ['a', 'b'] # keys and values dist = stats.alpha(a=1) inspect.signature(stats.alpha._pdf).bind('x', *dist.args, **dist.kwds).arguments # OrderedDict([('x', 'x'), ('a', 1)]) # 'x' probably doesn't count as a parameter

In the end what i did was:
parameter_names = [p for p in inspect.signature(distribution._pdf).parameters if not p=='x'] + ["loc","scale"] parameters = distribution.fit(pd_series) distribution_parameters_dictionary =dict(zip(parameter_names,parameters))
Where pd_series is a pandas series of the data being fitted.
See also questions close to this topic

Dynamic scope and static scope( if there are nonlocal variables and global variables)
Consider the following Python program:
As far as I am concerned,
Static scoping:
 In sub1: a(sub1), y(sub1), z(sub1), x(main)
 In sub2: a(sub2), x(sub2), w(sub2), y(main), z(main)
 In sub3: b(sub3), z(sub3),a(sub2), x(sub2), w(sub2),y(main)
So, a is nonlocal in sub3(), so should a come from sub2 or sub3, why? Does it make any difference to mark global x in sub2 and nonlocal a in sub3?
Dynamic scoping:
 In sub1: a(sub1), y(sub1), z(sub1)
 In sub2: a(sub2), x(sub2), w(sub2)
 In sub3: b(sub3), z(sub3)
For dynamic scoping, all variables declared inside function? How could I modify my answer? Thank you.

AWS cloud9 timeout when running flask application
Hi all I'm trying to setup a AWS cloud9 environment with flask to develop a web app. I'm new to AWS/ flask, and stuck on an issue. There seems to be an issue between the IDE environment and previewing the application in my browser (I'm using chrome, but have also tried in IE).
From app.py:
import os from flask import Flask app = Flask(__name__) @app.route('/') def hello(): return 'Hello World' app.run(host=os.getenv('IP', '0.0.0.0'), port=int(os.getenv('PORT', 8080))) if __name__ == '__main__': app.run() app.debug(True)
When I run this in the terminal (as root):
[root@ip1723111201 environment]# python ./app.py Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)
When I right click on the on
http://0.0.0.0:8080/
it will open a yab and redirect me to a Public IPx.x.x.x:8080
and will eventually timeout and give me:err_connection_timeout
When I attempt to run the application using the IDE run option it will take me to:
Running on http://127.0.0.1:8080/ (Press CTRL+C to quit)
At which point it will just timeout as well. So this has me really confused, when I'm running this outside of the cloud9 IDE I don't have this issue. I know in the documentation you're supposed to point to 0.0.0.0 over port 8080. So I'm not quite sure why running that with run would change the IP I specified.
I've also tried manually putting my project and username in manually:
At which point it redirects me to a page where it tells me it "can't find my username". I then tried to setup a cloud9.io account which completed, I confirmed my account but can't login and still have the "cannot find username" page.
After which I tested my
app.py
file from cloud9 locally with sublime substituted0.0.0.0
for127.0.0.1
and it worked locally.Does is there anything I'm missing in my config? Has anything changed in the setup since AWS acquired cloud9, I've been following online tutorials and videos but just can't see to solve this issue.
From the IDE environment:
# python version Python 2.7.12 # pip freeze flask astroid==1.5.3 awscfnbootstrap==1.4 awscli==1.11.132 Babel==0.9.4 backports.functoolslrucache==1.4 backports.sslmatchhostname==3.4.0.2 boto==2.42.0 botocore==1.5.95 chardet==2.0.1 click==6.7 cloudinit==0.7.6 CodeIntel==0.9.3 colorama==0.2.5 configobj==4.7.2 configparser==3.5.0 docutils==0.11 ecdsa==0.11 enum34==1.1.6 Flask==0.12.2 futures==3.0.3 gyp==0.1 ikpdb==1.1.2 Inflector==2.0.11 iniparse==0.3.1 isort==4.2.15 itsdangerous==0.24 jedi==0.11.0 Jinja2==2.7.2 jmespath==0.9.2 jsonpatch==1.2 jsonpointer==1.0 kitchen==1.1.1 lazyobjectproxy==1.3.1 lockfile==0.8 MarkupSafe==0.11 mccabe==0.6.1 paramiko==1.15.1 parso==0.1.0 PIL==1.1.6 ply==3.4 pyasn1==0.1.7 pycrypto==2.6.1 pycurl==7.19.0 pygpgme==0.3 pyliblzma==0.5.3 pylint==1.7.4 pylintdjango==0.7.2 pylintflask==0.5 pylintpluginutils==0.2.6 pystache==0.5.3 pythondaemon==1.5.2 pythondateutil==2.1 pyxattr==0.5.0 PyYAML==3.10 requests==1.2.3 rsa==3.4.1 simplejson==3.6.5 singledispatch==3.4.0.3 six==1.11.0 subprocess32==3.2.7 urlgrabber==3.10 urllib3==1.8.2 virtualenv==15.1.0 Werkzeug==0.13 wrapt==1.10.11 yummetadataparser==1.1.4 zope.cachedescriptors==4.3.0
Thanks for any help!
 The strangest thing  no error but python scripts not running

Plotting multiple images in the same pane in Python
How do I plot multiple images in same pane? For example I want to plot 8 graphs varying the simulated peak characteristics:
import numpy import peakutils from peakutils.plot import plot as pplot from matplotlib import pyplot #centers = (30.5, 72.3) centers = (100, 72.3) x = numpy.linspace(0, 120, 121) y = (peakutils.gaussian(x, 5, centers[0], 3) + peakutils.gaussian(x, 7, centers[1], 10) + numpy.random.rand(x.size)) #pyplot.figure(figsize=(10,6)) #pyplot.plot(x, y) #pyplot.title("Data with noise") #pyplot._show() indexes = peakutils.indexes(y, thres=0.5, min_dist=12) print(indexes) print(x[indexes], y[indexes]) pyplot.figure(figsize=(10,6)) pplot(x, y, indexes) pyplot.title('First estimate') pyplot.show()
so I want 8 plots with different centers1 = (100, 72.3), centers2 = (82,23) etc

Error; Max Number of iterations has been exceeded
I am trying to minimize the distance between my model function and the observed data via minimizing the equation:
(((v_modelv_obs)/errors)**2)
v_obs
anderrors
are both arrays of data. My model function can be manipulated by changing two fixing parameters:ps
andrs
. So, my goal is to find which values of these parameters minimize the above function. My model function is defined as:def integrate_NFW(rx,ps,rs): rho = (ps/((rx/rs)*((1+(rx/rs))**2))) function_result = rho * 4.0 * np.pi * rx**2 return function_result
Where
rx
is just another name forr
, which is an array of data (I did this to avoid confusion for myself).My actual function (named chisquared) which I showed briefly earlier is defined as:
def chisqfuncNFW(iter_vars): global v_model #Normalizes p0 (p0 is too large relative to rc) ps = iter_vars[0] * 3.85e+09 rs = iter_vars[1] v_modelNFW = [] for index in range(0, am): integral_result = simpsonsRule(integrate_NFW, 0.0, r[index], 200, ps, rs) v_mod_result = (grav_const * integral_result / r[index]) v_mod_result = np.sqrt(v_mod_result+(v_disk[index])**2+(v_gas[index])**2) #Creates an array of all the v_models are each different radius v_modelNFW.append(v_mod_result) chisq = 0.0 for index in range(0, am): chisq += (((v_modelNFW[index]  v_obs[index])/errors[index])**2) chisq = np.sqrt(chisq) return chisq
Where:
grav_const = 4.302e06 am = 25
Now, when I try to minimize chisqfuncNFW
initial_guess = np.array([1.0, 2.0]) resNFW = minimize(chisqfuncNFW, initial_guess, method='NelderMead',options={'xtol': 1e9,'maxiter' : 100}) print(resNFW)
this is returned:
final_simplex: (array([[ 1., 2.], [ 1., 2.], [ 1., 2.]]), array([ nan, nan, nan])) fun: nan message: 'Maximum number of iterations has been exceeded.' nfev: 399 nit: 100 status: 2 success: False x: array([ 1., 2.])
So my problem is that when I try to minimize my function, it keeps returning 'Maximum number of iterations has been exceeded', even when I try to make
maxiter
1000 or I try to decreasextol
. What can I do to get rid of this error?Here is my full code:
#IMPORTS from scipy.optimize import* import numpy as np #DATA v_obs = np.array([33.1, 46.7, 53.9, 58.8, 65.6, 70.8, 82.1, 84.1, 83.4, 81.5, 80.8, 82.8, 88.2, 88.5, 88.4, 88.2, 89.5, 89.7, 93.7, 97.0, 96.2, 94.8, 92.5, 92.8, 93.5]) r = np.array([0.91, 1.36, 1.82, 2.27, 2.72, 3.17, 3.63, 4.08, 4.54, 4.99, 5.45, 5.9, 6.36, 6.81, 7.26, 7.71, 8.17, 8.62, 9.07, 9.53, 9.96, 10.45, 10.93, 11.32, 11.8]) v_gas = np.array([0.0, 0.0, 0.0, 2.86, 7.4, 8.38, 8.42, 9.19, 12.0, 12.98, 13.57, 15.44, 20.16, 24.2, 27.05, 28.23, 27.84, 26.95, 26.46, 26.26, 25.18, 24.1, 23.02, 22.03, 21.74]) v_disk = np.array([31.52, 37.5, 42.04, 46.16, 48.37, 49.78, 50.48, 50.65, 50.39, 49.84, 49.05, 48.12, 47.07, 45.98, 44.86, 43.73, 42.59, 41.5, 40.45, 39.41, 38.5, 37.5, 36.62, 35.97, 35.19]) errors = np.array([3.9, 2.7, 4.7, 2.5, 2.6, 2.7, 4.9, 6.6, 6.2, 2.7, 4.0, 12.4, 11.7, 8.5, 8.9, 10.8, 3.1, 9.5, 7.3, 5.7, 1.5, 2.1, 13.9, 10.6, 12.1]) #CONSTANTS #Number of measurements am = 0 #Counts amount of elements in the array r for i in r: am = am+1 #Gravitational constant grav_const = 4.302e06 #INTEGRAL APPROXIMATION def simpsonsRule(func, a, b, n, p0, r0): if n%2 == 1: return "Not applicable" else: h = (b  a) / float(n) s = func(a, p0, r0) + sum((4 if i%2 == 1 else 2) * func(a+i*h, p0, r0) for i in range(1,n)) + func(b, p0, r0) return s*h/3.0 #DARK MATTER PROFILES #NFW Profile def integrate_NFW(rx,ps,rs): rho = (ps/((rx/rs)*((1+(rx/rs))**2))) function_result = rho * 4.0 * np.pi * rx**2 return function_result def chisqfuncNFW(iter_vars): global v_model #Normalizes p0 (p0 is too large relative to rc) ps = iter_vars[0] * 3.85e+09 rs = iter_vars[1] v_modelNFW = [] for index in range(0, am): integral_result = simpsonsRule(integrate_NFW, 0.0, r[index], 200, ps, rs) v_mod_result = (grav_const * integral_result / r[index]) v_mod_result = np.sqrt(v_mod_result+(v_disk[index])**2+(v_gas[index])**2) #Creates an array of all the v_models are each different radius v_modelNFW.append(v_mod_result) chisq = 0.0 for index in range(0, am): chisq += (((v_modelNFW[index]  v_obs[index])/errors[index])**2) chisq = np.sqrt(chisq) return chisq initial_guess = np.array([1.0, 2.0]) resNFW = minimize(chisqfuncNFW, initial_guess, method='NelderMead',options={'xtol': 1e9,'maxiter' : 100}) psNFW= resNFW.x[0] * 3.85e+09 rsNFW = resNFW.x[1] print(psNFW,rsNFW) print(resNFW)

Getting the width and area of peaks from Scipy Signal object
How do I get the peak objects with the properties such as position, peak aarea, peak width etc from Scipy Signal function using cwt get peaks method:
def CWT(trace): x = [] y = [] for i in range(len(trace)): x.append(trace[i].Position) y.append(trace[i].Intensity) x = np.asarray(x) y = np.asarray(y) return signal.find_peaks_cwt(x,y)
This just returns an array?

Predictions in the mxnet package
I'm trying to train a neural network and convolutional neural network in R with the package mxnet for a 2class classification problem. I'm using the fucntion "mx.mlp" for the neural network and the function "mx.model.FeedForward.create" for the convolutional neural network (CNN).
My problem is the predictions. I get the exact same probabilities for the 2 classes from both of the 2 functions in the test set which is very strange. This means that the models can only predict one class. I have tried to solve this problem for 3 weeks now without success. Any help would be much appreciated.
here is the links to the:
train set : https://www.dropbox.com/s/m47rx05iahvpild/train_set.csv?dl=0
test set : https://www.dropbox.com/s/ld38q2zq0t3yplo/test_set.csv?dl=0
R code : https://www.dropbox.com/s/nsoc5nb0hlibw7h/rcode.R?dl=0

Code for sampling negative binomial distribution
I would like to write a function that uses a uniform random number generator to sample from the negative binomial distribution. How would I do this?
It's not clear to me if this should be posted here or in stats.stackexchange. I can push it over there if that's a better fit.

How to fit a poisson distribution to a histogram using MATLAB?
I have integervalued data from a physical process that should follow poisson statistics. I'm trying to fit a poisson distribution to the histogram of the data. Here's what the histogram looks like:
I fit the histogram of the data to a poisson distribution using the following code:
figure; histfit(x(x>0),30,'poisson')
And obtain:
Am I doing something wrong? Should the fit be scaled, if so how? To me, the histogram in the first image looks roughly poissonian, so it surprises me that the fit is so poor.

Probability that henry wins?
Its decided that henry wins the round if the toss results in H, and tom wins if its T. coin is tossed 3 times, henry has 1 point and tom has 2 points.(either HTT or TTH or THT, is it important?) What is the probability that henry wins if they continue tossing?

Venn Diagram Probability
let a b and c be subsets of a universal set U with A subset of B and A∩C = 0. Suppose also, that n(U) = 50, n(A) = 5, n(B)=n(C)=25, and n(B'∩C') = 10. What is n(AU(B∩C))?
I'm stuck on this problem because I'm not sure what B ∩ C is. I'm not sure if it's 0 because it's not mentioned in the problem or is there a special way to solve this?

Which one is more possible to waste less memory, one big memory manager or several small one?
First of all, this may be more like a math problem.
I am writing a module that requires memory piece by piece and never release it until its instance is dead, so I write a simple memory manager to reduce
malloc
. The memory manager requires a block of memory during initialization, and the size of memory block is controllable by user, then the manager pass the memory pieces to user when required. If the manager is running out of memory, it doubles its memory block size byrealloc
. At the end, we can figure out that relation between required memory size and the total wasted memory size is:f(x) = 2^k  x, 2^(k1) < x <= 2^k
Now I have several memory users, I can either create a memory manager for each of them (the overhead of manager is not worth to consider), or create only one memory manager and share it among all users. The number of users and the size of each user's usage of memory may vary in a great range. So, which strategy have greater possible to waste less memory?
The memory manager does hide the actual memory block position and provides offset to user, to avoid
realloc
issues. The interface is quite simple:void *memo_ref(Memo memo, MemoOffset offset) { panic(offset < memo>used, "invalid offset is passed to memo"); return &memo>memory[offset]; }
So I think the compiler will inline it and there's no difficult about optimization.
And also, there's no need to worry about data race, since all users of the manager come from the same thread. They just require in a staggered way.
In my opinion, one big manager leads to faster program, since there are less
realloc
which is a big cost. So my focus is on memory usage. Thanks for your help. 
The probability distribution of two words in a file using java 8
I need the number of lines that contain two words. For this purpose I have written the following code: The input file contains
1000 lines
and about4,000 words
, and it takes about 4 hours. Is there a library inJava
that can do it faster? Can I implement this code usingAppache Lucene
orStanford Core NLP
to achieve less run time?ArrayList<String> reviews = new ArrayList<String>(); ArrayList<String> terms = new ArrayList<String>(); Map<String,Double> pij = new HashMap<String,Double>(); BufferedReader br = null; FileReader fr = null; try { fr = new FileReader("src/reviewspreprocessing.txt"); br = new BufferedReader(fr); String line; while ((line= br.readLine()) != null) { for(String term : line.split(" ")) { if(!terms.contains(term)) terms.add(term); } reviews.add(line); } } catch (IOException e) { e.printStackTrace();} finally { try { if (br != null) br.close(); if (fr != null) fr.close(); } catch (IOException ex) { ex.printStackTrace();} } long Count = reviews.size(); for(String term_i : terms) { for(String term_j : terms) { if(!term_i.equals(term_j)) { double p = (double) reviews.parallelStream().filter(s > s.contains(term_i) && s.contains(term_j)).count(); String key = String.format("%s_%s", term_i,term_j); pij.put(key, p/Count); } } }

How to generate random integer that are random "enough"?
I'm trying to solve the 280th problem in Project Euler, and for this I have written the following simulation;
#include <stdio.h> #include <stdlib.h> #include <time.h> #include <sys/time.h> /* Directions 1 2 3 4 */ int grid[5][5] = { {0, 0, 0, 0, 2}, {0, 0, 0, 0, 2}, {0, 0, 0, 0, 2}, {0, 0, 0, 0, 2}, {0, 0, 0, 0, 2} }; int InitPos[2] = {2, 2}; int MaxExp = 5000000; bool Success = false; int StepCount = 0; int ExpNumber = 1; int AntsBag = 0; void Init(); void CarryFood(int * pos); void LeftFood(int * pos); bool checkMovability(int * pos, int direction); bool moveToDirection(int pos[2], int direction); bool checkSuccess(); void ShowResult(); int main(int argc, char const *argv[]) { timeval curTime; gettimeofday(&curTime, NULL); int milli = curTime.tv_usec / 1000; time_t t; srand((unsigned)time(&t)); //timeTData*.txt corresponds to using "time(&t)" above //milliData.txt corresponds to using "milli" variable above //timeTUnsigData*.txt corresponds to using "(unsigned)time(&t)" above printf("%% Experiment Number : %d \n", MaxExp); while(ExpNumber <= MaxExp) { Init(); int pos[2]; pos[0] = InitPos[0]; pos[1] = InitPos[1]; do{ int direction = (rand() % 4) + 1; if (moveToDirection(pos, direction)) { StepCount++; } if (pos[1] == 4&&grid[pos[0]][4]==2&&AntsBag==0) { CarryFood(pos); } if (pos[1] == 0&&grid[pos[0]][0]==0&&AntsBag==2) { LeftFood(pos); } checkSuccess(); } while(!Success); ShowResult(); ExpNumber++; } return 0; } void Init() { Success = false; StepCount = 0; AntsBag = 0; int gridInit[5][5] = { {0, 0, 0, 0, 2}, {0, 0, 0, 0, 2}, {0, 0, 0, 0, 2}, {0, 0, 0, 0, 2}, {0, 0, 0, 0, 2} }; for (int i = 0; i < 5; ++i) { for (int j = 0; j < 5; ++j) { grid[i][j] = gridInit[i][j]; } } } void ShowResult() { /* for (int i = 0; i < 5; ++i) { printf("\n"); for (int j = 0; j < 5; ++j) { printf("%d ", grid[i][j]); } } */ printf("%d %d\n", StepCount, ExpNumber); } void CarryFood(int * pos) { AntsBag = 2; grid[pos[0]][4] = 0; } void LeftFood(int * pos) { AntsBag = 0; grid[pos[0]][0] = 2; } bool checkMovability(int * pos, int direction) { switch(direction) { case 1: { if(pos[1]==0){ return false; } break; } case 2: { if (pos[0]==0) { return false; } break; } case 3: { if (pos[0]==4) { return false; } break; } case 4: { if (pos[1]==4) { return false; } break; } default: { printf("Wrong direction input is given!!\n"); return false; break; } } return true; } bool moveToDirection(int * pos, int direction) { if ( !checkMovability(pos, direction) ) { return false; } switch(direction){ case 1: { pos[1] = 1; break; } case 2: { pos[0] = 1; break; } case 3: { pos[0] += 1; break; } case 4: { pos[1] += 1; break; } default: { printf("I'm stunned!\n"); return false; break; } } return true; } bool checkSuccess() { for (int i = 0; i < 5; ++i) { if (grid[i][0] != 2) { return false; } } //printf("Success!\n"); Success = true; return true; }
And the redirected the output to a *.txt file and find the expected value of the number of steps with the following octave code;
clear load data.txt n = data(:,1); output_precision(15); mean(n) %% The actual data %% milliData1 > 430.038224000000 %% milliData2 > 430.031745000000 %% timeTData1 > 430.029882400000 %% timeTData2 > 430.019626400000 %% timeUnsigData1 > 430.028159000000 %% timeUnsigData2 > 430.009509000000
However, even I run the exact same code twice I get different results, as you can see from the above results.(Note that, I have tried this with different srand(..) inputs for the reason I'm going to explain).
I thought that the reason for this is because how I generate a random integer between 14 for the random directions of the ant, because as far as I have been though, the probability distribution of this experiment should be the same as long as I repeat the experiment large number of time (in this particular case 5000000 times).
So my first question is that is it really the problem with the method of how I generate random integers ? If so, how can we overcome this problem, I mean how can we generate integer random enough so that when we repeat the same experiment large number of times, the expected value between those is smaller than these result that I have got ?

How to calibrate the thresholds of neural network output layer in multiclass classification task?
Assume we have a multiclass classification task with 3 classes:
{Cheesecake, Ice Cream, Apple Pie}
Given that we have a trained neural network that can classify which of the three desserts a random chef would prefer. Also, assume that the output layer consists of 3 neurons with softmax activation, such that each neuron represents the probability to like the corresponding dessert.
For example, possible outputs of such network might be:
Output(chef_1) = { P(Cheesecake) = 0.3; P(Ice Cream) = 0.1; P(Apple Pie) = 0.6; }
Output(chef_2) = { P(Cheesecake) = 0.2; P(Ice Cream) = 0.1; P(Apple Pie) = 0.7; }
Output(chef_3) = { P(Cheesecake) = 0.1; P(Ice Cream) = 0.1; P(Apple Pie) = 0.8; }
In such case, all instances (chef_1, chef_2 and chef_3) are likely to prefer an Apple Pie, but with a different confidence (e.g. chef_3 is more likely to prefer Apple Pie than chef_1 as the network probability outputs are 0.8 and 0.6 respectively)
Given that we have a new dataset of 1000 chefs, and we want to calculate the distribution of their favorite desserts, we would simply classify each one of the 1000 chefs and determine his favorite dessert based on the neuron with maximum probability.
We also want to improve the prediction accuracy by discarding chefs whose max prediction probability is below 0.6. Let's assume that 200 out of the 1000 were predicted with such probability, and we discarded them.
In such case, we may bias distribution over the 800 chefs (who were predicted with a probability higher than 0.6) if one dessert is easier to predict than another.
For example, if the average prediction probability of the classes are:
AverageP(Cheesecake) = 0.9
AverageP(Ice Cream) = 0.5
AverageP(Apple Pie) = 0.8
And we discard chefs who were predicted with probability which is lower than 0.6, among the 200 chefs that were discarded there are likely to be more chefs who prefer Ice Cream, and this will result in a biased distribution among the other 800.
Following this very long introduction (I am happy that you are still reading), my questions are:
Do we need a different threshold for each class? (e.g. among Cheesecake predictions discard instances whose probability is below X, among Ice Cream predictions discard instances whose probability is below Y, and among Apple Pie predictions discard instances whose probability is below Z).
If yes, how can I calibrate the thresholds without impacting the overall distribution on my 1000 chefs dataset (i.e. discard predictions with low probability in order to improve the accuracy, while preserving the distribution over the original dataset).
I've tried to use the average prediction probability of each class as a threshold, but I cannot assure that it will not impact the distribution (as these thresholds may overfit to the test set and not to the 1000 chefs dataset).
Any suggestions or related papers?