How to get geweke result from bayesian regression result?
when i am doing this pm.summary(linear_trace) i am getting
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_mean ess_sd ess_bulk ess_tail r_hat
Beta0 19.644 0.151 19.494 19.795 0.102 0.085 2.0 2.0 2.0 12.0 3.04
Betaa0 0.184 0.303 0.118 0.486 0.205 0.171 2.0 2.0 2.0 12.0 3.04
Betaa1 0.269 0.255 0.523 0.014 0.173 0.144 2.0 2.0 2.0 12.0 3.04
Betaa2 0.124 0.692 0.814 0.566 0.469 0.391 2.0 2.0 2.0 12.0 3.04
Betaa3 0.825 0.169 0.994 0.656 0.115 0.096 2.0 2.0 2.0 12.0 3.04
Betaa4 0.668 0.150 0.519 0.818 0.102 0.085 2.0 2.0 2.0 12.0 3.04
Betaa5 0.667 0.056 0.612 0.723 0.038 0.031 2.0 2.0 2.0 12.0 3.04
Betaa6 0.690 0.138 0.552 0.828 0.094 0.078 2.0 2.0 2.0 12.0 3.04
Betaa7 0.408 0.173 0.581 0.235 0.117 0.098 2.0 2.0 2.0 12.0 3.04
Betaa8 0.430 0.374 0.058 0.803 0.253 0.211 2.0 2.0 2.0 12.0 3.04
Betaa9 0.131 0.017 0.113 0.148 0.012 0.010 2.0 2.0 2.0 12.0 3.04
Betaa10 0.289 0.271 0.018 0.559 0.184 0.153 2.0 2.0 2.0 12.0 3.04
Betaa11 0.848 0.100 0.749 0.947 0.068 0.056 2.0 2.0 2.0 12.0 3.04
Betaa12 0.531 0.054 0.477 0.585 0.037 0.031 2.0 2.0 2.0 12.0 3.04
Betaa13 0.035 0.249 0.283 0.213 0.169 0.141 2.0 2.0 2.0 12.0 3.04
Betaa14 0.304 0.261 0.044 0.565 0.177 0.147 2.0 2.0 2.0 12.0 3.04
Betaa15 0.087 0.056 0.031 0.142 0.038 0.031 2.0 2.0 2.0 12.0 3.04
sigma2 0.000 0.000 0.000 0.000 0.000 0.000 2.0 2.0 2.0 12.0 3.04
when i am doing pm.geweke(linear_trace)
i am getting error
AttributeError: 'MultiTrace' object has no attribute 'mean'
See also questions close to this topic

Convert all rows of a padas dataframe column to comma separated values with each value in single quote
I have a pandas dataframe similar to:
df = pd.DataFrame(['a', 'b', 'c', 'd'], columns=['Col']) df Col 0 a 1 b 2 c 3 d
I am trying to convert all rows of this column to a comma separated string with each value in single quotes, like below:
'a', 'b', 'c', 'd'
I have tried the following with several different combinations but below is the closest I got:
s = df['Col'].str.cat(sep="', '") s "a', 'b', 'c', 'd"
I think that the end result should be:
"'a', 'b', 'c', 'd'"
Any help is highly appreciated.

How to use .map using parts of dataframes
I am using .map to replace values that fall into interval into a dataframe. I am not sure if it is because I am using the .map function or because I am copying on a copy of the dataframe:
intervals = pd.DataFrame(columns = ['Hole','From','To','Value'], data = [['N',0,100,'A'],['N',100,200,'B'],['N',200,500,'C'],['Q',1,101,'A'],['Q',101,201,'B'],['Q',201,501,'C']]) print('intervals\n',intervals,'\n') points = pd.DataFrame(columns = ['Hole','Point', 'Value'], data = [['N',45,'X'],['N',125,'X'],['N',145,'X'],['N',345,'X'],['Q',45,'X'],['Q',125,'X'],['Q',145,'X'],['Q',345,'X']]) print('points\n',points,'\n') holelist = intervals['Hole'].unique() print(holelist) intervallist = [] for hole in holelist: holeintervaldf = intervals[intervals['Hole'] == hole] print('holeintervaldf\n',holeintervaldf) holeintervaldf=holeintervaldf.set_index(keys = pd.IntervalIndex.from_arrays(left = holeintervaldf['From'], right = holeintervaldf['To']))['Value'] print('holeintervaldf with intervalsas index\n',holeintervaldf) points[points['Hole']==hole]['Value'] = points[points['Hole']==hole]['Point'].map(holeintervaldf) #this is where something is wrong for j in intervallist: print(j) print(points)
Thanks a bunch!

How to run Selenium python test simultaneously on single Instance of browser
Initially I was using new instance of browser for each test. But chrome process created a lot of orphan processes , which piled up over time and hogged the server on cpu cycles and memory so to avoid this, I have defined driver as global variable and was passing it to different tests. But the problem with code is if tests run sequentially it works fine, not able to run multiple tests simultaneously on single instance of browser.
driver = "" def getChromeDriver(): global driver options = Options() options.add_argument("headless") options.add_argument("nosandbox") options.add_argument("windowsize=1200,1100") driver = webdriver.Chrome(chrome_options=options, executable_path='my/path')
tests:
def testGoogle(): driver.get(www.google.com) def testyahoo(): driver.get(www.yahoo.com)

Regression model for analyzing data in multiple countries and over several years (simpsons paradox in multiple regression)?
Let's say i have this data for the years 19902020, for 20 different countries:
(the variables are just made up)
Dependent variable Foreign direct investment inflow (FDII)
Indendent variables: Poverty rate (PR) Government subsidies (GS) Tax rates (TS)
I had an exercise in class and the method demonstrated there was just to make a multiple regression model (ie. in R would be lm(FDII ~PR+GS+TS)), but however, it seems like this method would miss out on the relationships between the variables within each country. What if the relationship between the variables is negative/positive at country level, but reverses when countries are combined?

Why are polynomial multiple regression coefficients missing in poly?
I ran a polynomial multiple regression in R using
poly
, and in the output the coefficients of some combinations of the two variables (Unc
andWindow
) are seem to be missing, and I cannot get my head around it.The input data can be downloaded from here.
The code I used in the following:
per_lm=lm(formula = variable ~ polym(Unc, Window, degree = 4, raw = TRUE), data = mdata3)
and the output coefficients can be seen in the picture below.
You can see that combinations such as 2,3; 4,1; 4,2; 4,3; or 1,4 etc. are missing. Does anyone have any idea why can this happen? I will have to extract the equation of the surface determined by the coefficients to use it in a separate program written by a colleague.
If I consider the "missing" combinations as zero coefficients I cannot reproduce the fitted surface illustrated here, where the Period=variable in the input data file attached above. I will have to use the equation of this surface in a separate software that we are writing. Thank you for any ideas.

How to construct prediction interval for one value in python?
In a linear regression task from a given data set, I want to predict the mpg value based on the given horsepower = 98, which I did as follows:
pred_arr = np.array([98]).reshape(1, 1) k = mdl.predict(pred_arr) print('mpg associated with a horsepower of 98 is', *k)
And then for the same horsepower value, I want to compose a prediction interval.
for i in range(len(x)): if float(x[i]) == 98.0: xxx.append(y[i]) for i in xxx: sum_errs += (ik)**2 stdev = sqrt(1/(len(y)2) * sum_errs) interval = 1.96 * stdev lower, upper = k  interval, k + interval
The interval this code gives is not the same as the interval I got on R on the same data set:
predict(fit, data.frame(horsepower = 98), interval = "prediction")
What am I doing wrong?

CUDA convolution for propagating the state of a discrete bayesian filter
I am designing a discrete bayesian filter which will predict and update a 2D grid. The grid is represented by a set of probabilities for each x and y coordinate. Hence each cell contains the probability of the state being a certain x and y value (the distribution over the entire grid is normal). I wish to propagate the distribution by performing a convolution operation with a kernel that will shift the distribution and spread it out.
This example shows the case for a 1D discrete bayesian filter (https://github.com/rlabbe/KalmanandBayesianFiltersinPython/blob/master/02DiscreteBayes.ipynb). I am essentially trying to perform the prediction update step (as described under Generalizing with Convolution) with a 2D grid. But my kernel convolution does not produce the correct result.
I create an initial distribution over my grid centered at (0,0). I then convolve my kernel which is suppose to increase x by 0.01 and decrease y by 0.01 with the grid. With each convolution I expect the peak of the distribution to shift by that amount and the distribution to spread out a little bit.
Peak of grid after 1st convolution: x: 0.01, y: 0.01
Peak of grid after 2nd convolution: x: 0.02, y: 0.02
I am not sure if I am understanding this entire thing wrong or if the convolution operation is just broken. Any help would be greatly appreciated. Thank you.
%%cu #include <iostream> #include <cstdio> #include <math.h> #include <utility> // image parameters #define X_SIZE 0.01 #define Y_SIZE 0.01 #define X_MAX 1.0 #define Y_MAX 1.0 #define X_MIN 1.0 #define Y_MIN 1.0 #define MAP_SIZE_X 200 #define MAP_SIZE_Y 200 #define MAP_LENGTH MAP_SIZE_X*MAP_SIZE_Y __device__ float *device_filter; __device__ int *device_filter_radius_x; __device__ int *device_filter_radius_y; __device__ float *d_image; __device__ float *d_image_old; // Use a double buffer solution for convolution __global__ void GPUconvolution(float *image, float *image_old, float *filter, int *filter_radius_x, int *filter_radius_y) { int index = blockIdx.x * blockDim.x + threadIdx.x; int stride = blockDim.x * gridDim.x; for(int i = index; i < MAP_LENGTH; i += stride) { float sum = 0; int x = i % MAP_SIZE_X; int y = i / MAP_SIZE_X; for(int dy = (*filter_radius_y); dy <= (*filter_radius_y); dy++) { if(y + dy < 0  y + dy >= MAP_SIZE_Y) { // Zero padding continue; } for(int dx = (*filter_radius_x); dx <= *filter_radius_x; dx++) { if(x + dx < 0  x + dx >= MAP_SIZE_X) { // Zero padding continue; } int read_index = MAP_SIZE_X * (y + dy) + x + dx; int kernel_index = ((*filter_radius_y) + dy) * (*filter_radius_x) + (dx + (*filter_radius_x)); sum += filter[kernel_index] * image_old[read_index]; // convolve the filter with the old image } } image[i] = sum; // write to the new image } } __global__ void swapImages(void) { int index = blockIdx.x * blockDim.x + threadIdx.x; int stride = blockDim.x * gridDim.x; if (index == 0) { float *temp = d_image; d_image = d_image_old; d_image_old = temp; } } void generateKernel(float diff_x, float diff_y) { int filter_radius_x = 4; int filter_radius_y = 4; float sigma_x = 0.01; // noise in x float sigma_y = 0.01; // noise in y float *filter = new float[(2*filter_radius_x+1)*(2*filter_radius_y+1)]; // total probability to normalize float total_gauss_x = 0; float total_gauss_y = 0; // clear the previous filter memset(reinterpret_cast<void *>(filter),0,sizeof(float)*filter_radius_x*filter_radius_y); // compute sum to normalize filter for(int dy = filter_radius_y; dy <= filter_radius_y; dy++) { float fdy = (dy * Y_SIZE) + diff_y; for(int dx = filter_radius_x; dx <= filter_radius_x; dx++) { float fdx = (dx * X_SIZE) + diff_x; // Motion is gaussian float gauss_x = X_SIZE / (sigma_x * sqrt(2 * M_PI)) * expf(0.5 * fdx * fdx / (sigma_x * sigma_x)); float gauss_y = Y_SIZE / (sigma_y * sqrt(2 * M_PI)) * expf(0.5 * fdy * fdy / (sigma_y * sigma_y)); total_gauss_x += gauss_x; total_gauss_y += gauss_y; } } // compute the motion filter for(int dy = filter_radius_y; dy <= filter_radius_y; dy++) { float fdy = (dy * Y_SIZE) + diff_y; for(int dx = filter_radius_x; dx <= filter_radius_x; dx++) { float fdx = (dx * X_SIZE) + diff_x; // Motion is gaussian float gauss_x = X_SIZE / (sigma_x * sqrt(2 * M_PI)) * expf(0.5 * fdx * fdx / (sigma_x * sigma_x)); float gauss_y = Y_SIZE / (sigma_y * sqrt(2 * M_PI)) * expf(0.5 * fdy * fdy / (sigma_y * sigma_y)); float total_gauss_norm = (gauss_x/total_gauss_x) * (gauss_y/total_gauss_y); int kernel_index = ((filter_radius_y + dy) * filter_radius_x) + (filter_radius_x + dx); filter[kernel_index] = total_gauss_norm; } } // allocate device memory cudaMalloc((void**)&device_filter_radius_x, sizeof(int)); cudaMalloc((void**)&device_filter_radius_y, sizeof(int)); cudaMalloc((void**)&device_filter, sizeof(float)*filter_radius_x*filter_radius_y); // copy the filter to device memory cudaMemcpy(device_filter_radius_x,static_cast<const void *>(&filter_radius_x),sizeof(int),cudaMemcpyHostToDevice); cudaMemcpy(device_filter_radius_y,static_cast<const void *>(&filter_radius_y),sizeof(int),cudaMemcpyHostToDevice); cudaMemcpy(device_filter,static_cast<const void *>(filter),sizeof(float) * filter_radius_x * filter_radius_y,cudaMemcpyHostToDevice); } std::pair<int, int> findMax(float *image) { float max = 1.0/0.0; int max_x = 0; int max_y = 0; for(int iy = 0; iy < MAP_SIZE_Y; iy++) { for(int ix = 0; ix < MAP_SIZE_X; ix++) { int idx = MAP_SIZE_X * iy + ix; if (max < image[idx]) { max = image[idx]; max_x = ix; max_y = iy; } } } return std::make_pair(max_x, max_y); } void CPUConvolution(float *image) { const int MAX_THREADS_PER_BLOCK = 512; cudaMalloc((void**)&d_image, sizeof(float) * MAP_LENGTH); cudaMalloc((void**)&d_image_old, sizeof(float) * MAP_LENGTH); cudaMemcpy(d_image, image, sizeof(float) * MAP_LENGTH, cudaMemcpyHostToDevice); int blocks_per_grid = std::ceil(MAP_LENGTH / MAX_THREADS_PER_BLOCK) + 1; // shift the distribution in x by 0.01 float h_diff_x = 0.01; // shift the distribution in y by 0.01 float h_diff_y = 0.01; // Host image copy float *h_image_out = new float[MAP_LENGTH]; // Propagate 100 times for (int i = 0;i < 100;i++) { // compute the motion kernel generateKernel(h_diff_x, h_diff_y); // swap the images swapImages<<<1, 1, 0>>>(); // Propagate the state using motion kernel GPUconvolution<<<blocks_per_grid, MAX_THREADS_PER_BLOCK, 0>>>(d_image, d_image_old, device_filter, device_filter_radius_x, device_filter_radius_y); cudaDeviceSynchronize(); // copy the image back to host memset(reinterpret_cast<void *>(h_image_out), 0, sizeof(h_image_out)); cudaMemcpy(h_image_out, d_image, sizeof(float)*MAP_LENGTH, cudaMemcpyDeviceToHost); // As the distribution of the image is propagated the max of the image should shift proportionally (increase by 0.01 in x and decrease by 0.01 in y) std::pair<int, int> max = findMax(h_image_out); printf("findMLE > x: %f, y: %f\n", X_MIN + (max.first * X_SIZE), Y_MIN + (Y_SIZE * max.second)); } } int main(){ printf("Running\n"); int ux = MAP_SIZE_X / 2; int uy = MAP_SIZE_X / 2; float *image_in = new float[MAP_LENGTH]; // Create a gaussian distribution at (0,0) for(int iy = 0; iy < MAP_SIZE_Y; iy++) { for(int ix = 0; ix < MAP_SIZE_X; ix++) { int idx = MAP_SIZE_Y * iy + ix; image_in[idx] = 1/(2*M_PI) * expf(0.5 * sqrtf(powf(ix  ux, 2) + powf(iy  uy, 2))); } } std::pair<int, int> max = findMax(image_in); // should be (0,0) printf("findMLE > x: %f, y: %f\n", X_MIN + (max.first * X_SIZE), Y_MIN + (Y_SIZE * max.second)); CPUConvolution(image_in); std::cout << "All done" << std::endl; }

Why the bvar command provides this error?
I run the following bayesain var code but I've got an error
x < bvar(Data, lags = 1, n_draw = 1000L, n_burn = 200L, verbose = FALSE) Error in bvar(Data, lags = 1, n_draw = 1000L, n_burn = 200L, verbose = FALSE) : Problem with the data. Make sure it is numeric, without any NAs.
the data are the following ( year quantity price)
Many Thanks in advance, N.

How to find a reason whu a email is marked 3.5 BAYES_99 BODY
This has been discussed before but somehow nowhere is a good conclusion or solution to this. I am running a mailserver for several domains, and on my tests I realize that http://isnotspam.com gives the test messahes a 3.5 BAYES_99 BODY score. So the report shows "SpamAssassin Check : ham (nonspam)" however, it really boghters me that I get:
 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
Obviosly I thought it might be the HTML signature or the fact that the email was send as HTML. So I tyed a version with only a JPG as signature and than finaly the version below that is pure text. Still same result. I checked the IP and domain if anything is blacklisted anywhere. Its signed with SPF, DKIM, DMARC and listed at dnswl.org as trust. I tryed this for several domains and all show this Spamassassin score. I am frustrated about this, even though emails would most likely be delivered because it gives a score of 2.7 but still its not nice to be classivided as 99% probable.
I would appriciate if anyone has a idea on what else to check. The sending email server is CentOS 6.9, Postfix, Amavidnew, ClamAV, Dovecot. I am not sure what else this could be. See the link below to the report, its a text only email. Thank you for everyone that can assist me to get this resolved.
http://isnotspam.com/newlatestreport.php?email=ins10xbd4u5%40isnotspam.com
Best wishes, Thom

Prediction using Bayesian logistic regression using pymc3
I am trying to perform Bayesian logistic regression using pymc3, but I am facing an issue in using the model to perform prediction.
Data:
My dataset is of the housing loan default data, with sample data as follows:
BAD LOAN MORTDUE VALUE REASON JOB YOJ DEROG DELINQ CLAGE NINQ CLNO DEBTINC 1 1700 0548 40320 HomeImp Other 9 0 0 101.466002 1 8 37.113614 1 1800 28502 43034 HomeImp Other 11 0 0 88.766030 0 8 36.884894 0 2300 102370 120953 HomeImp Office 2 0 0 90.992533 0 13 31.588503
Problem:
I want to perform prediction on the test dataset, one way to do this is using the shared variable approach:
X_shared = theano.shared(X_train) with pm.Model() as logistic_model_pred: pm.glm.GLM(x=X_shared, y=y_train, labels=labels, family=pm.glm.families.Binomial()) X_shared.set_value(X_test) ppc = pm.sample_ppc(pred_trace, model=logistic_model_pred, samples=100)
However, using the above code (theano shared variable) is causing the following issue:
Error message:
AsTensorError: ('Variable type field must be a TensorType.', <Generic>, <theano.gof.type.Generic object at 0x00000216ABB16730>)
Possible Solution:
Using the following code does solve the issue but I do not know how to use the same model for test data.
with pm.Model() as logistic_model_pred: pm.glm.GLM.from_formula('BAD ~ DELINQ + DEROG + DEBTINC + NINQ + CLNO + VALUE + MORTDUE + YOJ + LOAN + CLAGE + JOB', data=pd.concat([y_train.reset_index(drop=True), X_train], axis=1), family=pm.glm.families.Binomial()) pred_trace = pm.sample(tune=1500, draws=1000, chains=4, cores=1, init='adapt_diag')
Full code:
%matplotlib inline from pathlib import Path import pickle from collections import OrderedDict import pandas as pd import numpy as np from scipy import stats import multiprocessing import arviz as az from sklearn import preprocessing from sklearn.model_selection import train_test_split from sklearn.metrics import (roc_curve, roc_auc_score, confusion_matrix, accuracy_score, f1_score, precision_recall_curve, balanced_accuracy_score) from mlxtend.plotting import plot_confusion_matrix import theano import pymc3 as pm from pymc3.variational.callbacks import CheckParametersConvergence import statsmodels.formula.api as smf import arviz as az import matplotlib.pyplot as plt import matplotlib.cm as cm import seaborn as sns from IPython.display import HTML import sys if not sys.warnoptions: import warnings warnings.simplefilter("ignore") # intialise data of lists. data = {'BAD':[1,1,0,1,0,0,0,1,1,0,0,1,0,0,1,0,1], 'LOAN':[1700,1800,2300,2400,2400,2900,2900,2900,2900, 3000,3200,3300,3600,3600,3700,3800,3900], 'MORTDUE':[30548,28502,102370,34863,98449,103949,104373,7750,61962,104570, 74864,130518,100693,52337,17857,51180,29896], 'VALUE':[40320,43034,120953,47471,117195,112505,120702,67996,70915,121729, 87266,164317,114743,63989,21144,63459,45960], 'REASON':['HomeImp','HomeImp','HomeImp','HomeImp','HomeImp', 'HomeImp','HomeImp','HomeImp', 'DebtCon','HomeImp','HomeImp','DebtCon','HomeImp','HomeImp', 'HomeImp','HomeImp','HomeImp'], 'JOB':['Other','Other','Office','Mgr','Office','Office','Office', 'Other','Mgr','Office','ProfExe', 'Other','Office','Office','Other','Office','Other'], 'YOJ':[9,11,2,12,4,1,2,16,2,2,7,9,6,20,5,20,11], 'DEROG':[0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0], 'DELINQ':[0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0], 'CLAGE':[101.4660019,88.76602988,90.99253347,70.49108003, 93.81177486,96.10232967,101.5402975, 122.2046628,282.8016592,85.8843719,250.6312692, 192.289149,88.47045214,204.2724988, 129.7173231,203.7515336,146.1232422], 'NINQ':[1,0,0,1,0,0,0,2,3,0,0,0,0,0,1,0,0], 'CLNO':[8,8,13,21,13,13,13,8,37,14,12,33,14,20,9,20,14], 'DEBTINC':[37.11361356,36.88489409,31.58850318,38.26360073, 29.68182705,30.05113629,29.91585903, 36.211348,49.20639579,32.05978327,42.90999735, 35.73055919,29.39354338,20.47091551, 26.63434752,20.06704205,24.47888119] } # Create DataFrame data = pd.DataFrame(data) # datatype defining data[['BAD', 'LOAN', 'MORTDUE', 'VALUE', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO', 'DEBTINC']] = data[['BAD', 'LOAN', 'MORTDUE', 'VALUE', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO', 'DEBTINC' ]].apply(pd.to_numeric) data[['REASON', 'JOB']] = data[['REASON', 'JOB']].apply(lambda x: x.astype('category')) print(data.dtypes) data.dropna(axis=0, how='any',inplace=True) # test train split X = data.drop('BAD', axis=1) y = data.BAD X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=12345) labels = X_train.columns # model training (error cause) X_shared = theano.shared(X_train) with pm.Model() as logistic_model_pred: pm.glm.GLM(x=X_shared, y=y_train, labels=labels, family=pm.glm.families.Binomial()) # Prediction on test data X_shared = theano.shared(X_test) ppc = pm.sample_ppc(pred_trace, model=logistic_model_pred, samples=100) # AUC np.mean(ppc['y'], axis=0).shape y_score = np.mean(ppc['y'], axis=0) roc_auc_score(y_score=np.mean(ppc['y'], axis=0), y_true=y_test) pred_scores = dict(y_true=y_test, y_score=y_score) cols = ['False Positive Rate', 'True Positive Rate', 'threshold'] roc = pd.DataFrame(dict(zip(cols, roc_curve(**pred_scores)))) precision, recall, ts = precision_recall_curve(y_true=y_test, probas_pred=y_score) pr_curve = pd.DataFrame({'Precision': precision, 'Recall': recall}) f1 = pd.Series({t: f1_score(y_true=y_test, y_pred=y_score>t) for t in ts}) best_threshold = f1.idxmax() # Alternative solution with pm.Model() as logistic_model_pred: pm.glm.GLM.from_formula('BAD ~ DELINQ + DEROG + DEBTINC + NINQ + CLNO + VALUE + MORTDUE + YOJ + LOAN + CLAGE + JOB', data=pd.concat([y_train.reset_index(drop=True), X_train], axis=1), family=pm.glm.families.Binomial()) pred_trace = pm.sample(tune=1500, draws=1000, chains=4, cores=1, init='adapt_diag')

Bayes Factors from posterior likelihoods in python?
After an AB test, I am looking to interpret the results with a Bayesian approach.
Given the following code, or something similar, how would I be able to produce the Bayes Factor  the ratio of the marginal likelihoods?
# A a_total = 28593 a_success = 3891 a_failure = a_total  a_success # B b_total = 28137 b_success = 3775 b_failure = b_total  b_success # generate arrays obs_a = np.repeat([1, 0], [a_success, a_failure]) obs_b = np.repeat([1, 0], [b_success, b_failure]) # pymc3 model with pm.Model() as model: p_A = pm.Beta("p_A", 1, 1) p_B = pm.Beta("p_B", 1, 1) delta = pm.Deterministic("delta", p_A  p_B) obs_A = pm.Bernoulli("obs_A", p_A, observed = obs_a) obs_B = pm.Bernoulli("obs_B", p_B, observed = obs_b) step = pm.NUTS() trace = pm.sample(10000, step = step) b_lift = (trace['p_B'].mean()  trace['p_A'].mean()) / trace['p_A'].mean() * 100 b_prob = np.mean(trace["delta"] > 0) a_lift = (trace['p_A'].mean()  trace['p_B'].mean()) / trace['p_B'].mean() * 100 a_prob = np.mean(trace["delta"] < 0) print(f'There is {b_prob} probability B outperforms A by a magnitude of {round(b_lift, 2)}%') print(f'There is {a_prob} probability A outperforms B by a magnitude of {round(a_lift, 2)}%')  output: There is 0.666 probability B outperforms A by a magnitude of 1.29% There is 0.334 probability A outperforms B by a magnitude of 1.28%
If I'm not mistaken, the Bayes Factor would essentially be a measure of the above evidence?
I would greatly appreciate your help!