bnlearn::bn.fit difference and calculation of methods "mle" and "bayes"
I try to understand the differences between the two methods bayes
and mle
in the bn.fit
function of the package bnlearn
.
I know about the debate between the frequentist and the bayesian approach on understanding probabilities. On a theoretical level I suppose the maximum likelihood estimate mle
is a simple frequentist approach setting the relative frequencies as the probability. But what calculations are done to get the bayes
estimate? I already checked out the bnlearn documenation, the description of the bn.fit function and some application examples, but nowhere there's a real description of what's happening.
I also tried to understand the function in R by first checking out bnlearn::bn.fit
, leading to bnlearn:::bn.fit.backend
, leading to bnlearn:::smartSapply
but then I got stuck.
Some help would be really appreciated as I use the package for academic work and therefore I should be able to explain what happens.
See also questions close to this topic

DiagrammeR export_graph Invalid asm.js
I'm having a problem exporting graphs in
R
to PDFs usingDiagrammeR's
export_graph
function inRStudio
.Example below to reproduce the problem. The PDFs are produced inconsistently so sometimes not at all.
The error message I get is on calling the export_graph in the code snipet below.
I'm using RStudio Version 1.1.463 and R 3.5.2 on Windows 10.
"\<"unknown">":1919791: Invalid asm.js: Function definition doesn't match use"
library(data.tree) library(yaml) library(DiagrammeR) library(DiagrammeRsvg) fileName < system.file("extdata", "jennylind.yaml", package="data.tree") cat(readChar(fileName, file.info(fileName)$size)) lol < yaml.load_file(fileName) jl < as.Node(lol) pic < ToDiagrammeRGraph(jl) render_graph(pic) export_graph(pic, "C:/Tmp/plot.pdf", file_type = "pdf")

Plotting in ggplot after converting to data.frame with a single column?
I'm trying to convert some simple data into a form I thought ggplot2 would accept.
I snag some simple stock data and now I just want to plot, later I want to plot say a 10day moving average or a 30day historical volatility period to go with it, which is I'm using ggplot.
I thought it would work something like this line of pseudocode
ggplot(maindata)+geom_line(moving average)+geom_line(30dayvol)
library(quantmod) library(ggplot2) start = as.Date("20080101") end = as.Date("20190213") start tickers = c("AMD") getSymbols(tickers, src = 'yahoo', from = start, to = end) closing_prices = as.data.frame(AMD$AMD.Close) ggplot(closing_prices, aes(y='AMD.Close'))
But I can't even get this to work. The problem of course appears to be that I don't have an xaxis. How do I tell ggplot to use the index column as a. Can this not work? Do I have to create a new "date" or "day" column?
This line for instance using the Regular R plot function works just fine
plot.ts(closing_prices)
This works without requiring me to enter a hard xaxis, and produces a graph, however I haven't figured out how to layer other lines onto this same graph, evidently ggplot is better so I tried that.
Any advice?

Using scale_color_gradient2 with a variable of class Date
I'm trying to color by date with ggplot2, but when I try to customize the color using
scale_color_gradient2
, I get an error sayingError in as.Date.numeric(value) : 'origin' must be supplied
.I can't seem to figure out how to pass the origin to
scale_color_gradient2
.I've provided an example below. Any advice?
set.seed(1) x1 < rnorm(100) x2 < rnorm(100) day < sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 100) myData < data.frame(x1, x2, day) # this plot works as expected ggplot(myData, aes(x = x1, y = x2, color = day)) + geom_point() # scale_color_gradient2() asks for an origin, but I can't figure out how to supply one ggplot(myData, aes(x = x1, y = x2, color = day)) + geom_point() + scale_color_gradient2()

CRAN package with Bayesian logistic regression via PolyaGamma scheme
I am maintaining a package that uses
BayesLogit
for Bayesian Logistic regression using the PolyaGamma latent variable technique and return samples from a Markov ChainMonte Carlo (MCMC).BayesLogit
is no longer on CRAN and I can install a previous version withinstall_version("BayesLogit", version = "0.6")
But this hack will prevent the submission of my package to CRAN. The source code was last updated a year ago, so I don't think it will return to CRAN.
I found another package that does the same thing with a similar syntax. But this package is not on CRAN either and installs with
devtools::install_github("kasparmartens/PolyaGamma")
Does a CRAN package implement Bayesian logistic regression with a PolyaGamma scheme and return the MCMC samples?

Causal Impact package: Calculate posterior tailarea probability from model estimates
I am currently using the CausalImpact package for some research and in this context I need to know and be able to explain, how the posterior tailarea probability is calculated in order to reproduce that value for validation purposes. Does anyone know, how that value can be reproduced given the data and the estimation series provided by the model? Thanks in advance!

Bayes factor for intercept only mixed logistic regression model vs. null model
I would like to compute a Bayes factor for an intercept only mixed logistic regression model vs. null model.
This is for a study where each participant undergoes multiple trials with a success or fail outcome, and we would like to assess whether it is more likely that the data came from a world where M0 is true (chance for success is p = 0.5 in each trial), vs. a world where M1 is true (people have a different success rate than p = 0.5, and there may be some personal differences in success rate, i.e. random intercept).
I tried to do this using the brms package in R. According to these guides the following code should be used to compute the Bayes factor: https://rpubs.com/lindeloev/bayes_factors https://mvuorre.github.io/post/2017/03/21/bayesfactorswithbrms/
But when I use the hypothesis() function approach, the Evid.Ratio is NA, and when I try to fit a null model with intercept = 0 to use the bayes_factor() function, the null model does not run and returns an error message:
SAMPLING FOR MODEL '7fd1e8370afde87f07b7e1f715ec714c' NOW (CHAIN 1). [1] "Error in sampler$call_sampler(args_list[[i]]) : " [2] " Must use algorithm="Fixed_param" for model that has no parameters." [1] "error occurred during calling the sampler; sampling not done"
Here is my code:
library(brms) dat_1 = read.csv("https://raw.githubusercontent.com/kekecsz/Bayesfactordifferentdistributions/master/data_all_%200.51_14000.csv") fit2 < brm(data_all_H1_pre2 ~ 0 + intercept + (0 + interceptparticipant_ID), data = dat_1, family = bernoulli, control = list(adapt_delta = 0.90)) hypothesis(fit2, hypothesis = 'intercept = 0') fit2_null < brm(data_all_H1_pre2 ~ 0, data = dat_1, family = bernoulli, control = list(adapt_delta = 0.90)) bayes_factor(fit2, fit2_null) # can't run this, because the code above (fit2_null) returns an error.
In the above example I use simulated data where the true success rate is p = 0.51 instead of 0.5 across all participants, so the expected result would be a Bayes factor favoring M1.
I don't necesseraly want to use the brms package, it is just the only one that I know which can handle mixed models. I would be fine with a solution using JAGS or rstan as well.
How can I compute the Bayes factor in this case?

Causality map changing when order of the variables are changed
I am using
bnlearn
andpcalg
R packages to obtain the causality map from the datasets. There is an orderindependent algorithm which claims to be independent of variables are given as input. When I change the order the variables, directions of arrowheads are changing. Below is the code I am using:data("gmG") set.seed(101) cols = sample(ncol(gmG8$x)) suffStat < list(C = cor(gmG8$x[,cols]), n = nrow(gmG8$x)) pc.gmG < pc(suffStat, indepTest = gaussCItest, labels = colnames(gmG8$x)[cols], alpha = 0.01) plot(pc.gmG)
Above code gives me following output:
Now I run the same code with different order of data.
data("gmG") set.seed(102) cols = sample(ncol(gmG8$x)) suffStat < list(C = cor(gmG8$x[,cols]), n = nrow(gmG8$x)) pc.gmG < pc(suffStat, indepTest = gaussCItest, labels = colnames(gmG8$x)[cols], alpha = 0.01) plot(pc.gmG)
As one can see there the arrows have changed the direction for v6 and v7. Am I missing something here? Note: I am aware of skeleton has been unchanged (graph without arrows).

How to correct multiple gaussian variables, given the stability analysis of their structure?
We have multiple Gaussian variables, which could be the locations of 2d points. Suppose the 2d points are measured independently.
If we connect the adjacent points, then we will get a structure (graph). Suppose we have a model to compute whether the structure is stable or not.
How can we use the information about stable structure to correct the given measurements? We can anticipate that some positions of the 2d points may cause an unstable structure. Hence, we can prune these positions and get a better estimate.

Build a Bayesian network to make a set of random variables have maximum joint probability
I am working on generating random Bayesian network ( from random DAG, and CDP) as an input for a heuristic search algorithm, which finds the maximum joint probability for a set of nodes in the BN.
I want to create BN such that a given random variable will have the maximum joint probability for a set of given values.
For example, BN includes RVs: A: a1,a2,a3, B: b1,b2,b3, C: c1,c2,c3, D; d1,d2,d3. How to build the BN such that p(A=a1,B=b2,D=d1) is the maximum probability among any other values that A,B,D can take.
Note: I can adapt DAG and CDP to give me this network, there is no constraint here.

Correct usage of stats4::mle
I want to use stats4::mle function to estimate the best parameters (2) of a distribution.
I would like to be sure my usage is correct and get guidance to avoid error
"Error in optim(start, f, method = method, hessian = TRUE, ...) : initial value in 'vmmin' is not finite In addition: Warning message: In log(mu) : NaNs produced"
Function I would like to estimate is exp(beta0*a + beta1*b) and I would like to estimate the betas
Sample code:
a < mydata$a # first variable b < mydata$b # second variable y < mydata$y # observed result nll < function(beta0, beta1) { mu = y  exp(beta0 * a + beta1 * b)  sum(log(mu)) } est < stats4::mle(minuslog = nll, start = list(beta0 = 0.0001, beta1 = 0.0001)) est
So:
 Is this the correct way of doing things?
 For the error, I understand this is due to values of mu getting to 0, but I don't know what I can do with it
Thanks for your help.

Why doesn't "beta.fit" come out right?
import numpy as np import matplotlib.pyplot as plt from scipy import stats observed = [0.294, 0.2955, 0.235, 0.2536, 0.2423, 0.2844, 0.2099, 0.2355, 0.2946, 0.3388, 0.2202, 0.2523, 0.2209, 0.2707, 0.1885, 0.2414, 0.2846, 0.328, 0.2265, 0.2563, 0.2345, 0.2845, 0.1787, 0.2392, 0.2777, 0.3076, 0.2108, 0.2477, 0.234, 0.2696, 0.1839, 0.2344, 0.2872, 0.3224, 0.2152, 0.2593, 0.2295, 0.2702, 0.1876, 0.2331, 0.2809, 0.3316, 0.2099, 0.2814, 0.2174, 0.2516, 0.2029, 0.2282, 0.2697, 0.3424, 0.2259, 0.2626, 0.2187, 0.2502, 0.2161, 0.2194, 0.2628, 0.3296, 0.2323, 0.2557, 0.2215, 0.2383, 0.2166, 0.2315, 0.2757, 0.3163, 0.2311, 0.2479, 0.2199, 0.2418, 0.1938, 0.2394, 0.2718, 0.3297, 0.2346, 0.2523, 0.2262, 0.2481, 0.2118, 0.241, 0.271, 0.3525, 0.2323, 0.2513, 0.2313, 0.2476, 0.232, 0.2295, 0.2645, 0.3386, 0.2334, 0.2631, 0.226, 0.2603, 0.2334, 0.2375, 0.2744, 0.3491, 0.2052, 0.2473, 0.228, 0.2448, 0.2189, 0.2149] a, b, loc, scale = stats.beta.fit(observed,floc=0,fscale=1) ax = plt.subplot(111) ax.hist(observed, alpha=0.75, color='green', bins=104, density=True) ax.plot(np.linspace(0, 1, 100), stats.beta.pdf(np.linspace(0, 1, 100), a, b)) plt.show()
The α and β is out of whack (α=6.056697373013153，β=409078.57804704335) The fitting image is also unreasonable. Histograms and beta distributions differ in height on the Yaxis.
The data of average is about 0.25, but calculated according to the expected value of beta distribution, 6.05/(6.05+409078.57)=1.47891162469e05.This seems counterintuitive.

distributing a MLE problem on a cluster using RMPI
I'd like each slave to read a large individual level dataset, then iteratively to find the subject level likelihood given updates of the parameter vector. The slaves read the datasets, and then the error is thrown by the first slave to try to compute a subject level likelihood  it cant find the dataframe. I'm guessing that the broadcast read commands that were successful were R processes that exited. So my general question  how can you make data persist between a one time command (read) and a command that gets iterated
I submit the job via the shell command
sbatch partition=multinode batchrmpiex1mle.s
file batchrmpiex1mle.R
#!/bin/bash # # this file is batchrmpiex1mle.sh # #SBATCH ntasks=1051 #SBATCH time=00:30:00 #SBATCH mempercpu=1024 #SBATCH jobname="Rmpi_ex1mle" #SBATCH mailtype=BEGIN,END # cd $SLURM_SUBMIT_DIR echo "Working from $(pwd)" export OMPI_MCA_mpi_warn_on_fork=0 #Run this from mpirun so that R starts in an MPI environment module load R module load openmpi/3.0.2/gcc7.3.0pmi2 mpirun n 1 R CMD BATCH norestore nosave quiet \ mainrmpiex1mle.R mainrmpiex1mle.Rout
file mainrmpiex1mle.R
if (!is.loaded("mpi_initialize")) { library("Rmpi") } "%,%" < function(x,y)paste(x,y,sep="") wd < Sys.getenv("SLURM_SUBMIT_DIR") source(wd %,% "/" %,% "funsrmpiex1mle.R") N < 1051 dsname < "dat.rda" load(wd %,% "/" %,% (N1) %,% "/" %,% dsname) p < ncol(DAT.i) B < rep(0, p) # Spawn ns workers mpi.spawn.Rslaves(nslaves=N1) # broadcast common code to all slaves mpi.bcast.Robj2slave(read.i) mpi.bcast.Robj2slave(logL.i) # The command we want to run on all the nodes/processors we have mpi.bcast.cmd(read.i, N1, dsname, wd) fit.glm < nlminb(start=B, obj=FlogL, n=N1) fit.glm.s < summary(fit.glm) save(list=c("fit.glm", "fit.glm.s"), file=fitglmpar.rda") # Close down the MPI processes and quit mpi.close.Rslaves(dellog=FALSE) mpi.exit()
file funsrmpiex1mle.R
"%,%" < function(x,y)paste(x,y,sep="") "FlogL" < function(B, n) { ns < n logL < sum(unlist(mpi.apply(1:ns, logL.i, B=B, comm=1))) logL } #workercode "read.i" < function(ns, dsname, wd) { "%,%" < function(x,y)paste(x,y,sep="") imx < ceiling(logb(ns, 10)) # which slave am I? i < mpi.comm.rank() i.digs < ceiling(logb(1 + i, 10)) z < paste(rep("0", imxi.digs), collapse="") i.s < z %,% i load(wd %,% "/" %,% i.s %,% "/" %,% dsname) } "logL.i" < function(B, i) { attach(DAT.i) b0 < B[1] b < B[1] P.i < logit.inv(b0 + X%*%b) D*log(P.i) + (1D)*log(1P.i) }

Bnlearn for hidden variable BN
I am trying to learn a hybrid Bayes Network with hidden variables. I have a fixed structure and my data frame has hidden variables with NaN value for all the rows. I tried learning the parameters using bn.fit method but it is learning CPD with NaN. I see there is a structuralem method to learns the structure of BN with hidden Variable. Is there a way to learn the parameters alone for a given structure?

Is there an equivalent to R's Bnlearn in Python?
I am building a Bayesian Network using the Bnlearn package in R. Is there a package in Python that would be able to recreate the same (or a very similar) model? I see there are a lot of bayesian network packages in Python, but many of them seem deficient in some way. Particular features I am looking for:
 Support for continuous & categorical variables
 Support for not only inference but also structure learning.
 The ability to perform arc operations  similar to bnlearn's white & black listing feature where I can define relationships between nodes or prevent the learning process from defining certain relationships.
The motivation behind this is that this model will eventually need to be put into production, and (at least I do not believe) R's singlethreaded nature would be an issue there. PMML would also be an option  but it doesn't look like any of the R to PMML packages support bayesian networks.