bnlearn::bn.fit difference and calculation of methods "mle" and "bayes"
I try to understand the differences between the two methods bayes
and mle
in the bn.fit
function of the package bnlearn
.
I know about the debate between the frequentist and the bayesian approach on understanding probabilities. On a theoretical level I suppose the maximum likelihood estimate mle
is a simple frequentist approach setting the relative frequencies as the probability. But what calculations are done to get the bayes
estimate? I already checked out the bnlearn documenation, the description of the bn.fit function and some application examples, but nowhere there's a real description of what's happening.
I also tried to understand the function in R by first checking out bnlearn::bn.fit
, leading to bnlearn:::bn.fit.backend
, leading to bnlearn:::smartSapply
but then I got stuck.
Some help would be really appreciated as I use the package for academic work and therefore I should be able to explain what happens.
See also questions close to this topic

How do I filter and then calculate the mean using the pipe operator?
I want to filter a dataframe and then calculate the mean of a column. But I want to use the pipe operator, and I'm not sure how I can do that.
The dataframe name is
vocab
, and the year I am filtering on is2016
. The name of the column I want to calculate the mean of iseducation
.vocab %>% filter(year == 2016) %>% mean(vocab$education)
Am I mixing up dplyr and base r syntax here? Must I only use dplyr syntax with pipe operators?

Large Matrix to run in cv.glmnet() for multinomial classification
I am working on a large matrix with number of samples N=40 and features, P=7130. I am trying to fit the
cv.glmnet()
for the ridge but i am getting error while doing this.
The dimensions of the dataset is (40,7130)
The code for the cv.glmnet() is as follows:ridge2_cv < cv.glmnet(x, y, ## type.measure: loss to use for crossvalidation. type.measure = "deviance", ## K = 10 is the default. nfold = 10, ## Multinomial regression family = "multinomial", ## ‘alpha = 1’ is the lasso penalty, and ‘alpha = 0’ the ridge penalty. alpha = 0)
Here
x
is large matrix with 285160 elements.y
is the multiclass response variable of size 40
I keep getting this error when i run the above function.Error in cbind2(1, newx) %*% (nbeta[[i]]) : invalid class 'NA' to dup_mMatrix_as_dgeMatrix In addition: Warning messages: 1: In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one multinomial or binomial class has fewer than 8 observations; dangerous ground 2: In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one multinomial or binomial class has fewer than 8 observations; dangerous ground

weighted voronoi diagram, power diagram, or Laguerre–Voronoi diagram in R
I'm having a similar problem to the one discussed in this post:
Project Scipy Voronoi diagram from 3d to 2d
However, I'm looking to perform the power diagram in R. I know the deldir() function in deldir uses the "weights" from the 'z' attribute more as labels than actual weights, but I'd like a function that will use the weights to determine where the separating lines are drawn like this:
This is what I have so far (which produces the normal nonweighted voronoi diagram):
library(dismo) x=c(1,4,3) y=c(1,2,5) z=c(1,3,2) points=data.frame(x=x,y=y) vor < dismo::voronoi(points) plot(vor) points(x,y)

Encog Bayes Classifier
I am trying to build a Bayes Classifier using Encog.
My MLDataSet is a double [][] (rows=150; columns= 140000)
My problem is that I don't know how to iterate trough all my events using BayesianNetwork. I have seen some examples using this but for small networks like this:
BayesianNetwork network = new BayesianNetwork(); network.createEvent("x1"); network.createEvent("x2"); network.createEvent("x3"); network.createEvent("x4");
Could you please help me?
Also, how can I reduce the dimension of my matrix in Encog?

STAN in R: Standard deviation or variance in normal distribution
I'm using STAN in R to estimate some parameters. For the priors I want to use a normal distribution. In the model description the priors have to be determined:
a ~ normal(mean,sigma)
Does STAN interpret the sigma as the standard deviation or as the variance?
Thank you in advance!

Creating Dummy Variables from data frame
I need to do a Bayesian linear regression (using the rethinking package) which requires my categorical variable be numeric (I think), so I'm trying to create dummy variables so that different categories are represented by (0,1,2,3...). I've simulated some data below to show you what my data looks like. Keep in mind that my actual data contains many more categorical variables that are represented here, so using a bunch of ifelse functions wouldn't be ideal:
#simulated data X < data.frame( Longest.axis = rnorm(50,10,5), Time..hrs. = round(runif(50,0,4)), Strain = sapply(1:50,function(x) {ifelse(x<12,"Msmeg_2788","Msmeg_WT")}), Item.Name = sapply(1:50, function(x) {ifelse(x<6,"Q109.jpg", ifelse(x<20,"Q340.jpg", ifelse(x<40,"Q11.jpg","Q230.jpg") ) )} ) )
And bellow I show my attempt to create dummy variables:
#create dummy variables Straindummy < model.matrix(X$Strain) Item.Namedummy < model.matrix(X$Item.Name)
it returns the following error:
Error: $ operator is invalid for atomic vectors
And here is my attempt to do a multivariate Bayesian linear regression just in case you need it to suggest what I should do:
library(rethinking) model < map( alist( X$Longest.Axis ~ dnorm(mu,sigma), mu < a + b1*Straindummy+b2*Item.Namedummy+b3*X$Time..hrs., a ~ dnorm(10,10), b1 ~ dnorm(0,1), b2 ~ dnorm(0,1), b3 ~ dnorm(0,1), sigma ~ dunif(0,10) ), data = X )
Thank you if you can offer any help!

Problems with abn packages (language r) in Ubuntu 18.04
If I use
search.hillclimber()
command, I had the following error:Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double'
And, I don't understand why.
Please, anybody to help me?
The code is the example 4 in this link: http://www.rbayesiannetworks.org/cookbook

Pandas: columnwise comparison with list
I am using a dataframe to keep track of all the possible combinations of two nodes. I am trying to return the row that corresponds to the current node values.
My dataframe lists the six possible combinations of parent values for node 1. The parents are 7 and 2. The values of nodes 7 and 2 are 2 and 1, respectively.
Currently, I am using the isin() method, but this returns 'True' when a value in the dataframe is equal to any value
How can I perform this comparison so that the first parent is only compared with the first column of the dataframe and so on?
Thanks!
parent_values = [df.iloc[z,parents[i]] for i in range(0,len(parents))] tf_frame = parent_df.isin(parent_values) testout = tf_frame.index[parent_df[7] == True].tolist()

Imposing constraints on parameters in MARSS in R
I'm using the
MARSS
package in R and I was wondering if there is a way to bound/restrict parameters when I'm running a maximum likelihood over a state space model.For example, let's say I have a model as follows:
x_{t} = phi*x_{t1} + v_{t} y_{t} = x_{t} + w_{t}
where
phi
is a parameter to be estimated andv_{t}
andw_{t}
are error terms where we must estimate their variances.Now, let's say I want to restrict this model such that
phi
can only vary between the values 0.1 to 0.5 and the variance ofv_{t}
can only be between the values 0 and 50% of the maximum variance of the data in the day. Is there any way to do this inMARSS
? I know when estimating parameters we can perform operations on parameters (e.g. if we have to find the parametera
in a matrix then we can perform the operation2*a
andMARSS
accounts for this, as given in the topic here: Imposing linear restrictions with MARSS).However, it appears that
MARSS
ignores functions, where you cannot evaluate parameters which have things such asexp(a)
in the matrix. Is there a way to do this? Thanks in advance. 
Compute MLE through a data frame for different segment of observations
I am doing Maximum Likelihood Estimation using
mle
function inr
. Here is myr
code.library(stats4) set.seed(650) y = c(rnorm(15, 1, 1), rnorm(15,3,1)) dat = data.frame(y) LLl < function(mu) { datl = dat[1:6,] R = dnorm(datl , mu, 1) sum(log(R)) } mle(LLl, start = list(mu=1))
I estimated the parameter for
mu
for using first 6 observation. I want to estimatemu
fordatl = dat[1:i,]
for anyi=2,3,3,..30
. How can I do this one? Am I able to useapply
function inr
?Thank you in advance.

2d correlation of two similar parts
I have to solve the following problem:
I have two images with the same part, but the part is rotated, scaled and translated in the second image (without shear). This part has characteristic points that you can use to compare one with each other.
The problem is: The second part can contain one or more errors (not too many).
I need to compute the rotation, scaling and translation matrix (this is not the problem itself) so thus the characteristic points of the first image have the minimum accumulated distance with the points of the second image.
I'm working with python, but the first approximation is bruteforce and I get the minimum distance from each
im1
point toim2
point. Is it possible to do this withransac
orflann
? What is the best or correct way to approach it? I think is a MLE problem, and the function to minimize is the accumulated distance to every pair of points.Later, I will assign to each
image1
characteristic point the nearestimage2
point, because if it's not clear, until now we don't know what the correspondence is. 
How to get the joint probability using BNlearn R package
I am using BNlearn package to learn BN from a dataset like this, I couldn't find a function to return the joint probability of a set of variables given the BN.
I can only find conditional query but not joint probability query.
Any idea, please.

Bayesian Network in Python
Currently, I use bnlearn library in R to implement a Bayesian Network model. The model requires a node structure that domain experts design. I feed the model with the node structure and a training dataset.
Now, I need to replicate the same thing in Python. I searched through different libraries and did not find something similar. The most similar one perhaps is Naive Base in SciKit. However, Naive Bayes does not require any node structure!
I appreciate if anyone can help me to replicate my Rmodel in Python.

Bayesian Networks: The MultiVar accuracy situation
I'm facing a problem with bayesian networks and I hope to find a answer. I going to try to split my question to improve your understanding!
OBJECTIVE: After training my model, given the variables X (read matrix A) I want to predict the variables Y (read matrix B). As example:
Given the evidences in variable A,B,C I want to predict the variables D,E,F
By this you can see that I search for a multivar prediction!
THE PROBLEM OR CATASTROPHE: Currently I'm using the package bnlearn in R and the problem is: The worst models (a empty and random graph) has sometime a better accuracy than my best model (learned from the data) or appears very close to my best model! and to make matters worse the test set is inside of my training set, so technically my best model should perform ~100% and this doesn't happen!
I want to know if there is any mathematical mistake in my implementation (I explaned stepbystep in the code) and how can I explain this situation...
PS1: I know that is necessary to split the data, but in my real project I don't have enough lines and changing the dataset (size) will probably change the final model
PS2: The multivar accuracy is not present in the package, so was created manually
Here is my code with commentaries:
library(bnlearn) # Dataframe al < alarm # Nodes that I'm going to use as evidence to my models nodeEvid < names(al)[c(30,31,32,33,34,35,36,37)] # Nodes that I'm going to use as events to my models nodeEvnt < names(al)[c(30,31,32,33,34,35,36,37)] ## Best Model  Using all my data to create the arcs bn_k2 < tabu(x = al, score = 'k2') ## Worst Models # Empty model bn_eg < empty.graph(names(al)) # Random model bn_rd < random.graph(names(al)) # Fitting the models ... modelsBN = list(bn.fit(x = bn_k2, data = al), bn.fit(x = bn_eg, data = al), bn.fit(x = bn_rd, data = al)) # Seed set.seed(7) # Selecting randomly lines to create our dTest trainRows < sample(1:nrow(al),as.integer(nrow(al)*0.30) , replace=F) # Dataframe for test dTest < al[trainRows,] # ACCURACY  CPDIST TO MULTIVAR # Dataframe to keep all the results in the end accuracyCPD < setNames(data.frame(matrix(ncol = length(nodeEvnt) + 1, nrow = length(modelsBN))), c(nodeEvnt,"TOTAL MEAN ACCURACY BY MODEL")) # Process to calculate ... for (m in 1:length(modelsBN)){ # For every m bayesian model that I created # predCPD is a dataframe generated to keep the results to each sample run, I will explain more ahead... predCPD < setNames(data.frame(matrix(ncol = length(nodeEvnt), nrow = nrow(dTest))), nodeEvnt) for (i in seq(nrow(dTest))){ # For i samples in my dTest #cpdist is a function that returns a dataframe of predictions based on conditional probability distribution from the model, with the rows being n value and the columns being # the nodeEvnt. So I will save his results in a dataframe called 'teste' teste < cpdist(modelsBN[[m]], nodes = nodeEvnt, evidence = as.list(dTest[i, names(dTest) %in% nodeEvid]), n = 1000, method = "lw") # Here I use predCPD to calculate a % of how many times was returned the TRUE value/rows from my teste dataframe (tries from my model), this will be done to each variable # from nodeEvnt for (j in 1:length(nodeEvnt)){ # Gerar media de acertos para j fatores bioticos predCPD[i,nodeEvnt[j]] < sum(teste[nodeEvnt[j]] == as.character(dTest[i,nodeEvnt[j]]), na.rm = TRUE)/nrow(teste) } } # Here I do a 'Mean from means' (because predCPD is technically a mean) after my dTest is done, so accCPD have the results to that m model accCPD < colMeans(predCPD, na.rm = TRUE) # Here I just multiply by 100 to put in the format 0  100 % for (j in 1:length(nodeEvnt)){ accuracyCPD[m,nodeEvnt[j]] < accCPD[nodeEvnt[j]]*100 } # Here I do a mean from my target variables to save in "TOTAL MEAN ACCURACY BY MODEL" accuracyCPD[m,length(nodeEvnt) + 1] < mean(accCPD)*100 }
I really want to solve this situation, and I think the answer would be in the CPT's, but is just a try...