Bayesian Netwotk structure learning
Using the bnlearn package I can learn the structure of a BN just by passing my dataset as parameter, for example:
bn1 < blnearn :: hc (dataset)
Or must I pass some edges as prior knowledge eg:
wl = data.frame (from = c ("A", "B"), to = c ("B", "C"))
bn1 < blnearn :: hc (datase, whitelist = wl)
What I mean is the bnlearn algorithms has capacity to learn the structure from data only or always need some help with prior knowledge.
See also questions close to this topic

Bayesian Network in Python
Currently, I use bnlearn library in R to implement a Bayesian Network model. The model requires a node structure that domain experts design. I feed the model with the node structure and a training dataset.
Now, I need to replicate the same thing in Python. I searched through different libraries and did not find something similar. The most similar one perhaps is Naive Base in SciKit. However, Naive Bayes does not require any node structure!
I appreciate if anyone can help me to replicate my Rmodel in Python.

Bayesian Networks: The MultiVar accuracy situation
I'm facing a problem with bayesian networks and I hope to find a answer. I going to try to split my question to improve your understanding!
OBJECTIVE: After training my model, given the variables X (read matrix A) I want to predict the variables Y (read matrix B). As example:
Given the evidences in variable A,B,C I want to predict the variables D,E,F
By this you can see that I search for a multivar prediction!
THE PROBLEM OR CATASTROPHE: Currently I'm using the package bnlearn in R and the problem is: The worst models (a empty and random graph) has sometime a better accuracy than my best model (learned from the data) or appears very close to my best model! and to make matters worse the test set is inside of my training set, so technically my best model should perform ~100% and this doesn't happen!
I want to know if there is any mathematical mistake in my implementation (I explaned stepbystep in the code) and how can I explain this situation...
PS1: I know that is necessary to split the data, but in my real project I don't have enough lines and changing the dataset (size) will probably change the final model
PS2: The multivar accuracy is not present in the package, so was created manually
Here is my code with commentaries:
library(bnlearn) # Dataframe al < alarm # Nodes that I'm going to use as evidence to my models nodeEvid < names(al)[c(30,31,32,33,34,35,36,37)] # Nodes that I'm going to use as events to my models nodeEvnt < names(al)[c(30,31,32,33,34,35,36,37)] ## Best Model  Using all my data to create the arcs bn_k2 < tabu(x = al, score = 'k2') ## Worst Models # Empty model bn_eg < empty.graph(names(al)) # Random model bn_rd < random.graph(names(al)) # Fitting the models ... modelsBN = list(bn.fit(x = bn_k2, data = al), bn.fit(x = bn_eg, data = al), bn.fit(x = bn_rd, data = al)) # Seed set.seed(7) # Selecting randomly lines to create our dTest trainRows < sample(1:nrow(al),as.integer(nrow(al)*0.30) , replace=F) # Dataframe for test dTest < al[trainRows,] # ACCURACY  CPDIST TO MULTIVAR # Dataframe to keep all the results in the end accuracyCPD < setNames(data.frame(matrix(ncol = length(nodeEvnt) + 1, nrow = length(modelsBN))), c(nodeEvnt,"TOTAL MEAN ACCURACY BY MODEL")) # Process to calculate ... for (m in 1:length(modelsBN)){ # For every m bayesian model that I created # predCPD is a dataframe generated to keep the results to each sample run, I will explain more ahead... predCPD < setNames(data.frame(matrix(ncol = length(nodeEvnt), nrow = nrow(dTest))), nodeEvnt) for (i in seq(nrow(dTest))){ # For i samples in my dTest #cpdist is a function that returns a dataframe of predictions based on conditional probability distribution from the model, with the rows being n value and the columns being # the nodeEvnt. So I will save his results in a dataframe called 'teste' teste < cpdist(modelsBN[[m]], nodes = nodeEvnt, evidence = as.list(dTest[i, names(dTest) %in% nodeEvid]), n = 1000, method = "lw") # Here I use predCPD to calculate a % of how many times was returned the TRUE value/rows from my teste dataframe (tries from my model), this will be done to each variable # from nodeEvnt for (j in 1:length(nodeEvnt)){ # Gerar media de acertos para j fatores bioticos predCPD[i,nodeEvnt[j]] < sum(teste[nodeEvnt[j]] == as.character(dTest[i,nodeEvnt[j]]), na.rm = TRUE)/nrow(teste) } } # Here I do a 'Mean from means' (because predCPD is technically a mean) after my dTest is done, so accCPD have the results to that m model accCPD < colMeans(predCPD, na.rm = TRUE) # Here I just multiply by 100 to put in the format 0  100 % for (j in 1:length(nodeEvnt)){ accuracyCPD[m,nodeEvnt[j]] < accCPD[nodeEvnt[j]]*100 } # Here I do a mean from my target variables to save in "TOTAL MEAN ACCURACY BY MODEL" accuracyCPD[m,length(nodeEvnt) + 1] < mean(accCPD)*100 }
I really want to solve this situation, and I think the answer would be in the CPT's, but is just a try...

bnlearn::bn.fit difference and calculation of methods "mle" and "bayes"
I try to understand the differences between the two methods
bayes
andmle
in thebn.fit
function of the packagebnlearn
.I know about the debate between the frequentist and the bayesian approach on understanding probabilities. On a theoretical level I suppose the maximum likelihood estimate
mle
is a simple frequentist approach setting the relative frequencies as the probability. But what calculations are done to get thebayes
estimate? I already checked out the bnlearn documenation, the description of the bn.fit function and some application examples, but nowhere there's a real description of what's happening.I also tried to understand the function in R by first checking out
bnlearn::bn.fit
, leading tobnlearn:::bn.fit.backend
, leading tobnlearn:::smartSapply
but then I got stuck.Some help would be really appreciated as I use the package for academic work and therefore I should be able to explain what happens.