How to force classification tree to be complete?

For my application, I require only complete classification trees. But sklearn's DecisionTreeClassiffier only regards max_depth and max_leaf_nodes as inputs. Meaning there is no wy to force a complete tree with 2^(max_depth) leaves. How do I make sure sklearn only produces complete trees?

The function is here: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html .

1 answer

  • answered 2022-04-28 14:51 Ben Reiniger

    You cannot. Doing so might require some splits to create nodes with no training samples in them, which then cannot have target values meaningfully assigned nor further splits applied.

    If you set max_depth small enough, your data might be such that a complete binary tree is built, but there's no guarantee. Your very first split might produce a pure node on one side, which shouldn't be split further. If you want the tree to split even pure nodes (but which have enough samples to split at all), you may be able to define a custom splitting criterion, but that's not easy.

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum