Filling in Missing Values in Python

I have a data frame that looks like this.

Subject Level Age Dosage 
1       Beta  27  2
2       Alpha 19  3
3       Alpha 13  5

And a data frame that looks like this.

Subject Level Age
4       Beta  18
5       Beta  26
6       Alpha 17
7       Beta  27

My desired result is the second data frame with predicted dosage numbers looking like this.

Subject Level Age Pred_Dosage
4       Beta  18  4
5       Beta  26  3
6       Alpha 17  1
7       Beta  27  3

Basically, I want to use the first data frame to predict the dosage field values for the second data frame. I was thinking random forest regressor would be the right approach, however are there any other ones?

1 answer

  • answered 2021-07-27 17:46 rnso

    Since Dosage to be predicted is a quantative variable, you need a regressor algorithm. A number of these are available, e.g. see here You should also mention how many rows are available in the training data frame. Also, you should confirm that there are only 2 predictor variables (Level and Age). These factors may affect choice of the algorithm.

    You may also first do a univariate analysis to find if there is a significant relation between Dosage and both Level and Age. Whether Dosage is predicted by one, both or none predictors may affect your model.

    It should also be clear that each row belongs to a distinct subject and there is no repeat testing of subjects.

    With only one categorical and one numeric predictor, one can also make a scatterplot with Age on X-axis and Dosage on y-axis. The points for Alpha and Beta can be colored differently and regression line plotted separately for Alpha and Beta. This will also help in creating a good model.

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum