Filling in Missing Values in Python
I have a data frame that looks like this.
Subject Level Age Dosage 1 Beta 27 2 2 Alpha 19 3 3 Alpha 13 5
And a data frame that looks like this.
Subject Level Age 4 Beta 18 5 Beta 26 6 Alpha 17 7 Beta 27
My desired result is the second data frame with predicted dosage numbers looking like this.
Subject Level Age Pred_Dosage 4 Beta 18 4 5 Beta 26 3 6 Alpha 17 1 7 Beta 27 3
Basically, I want to use the first data frame to predict the dosage field values for the second data frame. I was thinking random forest regressor would be the right approach, however are there any other ones?
Since Dosage to be predicted is a quantative variable, you need a regressor algorithm. A number of these are available, e.g. see here You should also mention how many rows are available in the training data frame. Also, you should confirm that there are only 2 predictor variables (Level and Age). These factors may affect choice of the algorithm.
You may also first do a univariate analysis to find if there is a significant relation between Dosage and both Level and Age. Whether Dosage is predicted by one, both or none predictors may affect your model.
It should also be clear that each row belongs to a distinct subject and there is no repeat testing of subjects.
With only one categorical and one numeric predictor, one can also make a scatterplot with Age on X-axis and Dosage on y-axis. The points for Alpha and Beta can be colored differently and regression line plotted separately for Alpha and Beta. This will also help in creating a good model.