I'm not sure if my question title is very clear, so let me clarify.

I'm running the following `probit`

regression on data that is sorted by date (ascending):

```
probit (dependent variable: "over") (independent variables) (Dummy variables)
predict over_
```

All the independent variables are moving averages calculated as follows:

```
hconvma=(hconv[_n-1]+hconv[_n-2]+hconv[_n-3])/3
```

Now, when I go to predict the probability of `over`

, I notice that the prediction changes when I insert the actual result for the given observation.

To make this more clear, I usually predict the probability of `over`

for 10-20 new observations without knowing the true result of `over`

. I am given a probability of 1 or 0 occurring for `over`

after running the regression for the 20 new observations.

However, once I input the *actual* results for `over`

for the new observations, the predictions for the given new observations change, sometimes by as much as 30%.

I have realised this is because the model is not entirely based on past results of all the independent variables since the dummy variables are formed by future data as well as past data.

From here stems the question: **Is it possible to have the Beta coefficient of all the dummy variables be based solely on past data?** Or in other words, can this model be made 100% predictive? In this way, all the values for `over_`

would be true, in the sense that they would reflect the probability prediction of `over`

occurring given no future data, or knowledge of the actual result for `over`

for any given observation. I would require this to test the true success rate of prediction for my model.

Can this be done in Stata, and if so, how?

If any further information on any variables is required please let me know.