Confusion with Vowpal Wabbit Contextual Bandit training data formatting
I am new to Vowpal Wabbit and am working on a multi-arm bandit model to recommend different CTAs for sign up pop ups. I already completed the walkthrough on the main site but am a bit confuse on what the training data is supposed to look like for the
--cb_explore_adf version. So far, for regular versions (with set action totals) the data looks like:
action:cost:probability | features
which makes sense, but then when you get to the adf version, it becomes:
| a:1 b:0.5 0:0.1:0.75 | a:0.5 b:1 c:2 shared | s_1 s_2 0:1.0:0.5 | a:1 b:1 c:1 | a:0.5 b:2 c:1
I've read the documentation numerous times and I still don't understand how this works.
I think an example of data similar to mine of how it would be adapted to the above version would be great.
Example of my use case: 2 actions: 1 and 0 3 features: language, country, favorite sport
Some of the docs that I've looked at:
Playing around with it, I created a train.txt with this input:
shared |user language=en nation=CAN |action arm=10-OC-ValueProp10 0:0:0.5 |action arm=11-OC-ValueProp11 shared |user language=it nation=ITA |action arm=10-OC-ValueProp10 0:0:0.5 |action arm=11-OC-ValueProp11 shared |user language=it nation=ITA 0:0:0.5 |action arm=10-OC-ValueProp10 |action arm=11-OC-ValueProp11 shared |user language=it nation=ITA 0:0:0.5 |action arm=10-OC-ValueProp10 |action arm=11-OC-ValueProp11
But when I run this:
vw = pyvw.vw("-d full_data.txt --cb_explore_adf -q ua --quiet --epsilon 0.2") vw.predict("|user language=en nation=USA")
I get a [1.0] which doesn't make sense. I am sure that I am doing something wrong.
ADF stands for action dependent features. So each event/example consists of multiple lines, with the first line being an optional set of shared features (marked with
Apart from the shared line, each line corresponds with an action.
So, when you provide VW with the input:
|user language=en nation=USA
You are asking for a prediction for only 1 action (since there is no shared line), which is why you are getting back a PMF (probability mass function, or the probability to choose each distinct item) which is simply [1.0]. This states the single action should be chosen with a probability of 1.0. However, reading the features it looks as though you are actually passing what should e the shared features.
For each prediction you need to provide all of the features for each action, as essentially the action itself is defined as the set of its features (ADF).
Your predict data should look something like (notice the label is omitted):
shared |user language=it nation=ITA |action arm=10-OC-ValueProp10 |action arm=11-OC-ValueProp11
VW will then emit something that looks like [0.9, 0.1]. You should then sample from this PMF (to allow for exploration) to determine which is the chosen action.
The format of the training data is a bit confusing since the same format was reused from non-adf. The
actionportion of the label is actually unused since the label must be on the line as the action it is for.
shared |user language=en nation=CAN |action arm=10-OC-ValueProp10 0:0:0.5 |action arm=11-OC-ValueProp11
In the above example it says that action two here had a cost of 0, and when it was picked the probability of choosing it was 0.5 (the value in the PMF)