This directory includes some benchmark experiments and demonstrations about off-policy evaluation using the full size Open Bandit Dataset. The detailed description, results, and discussions can be found in the relevant paper.
cf_policy_search
: counterfactual policy search using OPEope
: estimation performance comparisons on a variety of OPE estimators