Optimal policy evaluation using kernel-based temporal difference methods