On the Model-misspecification of Reinforcement Learning