The more I learn about RL the more I realize no one has ever trained on-policy. You can never update the same model twice.
@redtachyon @hallerite (And even memoryless approaches are defacto using the environment as memory and thus not actually memoryless)
15,21K