They had been doing that, and called it "shadow mode" [1]. I suspect it's no longer being done, perhaps they reached the limit of what they can learn from that sort of training.
When it's in 'real mode', any disengagement or intervention (ie. using the accelerator pedal without disengaging) is logged to the car and sent to Tesla for some data analysis, and this has been a thing for a while. Of course we don't know just how thorough their data science plays into FSD decision making and what interventions they actually investigate.
[1] https://www.theverge.com/2016/10/19/13341194/tesla-autopilot...