Are you training new models from scratch or just fine tuning LLMs? I'm from the CV side and we tend to train stuff from scratch because we're still highly focused on finding new architectures and how to scale. The NLP people I know tend to use LLMs and existing checkpoints so their experiments tend to be a lot cheaper.
Not that anyone should think any aspect (training nor inference) is cheap.
Not that anyone should think any aspect (training nor inference) is cheap.