HLE encompasses very hard problems where the larger pretraining of Mythos probably matters quite a bit. I'm not saying that Mythos is not showing some amount of genuine improvement compared to e.g. the latest Opus; just that if you're going to compare models, you should at least make sure that the overall test-time workload is in the same ballpark given how high it seems to be for Mythos.