Yes, exactly.
These are both fair points. I don't suppose there's much that can be done about the second one, except to try to discourage it where possible; the first, though, can perhaps be reduced by formalizing condensed "standard" encounter sets. E.g., a set of four-five widely differentiated encounters with which to compare the classes, either by running them through, or simply theorycrafting. (Essentially, the encounter series Psyren mentioned, standardized.)
This is true, as far as it goes, but the fallacy lies in trying to use the comparison for more than it actually works for. Fighters are a fairly good stand-in for all kinds of mundane classes and monsters, but what about a comparison of Shadowcaster vs. Truenamer? There are very very few monsters that use either of those subsystems, and not too many NPCs for that matter, and a direct arena comparison is therefore next to useless for determining their respective value in a party.