How SWE-bench changes expectations for coding assistants by grounding evaluation in real repositories and fixes.