What changed?
I’ve been using coding agents to write 100% of my code since Claude Code came out.
The discourse lately about a big shift in agent ability initially surprised me. I agree the agents are great. Though I have not seen a huge difference between Opus 4.5 and 4.6 as well as Codex-5.2 and 5.3. The new models are incredible. The models just before were quite incredible too.
I will admit, 4.6 and 5.3 seem to be just a bit smarter than their predecessors.
So is the change in the discourse that these models crossed a threshold for people?
Is the change that people had written coding agents off and came back to drastically better models?
Or is it something else?
I’d argue it’s a little bit of everything. The models are likely the majority, let’s say 70% of the change. But the harnesses also play a meaningful role.
A few major improvements in the harnesses have made a huge quality of life difference in my day to day.
Specifically, auto compaction has made a massive difference in being able to rely on an agent to complete it’s work. It can’t be understated how reliable auto compaction is a huge unlock to coding workflows.
Claude Code’s auto compaction initially was borked and I never used it. I would dutifully create a new session once the context window filled up.
Now, I have been working on greenfield projects for the past year. What about someone working on an old codebase?
Autocompaction is in my view the single biggest improvement outside of the model. I played around with making a fork of Zed in December and something in the agent’s ability work work independently gave me confidence that I could actually do this. Auto compaction was a huge part of it.
The ability of a model to take an underspecified prompt and do something reasonable with it has been a huge change. Early on with Claude Code, I would give it a task that was relatively well scoped and expect it to still need assistance. Now I can give it larger tasks, expect it to make reasonable decisions and test the output. That switch to larger tasks makes it more impressive to folks just tuning in.
Finally, the models have been RLHF’d to ask questions and be more conversational. This tuning makes them feel fun to use! It’s like working with a real collaborator. Opus 4.6 shines particularly well in it’s conversational style. Codex-5.3 shines in it’s speed.
I started to see the shift on conversational ability with Codex-5.1 and it it’s only gotten better since then.
The tuning here to default to asking the user questions softens the hurdle for getting good results from the first prompt. In a sense, the models train the human users to disambiguate. That makes a world of difference in helping a human user understand how to get the best results from the model.
A few other things I like:
- Claude Code’s use of color helps me grok ideas faster
- Codex’s app makes it seamless to start a bunch of ephemeral threads