On building trust

05 Aug, 2025

I spent the last 30 days vibe coding Swarm. The experience keeps bringing me back to one overriding question about AI coding: how do I build trust in the agent’s outputs?

Building trust in the outputs is critical to scaling usage.

If I can check out a PR and be sure it’s correct by running a few tests locally, that’s great.

If I can see the tests on the PR passed, that’s even better.

If I can see the agent ran a comprehensive integration test in its own sandbox that ensures the end-to-end flow works correctly after the changes, that’s the best.

I am a longtime Phabricator user, and one of my favorite aspects of the tool is the “Test Plan.” At Gem, we used the Test Plan section to write down how we tested the changes in the PRs we put up for review.

So far, I have not seen an AI coding tool that has a similar concept.

I want to know the code generated is trustworthy. Trustworthy means more than just passing the linter, unit tests, and compiler. It means it won’t break production. It actually handles edge cases gracefully.

With convincing test plans, we’ll be able to treat agents as actual engineers vs. the very smart interns they are today.

Eventually, we will move beyond humans as the gatekeepers for trust and agents will be able to establish trust with each other.