Building with AI Agents: What I've Learned So Far

We’re in the early innings of AI agents actually being useful. Not the demo-ware “watch me book a flight” kind — I mean agents that do real work, reliably, in production.

At Axislabs, I’ve been building products that lean heavily on AI agents. Here’s what I’ve learned.

Start with the loop, not the model

The most common mistake I see is people starting with “let’s use GPT-4” or “let’s use Claude” and then figuring out the product. That’s backwards.

🔄
The model is a component. The loop is the product.

Start with the workflow loop:

Input

What does the agent receive? A user request, a webhook, a scheduled trigger?

Decision

What does it need to figure out? Which tool to use, what data to fetch, how to structure the response?

Action

What does it do? Call an API, write to a database, generate content, send a notification?

Verification

How do you know it worked? Validate the output, check against expected results, log everything.

Agents need guardrails, not freedom

The fantasy of a fully autonomous agent that “figures it out” is appealing but wrong for production. Every successful agent system I’ve built has tight guardrails:

Without guardrails

✗Unpredictable outputs
✗Token costs spiral
✗Silent failures
✗Users lose trust

With guardrails

✓Structured, validated output
✓Budget controls and circuit breakers
✓Graceful fallback to humans
✓Consistent, reliable behaviour

The goal isn’t to limit the AI. It’s to make it predictable enough to trust.

Tool use is where the magic happens

The real power of agents isn’t in generating text — it’s in using tools. A well-designed tool interface lets an agent interact with your systems:

const tools = [
  {
    name: "search_database",
    description: "Search the product database by query",
    parameters: {
      query: { type: "string" },
      limit: { type: "number", default: 10 },
    },
  },
  {
    name: "send_notification",
    description: "Send a notification to a user",
    parameters: {
      userId: { type: "string" },
      message: { type: "string" },
    },
  },
];

Keep tools simple, well-documented, and composable. The agent will figure out how to chain them together.

Evals are your test suite

You wouldn’t ship code without tests. Don’t ship agents without evals.

>95%

Accuracy target

<3s

P95 latency

<$0.05

Cost per task

I run evaluation suites that test:

Accuracy — does the agent produce the right output for known inputs?
Robustness — does it handle edge cases and malformed input gracefully?
Cost — how many tokens does it use on average?
Latency — is it fast enough for the user experience?

Where this is heading

🚀
We’re about 18 months away from agents being a boring, standard part of every SaaS product.

The companies building that muscle memory now — learning how to design agent loops, write good evals, and ship reliably — will have a massive advantage.

The best time to start building with agents was six months ago. The second best time is now.

If you’re building with AI agents and want to compare notes, find me on X.