We built our AI agent's prompt fast, but wrong

"I tried doing this using ChatGPT, it was okay until it didn't work."

That's what our customers tell us about trying generic AI for price quote creation. They tried, it sort of worked, then it fell apart.

We built Charlie to solve this. Charlie is our AI agent that creates price quotes for trade show booths. What used to take half a day now takes 15 minutes.

The difference? Charlie knows the industry. My co-founder Sugir spent 7 years in trade show booth sales. He knows how to read floor plans, identify elements on 3D mockups, and structure quotes line by line.

Generic Claude or ChatGPT doesn't know any of this. Charlie does. That specialized knowledge is what makes it work.

But here's what nobody tells you about building specialized AI agents: the way we built Charlie's instructions was fast to start but became a maintenance nightmare.

Let me show you what happened.

How we built Charlie's first prompt

When we started, I was still building our (now abandoned) project management tool. Sugir wanted to test if AI could actually do this quote creation thing.

So he just... started talking to Claude.

His process looked like this:

Start with the same inputs he'd use himself (PDF proposals with floor plans)
Ask Claude to create a price quote
Claude produces nonsense or something that needs fixing
Tell Claude what's wrong
Ask: "What should I add/remove/update in your instructions to fix this?"
Apply that suggestion
Start a fresh conversation
See if the result improved
Repeat

He did this hundreds of times over several weeks.

Eventually, he got something that worked. Not perfect, but good enough to make us think: okay, this could actually work.

This iterative approach got us to a working prototype in weeks, not months.

Why this approach worked for our MVP

Looking back, this "talk to the AI, fix the prompt, repeat" method had real advantages:

We moved fast. No formal process, no documentation overhead. Just Sugir, Claude, and a growing set of instructions.

The expert was in control. Sugir knew exactly what was wrong with each output. He could immediately spot when Charlie was making the same mistakes he'd seen junior salespeople make.

We got real results. The prototype worked well enough to show customers. They saw value immediately.

This is actually a valid approach when:

You have an expert who's comfortable working with AI
You're trying to prove a concept
Speed matters more than maintainability

We checked all those boxes. The iterative method got Charlie off the ground.

But here's where it gets expensive.

The maintenance wall

Six months in, we started hitting problems.

Customer A wants a modification. They calculate furniture pricing differently than our standard approach. No problem, we'll just adjust the prompt for them.

Except... the prompt is 50 pages of instructions. We need to find the furniture calculation logic, modify it, and make sure we don't break anything else.

We create a separate version: charlie-prompt-customer-a.txt

Customer B wants different output formatting. Another custom version: charlie-prompt-customer-b.txt

Now we have three versions of the same massive prompt.

Then we need to change something in the core logic. Something about how Charlie identifies booth dimensions on floor plans. We need to update this in all three versions.

We spend time just finding where the dimension logic lives in each prompt. Then we have to carefully apply the same update three times, making sure the customer-specific parts don't interfere.

Even AI coding tools struggle with this. We've tried using Cursor and Claude Code to edit these massive prompt files. They can't consistently find the right section or the right mention of a concept.

So for the same change, they'll modify each version in slightly different ways. One version gets the update in section A, another in section B. The files drift further apart with each change.

This is when we realized: we didn't build a prompt. We built a monolithic patchwork that's becoming very hard and time-consuming to maintain.

What makes it a "patchwork"

The iterative method produces instructions that look like this:

You are Charlie, an expert at creating price quotes...

When analyzing floor plans, look for...

[50 lines about floor plan analysis]

For furniture items, calculate the price by...

[30 lines about furniture]

Oh, and when you see curved walls, make sure to...

[20 lines added three weeks later]

Actually, for the furniture calculation, also consider...

[15 lines patching the earlier furniture section]

Every time we hit a problem, we added more instructions. Sometimes at the end. Sometimes scattered throughout. Sometimes contradicting earlier sections.

The result:

Difficult to read. Tough to find the specific instruction you need in 50 pages.
Difficult to maintain. The furniture logic is scattered throughout, mixed with instructions about how Charlie should behave.
Difficult to extend. Want to add new functionality? It's hard to find where it currently lives AND decide where new pieces should fit.
Very hard to customize. Customer variations mean duplicating the entire file.

We're drowning in our own instructions.

The real cost

Here's what this architectural debt actually costs us:

Risk: Each customer version increases the chance of bugs. We fix something in version A but forget to fix it in version B.

Velocity: New features are slower to ship because we have to think through three different prompt versions.

Team friction: Sugir knows where everything is because he built it. But onboarding anyone else to work on the prompts is brutal.

The iterative approach that got us to market is now slowing us down.

What we're doing about it

We're rebuilding Charlie's instructions as composable agents.

Instead of duplicating massive prompt files for each customer, we're separating:

Common logic (shared across all customers): How Charlie should behave, how the workflow to create a price quote works, the general structure and reasoning approach
Customer-specific rules (isolated per agency): Their markup coefficients, calculation preferences, pricing logic

Each agency gets their own Charlie, composed from shared knowledge + their business rules.

Here's why this matters beyond just maintenance:

Customers own their business logic.

As a default, we set the standard markup coefficient for electrical work at 1.7x. But one of our agencies uses 1.55x based on their supplier relationships.

Right now, that rule is buried somewhere in their 50-page prompt file. When they ask "How did Charlie calculate this?" we have to hunt through the patchwork.

With composable agents, their markup rules live in one clear place. We haven't opened visibility of these configuration files to customers yet, but this composability makes it much easier later on - if we want to - to expose the "library" of rules and knowledge they own.

This isn't just about making our lives easier. It's about making Charlie more trustworthy.

Generic Claude would guess at "reasonable markups." Charlie knows your agency's actual numbers.

We're implementing this now. Early results are promising.

I'll share more once we've proven this works at scale.

The lesson

If you're building a specialized AI agent, the iterative "talk to it and fix the prompt" approach will get you to a working prototype fast.

But know what you're signing up for: you're building technical debt from day one.

That might be fine. Getting to market and validating the idea is worth it. Just don't be surprised when you hit the scaling wall like we did.

The question isn't whether to iterate or not. It's: when do you stop iterating and start architecting?

For us, that moment was six months in, when customer variations became unmanageable.

Your mileage may vary. But if you're staring at a 50-page prompt file right now, wondering why simple changes take forever...

You're probably at that moment too.

Building specialized AI agents in production? I'm documenting our journey - the wins and the faceplants. Hit me up at will@willgyt.com if you want to compare notes.