"RAG-powered." The phrase appears on every agency website that touches AI. Most of the time it means: we have an OpenAI API key and we prompt it with the customer's name.
That's not RAG. That's interpolation. Real Retrieval-Augmented Generation has three required components, none of which most "RAG-powered" implementations include.
The three required components
- A real retrieval layer — a vector database (Pinecone, Weaviate, Supabase pgvector) populated with embeddings of relevant context. Not just the customer's name and company, but actual content: their historical conversations, brand voice samples, objection library, top-converter scripts, ICP playbook.
- Relevance scoring on retrieval — semantic similarity, recency weighting, and intent matching. The first 3-5 most-relevant chunks get passed to the model; everything else is dropped.
- Structured augmentation in the prompt — the retrieved context isn't just stuffed into the system prompt. It's structured (XML or markdown) with explicit roles ("here's the brand voice", "here's a similar past conversation", "here's the prospect's data") so the model can reason about it differently.
Without all three, you have a fancy mail merge.
A real example: the Email Nurturer for Autobroker U
Here's the actual prompt structure we run for one client. Some specifics are redacted but the architecture is real.
When the Email Nurturer is about to write a follow-up email to a prospect named Sarah (SaaS founder, $18K/mo ad spend, recently raised pricing concerns), the system runs in this order:
- Retrieve top 3 most-similar past conversations from the embedding index. Filtered to: same prospect-type (SaaS founder), same stage (post-qualifier, pre-call), same objection family (pricing). Returns three real prior conversations where similar objections were addressed.
- Retrieve top 5 brand-voice samples — emails from the client's historical winners that match the tone and length we're aiming for.
- Retrieve relevant objection responses from the objection library. For "pricing concerns" specifically, we have 8 documented response patterns; the most-similar 2 are pulled.
- Retrieve the prospect's full conversation history from the CRM — every prior touch.
- Construct the augmented prompt with all retrieved context, structured by role.
What the structured prompt actually looks like
Approximate shape (simplified):
<role>
You are the Email Nurturer for [Client]. You write in [Client]'s brand voice, you address objections with empathy and specificity, you never use generic sales language.
</role>
<brand_voice_samples>
[3 actual emails from the client's historical winners]
</brand_voice_samples>
<similar_past_conversations>
[2-3 anonymized past conversations where the same objection was successfully addressed]
</similar_past_conversations>
<objection_response_patterns>
[Top 2 documented response patterns for "pricing concern" from the playbook]
</objection_response_patterns>
<prospect_context>
Name: Sarah
Company: [redacted SaaS]
Role: Founder
Ad spend: $18K/mo
Current ROAS: 1.4×
Stage: Post-qualifier, no call booked yet
Last interaction: 2 days ago, expressed pricing concern in qualifier conversation
Prior emails received: 3 (all opened)
Prior email engagement: high; replied to 2 of 3
</prospect_context>
<objective>
Send a follow-up email that:
1. Addresses the pricing concern directly without being defensive
2. Reframes pricing as investment-vs-cost
3. Provides one concrete next-step CTA (book call OR reply with a clarifying question)
4. Length: 80-120 words
5. Tone matches the brand voice samples above
</objective>
<output_format>
JSON with: subject, preheader, body, suggested_send_time_iso
</output_format>
Why this works
Three things happen when you structure the prompt this way:
- The model has actual examples of voice — not just an instruction to "match the brand voice." It can extract the patterns and apply them.
- The model has actual examples of addressing this specific objection — past conversations where the same problem was solved. It can adapt rather than invent.
- The model has structured prospect data, not a name dropped into a template. It can reason about why Sarah specifically might be raising pricing concerns and what would land.
The output is an email that reads like the client's best emails. Not "Hi {first_name}, I noticed you have a {objection_type}, let me address that." A real email that addresses Sarah's actual situation with the brand's actual voice.
The infrastructure to support this
For one client, the embedding index typically contains:
- 500-2,000 historical client emails (the winners)
- 50-200 sales-call transcripts
- 50-100 documented objection-response patterns
- 1,000-5,000 prior prospect conversations (anonymized)
- The full ICP playbook
- Brand voice style guide
That's 1,500-7,000 embedded chunks per client. Index size: typically 4-12 MB. Query latency: 80-200ms per retrieval. Cost: $0.001-$0.003 per agent decision.
How most agencies skip this
The shortcut most "RAG-powered" agency implementations take is to skip retrieval entirely. They pass the prospect's name and a few CRM fields to a hardcoded prompt that says "write a follow-up email" and hope the model produces something usable.
The output is generic. The voice is wrong. The objection responses are vague. The customer can tell — and so can the prospect.
The reason agencies skip retrieval is that it requires actual data engineering: indexing the historical content, maintaining embeddings, scoring relevance, structuring prompts. That's two weeks of setup per client minimum. Most agencies don't have the technical capacity, so they pretend they do.
The customers who can tell the difference are the ones who care. The customers who can't tell the difference are the ones who churn faster, because the agent layer they thought they bought never existed.
What to ask any vendor claiming RAG
Three questions that filter signal from noise:
- "Show me one real agent conversation from a current client deployment. Walk me through what was retrieved, why, and how it was structured in the prompt."
- "What's in your embedding index for [example client]? How many chunks? What types?"
- "What's the latency from prospect input to agent decision? How much is retrieval vs. inference?"
If the vendor can answer all three with specifics, they're running real RAG. If they pivot to "we use the latest GPT" or "we have proprietary fine-tuning" — they don't.