
Built a production-grade customer support agent that handles the long tail of refund, shipping, and order-status tickets with a supervisor-enforced policy layer and human-in-the-loop on any write action above threshold.
The Challenge
This mid-market e-commerce retailer experienced severe operational bottlenecks due to a high volume of repetitive customer support tickets covering order tracking, return requests, and shipping delay inquiries. These transactional tickets accounted for over 70% of inbound customer service queues, draining human agent resources and inflating customer service operational costs. Prior attempts to resolve this issue using traditional, keyword-matching FAQ chatbots failed because those legacy conversational systems could only retrieve static text and lacked the capabilities to access live database states. To successfully automate operations, the business required a secure agentic system that could read and modify customer files within Shopify and HubSpot. However, doing so introduced severe business risks, including potential leakage of personally identifiable information (PII) and the unauthorized execution of fraudulent customer refunds that bypassed standard retail policy controls.
Pain points we set out to solve
- ×70%+ of inbound tickets were repetitive and deflectable
- ×Prior chatbot could not act on internal systems, only read
- ×No guardrails meant every AI refund was a compliance risk
- ×Agents burned hours copy-pasting order data between 4 tools
Objectives
- 01Deflect at least 35% of inbound support volume within 90 days
- 02Zero unauthorized writes - every refund must respect policy limits
- 03Sub-5-second agent response time on 95% of tickets
- 04Full audit trail for every action the agent takes
Approach
How we delivered — phased, with clear checkpoints and evidence at each step.
- Week 1-2
Discovery and data audit
Mapped ticket taxonomy from 6 months of Zendesk history, identified the 8 ticket types that covered 82% of volume, and defined policy guardrails with the Ops team.
- Week 3-5
Agent architecture
Designed a LangGraph supervisor graph with specialized sub-agents - an intent classifier, a read-only order-lookup agent, and a write-action agent - each with scoped tool permissions.
- Week 6-8
Tool integration and guardrails
Wired tools into Shopify, HubSpot, and the internal RMA service. Built a policy enforcement node that validates every proposed write against refund limits, return windows, and customer lifetime value.
- Week 9-10
HITL, evals and launch
Added human-in-the-loop approval for any refund above 150 dollars. Built an eval harness with 120 golden tickets and shipped behind a feature flag to 10% of traffic, then ramped.
The Solution
The engineered solution is a multi-agent orchestration graph built on the LangGraph framework using a supervisor-worker design pattern to coordinate specialized functional nodes. When a support ticket enters the system, a lightweight intent-classifier agent parses the user query and routes the context to dedicated worker nodes, including a read-only Shopify agent and a CRM data retrieval agent. To eliminate compliance and financial liabilities, the architecture implements a strict supervisor policy node that intercepts all write operations proposed by the action agents. This policy node validates refund amounts, return windows, and customer lifetime values (CLV) against pre-configured retail safety rules in code rather than prompts. Refunds exceeding 150 dollars are automatically paused and routed to a human-in-the-loop (HITL) approval queue. The entire agentic workflow logs comprehensive execution traces to LangSmith, ensuring a complete, searchable audit trail for compliance teams.
Supervisor-enforced policy layer
No write action executes without clearing configurable refund limits, return-window rules, and CLV-based escalation thresholds.
Scoped tool permissions per agent
The read agent can only query. The write agent can only act after supervisor approval. No single agent holds both capabilities.
Human-in-the-loop on high-risk actions
Refunds over 150 dollars, account merges, and address changes route to a human queue with full context pre-loaded.
Full tracing and audit
LangSmith traces every node run, and action logs mirror to the warehouse so compliance can reconstruct any decision.
Technology stack
Picked for latency, cost, and long-term maintainability — not for novelty.
AI / Agent
Tools & Integrations
Observability
Infra
Results
Business impact
The support team shifted from clearing a queue to handling edge cases and VIP customers. CSAT held steady at 4.6/5, and the retailer redirected two FTEs from ticket triage to proactive retention work.
Key takeaways
- Supervisor graphs beat monolithic agents for anything that touches production systems
- Policy belongs in code, not in the prompt - LLMs will confidently violate soft constraints
- An eval harness catches regressions that human spot-checks miss, especially on long-tail tickets