Built a stateful multi-agent courtroom architecture with a moderator node scheduling parallel advocate/critic perspectives. Implemented a self-correcting RAG sub-graph combining Pinecone/BM25 search, Jina Reranking, and Self-RAG hallucination auditing.
RabbitHole
Messy public debates and complex legal scenarios resist simple, consensus-driven answers, and LLMs under prompt constraints frequently suffer from token overruns and perspective compliance issues.

Hierarchical LangGraph orchestration featuring State-Based Schema Constraints to restrict token usage, a Corrective RAG (CRAG) fallback using Jina Web Search, and dual-tier model routing (Llama-3.3-70B and smaller models).
Optimized execution latency (MTTV) by 51% (from 19.8s to 9.8s) while completely neutralizing token overruns. Delivered a modular, inspectable Python engine alongside a Next.js/React debug interface.
What to inspect.
Strict State-Based Schema Constraints resolving prompt-based perspective compliance issues.
Self-RAG Hallucination Auditor loop utilizing Jina Rerank and Corrective web search fallback.
Dynamic model routing and moderator partitioning reducing daily token consumption under Groq rate limits.