The Tech Stack Behind a Production AI Voice Platform

Overview

When we built Loquent, our AI voice platform for healthcare, we had to make dozens of architectural decisions. This post walks through our tech stack, why we chose each piece, and what we'd do differently.

This is written for technical founders and engineers evaluating similar builds.

The Stack at a Glance

| Layer | Technology | Why | |-------|-----------|-----| | Backend API | NestJS (Node.js) | Modular architecture, TypeScript-native, great for complex business logic | | Database | PostgreSQL + Prisma ORM | Relational data with type-safe queries, excellent migration tooling | | Voice/Telephony | Twilio | Programmable voice with WebSocket streaming, proven reliability | | AI/LLM | OpenAI GPT-4 + Anthropic Claude | Best-in-class language understanding, used for different tasks | | Frontend | Next.js + React | Dashboard for clinic staff, SSR for performance | | Infrastructure | AWS (EC2, RDS, S3) | Full control, HIPAA-eligible services | | Payments | Stripe | Subscription billing for clinic customers | | Deployment | PM2 + Vercel | API on EC2 with PM2, dashboard on Vercel |

Backend: NestJS

We chose NestJS for the backend API. The main reasons:

Module system: NestJS's module architecture maps perfectly to our domain — calls, agents, sessions, billing, webhooks, integrations. Each module owns its logic cleanly.

TypeScript-native: Full type safety from database to API response. Prisma generates types, NestJS validates DTOs, and the whole chain is type-checked.

Decorator-based routing: Guards, interceptors, and pipes give us clean middleware patterns for authentication, rate limiting, and request validation.

Testing: Built-in testing utilities make unit and integration testing straightforward.

Database: PostgreSQL + Prisma

PostgreSQL handles our relational data — users, organizations, agents, call sessions, billing records. Prisma provides:

Type-safe queries: No raw SQL mistakes
Schema-as-code: Prisma schema is the source of truth
Migrations: Predictable, reviewable database changes
Relations: Clean handling of complex relationships (Organization → Agents → Sessions → Transcripts)

Voice: Twilio

Twilio's programmable voice handles telephony — receiving calls, streaming audio, and managing call state. The critical feature is WebSocket media streaming, which lets us process audio in real-time:

Call comes in → Twilio routes to our webhook
We establish a WebSocket connection for audio streaming
Audio flows to speech-to-text in real-time
LLM processes the transcript and generates a response
Text-to-speech converts the response to audio
Audio streams back to the caller via WebSocket

The entire loop happens in under 2 seconds — fast enough to feel like a natural conversation.

AI Layer

We use multiple AI models for different tasks:

Conversation handling: GPT-4 and Claude for understanding patient intent, maintaining conversation context, and generating natural responses. We use system prompts tuned for healthcare conversations.

Intent classification: Lightweight models to quickly classify what the caller wants (book appointment, reschedule, ask a question, speak to human).

Action execution: Custom logic that takes the AI's decision and executes it — calling scheduling APIs, sending SMS confirmations, updating records.

RAG (Retrieval-Augmented Generation): For clinic-specific information (services, doctors, hours, preparation instructions), we use vector search to provide relevant context to the LLM.

Frontend: Next.js

The dashboard where clinic staff manage their AI agents is built with Next.js:

Server-side rendering for fast initial loads
Real-time updates for active call monitoring
Responsive design — staff use it on desktop and mobile

Infrastructure Decisions

Why AWS over serverless? Voice processing needs consistent, low-latency compute. Serverless cold starts would add unacceptable delay to voice conversations. EC2 gives us predictable performance.

Why PM2? Simple process management for the NestJS API. Handles restarts, clustering, and log management without the complexity of Kubernetes (which would be overkill at our scale).

Why Vercel for the frontend? Next.js deploys perfectly on Vercel. Automatic previews, edge caching, and zero-config deployment. No reason to self-host the dashboard.

What We'd Do Differently

Start with better observability: We should have added structured logging and call tracing from day one. Debugging voice AI issues across the speech-to-text → LLM → text-to-speech chain is painful without good observability.

Abstract the LLM layer earlier: We initially hardcoded OpenAI calls. When we added Claude support, we had to refactor. An LLM abstraction layer from the start would have saved time.

Load testing: Voice calls are inherently concurrent. We should have stress-tested concurrent call handling earlier in development.

Key Takeaways

NestJS + Prisma + PostgreSQL is a solid foundation for complex AI applications
Twilio's WebSocket streaming is essential for real-time voice AI
Use multiple AI models for different tasks — don't force one model to do everything
Invest in observability early — AI systems are harder to debug than traditional software
Ship fast, iterate — our first version was far from perfect, but it was in production handling real calls within 8 weeks

If you're building something similar and want to talk architecture, reach out. We're happy to share what we've learned.

The Tech Stack Behind a Production AI Voice Platform

Overview

The Stack at a Glance

Backend: NestJS

Database: PostgreSQL + Prisma

Voice: Twilio

AI Layer

Frontend: Next.js

Infrastructure Decisions

What We'd Do Differently

Key Takeaways

Keep Reading

AI Receptionist for Canadian Dental Clinics: What You Need to Know

How We Built Loquent in Under 8 Weeks

The Complete Guide to Model Context Protocol (MCP)

Ready to build with AI?