Overview
When we built Loquent, our AI voice platform for healthcare, we had to make dozens of architectural decisions. This post walks through our tech stack, why we chose each piece, and what we'd do differently.
This is written for technical founders and engineers evaluating similar builds.
The Stack at a Glance
| Layer | Technology | Why | |-------|-----------|-----| | Backend API | NestJS (Node.js) | Modular architecture, TypeScript-native, great for complex business logic | | Database | PostgreSQL + Prisma ORM | Relational data with type-safe queries, excellent migration tooling | | Voice/Telephony | Twilio | Programmable voice with WebSocket streaming, proven reliability | | AI/LLM | OpenAI GPT-4 + Anthropic Claude | Best-in-class language understanding, used for different tasks | | Frontend | Next.js + React | Dashboard for clinic staff, SSR for performance | | Infrastructure | AWS (EC2, RDS, S3) | Full control, HIPAA-eligible services | | Payments | Stripe | Subscription billing for clinic customers | | Deployment | PM2 + Vercel | API on EC2 with PM2, dashboard on Vercel |
Backend: NestJS
We chose NestJS for the backend API. The main reasons:
Module system: NestJS's module architecture maps perfectly to our domain — calls, agents, sessions, billing, webhooks, integrations. Each module owns its logic cleanly.
TypeScript-native: Full type safety from database to API response. Prisma generates types, NestJS validates DTOs, and the whole chain is type-checked.
Decorator-based routing: Guards, interceptors, and pipes give us clean middleware patterns for authentication, rate limiting, and request validation.
Testing: Built-in testing utilities make unit and integration testing straightforward.
Database: PostgreSQL + Prisma
PostgreSQL handles our relational data — users, organizations, agents, call sessions, billing records. Prisma provides:
- Type-safe queries: No raw SQL mistakes
- Schema-as-code: Prisma schema is the source of truth
- Migrations: Predictable, reviewable database changes
- Relations: Clean handling of complex relationships (Organization → Agents → Sessions → Transcripts)
Voice: Twilio
Twilio's programmable voice handles telephony — receiving calls, streaming audio, and managing call state. The critical feature is WebSocket media streaming, which lets us process audio in real-time:
- Call comes in → Twilio routes to our webhook
- We establish a WebSocket connection for audio streaming
- Audio flows to speech-to-text in real-time
- LLM processes the transcript and generates a response
- Text-to-speech converts the response to audio
- Audio streams back to the caller via WebSocket
The entire loop happens in under 2 seconds — fast enough to feel like a natural conversation.
AI Layer
We use multiple AI models for different tasks:
Conversation handling: GPT-4 and Claude for understanding patient intent, maintaining conversation context, and generating natural responses. We use system prompts tuned for healthcare conversations.
Intent classification: Lightweight models to quickly classify what the caller wants (book appointment, reschedule, ask a question, speak to human).
Action execution: Custom logic that takes the AI's decision and executes it — calling scheduling APIs, sending SMS confirmations, updating records.
RAG (Retrieval-Augmented Generation): For clinic-specific information (services, doctors, hours, preparation instructions), we use vector search to provide relevant context to the LLM.
Frontend: Next.js
The dashboard where clinic staff manage their AI agents is built with Next.js:
- Server-side rendering for fast initial loads
- Real-time updates for active call monitoring
- Responsive design — staff use it on desktop and mobile
Infrastructure Decisions
Why AWS over serverless? Voice processing needs consistent, low-latency compute. Serverless cold starts would add unacceptable delay to voice conversations. EC2 gives us predictable performance.
Why PM2? Simple process management for the NestJS API. Handles restarts, clustering, and log management without the complexity of Kubernetes (which would be overkill at our scale).
Why Vercel for the frontend? Next.js deploys perfectly on Vercel. Automatic previews, edge caching, and zero-config deployment. No reason to self-host the dashboard.
What We'd Do Differently
Start with better observability: We should have added structured logging and call tracing from day one. Debugging voice AI issues across the speech-to-text → LLM → text-to-speech chain is painful without good observability.
Abstract the LLM layer earlier: We initially hardcoded OpenAI calls. When we added Claude support, we had to refactor. An LLM abstraction layer from the start would have saved time.
Load testing: Voice calls are inherently concurrent. We should have stress-tested concurrent call handling earlier in development.
Key Takeaways
- NestJS + Prisma + PostgreSQL is a solid foundation for complex AI applications
- Twilio's WebSocket streaming is essential for real-time voice AI
- Use multiple AI models for different tasks — don't force one model to do everything
- Invest in observability early — AI systems are harder to debug than traditional software
- Ship fast, iterate — our first version was far from perfect, but it was in production handling real calls within 8 weeks
If you're building something similar and want to talk architecture, reach out. We're happy to share what we've learned.