Can users switch between voice and text mid-conversation without losing context?

Yes. The Unified Context architecture maintains a single session state across both modalities, allowing seamless switching between voice and text input within the same conversation thread.

Can the voice assistant detect AI-generated deepfake voices?

Yes. An anti-spoofing layer analyzes acoustic features including spectral patterns, pitch consistency, and micro-pauses to identify synthesized speech and protect against AI-driven voice fraud.

What is the system's response latency for text and voice interactions?

Text responses average 0.4 seconds and voice responses average 0.9 seconds end-to-end, achieved through edge-deployed LLM inference across 12 regional nodes.

How does the system handle high-traffic spikes without degrading performance?

Auto-scaling inference pods with priority routing ensure high-risk sessions get dedicated compute while routine queries use shared pools, maintaining sub-second responses at 5x normal traffic.

What kinds of case studies are published here?

Deep-dives on AI agents we have shipped: voice AI for telecalling, dental, real-estate, and travel concierge use cases; multi-modal chatbots; content-automation pipelines; cross-cultural tone-checkers; Reddit lead-capture; and ScrapCRM. Each walks through architecture, decisions, and measurable outcomes.

How are case studies different from blog posts?

Case studies are anchored to a specific shipped product with a real client, real metrics, and a real testimonial. Blog posts are commentary, opinions, and engineering notes that are not tied to a single project.

What stack do you typically use for AI agents?

It depends on the constraints. Common picks: ElevenLabs or Twilio for voice; OpenAI, Anthropic, or open-weight models for the LLM layer; n8n or custom orchestration for the agent loop; Postgres and Redis for state; plus the integration layer (CRMs, calendars, telephony) tailored per project. Each case study lists the exact stack.

Can I use these as a buying signal for my own AI build?

Yes — that is exactly what they are for. Each one names the problem we were solving, the architecture we chose and why, and the measurable result. If your situation rhymes with one of them, that is a strong indicator we can help.

How do I start a conversation about a similar build?

Email contact@thworks.org with the case study that is closest to what you are trying to ship, plus your top three unknowns. We will book a 30-minute scoping call and return with a concrete plan, timeline, and price.

Real-time chatbot dashboard showing concurrent text and voice interactions with bot-threat detection heatmap and session analytics

All Case Studies

Fintech & Financial ServicesAI & Machine LearningCybersecurityFull-Stack Development

Secure Multimodal AI: Seamless Text & Voice Support with Integrated Anti-Bot Protection

Q: How does the chatbot decide when to trigger a CAPTCHA or security challenge?

The system uses Intent-Based Security, monitoring for high-risk actions combined with behavioral telemetry — mouse patterns, typing cadence, and session timing. CAPTCHAs trigger only when risk signals converge, so legitimate users rarely see them while bots are caught 99.2% of the time.

Achieved 0.4s text and 0.9s voice response times with 99.2% bot mitigation — without adding friction for real users.

0.4sText Response Time

0.9sVoice Response Time

99.2%Bot Block Rate

+42%User Trust Score

THWORKS developed a multimodal AI assistant that unifies text and voice interfaces within a single secure session for a fintech client. The system delivers 0.4-second text responses and 0.9-second voice responses while blocking 99.2% of automated bot attacks through behavioral-first security — using session-based CAPTCHA and biometric voice analysis to protect high-risk flows like password resets and fund transfers without disrupting legitimate user experience.

The Challenge: Bot Attacks Surging 400% While User Engagement Dropped 20%

A fintech platform experienced a 400% surge in account takeover attempts via their support chat, while mobile user engagement simultaneously dropped 20% due to cumbersome text-only interfaces. Static CAPTCHAs frustrated 15% of legitimate users and were easily bypassed by advanced headless browsers — creating a security-UX deadlock where tightening protection actively drove away real customers.

In fintech, every friction point in the support flow directly impacts conversion and compliance. The client needed a system that allowed seamless voice-to-text switching during sensitive operations (password resets, fund transfers) while ensuring 100% of high-risk actions were protected by non-intrusive bot defense — targeting 80% automation of routine verification tasks.

Our Solution: Unified Context Architecture with Edge-Deployed Security

We implemented a 'Unified Context' architecture where conversation state is shared between a WebSocket-based voice stream and a React-based text UI. Users can start a query via text, provide an address via voice, and enter a PIN via text — all within a single continuous session without losing context or requiring re-authentication.

Security was integrated at the network edge, not bolted on as an afterthought. A behavioral analysis layer monitors interaction velocity, input patterns, and device fingerprints in real time. When bot-like behavior is detected during a high-risk flow, the system dynamically triggers step-up verification — challenging only suspicious sessions instead of blocking all users with a login wall.

Key Technical Decisions

Edge-Inference Routing: Deployed LLM inference at regional edge nodes to achieve 0.4s text latency, minimizing round-trip time for mobile users across 12 geographic regions.

Just-In-Time Verification: Implemented 'Step-Up Authentication' that only challenges users entering high-risk intent zones (fund transfers, password changes), maintaining an 87% session completion rate versus 72% with traditional global CAPTCHAs.

Biometric Voice Analysis: Integrated spectral analysis in the voice pipeline to distinguish between synthesized deepfake voices and genuine human speech — adding a security layer invisible to legitimate users.

Results: 99.2% Bot Mitigation with Zero UX Degradation

99.2%

Bot Defense Accuracy

60%

Voice Latency Improvement

87%

Onboarding Completion

Before

Rigid text-only bot with static CAPTCHAs. 15% legitimate user frustration rate. Easily bypassed by advanced headless browsers. No voice support. 72% session completion.

After

Fluid multimodal experience with invisible behavioral security. 99.2% bot block rate. Seamless voice-text switching. 87% session completion. User trust scores up 42%.

Technology Stack

GPT / LLMPowers conversational intelligence for understanding user intent and generating contextual responses across both text and voice modalities.

LangChainOrchestrates LLM chains, memory management, and tool-calling workflows for complex multi-turn conversations with state persistence.

FastAPIHigh-performance async Python backend serving 50,000+ concurrent sessions with native WebSocket support and OpenAPI documentation.

WebSocketsFull-duplex real-time communication for streaming text and voice responses without polling overhead or connection re-establishment.

Whisper (STT)OpenAI Whisper provides accurate multilingual speech recognition with fine-tuning for fintech-specific terminology.

TTS EngineConverts AI-generated text into natural-sounding speech with configurable voice personas matching the client's brand identity.

Behavioral CAPTCHAAdaptive bot detection combining interaction velocity analysis, device fingerprinting, and challenge-response verification for high-risk flows.

"The speed is incredible. We were worried that adding bot security would slow down the experience, but THWORKS built a system that's actually faster than our previous unsecured bot. The voice-to-text handoff is magic — our users love it."

Marcus ChenHead of Platform Security, FinEdge Systems

Frequently Asked Questions

Common questions about this project and our approach.

The system uses Intent-Based Security. It monitors for high-risk actions (withdrawals, email changes, password resets) and combines this with behavioral telemetry — mouse movement patterns, typing cadence, and session timing. CAPTCHAs are triggered only when risk signals converge, so legitimate users rarely see them while bots are caught 99.2% of the time.

Related Case Studies

24/7 Intelligent Voice AI: Automating Inbound Customer Care & Seamless Handoffs

Enterprise Customer Support

Cross-Cultural Tone Sentinel: Mastering Multilingual Sentiment & Escalation Precision

Global Retail & E-Commerce

Secure Your AI Interactions

Let's discuss how we can solve your technical challenges with the same precision and impact.

Secure Your AI Interactions