
Secure Multimodal AI: Seamless Text & Voice Support with Integrated Anti-Bot Protection
Achieved 0.4s text and 0.9s voice response times with 99.2% bot mitigation — without adding friction for real users.
THWORKS developed a multimodal AI assistant that unifies text and voice interfaces within a single secure session for a fintech client. The system delivers 0.4-second text responses and 0.9-second voice responses while blocking 99.2% of automated bot attacks through behavioral-first security — using session-based CAPTCHA and biometric voice analysis to protect high-risk flows like password resets and fund transfers without disrupting legitimate user experience.
The Challenge: Bot Attacks Surging 400% While User Engagement Dropped 20%
A fintech platform experienced a 400% surge in account takeover attempts via their support chat, while mobile user engagement simultaneously dropped 20% due to cumbersome text-only interfaces. Static CAPTCHAs frustrated 15% of legitimate users and were easily bypassed by advanced headless browsers — creating a security-UX deadlock where tightening protection actively drove away real customers.
In fintech, every friction point in the support flow directly impacts conversion and compliance. The client needed a system that allowed seamless voice-to-text switching during sensitive operations (password resets, fund transfers) while ensuring 100% of high-risk actions were protected by non-intrusive bot defense — targeting 80% automation of routine verification tasks.
Our Solution: Unified Context Architecture with Edge-Deployed Security
We implemented a 'Unified Context' architecture where conversation state is shared between a WebSocket-based voice stream and a React-based text UI. Users can start a query via text, provide an address via voice, and enter a PIN via text — all within a single continuous session without losing context or requiring re-authentication.
Security was integrated at the network edge, not bolted on as an afterthought. A behavioral analysis layer monitors interaction velocity, input patterns, and device fingerprints in real time. When bot-like behavior is detected during a high-risk flow, the system dynamically triggers step-up verification — challenging only suspicious sessions instead of blocking all users with a login wall.
Key Technical Decisions
Edge-Inference Routing: Deployed LLM inference at regional edge nodes to achieve 0.4s text latency, minimizing round-trip time for mobile users across 12 geographic regions.
Just-In-Time Verification: Implemented 'Step-Up Authentication' that only challenges users entering high-risk intent zones (fund transfers, password changes), maintaining an 87% session completion rate versus 72% with traditional global CAPTCHAs.
Biometric Voice Analysis: Integrated spectral analysis in the voice pipeline to distinguish between synthesized deepfake voices and genuine human speech — adding a security layer invisible to legitimate users.
Results: 99.2% Bot Mitigation with Zero UX Degradation
Before
Rigid text-only bot with static CAPTCHAs. 15% legitimate user frustration rate. Easily bypassed by advanced headless browsers. No voice support. 72% session completion.
After
Fluid multimodal experience with invisible behavioral security. 99.2% bot block rate. Seamless voice-text switching. 87% session completion. User trust scores up 42%.
Technology Stack
"The speed is incredible. We were worried that adding bot security would slow down the experience, but THWORKS built a system that's actually faster than our previous unsecured bot. The voice-to-text handoff is magic — our users love it."
Frequently Asked Questions
Common questions about this project and our approach.
The system uses Intent-Based Security. It monitors for high-risk actions (withdrawals, email changes, password resets) and combines this with behavioral telemetry — mouse movement patterns, typing cadence, and session timing. CAPTCHAs are triggered only when risk signals converge, so legitimate users rarely see them while bots are caught 99.2% of the time.
Related Case Studies
Secure Your AI Interactions
Let's discuss how we can solve your technical challenges with the same precision and impact.
Secure Your AI Interactions

