Model Code:
Sample Rate: 16 kHz
Use Case: Voice agents, IVR systems, ASR pipelines
AGENTIC_NCSample Rate: 16 kHz
Use Case: Voice agents, IVR systems, ASR pipelines
Introduction
AI Voice Agent is a specialized noise cancellation model designed to improve Automatic Speech Recognition (ASR) accuracy in noisy environments. Unlike traditional noise cancellation optimized for human perception, this model preserves the acoustic and phonetic information that ASR systems depend on while removing conversational disruptions.Key Advantages
| Feature | AI Voice Agent | Traditional Noise Cancellation |
|---|---|---|
| Optimization Goal | ASR accuracy (WER reduction) | Human listening quality |
| Acoustic Preservation | Preserves phonetic features | May strip ASR-critical info |
| ASR Compatibility | All major ASR systems | May degrade ASR performance |
| WER Improvement | 5-30% in noisy conditions | Often increases WER |
| Clean Audio Impact | No degradation | No degradation |
Key Benefits
Improved ASR Accuracy
5-30% average WER improvement
across multiple ASR systems on noisy data with no degradation on clean audio.
across multiple ASR systems on noisy data with no degradation on clean audio.
ASR-Agnostic Design
Works seamlessly with any ASR pipeline—open-source or commercial—without requiring retraining or modification.
Enhanced Turn-Taking
Reduces false triggers from ambient sounds (background chatter, environmental noise) that cause agents to interrupt.
Real-World Robustness
Trained on diverse acoustic environments to handle production variability from call centers to mobile devices.
Use Cases
| Use Case | Benefits | Ideal For |
|---|---|---|
| Voice Agents | Improves speech recognition in noisy environments; preserves the primary speaker only | Conversational AI, virtual assistants, chatbots |
| IVR Systems | Enhances recognition accuracy for automated phone systems | Interactive voice response, phone menus, call routing |
| Customer Service Bots | Reduces transcription errors in call center environments | Support automation, call analysis, and quality monitoring |
| Phone Assistants | Optimizes speech-to-text in telephony conditions | Mobile assistants, hands-free applications |
| Real-Time Transcription | Clean audio input for live transcription services | Meeting transcription, live captioning, and note-taking |
| ASR Preprocessing | Purpose-built enhancement for downstream ASR systems | Any speech-to-text pipeline in noisy conditions |
Model Configuration
To use the AI Voice Agent model, specify it when creating your audio processor:Recommended Configuration
- Sample Rate: 16 kHz for best quality
For complete setup instructions, see the Quickstart Guide.
Performance Benchmarks
Real-World Test Results
Test Environment:
Background: Restaurant with cutlery sounds and conversation Microphone: Speakerphone at 10cm distance ASR System: Deepgram Nova3 Streaming Transcription Accuracy & Audio Samples| Oracle Transcript | Source Transcript | Generic BVC | Sanas AI Voice Agent |
|---|---|---|---|
| it was in the spring of the year eighteen ninety four that all london was interested and the fashionable world dismayed by the murder of the honorable ronald adair under most unusual and inexplicable circumstances the public has already learned those particulars of the crime which came out in the police investigation but a good deal was suppressed upon that occasion since the case for the prosecution was so overwhelmingly strong that it was not necessary to bring forward all the facts only now at the end of nearly ten years am i allowed to supply those missing links which make up the whole of the remarkable chain the crime was of interest in itself but that interest was as nothing to me compared to the incons inconceivable sequence which afforded me the greatest shock and surprise of any event in my adventurous life | it was in the spring of the year one eight nine four that all london was interested and the fashionable world dismayed by the mother of the honorable rona adair under most unusual and those particulars of the crime which came out in the police investigation but a good deal was suppressed upon the occasion since the case for the prosecution strong that it was not necessary necessary to bring forward all the facts only now at the end of nearly ten years am i allowed to supplement these basic things which make the crime was of interest in itself but that interest was as nothing to me compared to the incon inconceivable sequence which afforded me the greatest shock and of any event in my adventurous life let’s create magic | it was in the one eight nine four that all london was interested and the fashionable world dismayed by the mother of the honorable rona adair and the most unusual and inexplicable circumstances the public has already learned those particulars of the crime which came out in the police investigation but a good deal was suppressed upon the occasion since the case for the prosecution was so overwhelmingly strong that it was not necessary to bring forward all the facts only now at the end of nearly ten years number allowed to supply these basic wings which make up the board of the remarkable chain the crime was of interest in itself but that interest was as nothing to me compared to the incomes inconsistencyable sequence which afforded me the greatest shock and surprise of any event in my adventurous life | it was in the spring of the year one eight nine four that all london was interested and the fashionable world dismayed by the murder of the honorable rona adair and the most unusual and inexplicable circumstances the public has already learned those particulars of the crime which came out in the police investigation but a good deal was surprised upon the occasion since the case for the prosecution was so overwhelmingly strong that it was not necessary to bring forward all the facts only now at the end of nearly ten years am i allowed to supply these missing links of which make up the whole of the remarkable chain the crime was of interest in itself but that interest was as nothing to me compared to the inconceivable sequence which afforded me the greatest shock and surprise of any event in my adventurous life |
| — | 22.9% WER | 13.2% WER | 7.5% WER |
- Source Audio
- Generic BVC
- Sanas AI Voice Agent
Latency & Performance
| Metric | Value | Description |
|---|---|---|
| Streaming Latency | ~100ms | End-to-end processing time. Note: Actual latency also depends on the selected Sanas Cloud region and network conditions. |
| Concurrency | Scalable | Limited only by your infrastructure. |
Next Steps
Quickstart Guide
Complete setup and initialization instructions
API Reference
Learn more about the core references, enums, and callbacks
Multi-Stream Tutorial
Handle multiple concurrent voice streams
Need help? Contact our support team at support@sanas.ai or raise a ticket.