Overview of Sanas - Sanas Developer Hub

What is Sanas SDK?

The Sanas SDK processes speech audio in real time. You feed it audio frames and it returns processed audio frames — for example noise-cancelled, speech-enhanced, or accent-converted speech, or speech translated into another language. With audio processing (things like noise cancellation, speech enhancement, and accent conversion), you pick the specific model you want and the system cleans up or transforms the audio almost instantly.
Language Translation works a little differently. Instead of choosing a model, you just tell it the language someone is speaking and the language you want them heard in. Behind the scenes it does three jobs — transcribes the speech, translates it, and re-speaks it in the target language — and hands you back both the translated audio and a text transcript. Because it’s doing all three steps, it takes a few seconds longer than the audio-processing features, so it’s best suited to conversations where a slight delay is fine.

The good news: setting it up is the same either way. The connection, activation, and start-up steps are identical — Language Translation just needs the language settings instead of a model choice.

The Python package ships as a self-contained wheel: everything the SDK needs is bundled inside it, so there is no separate native library to install or manage. You just create a virtual environment and pip install the wheel. You work with interleaved float32 PCM audio: push frames in with process_frame and read processed frames back. The SDK does not open microphones or speakers for you — you bring your own audio (from a file, a socket, a media stream, etc.). Enterprise-ready with security, compliance, and support. Sanas also offers a standalone Language Translation API for real-time speech-to-speech in the browser — if that’s your use case, start there instead of the SDK.

What can you do with the Sanas SDK ?

Audio processing — run a model (speech enhancement, accent conversion,…) over your audio, selected by a model key.
Language Translation (LT) — translate speech from one language to another and receive both translated audio and transcript segments.
Pipeline lifecycle — an audio_pipeline_state_notify callback reports the processing pipeline state (PipelineState); wait for RUNNING before feeding frames.

Audio processing vs. Language Translation. The SDK supports two modes that differ in how you select them and what they return. Audio processing (speech enhancement, noise cancellation, accent conversion) is chosen by setting a model key on AudioAttributes.model_name — for example SE2.2 or AT5.2 — and returns processed audio frames with minimal latency. Language Translation (LT) is selected instead by supplying a LanguageTranslationConfig (lt_config) and leaving model_name empty; you specify a source and target language rather than a model, and the pipeline returns both translated audio and transcript segments via a callback. Because LT runs speech-to-text, translation, and text-to-speech end to end, it carries higher latency than the audio-processing models, so plan for a longer buffer/drain window. Everything else stays the same: you use the identical create_sdk → activate_api_key → create_audio_processor flow and wait for PipelineState.RUNNING before feeding frames.

This describes Language Translation inside the Python SDK (lt_config). For real-time speech-to-speech in the browser, Sanas also offers a standalone Language Translation API — WebRTC / WebSocket / JavaScript client with an init/configure protocol. Use that instead of the SDK if you’re building a web app.

Key types: InitParams, create_sdk, Sdk, AudioAttributes, ProcessorAttributes, AudioProcessor, AudioFrame, LanguageTranslationConfig, CloudInferencingParams, PipelineState, SdkResult.

How Audio processing works.

The Sanas SDK Connector sits between your application and Sanas Models, managing the connection to the inference engine. It sends input, receives processed output, and streams data in real-time using SIP (Session Initiation Protocol) and RTP (Real-Time Transport Protocol). The Sanas SDK can be deployed via Sanas Cloud (recommended) or self-hosted (coming soon) on your own infrastructure.

Core Capabilities

Speech Enhancement

Isolate intended speech by removing background noise and voices — no quality degradation.

Speech Enhancement for Agentic Noise Cancellation

Clean audio before ASR/STT. Isolate primary speakers and reduce Relative Word Error Rate (RWERR).

Speech Enhancement with full-fidelity

Reconstruct and restore speech quality degraded by compression and network conditions.

Live Language Translation

Real-time cross-language voice communication.

Accent Translation

Convert accents while preserving voice identity.

Sanas currently offers Speech Enhancement (audio clarification), Accent Translation, and Language Translation. Speech Intelligence coming soon.

Available Models

Voice Isolation (General)

VI_G_SE3.0 Isolates intended speech by removing background noise and voices. Optimized for human listeners.

Speech Enhancement · Standard

SE2.1 Restores and enhances voice quality for telephony audio. Low CPU footprint.

Speech Enhancement · Full-Fidelity

SE2.2 Full-fidelity speech enhancement with bandwidth extension to ultra-fidelity 24kHz.

Agentic Speech Enhancement· Voice Isolation (General)

AGENTIC_VI_G_SE Removes background noise and distant voices for complete voice isolation of the primary speaker’s audio stream.

Agentic Noise Clarification · Voice Isolation (Telephony)

AGENTIC_VI_GT_SE Telephony-optimized variant of Voice Isolation for 8kHz narrowband audio.

Agentic Speech Enhancement · Standard

AGENTIC_ST_SE Removes background noise while preserving all human speech for multi-speaker environments.

Accent Translation

AT5.2 Accent Translation modifies global accents in real-time, allowing your teams to be instantly understood while preserving what makes every voice unique.

Language Translation

See LT Documentation Real-time language translation that preserves your speakers’ voices, tone, and intent.

See the full model comparison →

What You Can Build

Voice Agents

Enhance voice agent pipelines with real-time speech and audio processing for clearer, more accurate interactions.

Contact Centers

Power contact center audio at scale with concurrent stream processing and enterprise-grade reliability.

Conferencing & Gaming

Deliver high-quality voice experiences across communication and interactive platforms.

STT Pipelines

Improve speech-to-text accuracy by processing audio before it reaches your transcription engine.

Key Features

Real-Time Streaming

Low-latency processing Live audio processing with SIP/RTP

High Concurrency

Scalable solution Process multiple audio streams simultaneously

Session Management

SIP protocol Reliable connection establishment and management

Seamless Communication

Enable clearer human-machine interactions across any environment

Easy Integration

Simple API Initialize, create a processor, stream audio

Secure

Enterprise security Encrypted transmission, secure authentication

Ready to Build?

Request your API keys and integration credentials, then use the Quick Start to start streaming clean audio in minutes.

Sign up for account here

Get your developer account to start building.

Quick Start

Get up and running with Sanas SDK in under 5 minutes.

Resources

Pricing

Usage-based pricing.

Enterprise

Data residency, compliance, security, and support.

API Reference

Complete API documentation for the Sanas SDK.

Change Log

Latest updates, releases, and fixes.

​What is Sanas SDK?

​What can you do with the Sanas SDK ?

​How Audio processing works.

​Core Capabilities

Speech Enhancement

Speech Enhancement for Agentic Noise Cancellation

Speech Enhancement with full-fidelity

Live Language Translation

Accent Translation

​Available Models

Voice Isolation (General)

Speech Enhancement · Standard

Speech Enhancement · Full-Fidelity

Agentic Speech Enhancement· Voice Isolation (General)

Agentic Noise Clarification · Voice Isolation (Telephony)

Agentic Speech Enhancement · Standard

Accent Translation

Language Translation

​What You Can Build

Voice Agents

Contact Centers

Conferencing & Gaming

STT Pipelines

​Key Features

Real-Time Streaming

High Concurrency

Session Management

Seamless Communication

Easy Integration

Secure

​Ready to Build?

Sign up for account here

Quick Start

​Resources

Pricing

Enterprise

API Reference

Change Log

What is Sanas SDK?

What can you do with the Sanas SDK ?

How Audio processing works.

Core Capabilities

Available Models

What You Can Build

Key Features

Ready to Build?

Resources