Quick Start

The Sanas SDK processes speech audio in real time. You feed it interleaved float32 PCM frames and it returns processed frames — noise-cancelled, speech-enhanced, accent-converted, or translated into another language. The Python package ships as a self-contained wheel: everything the SDK needs is bundled inside it, so there’s no separate native library to install. You create a virtual environment, pip install the wheel, and start streaming audio.

The SDK does not open microphones or speakers for you — you bring your own audio (from a file, socket, or media stream). You push frames in with process_frame and read processed frames back.

Two ways to use the SDK

The SDK runs in one of two modes. The setup flow is identical — you always create_sdk → activate_api_key → create_audio_processor and wait for PipelineState.RUNNING — but how you select the mode and what you get back differ:

	Audio Processing	Language Translation (LT)
Selected by	a model key on `AudioAttributes.model_name` (e.g. `SE2.2`, `AT5.2`)	the presence of `lt_config`; `model_name` left empty
What it does	speech enhancement, noise cancellation, accent conversion	translates speech from one language to another
Returns	processed audio frames only	translated audio frames plus transcript segments (via the `lt_config` callback)
Extra config	none	`LanguageTranslationConfig(language_in, language_out, …)`
Latency	low (~tens of ms) — short tail drain	higher (~seconds) — drain with a longer silence window

Everything else — activation, the pipeline-state callback, real-time 20 ms frame pacing — works the same way in both modes. Pick the tab that matches your use case in Step 4.

This page covers Language Translation inside the Python SDK (selected via lt_config). Sanas also offers a standalone, browser-friendly Language Translation API (WebRTC / WebSocket / JavaScript client) for real-time speech-to-speech in the browser — use that instead of the SDK if you’re building a web app.

Prerequisites

Before You Begin

You only need a supported CPython interpreter and pip — no extra tooling required. The wheel in your archive targets the Python version named in its filename:

Python version	Wheel tag	Notes
3.10	`cp310-cp310`	version-specific
3.11	`cp311-cp311`	version-specific
3.12, 3.13, 3.14	`cp312-abi3`	one Stable-ABI wheel covers 3.12+

Pick the archive whose <pytag> matches your interpreter (e.g. cp310 for Python 3.10; cp312 for Python 3.12 or newer).

Get SDK Credentials

Step 1: Install the SDK

Your archive is laid out like this:

sanas-<version>-<pytag>-<os>-<arch>/
├── examples/                 # runnable examples + shared helpers
├── sanas-<version>-….whl     # the Python wheel (one, matching <pytag>)
└── USAGE.md

Create a virtual environment and install the wheel from inside the extracted archive directory.

# With built-in python (use the python that matches the wheel tag)
python3.10 -m venv .venv
# ...or with uv (manages the Python version; --seed adds pip)
uv venv .venv --seed --python 3.10
. .venv/bin/activate
pip install --upgrade pip
pip install sanas-*.whl

py -3.10 -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install (Get-ChildItem sanas-*.whl).FullName

Step 2: Verify Installation

python -c "import sanas; print('sanas', sanas.__version__)"

Step 3: Select Your Model

For audio processing, set model_name to one of the model keys enabled for your account. Language Translation is selected differently — by the presence of an lt_config (see Step 4).

Category	Model keys
Agentic Speech Enhancement	`AGENTIC_ST_SE`, `AGENTIC_VI_GT_SE`, `AGENTIC_VI_G_SE`
Accent Translation (AT)	`AT5.2`
Speech Enhancement	`SE1.2`, `SE2.1`, `SE2.2`, `VI_G_SE`

Step 4: Process Your First Audio Stream

Two things to understand before reading the code:

The pipeline initializes asynchronously. After you create a processor, wait for PipelineState.RUNNING (via the audio_pipeline_state_notify callback) before feeding frames.
process_frame is synchronous but expects real-time audio. Feed frames at the rate they would arrive live (one 20 ms frame every 20 ms). The snippets below pace with a monotonic-clock deadline so timing doesn’t drift.

The two tabs below show the only difference between the modes: Audio Processing sets a model_name; Language Translation leaves it empty and passes an lt_config instead. Compare them side by side.

Audio Processing
Language Translation

Create sdk_example.py and set model_name to one of your enabled model keys.

import array
import threading
import time
import sanas
from examples.wav_utils import read_wav, save_wav  # or copy these helpers

# 1. Create + activate (activation is synchronous and returns an SdkResult).
sdk = sanas.create_sdk(sanas.InitParams(storage_dir="./storage"))
res = sdk.activate_api_key("YOUR_API_KEY")
if not res.success:
    raise RuntimeError(f"activation failed ({res.error_type}): {res.message}")

# 2. Read input audio as interleaved float32.
float32_bytes, sample_rate, channels = read_wav("input.wav")
samples = array.array("f"); samples.frombytes(float32_bytes)

# 3. Get notified when the pipeline is ready. The state callback fires on a
#    background thread, so publish each state under a Condition and let the
#    main thread wait for a terminal state (RUNNING = go, NOT_RUNNING = failed).
cond = threading.Condition()
state = {"value": None}
def on_state(s):
    with cond:
        state["value"] = s
        cond.notify_all()

attrs = sanas.ProcessorAttributes(
    audio_attributes=sanas.AudioAttributes(
        sampling_rate=sample_rate,
        channels=channels,
        model_name="<your-model-key>",    # e.g. "AT5.2", "SE2.2", "VI_G_SE"
        audio_pipeline_state_notify=on_state,
    )
)

# 4. Feed 20 ms frames at real time; process_frame returns the processed frame.
frame_len = sample_rate * channels // 50      # samples in 20 ms
frame_period = 0.020                          # seconds per frame
out_bytes = []
_terminal = (sanas.PipelineState.RUNNING, sanas.PipelineState.NOT_RUNNING)
with sdk.create_audio_processor(attrs) as proc:
    with cond:                                # wait until the pipeline settles
        cond.wait_for(lambda: state["value"] in _terminal, timeout=30)
    if state["value"] != sanas.PipelineState.RUNNING:
        raise RuntimeError(f"pipeline did not start (state: {state['value']})")

    # `deadline` is the wall-clock time the NEXT frame should be sent.
    deadline = time.monotonic()
    for i in range(0, len(samples) - frame_len + 1, frame_len):
        frame = proc.process_frame(sanas.AudioFrame(samples=samples[i:i + frame_len]))
        if frame.frame_count:
            out_bytes.append(bytes(frame.samples))   # memoryview is transient; copy
        deadline += frame_period
        time.sleep(max(0.0, deadline - time.monotonic()))

    # Drain the buffered tail by pushing silence (still paced at real time).
    silence = array.array("f", bytes(frame_len * 4))
    for _ in range(10):
        frame = proc.process_frame(sanas.AudioFrame(samples=silence))
        if frame.frame_count:
            out_bytes.append(bytes(frame.samples))
        deadline += frame_period
        time.sleep(max(0.0, deadline - time.monotonic()))

save_wav("output.wav", b"".join(out_bytes), sample_rate, channels)

examples/helpers.py provides PipelineWaiter, sleep_until, and feed_and_drain, which wrap steps 3–4 above — prefer them over hand-rolling the loop.

Language Translation is selected by the presence of lt_config (not by a model key), so leave model_name empty. Translated audio comes back from process_frame; transcripts arrive on the lt_config callback.

import array
import threading
import time
import sanas
from examples.wav_utils import read_wav, save_wav

sdk = sanas.create_sdk(sanas.InitParams(storage_dir="./storage"))
res = sdk.activate_api_key("YOUR_API_KEY")
if not res.success:
    raise RuntimeError(f"activation failed ({res.error_type}): {res.message}")

float32_bytes, sample_rate, channels = read_wav("input.wav")
samples = array.array("f"); samples.frombytes(float32_bytes)

def on_transcript(tf):
    kind = "Translation" if tf.type_ == sanas.TranscriptType.TRANSLATION else "Transcription"
    for seg in tf.transcript_data_.complete:
        print(f"[{kind}] {seg.text}")

cond = threading.Condition()
state = {"value": None}
def on_state(s):
    with cond:
        state["value"] = s
        cond.notify_all()

attrs = sanas.ProcessorAttributes(
    audio_attributes=sanas.AudioAttributes(
        sampling_rate=sample_rate,
        channels=channels,
        model_name="",                    # LT is selected by lt_config, not a key
        audio_pipeline_state_notify=on_state,
    ),
    lt_config=sanas.LanguageTranslationConfig(
        language_in="en-US",
        language_out="es-ES",
        conversation_id="",               # optional; links two-party sessions
        callback=on_transcript,
    ),
)

frame_len = sample_rate * channels // 50      # samples in 20 ms
frame_period = 0.020                          # seconds per frame
out_bytes = []
_terminal = (sanas.PipelineState.RUNNING, sanas.PipelineState.NOT_RUNNING)
with sdk.create_audio_processor(attrs) as proc:
    with cond:
        cond.wait_for(lambda: state["value"] in _terminal, timeout=30)
    if state["value"] != sanas.PipelineState.RUNNING:
        raise RuntimeError(f"pipeline did not start (state: {state['value']})")

    deadline = time.monotonic()
    for i in range(0, len(samples) - frame_len + 1, frame_len):
        frame = proc.process_frame(sanas.AudioFrame(samples=samples[i:i + frame_len]))
        if frame.frame_count:
            out_bytes.append(bytes(frame.samples))
        deadline += frame_period
        time.sleep(max(0.0, deadline - time.monotonic()))

    # Translation adds several seconds of latency, so keep pushing silence for a
    # few seconds after the input ends to pull back the translated tail.
    silence = array.array("f", bytes(frame_len * 4))
    for _ in range(int(5.0 / frame_period)):  # ~5 s of silence
        frame = proc.process_frame(sanas.AudioFrame(samples=silence))
        if frame.frame_count:
            out_bytes.append(bytes(frame.samples))
        deadline += frame_period
        time.sleep(max(0.0, deadline - time.monotonic()))

save_wav("translated.wav", b"".join(out_bytes), sample_rate, channels)

Because translation runs speech-to-text, translation, and text-to-speech end to end, it adds several seconds of latency. Keep the extended silence-drain loop (~5 s) so the translated tail is pulled back after your input ends — this is the main runtime difference from audio processing.

Building for the browser instead of a Python backend? Use the standalone Language Translation API (WebRTC / WebSocket / JavaScript client) rather than the SDK path shown here.

Cloud inference tuning (optional)

When the SDK runs on cloud / remote inference (the default), you can pass a CloudInferencingParams on AudioAttributes.cloud_inferencing_params to tune the remote path. These parameters are only meaningful for cloud inference and are ignored for local inference. The field is optional — omit it to keep the defaults.

Field	Type	Default	Meaning
`use_pcm16`	`bool`	`False`	Negotiate L16 (raw 16-bit linear PCM) at the session sample rate instead of the rate-default codec. Preserves fidelity and avoids server-side transcoding, at the cost of more bandwidth.

cloud_params = sanas.CloudInferencingParams()
cloud_params.use_pcm16 = True          # opt into raw 16-bit linear PCM uplink
attrs = sanas.ProcessorAttributes(
    audio_attributes=sanas.AudioAttributes(
        sampling_rate=sample_rate,
        channels=channels,
        model_name="<your-model-key>",
        cloud_inferencing_params=cloud_params,   # optional; omit for defaults
        audio_pipeline_state_notify=on_state,
    )
)

Leave cloud_inferencing_params unset (or use_pcm16=False) unless you specifically need raw PCM uplink — the rate-default codec is the recommended default for most sessions.

Step 5: Run the Examples

The examples/ folder is self-contained (standard-library helpers only, no numpy). With your venv active, set the required environment variables and run an example from inside the extracted archive:

export SANAS_API_KEY="your-key"
export SANAS_STORAGE_DIR=./storage     # SDK data + logs (created if missing)
export SANAS_INPUT_WAV=input.wav       # 16-bit PCM or 32-bit float WAV
python examples/sdk_example.py                       # audio processing
python examples/language_translation_example.py      # language translation
python examples/multi_stream_example.py --streams 4  # concurrency benchmark

SANAS_STORAGE_DIR is where the SDK keeps its data and writes logs (under storage_dir/logs).

Key Types

InitParams, create_sdk, Sdk, AudioAttributes, ProcessorAttributes, AudioProcessor, AudioFrame, LanguageTranslationConfig, CloudInferencingParams, PipelineState, SdkResult.

Next Steps

Looking for examples try the Tutorials/Examples:

Processing Multiple Streams

Scale to multiple concurrent audio streams with a shared SDK instance.

Need Help?

Email Support

support@sanas.ai Response time: 1 business day

Support Portal

Raise a support ticket for urgent issues

Getting Started

Models

Deployment

Tutorials/Examples

Enterprise

Resources

Two ways to use the SDK

Prerequisites

Step 1: Install the SDK

Step 2: Verify Installation

Step 3: Select Your Model

Step 4: Process Your First Audio Stream

Cloud inference tuning (optional)

Step 5: Run the Examples

Key Types

Next Steps

Processing Multiple Streams

Need Help?

Email Support

Support Portal

​Two ways to use the SDK

​Prerequisites

​Step 1: Install the SDK

​Step 2: Verify Installation

​Step 3: Select Your Model

​Step 4: Process Your First Audio Stream

​Cloud inference tuning (optional)

​Step 5: Run the Examples

​Key Types

​Next Steps

Processing Multiple Streams

​Need Help?

Email Support

Support Portal

Two ways to use the SDK

Prerequisites

Step 1: Install the SDK

Step 2: Verify Installation

Step 3: Select Your Model

Step 4: Process Your First Audio Stream

Cloud inference tuning (optional)

Step 5: Run the Examples

Key Types

Next Steps

Need Help?