For Windows-based meeting transcription tools, WASAPI (Windows Audio Session API) loopback capture is the standard mechanism for capturing system audio. Unlike microphone capture, loopback recording captures the audio output — everything you hear through your speakers or headphones. Here's how it works at a technical level.

WASAPI Architecture Overview

WASAPI sits between applications and the audio hardware in Windows' audio stack:

Application (Zoom/Teams) → Audio Engine → WASAPI → Audio Hardware
                                ↓
                    Loopback Capture (your tool)

In shared mode, WASAPI allows multiple applications to share the audio endpoint. Loopback capture taps into the mixed output of all applications playing through a given audio device.

Implementation in Python

Using pyaudiowpatch (a WASAPI-compatible fork of PyAudio):

import pyaudiowpatch as pyaudio
import numpy as np

def find_loopback_device(p: pyaudio.PyAudio):
    # Find the default loopback device for WASAPI
    wasapi_info = p.get_host_api_info_by_type(pyaudio.paWASAPI)

    default_speakers = p.get_device_info_by_index(
        wasapi_info["defaultOutputDevice"]
    )

    for i in range(p.get_device_count()):
        device = p.get_device_info_by_index(i)
        if (device.get("isLoopbackDevice")
            and device["name"].startswith(default_speakers["name"])):
            return device

    raise RuntimeError("No loopback device found")

def capture_audio():
    p = pyaudio.PyAudio()
    device = find_loopback_device(p)

    stream = p.open(
        format=pyaudio.paInt16,
        channels=device["maxInputChannels"],
        rate=int(device["defaultSampleRate"]),
        input=True,
        input_device_index=device["index"],
        frames_per_buffer=512,  # Low latency buffer
    )

    print(f"Capturing from: {device['name']}")
    print(f"Sample rate: {device['defaultSampleRate']} Hz")
    print(f"Channels: {device['maxInputChannels']}")

    try:
        while True:
            data = stream.read(512, exception_on_overflow=False)
            audio_array = np.frombuffer(data, dtype=np.int16)
            # Process or forward the audio...
            yield data
    finally:
        stream.stop_stream()
        stream.close()
        p.terminate()

Buffer Size and Latency Trade-offs

Buffer SizeLatencyCPU UsageReliability
128 frames~3msHighMay overflow
512 frames~11msModerateGood
1024 frames~23msLowExcellent
4096 frames~93msVery lowExcellent
512Optimal Buffer (frames)
~11msCapture Latency
48kHzTypical Sample Rate

Handling Common Issues

Sample Rate Mismatch

The loopback device's sample rate matches the system's audio output format, which is often 48kHz. If your ASR provider expects 16kHz, you'll need to resample:

import librosa

def resample_audio(audio_data, original_rate=48000, target_rate=16000):
    audio_float = audio_data.astype(np.float32) / 32768.0
    resampled = librosa.resample(audio_float, orig_sr=original_rate, target_sr=target_rate)
    return (resampled * 32768).astype(np.int16)

Channel Downmixing

System audio is often stereo (2 channels), but ASR providers typically expect mono. Downmix by averaging the channels:

def stereo_to_mono(stereo_data):
    stereo = np.frombuffer(stereo_data, dtype=np.int16)
    left = stereo[0::2]
    right = stereo[1::2]
    mono = ((left.astype(np.int32) + right.astype(np.int32)) // 2).astype(np.int16)
    return mono.tobytes()
How Voxclar handles this: Voxclar's Windows audio capture module handles all of these details automatically — device discovery, sample rate conversion, channel downmixing, and buffer management. Users just click "Start" and the audio pipeline handles the rest.

Exclusive vs. Shared Mode

WASAPI supports two modes:

macOS Equivalent: Core Audio Taps

On macOS, the equivalent technology is Core Audio process taps (available since macOS 14). While the API is different, the concept is the same — tapping into the audio output of specific applications without affecting playback. Read more about the cross-platform challenges in our Electron development guide.

"WASAPI loopback is one of those APIs that's simple in concept but tricky in practice. The buffer management and sample rate handling are where most developers get stuck." — Audio Engineer, Voxclar

For more on the complete audio pipeline, see our guide to real-time speech-to-text for meetings and our AI interview assistant technical guide.