# FunASR Python Client

[![PyPI version](https://badge.fury.io/py/funasr-python.svg)](https://badge.fury.io/py/funasr-python)
[![Python versions](https://img.shields.io/pypi/pyversions/funasr-python.svg)](https://pypi.org/project/funasr-python)
[![License](https://img.shields.io/pypi/l/funasr-python.svg)](https://pypi.org/project/funasr-python)
[![Tests](https://github.com/your-org/funasr-python/workflows/Tests/badge.svg)](https://github.com/your-org/funasr-python/actions)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)

A high-performance, enterprise-grade Python client for FunASR WebSocket speech recognition service. Built for production use with comprehensive error handling, automatic reconnection, and extensive customization options.

## 📖 Table of Contents

### Getting Started
- [Features](#features) - What this client offers
- [Installation](#installation) - Get up and running
- [Quick Start](#quick-start) - 5-minute tutorial
- [Common Use Cases](#common-use-cases) - Ready-to-use examples for your scenario

### Core Concepts
- [Recognition Mode Selection](#recognition-mode-selection-guide) - Choose the right mode
- [Configuration](#configuration) - Environment variables & settings
- [Error Handling](#error-handling) - Exception handling
- [Troubleshooting](#troubleshooting) - Fix common issues

### Advanced Topics
- [Advanced Usage](#advanced-usage) - Custom configurations, callbacks, streaming
- [Performance Optimization](#performance-optimization) - Tuning for production
- [API Reference](#api-reference) - Complete API documentation
- [Command Line Interface](#command-line-interface) - CLI usage

### Development
- [Testing](#testing) - Run tests
- [Development](#development) - Contribute to the project
- [Documentation & Guides](#documentation--guides) - Additional resources

## Features

### 🚀 **High Performance**
- **Asynchronous I/O**: Built on asyncio for maximum concurrency
- **Connection Pooling**: Efficient WebSocket connection management
- **Streaming Recognition**: Real-time speech recognition with minimal latency
- **Memory Efficient**: Optimized audio processing with configurable buffering

### 🔧 **Production Ready**
- **Robust Error Handling**: Comprehensive exception handling and recovery
- **Automatic Reconnection**: Smart reconnection with exponential backoff
- **Health Monitoring**: Built-in connection health checks
- **Resource Management**: Automatic cleanup and resource deallocation

### 📊 **Recognition Modes for Different Scenarios**
- **Offline Mode**: Best for complete audio files, highest accuracy
- **Online Mode**: Ultra-low latency streaming, suitable for interactive applications
- **Two-Pass Mode** ⭐: **Recommended for real-time scenarios** - combines streaming speed with offline accuracy

### 🎯 **Enterprise Features**
- **Configuration Management**: Flexible configuration with .env support
- **Comprehensive Logging**: Structured logging with configurable levels
- **Metrics & Monitoring**: Built-in performance metrics
- **Type Safety**: Full type hints for better IDE support

### 🎵 **Audio Processing**
- **Multiple Formats**: Support for WAV, FLAC, MP3, and more
- **Automatic Resampling**: Smart audio format conversion
- **Voice Activity Detection**: Optional VAD for improved efficiency
- **Microphone Integration**: Real-time microphone recording support

## Installation

### Basic Installation

```bash
pip install funasr-python
```

### With Optional Dependencies

```bash
# Audio processing capabilities
pip install funasr-python[audio]

# Performance optimizations
pip install funasr-python[performance]

# Development tools
pip install funasr-python[dev]

# Everything
pip install funasr-python[all]
```

### From Source

```bash
git clone https://github.com/alibaba-damo-academy/FunASR.git
cd FunASR/clients/funasr-python
pip install -e .
```

## Quick Start

### Step 1: Install

```bash
pip install funasr-python
```

### Step 2: Recognize Your First Audio File

```python
import asyncio
from funasr_client import AsyncFunASRClient

async def main():
    # Create client with default settings (Two-Pass mode)
    client = AsyncFunASRClient(
        server_url="ws://localhost:10095"  # Your FunASR server
    )
    
    # Recognize an audio file
    result = await client.recognize_file("path/to/audio.wav")
    print(f"Recognition result: {result.text}")
    
    await client.close()

if __name__ == "__main__":
    asyncio.run(main())
```

**That's it!** You've just transcribed your first audio file. 🎉

### Step 3: Choose the Right Mode for Your Use Case

The client supports three recognition modes. See [Recognition Mode Selection Guide](#recognition-mode-selection-guide) to choose the best one for your scenario:

- 🎯 **Two-Pass Mode** (Default, Recommended) - Best balance for real-time apps
- ⚡ **Online Mode** - Lowest latency for interactive apps  
- 🎓 **Offline Mode** - Highest accuracy for batch processing

### Next Steps

- 📖 [Common Use Cases](#common-use-cases) - See complete examples for your scenario
- ⚙️ [Configuration Guide](#configuration) - Customize behavior
- 🔧 [Advanced Topics](#advanced-usage) - Streaming, callbacks, and more

## Common Use Cases

This section provides complete, ready-to-use examples for common scenarios.

### Batch File Transcription

Process multiple audio files efficiently:

```python
import asyncio
from funasr_client import AsyncFunASRClient

async def batch_transcribe():
    client = AsyncFunASRClient(server_url="ws://localhost:10095")
    
    files = ["file1.wav", "file2.wav", "file3.wav"]
    
    # Process files concurrently
    tasks = [client.recognize_file(f) for f in files]
    results = await asyncio.gather(*tasks)
    
    for filename, result in zip(files, results):
        print(f"{filename}: {result.text}")
    
    await client.close()

asyncio.run(batch_transcribe())
```

### Real-time Customer Service (Audio Stream)

Stream audio from customer service calls for real-time transcription:

```python
import asyncio
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode
from funasr_client.callbacks import SimpleCallback

async def customer_service_transcription():
    """Real-time transcription for customer service calls."""
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.TWO_PASS,  # Best for real-time + accuracy
        enable_vad=True,                # Detect speech/silence
        chunk_interval=10               # Balanced latency
    )
    
    client = AsyncFunASRClient(config=config)
    
    # Callback to handle transcription results
    def on_partial(result):
        # Show live transcription to agent
        print(f"[LIVE] {result.text}")
    
    def on_final(result):
        # Save to database for quality assurance
        save_to_qa_system(result.text, result.confidence)
        print(f"[FINAL] {result.text} (confidence: {result.confidence:.2f})")
    
    callback = SimpleCallback(on_partial=on_partial, on_final=on_final)
    
    # Simulate audio stream from telephony system
    async def audio_stream_from_call():
        """Stream audio chunks from phone call (e.g., WebRTC, SIP)."""
        # In production, this would be from:
        # - WebRTC media stream
        # - SIP/RTP packets
        # - Twilio/Asterisk audio feed
        
        # Example: Read from audio buffer or network socket
        import wave
        with wave.open("customer_call.wav", 'rb') as wav:
            chunk_size = 3200  # 100ms at 16kHz, 16-bit mono
            while True:
                chunk = wav.readframes(1600)  # 100ms of frames
                if not chunk:
                    break
                yield chunk
                await asyncio.sleep(0.1)  # Real-time simulation
    
    await client.start()
    await client.recognize_stream(audio_stream_from_call(), callback)
    await client.close()

asyncio.run(customer_service_transcription())
```

### Live Meeting/Conference Transcription

Real-time transcription for online meetings with speaker diarization support:

```python
import asyncio
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode
from funasr_client.callbacks import SimpleCallback

async def meeting_transcription():
    """Real-time meeting transcription with timestamp."""
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.TWO_PASS,
        enable_vad=True,
        chunk_interval=10
    )
    
    client = AsyncFunASRClient(config=config)
    
    transcript_buffer = []
    
    def on_final(result):
        from datetime import datetime
        timestamp = datetime.now().strftime("%H:%M:%S")
        line = f"[{timestamp}] {result.text}"
        transcript_buffer.append(line)
        print(line)
    
    callback = SimpleCallback(on_final=on_final)
    
    # Stream audio from meeting platform (Zoom, Teams, etc.)
    async def meeting_audio_stream():
        """Stream audio from meeting platform API."""
        # In production, integrate with:
        # - Zoom SDK: https://marketplace.zoom.us/docs/sdk/native-sdks/audio
        # - Teams Bot: https://docs.microsoft.com/en-us/microsoftteams/platform/bots/calls-and-meetings/
        # - Agora: https://docs.agora.io/en/voice-call-4.x-preview/landing-page
        
        # Example: Streaming from audio input device
        import pyaudio
        
        CHUNK = 1600  # 100ms at 16kHz
        FORMAT = pyaudio.paInt16
        CHANNELS = 1
        RATE = 16000
        
        p = pyaudio.PyAudio()
        stream = p.open(
            format=FORMAT,
            channels=CHANNELS,
            rate=RATE,
            input=True,
            frames_per_buffer=CHUNK
        )
        
        try:
            print("🎤 Meeting recording started...")
            while True:
                data = await asyncio.get_event_loop().run_in_executor(
                    None, stream.read, CHUNK
                )
                yield data
        except KeyboardInterrupt:
            pass
        finally:
            stream.stop_stream()
            stream.close()
            p.terminate()
            
            # Save transcript to file
            with open("meeting_transcript.txt", "w") as f:
                f.write("\n".join(transcript_buffer))
            print(f"\n📝 Transcript saved: {len(transcript_buffer)} lines")
    
    await client.start()
    await client.recognize_stream(meeting_audio_stream(), callback)
    await client.close()

asyncio.run(meeting_transcription())
```

### Voice Command Recognition (Streaming)

Low-latency streaming recognition for voice-controlled IoT devices:

```python
import asyncio
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode
from funasr_client.callbacks import SimpleCallback

async def voice_control_device():
    """Voice commands for smart home devices."""
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.ONLINE,  # Lowest latency for commands
        chunk_interval=5,
        enable_vad=True
    )
    
    client = AsyncFunASRClient(config=config)
    
    def execute_command(result):
        if not result.is_final:
            return
        
        command = result.text.lower()
        print(f"Heard: {command}")
        
        # Command matching
        if "turn on" in command and "light" in command:
            print("✅ Turning on lights")
            # control_device("light", "on")
        elif "turn off" in command and "light" in command:
            print("✅ Turning off lights")
        elif "temperature" in command:
            print("🌡️  Current temperature: 22°C")
        elif "play music" in command:
            print("🎵 Starting music playback")
        else:
            print("❓ Command not recognized")
    
    callback = SimpleCallback(on_final=execute_command)
    
    # Stream from microphone with wake word detection
    async def voice_stream_with_wakeword():
        """Stream audio only after wake word detected."""
        import pyaudio
        
        CHUNK = 1600
        FORMAT = pyaudio.paInt16
        CHANNELS = 1
        RATE = 16000
        
        p = pyaudio.PyAudio()
        stream = p.open(
            format=FORMAT,
            channels=CHANNELS,
            rate=RATE,
            input=True,
            frames_per_buffer=CHUNK
        )
        
        print("🎤 Say 'Hey Assistant' to start...")
        
        try:
            while True:
                # In production, use wake word detection here
                # e.g., Porcupine, Snowboy, or custom model
                
                data = await asyncio.get_event_loop().run_in_executor(
                    None, stream.read, CHUNK
                )
                yield data
        except KeyboardInterrupt:
            pass
        finally:
            stream.stop_stream()
            stream.close()
            p.terminate()
    
    await client.start()
    await client.recognize_stream(voice_stream_with_wakeword(), callback)
    await client.close()

asyncio.run(voice_control_device())
```

### Live Broadcast Subtitle Generation

Generate real-time subtitles for live streaming platforms:

```python
import asyncio
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode
from funasr_client.callbacks import SimpleCallback

async def live_broadcast_subtitles():
    """Generate real-time subtitles for live streams."""
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.TWO_PASS,
        enable_vad=True,
        chunk_interval=8  # Balance between latency and accuracy
    )
    
    client = AsyncFunASRClient(config=config)
    
    subtitle_queue = []
    
    def on_partial(result):
        # Show live preview (may change)
        print(f"[PREVIEW] {result.text}", end='\r')
    
    def on_final(result):
        # Send to subtitle overlay system
        subtitle = {
            'text': result.text,
            'confidence': result.confidence,
            'timestamp': result.timestamp
        }
        subtitle_queue.append(subtitle)
        
        # Push to OBS, RTMP overlay, or subtitle service
        send_to_subtitle_overlay(subtitle)
        print(f"\n[SUBTITLE] {result.text}")
    
    callback = SimpleCallback(on_partial=on_partial, on_final=on_final)
    
    # Stream from broadcast source
    async def broadcast_audio_stream():
        """Stream audio from broadcast source (RTMP, HLS, etc.)."""
        # In production, integrate with:
        # - FFmpeg for RTMP streams
        # - OBS WebSocket for local capture
        # - Media server APIs (Wowza, Ant Media, etc.)
        
        # Example: Stream from RTMP using FFmpeg subprocess
        import subprocess
        
        ffmpeg_command = [
            'ffmpeg',
            '-i', 'rtmp://live-server/stream/key',  # Input stream
            '-f', 's16le',          # Output format: signed 16-bit little-endian
            '-ar', '16000',         # Sample rate: 16kHz
            '-ac', '1',             # Channels: mono
            '-'                     # Output to stdout
        ]
        
        process = subprocess.Popen(
            ffmpeg_command,
            stdout=subprocess.PIPE,
            stderr=subprocess.DEVNULL
        )
        
        chunk_size = 3200  # 100ms of 16-bit mono at 16kHz
        
        try:
            while True:
                chunk = process.stdout.read(chunk_size)
                if not chunk:
                    break
                yield chunk
                await asyncio.sleep(0)  # Yield control
        finally:
            process.terminate()
    
    await client.start()
    
    try:
        await client.recognize_stream(broadcast_audio_stream(), callback)
    except KeyboardInterrupt:
        print("\n\n📝 Broadcast ended. Saving subtitles...")
        # Save to SRT file
        save_to_srt_file(subtitle_queue, "broadcast_subtitles.srt")
    
    await client.close()

def send_to_subtitle_overlay(subtitle):
    """Send subtitle to overlay system (OBS, WebSocket, etc.)."""
    # Example: Send to OBS via WebSocket
    # obs_client.send_command("SetTextGDIPlusText", {"text": subtitle['text']})
    pass

def save_to_srt_file(subtitles, filename):
    """Save subtitles to SRT format."""
    with open(filename, 'w', encoding='utf-8') as f:
        for i, sub in enumerate(subtitles, 1):
            # SRT format requires timing - simplified example
            f.write(f"{i}\n")
            f.write(f"00:00:00,000 --> 00:00:05,000\n")
            f.write(f"{sub['text']}\n\n")

asyncio.run(live_broadcast_subtitles())
```

### Podcast/Meeting Transcription (File-based)

High-accuracy transcription for long-form content:

```python
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode

async def transcribe_podcast():
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.OFFLINE,  # Highest accuracy
        enable_itn=True  # Normalize numbers, dates, etc.
    )
    
    client = AsyncFunASRClient(config=config)
    
    result = await client.recognize_file("podcast_episode.wav")
    
    # Save to file
    with open("transcript.txt", "w") as f:
        f.write(result.text)
    
    print(f"Transcription saved. Confidence: {result.confidence:.2%}")
    
    await client.close()

asyncio.run(transcribe_podcast())
```

### Environment Configuration

Load settings from `.env` file for easy deployment:

```bash
# .env file
FUNASR_WS_URL=ws://production-server:10095
FUNASR_MODE=2pass
FUNASR_ENABLE_VAD=true
```

```python
from funasr_client import create_async_client

async def use_env_config():
    # Automatically loads from .env
    client = create_async_client()
    
    result = await client.recognize_file("audio.wav")
    print(result.text)
    
    await client.close()

asyncio.run(use_env_config())
```

## Recognition Mode Selection Guide

Choose the optimal recognition mode for your use case:

### Real-time Recognition (Microphone)

For real-time applications, we recommend **Two-Pass Mode** which provides the best balance of speed and accuracy:

```python
import asyncio
from funasr_client import AsyncFunASRClient
from funasr_client.models import RecognitionMode, ClientConfig

async def realtime_recognition():
    # Two-Pass Mode: Optimal for real-time scenarios
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.TWO_PASS,  # Recommended for real-time
        enable_vad=True,  # Voice activity detection
        chunk_interval=10  # Balanced latency/accuracy
    )

    client = AsyncFunASRClient(config=config)
    
    # Use recognize_file for testing
    result = await client.recognize_file("examples/audio/asr_example.wav")
    print(f"Recognition: {result.text}")
    
    await client.close()

if __name__ == "__main__":
    asyncio.run(realtime_recognition())
```

### Ultra-Low Latency (Interactive Applications)

For scenarios requiring minimal latency (e.g., voice assistants):

```python
async def ultra_low_latency():
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.ONLINE,  # Ultra-low latency
        chunk_interval=5,  # Faster processing
        enable_vad=True
    )

    client = AsyncFunASRClient(config=config)
    
    # Use recognize_file for testing
    result = await client.recognize_file("examples/audio/asr_example.wav")
    print(f"Recognition: {result.text}")
    
    await client.close()
```

### Configuration with Environment Variables

Create a `.env` file:

```env
FUNASR_WS_URL=ws://localhost:10095
FUNASR_MODE=2pass  # Recommended: Two-Pass Mode for optimal real-time performance
FUNASR_SAMPLE_RATE=16000
FUNASR_ENABLE_ITN=true
FUNASR_ENABLE_VAD=true  # Recommended for real-time scenarios
```

```python
from funasr_client import create_async_client

# Configuration loaded automatically from .env
# Note: create_async_client() is a synchronous function
client = create_async_client()
result = await client.recognize_file("examples/audio/asr_example.wav")
print(result.text)
await client.close()
```

## Advanced Usage

### Custom Configuration

```python
from funasr_client import AsyncFunASRClient, ClientConfig, AudioConfig
from funasr_client.models import RecognitionMode, AudioFormat

config = ClientConfig(
    server_url="ws://your-server:10095",  # Must specify server URL
    mode=RecognitionMode.TWO_PASS,
    timeout=30.0,
    max_retries=3,
    audio=AudioConfig(
        sample_rate=16000,
        format=AudioFormat.PCM,
        channels=1
    )
)

client = AsyncFunASRClient(config=config)
```

### Callback Handlers

```python
from funasr_client.callbacks import SimpleCallback

def on_result(result):
    print(f"Received: {result.text}")

def on_error(error):
    print(f"Error: {error}")

callback = SimpleCallback(
    on_result=on_result,
    on_error=on_error
)

client = AsyncFunASRClient(callback=callback)
```

### Multiple Recognition Sessions

```python
async def recognize_multiple():
    # Use Two-Pass Mode for optimal performance
    client = AsyncFunASRClient(
        mode=RecognitionMode.TWO_PASS  # ⭐ Recommended
    )

    # Process multiple files concurrently
    tasks = [
        client.recognize_file("examples/audio/asr_example.wav"),
        client.recognize_file("examples/audio/61-70970-0001.wav"),
        client.recognize_file("examples/audio/61-70970-0016.wav")
    ]

    results = await asyncio.gather(*tasks)
    for i, result in enumerate(results, 1):
        print(f"File {i}: {result.text}")
```

### Real-time Applications Examples

#### Live Streaming Transcription

```python
async def live_transcription():
    """Real-time transcription for live streams."""
    config = ClientConfig(
        mode=RecognitionMode.TWO_PASS,  # ⭐ Optimal for live streaming
        enable_vad=True,                # Filter silence
        chunk_interval=8,               # Balanced performance
        auto_reconnect=True             # Handle network issues
    )

    client = AsyncFunASRClient(config=config)

    def on_result(result):
        if result.is_final:
            # Send to subtitle system
            send_subtitle(result.text, result.confidence)
        else:
            # Show live preview
            show_live_text(result.text)

    from funasr_client.callbacks import SimpleCallback
    callback = SimpleCallback(on_final=on_result, on_partial=on_result)

    await client.start()
    session = await client.start_realtime(callback)

    # Your audio streaming implementation here
    await stream_audio_to_session(session)
```

#### Voice Assistant Integration

```python
async def voice_assistant():
    """Voice assistant with Two-Pass optimization."""
    config = ClientConfig(
        mode=RecognitionMode.TWO_PASS,  # ⭐ Best for voice assistants
        enable_vad=True,                # Automatic speech detection
        chunk_interval=10               # Good responsiveness
    )

    client = AsyncFunASRClient(config=config)

    async def process_command(result):
        if result.is_final and result.confidence > 0.8:
            # Process voice command
            response = await process_voice_command(result.text)
            await speak_response(response)

    from funasr_client.callbacks import AsyncSimpleCallback
    callback = AsyncSimpleCallback(on_final=process_command)

    await client.start()
    session = await client.start_realtime(callback)

    print("🎤 Voice assistant ready. Speak now...")
    # Your microphone streaming logic here
```

## Command Line Interface

The package includes a full-featured CLI:

```bash
# Basic recognition
funasr-client recognize examples/audio/asr_example.wav

# Real-time recognition from microphone
funasr-client stream --source microphone

# Batch processing
funasr-client batch examples/audio/*.wav --output results.jsonl

# Server configuration
funasr-client configure --server-url ws://localhost:10095

# Test connection
funasr-client test-connection
```

## Recognition Mode Selection Guide

Choose the optimal recognition mode for your use case:

| Mode | Latency | Accuracy | Best For | Use Cases |
|------|---------|----------|----------|-----------|
| **Two-Pass** ⭐ | Medium | **High** | **Real-time applications** | Live streaming, real-time subtitles, voice assistants |
| **Online** | **Low** | Medium | Interactive apps | Voice commands, quick responses |
| **Offline** | High | **Highest** | File processing | Transcription services, post-processing |

### Two-Pass Mode Advantages ⭐

**Recommended for real-time scenarios** because it:

- ✅ **Fast partial results** for immediate user feedback (Phase 1: Online)
- ✅ **High-accuracy final results** using 2-pass optimization (Phase 2: Offline)
- ✅ **Balanced resource usage** with smart buffering
- ✅ **Production-ready** with robust error handling

```python
# Recommended configuration for real-time applications
config = ClientConfig(
    mode=RecognitionMode.TWO_PASS,  # Best balance
    enable_vad=True,                # Improves efficiency
    chunk_interval=10,              # Optimal for most cases
    auto_reconnect=True             # Production reliability
)
```

> ⚠️ **Important**: To ensure you receive **both** partial (online) and final (offline) results in Two-Pass mode:
> - ✅ Use `recognize_file()` for complete audio files (handles end-of-speech automatically)
> - ✅ Call `end_realtime_session()` after each utterance in streaming scenarios
> - ✅ Enable VAD (`enable_vad=True`) for better speech boundary detection
> - ✅ Include sufficient silence (0.5-1s) at the end of speech segments
> 
> 📖 **See detailed guide**: [Two-Pass Best Practices](docs/TWO_PASS_BEST_PRACTICES_zh.md)

## Configuration

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `FUNASR_WS_URL` | WebSocket server URL | `ws://localhost:10095` |
| `FUNASR_MODE` | Recognition mode (`offline`, `online`, `2pass`) | `2pass` ⭐ |
| `FUNASR_TIMEOUT` | Connection timeout | `30.0` |
| `FUNASR_MAX_RETRIES` | Max retry attempts | `3` |
| `FUNASR_SAMPLE_RATE` | Audio sample rate | `16000` |
| `FUNASR_ENABLE_ITN` | Enable inverse text normalization | `true` |
| `FUNASR_ENABLE_VAD` | Enable voice activity detection | `true` |
| `FUNASR_DEBUG` | Enable debug logging | `false` |

> 💡 **Tip**: Two-Pass Mode (`2pass`) is recommended for most real-time applications as it provides the best balance between latency and accuracy.

### Configuration File

```python
from funasr_client import ConfigManager

# Load from custom config file
config = ConfigManager.from_file("my_config.json")
client = AsyncFunASRClient(config=config.client_config)
```

## Error Handling

```python
from funasr_client.errors import (
    FunASRError,
    ConnectionError,
    AudioError,
    TimeoutError
)

try:
    result = await client.recognize_file("examples/audio/asr_example.wav")
except ConnectionError:
    print("Failed to connect to server")
except AudioError:
    print("Audio processing failed")
except TimeoutError:
    print("Request timed out")
except FunASRError as e:
    print(f"Recognition error: {e}")
```

## Troubleshooting

### Connection Issues

**Problem**: `ConnectionError: Failed to connect to server`

**Solutions**:
1. Verify server is running: `curl http://localhost:10095` (should upgrade to WebSocket)
2. Check server URL in config: `ws://localhost:10095` (not `http://`)
3. Test network connectivity: `ping localhost`
4. Check firewall settings

```python
# Enable debug logging to see connection details
import logging
logging.basicConfig(level=logging.DEBUG)

client = AsyncFunASRClient(
    server_url="ws://localhost:10095",
    timeout=60.0,  # Increase timeout
    max_retries=5   # More retry attempts
)
```

### Empty or No Recognition Results

**Problem**: `recognize_file()` returns empty text or no final result

**Common Causes**:
1. **Two-Pass Mode**: Missing end-of-speech signal
2. **Audio Format**: Incorrect sample rate or channels
3. **Audio Quality**: Too quiet, noisy, or non-speech content

**Solutions**:

```python
# Solution 1: Ensure proper audio format
from funasr_client import AudioProcessor, AudioConfig

processor = AudioProcessor(target_config=AudioConfig(
    sample_rate=16000,  # Match server expectation
    channels=1          # Mono audio
))
audio_data, sr = processor.load_audio_file("audio.wav")

# Solution 2: Enable VAD for better speech detection
config = ClientConfig(
    enable_vad=True,      # Detect speech boundaries
    chunk_interval=10      # Adequate processing time
)

# Solution 3: Check audio has sufficient silence at end (Two-Pass mode)
# Add 0.5-1 second silence to audio file, or use offline mode:
config = ClientConfig(mode=RecognitionMode.OFFLINE)
```

### High Latency / Slow Recognition

**Problem**: Recognition takes too long

**Solutions**:

```python
# Use Online mode for lowest latency
config = ClientConfig(
    mode=RecognitionMode.ONLINE,  # Fastest mode
    chunk_interval=5,              # Smaller chunks
    buffer_size=4096               # Smaller buffer
)

# Or optimize Two-Pass mode
config = ClientConfig(
    mode=RecognitionMode.TWO_PASS,
    chunk_interval=8,     # Reduce from default 10
    enable_vad=True       # Skip non-speech
)
```

### Audio Format Errors

**Problem**: `AudioError: Unsupported audio format`

**Solutions**:

```python
# Check supported formats
from funasr_client import AudioProcessor

processor = AudioProcessor()

# Supported: WAV, FLAC, MP3, OGG, M4A, etc.
# If format unsupported, convert first:

# Option 1: Use AudioProcessor to convert
audio_data, sr = processor.load_audio_file("audio.mp3")
processed = processor.convert_to_target_format(audio_data, sr)

# Option 2: Pre-convert with ffmpeg
# ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav
```

### Timeout Errors

**Problem**: `TimeoutError: Request timed out`

**Solutions**:

```python
# Increase timeout for large files
config = ClientConfig(
    timeout=120.0,        # 2 minutes
    max_retries=3,
    retry_delay=2.0
)

# For very large files, consider chunking or batch processing
async def process_large_file():
    client = AsyncFunASRClient(config=config)
    # Process in segments if possible
```

### Common Error Reference

| Error | Meaning | Solution |
|-------|---------|----------|
| `ConnectionError` | Cannot connect to server | Check server URL, network, firewall |
| `AudioFileNotFoundError` | File path incorrect | Verify file exists, check path |
| `AudioError` | Audio processing failed | Check format, sample rate, channels |
| `TimeoutError` | Request took too long | Increase timeout, check file size |
| `InvalidConfigurationError` | Config invalid | Check parameter values, types |
| `ResourceExhaustedError` | Connection pool full | Increase `connection_pool_size` |

### Getting Help

If problems persist:
1. Enable debug logging: `FUNASR_DEBUG=true`
2. Check server logs for errors
3. See [Two-Pass Best Practices](docs/TWO_PASS_BEST_PRACTICES_zh.md) for mode-specific issues
4. Open an issue: [GitHub Issues](https://github.com/alibaba-damo-academy/FunASR/issues)

## Performance Optimization

### Real-time Performance Best Practices

For optimal real-time performance, follow these recommendations:

```python
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode, AudioConfig

# Optimized configuration for real-time scenarios
config = ClientConfig(
    # Core settings
    mode=RecognitionMode.TWO_PASS,  # ⭐ Best balance for real-time
    enable_vad=True,                # Reduces processing load
    chunk_interval=10,              # Optimal latency/accuracy trade-off

    # Performance settings
    auto_reconnect=True,            # Production reliability
    connection_pool_size=5,         # Connection reuse
    buffer_size=8192,               # Optimal buffer size

    # Audio optimization
    audio=AudioConfig(
        sample_rate=16000,          # Standard ASR rate
        channels=1,                 # Mono for efficiency
        sample_width=2              # 16-bit PCM
    )
)

client = AsyncFunASRClient(config=config)
```

### Performance Tuning Guidelines

| Parameter | Recommended Value | Impact |
|-----------|------------------|---------|
| `mode` | `TWO_PASS` ⭐ | Best accuracy/latency balance |
| `chunk_interval` | `10` | Standard real-time performance |
| `chunk_interval` | `5` | Lower latency, higher CPU usage |
| `chunk_interval` | `20` | Higher latency, lower CPU usage |
| `enable_vad` | `True` | Reduces unnecessary processing |
| `sample_rate` | `16000` | Optimal for most ASR models |

### Connection Pooling

```python
from funasr_client import AsyncFunASRClient, ClientConfig

# Create configuration with connection pool size
config = ClientConfig(
    server_url="ws://localhost:10095",  # Specify server URL
    connection_pool_size=10
)

# Create clients with shared configuration
client1 = AsyncFunASRClient(config=config)
client2 = AsyncFunASRClient(config=config)

# Both clients will use the same pool size configuration
```

### Audio Processing

```python
from funasr_client import AudioProcessor, AudioConfig, AsyncFunASRClient

# Create audio configuration
audio_config = AudioConfig(
    sample_rate=16000,
    channels=1
)

# Pre-process audio for better performance
processor = AudioProcessor(target_config=audio_config)

# Load and process audio file
audio_data, sample_rate = processor.load_audio_file("examples/audio/asr_example.wav")
processed_audio = processor.convert_to_target_format(audio_data, sample_rate)

# Use standard file recognition
client = AsyncFunASRClient()
result = await client.recognize_file("examples/audio/asr_example.wav")
```

## Testing

Run the test suite:

```bash
# Install test dependencies
pip install funasr-python[test]

# Run all tests
pytest

# Run with coverage
pytest --cov=funasr_client

# Run specific test categories
pytest -m unit
pytest -m integration
```

## Development

### Setup Development Environment

```bash
git clone https://github.com/alibaba-damo-academy/FunASR.git
cd FunASR/clients/funasr-python

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e .[dev]

# Install pre-commit hooks
pre-commit install
```

### Code Quality

```bash
# Format code
ruff format src/ tests/

# Lint code
ruff check src/ tests/

# Type check
mypy src/

# Run all quality checks
pre-commit run --all-files
```

## API Reference

### Core Classes

- **`AsyncFunASRClient`**: Main asynchronous client
- **`FunASRClient`**: Synchronous client wrapper
- **`ClientConfig`**: Client configuration
- **`AudioConfig`**: Audio processing configuration
- **`RecognitionResult`**: Recognition result container

### Callback System

- **`RecognitionCallback`**: Abstract callback interface
- **`SimpleCallback`**: Basic callback implementation
- **`LoggingCallback`**: Logging-based callback
- **`MultiCallback`**: Combines multiple callbacks

### Audio Processing

- **`AudioProcessor`**: Audio processing utilities
- **`AudioRecorder`**: Microphone recording
- **`AudioFileStreamer`**: File-based audio streaming

### Utilities

- **`ConfigManager`**: Configuration management
- **`ConnectionManager`**: Connection pooling
- **`Timer`**: Performance timing utilities

## Documentation & Guides

### Quick References ⚡
- [Two-Pass Quick Reference](docs/TWO_PASS_QUICK_REFERENCE.md) - Fast solutions for common Two-Pass mode issues
- [Examples Directory](examples/) - Comprehensive usage examples

### Detailed Guides 📖
- [Two-Pass Best Practices (中文)](docs/TWO_PASS_BEST_PRACTICES_zh.md) - Complete guide to avoid empty Phase 2 results
- API Reference (Coming soon)
- Configuration Guide (Coming soon)
- Performance Optimization (Coming soon)

### Architecture Documentation
- [FunASR WebSocket Protocol](../../runtime/docs/websocket_protocol.md)
- [Two-Pass Architecture](../../runtime/docs/funasr-wss-server-2pass-architecture.puml)

## Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details.

### Development Process

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Run the test suite
6. Submit a pull request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history.

## Support

- **Documentation**: [FunASR Documentation](https://github.com/alibaba-damo-academy/FunASR)
- **Issues**: [GitHub Issues](https://github.com/alibaba-damo-academy/FunASR/issues)
- **Discussions**: [GitHub Discussions](https://github.com/alibaba-damo-academy/FunASR/discussions)

## Acknowledgments

- Built on the excellent [FunASR](https://github.com/alibaba-damo-academy/FunASR) speech recognition toolkit
- Inspired by best practices from the Python asyncio ecosystem
- Thanks to all contributors and users for feedback and improvements