# FunASR Python Client

[![PyPI version](https://badge.fury.io/py/funasr-python.svg)](https://badge.fury.io/py/funasr-python)
[![Python versions](https://img.shields.io/pypi/pyversions/funasr-python.svg)](https://pypi.org/project/funasr-python)
[![License](https://img.shields.io/pypi/l/funasr-python.svg)](https://pypi.org/project/funasr-python)
[![Tests](https://github.com/your-org/funasr-python/workflows/Tests/badge.svg)](https://github.com/your-org/funasr-python/actions)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)

A high-performance, enterprise-grade Python client for FunASR WebSocket speech recognition service. Built for production use with comprehensive error handling, automatic reconnection, and extensive customization options.

## Features

### 🚀 **High Performance**
- **Asynchronous I/O**: Built on asyncio for maximum concurrency
- **Connection Pooling**: Efficient WebSocket connection management
- **Streaming Recognition**: Real-time speech recognition with minimal latency
- **Memory Efficient**: Optimized audio processing with configurable buffering

### 🔧 **Production Ready**
- **Robust Error Handling**: Comprehensive exception handling and recovery
- **Automatic Reconnection**: Smart reconnection with exponential backoff
- **Health Monitoring**: Built-in connection health checks
- **Resource Management**: Automatic cleanup and resource deallocation

### 📊 **Recognition Modes for Different Scenarios**
- **Offline Mode**: Best for complete audio files, highest accuracy
- **Online Mode**: Ultra-low latency streaming, suitable for interactive applications
- **Two-Pass Mode** ⭐: **Recommended for real-time scenarios** - combines streaming speed with offline accuracy

### 🎯 **Enterprise Features**
- **Configuration Management**: Flexible configuration with .env support
- **Comprehensive Logging**: Structured logging with configurable levels
- **Metrics & Monitoring**: Built-in performance metrics
- **Type Safety**: Full type hints for better IDE support

### 🎵 **Audio Processing**
- **Multiple Formats**: Support for WAV, FLAC, MP3, and more
- **Automatic Resampling**: Smart audio format conversion
- **Voice Activity Detection**: Optional VAD for improved efficiency
- **Microphone Integration**: Real-time microphone recording support

## Installation

### Basic Installation

```bash
pip install funasr-python
```

### With Optional Dependencies

```bash
# Audio processing capabilities
pip install funasr-python[audio]

# Performance optimizations
pip install funasr-python[performance]

# Development tools
pip install funasr-python[dev]

# Everything
pip install funasr-python[all]
```

### From Source

```bash
git clone https://github.com/alibaba-damo-academy/FunASR.git
cd FunASR/clients/funasr-python
pip install -e .
```

## Quick Start

### Basic Usage

```python
import asyncio
from funasr_client import AsyncFunASRClient

async def main():
    client = AsyncFunASRClient()

    # Recognize an audio file
    result = await client.recognize_file("examples/audio/asr_example.wav")
    print(f"Recognition result: {result.text}")

    await client.close()

if __name__ == "__main__":
    asyncio.run(main())
```

### Stream Recognition (Async Iterator)

Use `recognize_stream` to process custom audio streams from any source:

```python
import asyncio
from funasr_client import AsyncFunASRClient
from funasr_client.callbacks import SimpleCallback

async def stream_recognition_demo():
    """Recognize audio from custom async stream."""
    client = AsyncFunASRClient(
        server_url="ws://localhost:10095",
        mode="2pass"  # Two-Pass mode for best results
    )

    def on_partial_result(result):
        print(f"Partial: {result.text}")

    def on_final_result(result):
        print(f"Final: {result.text} (confidence: {result.confidence:.2f})")

    callback = SimpleCallback(
        on_partial=on_partial_result,
        on_final=on_final_result
    )

    await client.start()

    # Example 1: Stream from file in chunks
    async def audio_stream_from_file(file_path, chunk_size=3200):
        """Read audio file and yield chunks."""
        with open(file_path, 'rb') as f:
            # Skip WAV header (44 bytes)
            f.read(44)
            while True:
                chunk = f.read(chunk_size)
                if not chunk:
                    break
                yield chunk
                await asyncio.sleep(0.01)  # Simulate real-time streaming

    # Start streaming recognition
    await client.recognize_stream(
        audio_stream_from_file("examples/audio/asr_example.wav"),
        callback
    )

    await client.close()

# Example 2: Stream from network source
async def stream_from_network():
    """Stream audio from network source (e.g., RTP, RTSP)."""
    import aiohttp

    client = AsyncFunASRClient(server_url="ws://localhost:10095")

    async def network_audio_stream(url):
        """Stream audio from HTTP/network source."""
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as response:
                async for chunk in response.content.iter_chunked(3200):
                    yield chunk

    def on_result(result):
        if result.is_final:
            print(f"Transcription: {result.text}")

    from funasr_client.callbacks import SimpleCallback
    callback = SimpleCallback(on_final=on_result)

    await client.start()
    await client.recognize_stream(
        network_audio_stream("http://example.com/audio.pcm"),
        callback
    )
    await client.close()

# Example 3: Stream from microphone (PyAudio)
async def stream_from_microphone():
    """Real-time recognition from microphone using PyAudio."""
    import pyaudio
    import asyncio

    client = AsyncFunASRClient(server_url="ws://localhost:10095")

    async def microphone_stream():
        """Capture audio from microphone and yield chunks."""
        CHUNK = 1600  # 100ms at 16kHz
        FORMAT = pyaudio.paInt16
        CHANNELS = 1
        RATE = 16000

        p = pyaudio.PyAudio()
        stream = p.open(
            format=FORMAT,
            channels=CHANNELS,
            rate=RATE,
            input=True,
            frames_per_buffer=CHUNK
        )

        print("🎤 Recording... Press Ctrl+C to stop")
        try:
            while True:
                data = await asyncio.get_event_loop().run_in_executor(
                    None, stream.read, CHUNK
                )
                yield data
        except KeyboardInterrupt:
            pass
        finally:
            stream.stop_stream()
            stream.close()
            p.terminate()

    def on_result(result):
        if result.is_final:
            print(f"You said: {result.text}")
        else:
            print(f"Hearing: {result.text}")

    from funasr_client.callbacks import SimpleCallback
    callback = SimpleCallback(
        on_partial=on_result,
        on_final=on_result
    )

    await client.start()
    await client.recognize_stream(microphone_stream(), callback)
    await client.close()

if __name__ == "__main__":
    # Run different examples
    asyncio.run(stream_recognition_demo())
    # asyncio.run(stream_from_network())
    # asyncio.run(stream_from_microphone())
```

### Real-time Recognition (Microphone)

For real-time applications, we recommend **Two-Pass Mode** which provides the best balance of speed and accuracy:

```python
import asyncio
from funasr_client import AsyncFunASRClient
from funasr_client.models import RecognitionMode, ClientConfig

async def realtime_recognition():
    # Two-Pass Mode: Optimal for real-time scenarios
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.TWO_PASS,  # Recommended for real-time
        enable_vad=True,  # Voice activity detection
        chunk_interval=10  # Balanced latency/accuracy
    )

    client = AsyncFunASRClient(config=config)

    def on_partial_result(result):
        print(f"Partial: {result.text}")

    def on_final_result(result):
        print(f"Final: {result.text} (confidence: {result.confidence:.2f})")

    from funasr_client.callbacks import SimpleCallback
    callback = SimpleCallback(
        on_partial=on_partial_result,
        on_final=on_final_result
    )

    await client.start()

    # Start real-time session
    session = await client.start_realtime(callback)

    # Your audio streaming logic here
    # In practice, you would stream from microphone or audio source

    await client.close()

if __name__ == "__main__":
    asyncio.run(realtime_recognition())
```

### Ultra-Low Latency (Interactive Applications)

For scenarios requiring minimal latency (e.g., voice assistants):

```python
async def ultra_low_latency():
    config = ClientConfig(
        mode=RecognitionMode.ONLINE,  # Ultra-low latency
        chunk_interval=5,  # Faster processing
        enable_vad=True
    )

    client = AsyncFunASRClient(config=config)
    # Implementation similar to above
```

### Configuration with Environment Variables

Create a `.env` file:

```env
FUNASR_WS_URL=ws://localhost:10095
FUNASR_MODE=2pass  # Recommended: Two-Pass Mode for optimal real-time performance
FUNASR_SAMPLE_RATE=16000
FUNASR_ENABLE_ITN=true
FUNASR_ENABLE_VAD=true  # Recommended for real-time scenarios
```

```python
from funasr_client import create_async_client

# Configuration loaded automatically from .env
client = await create_async_client()
result = await client.recognize_file("examples/audio/asr_example.wav")
print(result.text)
```

## Advanced Usage

### Custom Configuration

```python
from funasr_client import AsyncFunASRClient, ClientConfig, AudioConfig
from funasr_client.models import RecognitionMode, AudioFormat

config = ClientConfig(
    server_url="ws://your-server:10095",
    mode=RecognitionMode.TWO_PASS,
    timeout=30.0,
    max_retries=3,
    audio=AudioConfig(
        sample_rate=16000,
        format=AudioFormat.PCM,
        channels=1
    )
)

client = AsyncFunASRClient(config=config)
```

### Callback Handlers

```python
from funasr_client.callbacks import SimpleCallback

def on_result(result):
    print(f"Received: {result.text}")

def on_error(error):
    print(f"Error: {error}")

callback = SimpleCallback(
    on_result=on_result,
    on_error=on_error
)

client = AsyncFunASRClient(callback=callback)
```

### Multiple Recognition Sessions

```python
async def recognize_multiple():
    # Use Two-Pass Mode for optimal performance
    client = AsyncFunASRClient(
        mode=RecognitionMode.TWO_PASS  # ⭐ Recommended
    )

    # Process multiple files concurrently
    tasks = [
        client.recognize_file("examples/audio/asr_example.wav"),
        client.recognize_file("examples/audio/61-70970-0001.wav"),
        client.recognize_file("examples/audio/61-70970-0016.wav")
    ]

    results = await asyncio.gather(*tasks)
    for i, result in enumerate(results, 1):
        print(f"File {i}: {result.text}")
```

### Real-time Applications Examples

#### Live Streaming Transcription

```python
async def live_transcription():
    """Real-time transcription for live streams."""
    config = ClientConfig(
        mode=RecognitionMode.TWO_PASS,  # ⭐ Optimal for live streaming
        enable_vad=True,                # Filter silence
        chunk_interval=8,               # Balanced performance
        auto_reconnect=True             # Handle network issues
    )

    client = AsyncFunASRClient(config=config)

    def on_result(result):
        if result.is_final:
            # Send to subtitle system
            send_subtitle(result.text, result.confidence)
        else:
            # Show live preview
            show_live_text(result.text)

    from funasr_client.callbacks import SimpleCallback
    callback = SimpleCallback(on_final=on_result, on_partial=on_result)

    await client.start()
    session = await client.start_realtime(callback)

    # Your audio streaming implementation here
    await stream_audio_to_session(session)
```

#### Voice Assistant Integration

```python
async def voice_assistant():
    """Voice assistant with Two-Pass optimization."""
    config = ClientConfig(
        mode=RecognitionMode.TWO_PASS,  # ⭐ Best for voice assistants
        enable_vad=True,                # Automatic speech detection
        chunk_interval=10               # Good responsiveness
    )

    client = AsyncFunASRClient(config=config)

    async def process_command(result):
        if result.is_final and result.confidence > 0.8:
            # Process voice command
            response = await process_voice_command(result.text)
            await speak_response(response)

    from funasr_client.callbacks import AsyncSimpleCallback
    callback = AsyncSimpleCallback(on_final=process_command)

    await client.start()
    session = await client.start_realtime(callback)

    print("🎤 Voice assistant ready. Speak now...")
    # Your microphone streaming logic here
```

## Command Line Interface

The package includes a full-featured CLI:

```bash
# Basic recognition
funasr-client recognize examples/audio/asr_example.wav

# Real-time recognition from microphone
funasr-client stream --source microphone

# Batch processing
funasr-client batch examples/audio/*.wav --output results.jsonl

# Server configuration
funasr-client configure --server-url ws://localhost:10095

# Test connection
funasr-client test-connection
```

## Recognition Mode Selection Guide

Choose the optimal recognition mode for your use case:

| Mode | Latency | Accuracy | Best For | Use Cases |
|------|---------|----------|----------|-----------|
| **Two-Pass** ⭐ | Medium | **High** | **Real-time applications** | Live streaming, real-time subtitles, voice assistants |
| **Online** | **Low** | Medium | Interactive apps | Voice commands, quick responses |
| **Offline** | High | **Highest** | File processing | Transcription services, post-processing |

### Two-Pass Mode Advantages ⭐

**Recommended for real-time scenarios** because it:

- ✅ **Fast partial results** for immediate user feedback (Phase 1: Online)
- ✅ **High-accuracy final results** using 2-pass optimization (Phase 2: Offline)
- ✅ **Balanced resource usage** with smart buffering
- ✅ **Production-ready** with robust error handling

```python
# Recommended configuration for real-time applications
config = ClientConfig(
    mode=RecognitionMode.TWO_PASS,  # Best balance
    enable_vad=True,                # Improves efficiency
    chunk_interval=10,              # Optimal for most cases
    auto_reconnect=True             # Production reliability
)
```

> ⚠️ **Important**: To ensure you receive **both** partial (online) and final (offline) results in Two-Pass mode:
> - ✅ Use `recognize_file()` for complete audio files (handles end-of-speech automatically)
> - ✅ Call `end_realtime_session()` after each utterance in streaming scenarios
> - ✅ Enable VAD (`enable_vad=True`) for better speech boundary detection
> - ✅ Include sufficient silence (0.5-1s) at the end of speech segments
> 
> 📖 **See detailed guide**: [Two-Pass Best Practices](docs/TWO_PASS_BEST_PRACTICES_zh.md)

## Configuration

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `FUNASR_WS_URL` | WebSocket server URL | `ws://localhost:10095` |
| `FUNASR_MODE` | Recognition mode (`offline`, `online`, `2pass`) | `2pass` ⭐ |
| `FUNASR_TIMEOUT` | Connection timeout | `30.0` |
| `FUNASR_MAX_RETRIES` | Max retry attempts | `3` |
| `FUNASR_SAMPLE_RATE` | Audio sample rate | `16000` |
| `FUNASR_ENABLE_ITN` | Enable inverse text normalization | `true` |
| `FUNASR_ENABLE_VAD` | Enable voice activity detection | `true` |
| `FUNASR_DEBUG` | Enable debug logging | `false` |

> 💡 **Tip**: Two-Pass Mode (`2pass`) is recommended for most real-time applications as it provides the best balance between latency and accuracy.

### Configuration File

```python
from funasr_client import ConfigManager

# Load from custom config file
config = ConfigManager.from_file("my_config.json")
client = AsyncFunASRClient(config=config.client_config)
```

## Error Handling

```python
from funasr_client.errors import (
    FunASRError,
    ConnectionError,
    AudioError,
    TimeoutError
)

try:
    result = await client.recognize_file("examples/audio/asr_example.wav")
except ConnectionError:
    print("Failed to connect to server")
except AudioError:
    print("Audio processing failed")
except TimeoutError:
    print("Request timed out")
except FunASRError as e:
    print(f"Recognition error: {e}")
```

## Performance Optimization

### Real-time Performance Best Practices

For optimal real-time performance, follow these recommendations:

```python
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode, AudioConfig

# Optimized configuration for real-time scenarios
config = ClientConfig(
    # Core settings
    mode=RecognitionMode.TWO_PASS,  # ⭐ Best balance for real-time
    enable_vad=True,                # Reduces processing load
    chunk_interval=10,              # Optimal latency/accuracy trade-off

    # Performance settings
    auto_reconnect=True,            # Production reliability
    connection_pool_size=5,         # Connection reuse
    buffer_size=8192,               # Optimal buffer size

    # Audio optimization
    audio=AudioConfig(
        sample_rate=16000,          # Standard ASR rate
        channels=1,                 # Mono for efficiency
        sample_width=2              # 16-bit PCM
    )
)

client = AsyncFunASRClient(config=config)
```

### Performance Tuning Guidelines

| Parameter | Recommended Value | Impact |
|-----------|------------------|---------|
| `mode` | `TWO_PASS` ⭐ | Best accuracy/latency balance |
| `chunk_interval` | `10` | Standard real-time performance |
| `chunk_interval` | `5` | Lower latency, higher CPU usage |
| `chunk_interval` | `20` | Higher latency, lower CPU usage |
| `enable_vad` | `True` | Reduces unnecessary processing |
| `sample_rate` | `16000` | Optimal for most ASR models |

### Connection Pooling

```python
from funasr_client import ConnectionManager, ClientConfig

# Create configuration with custom pool size
config = ClientConfig(connection_pool_size=10)
manager = ConnectionManager(config)

# Start the connection manager
await manager.start()

# Use connection manager for multiple clients
client1 = AsyncFunASRClient(connection_manager=manager)
client2 = AsyncFunASRClient(connection_manager=manager)
```

### Audio Processing

```python
from funasr_client import AudioProcessor

# Pre-process audio for better performance
processor = AudioProcessor(
    target_sample_rate=16000,
    enable_vad=True,
    chunk_size=1024
)

processed_audio = processor.process_file("examples/audio/asr_example.wav")
result = await client.recognize_audio(processed_audio)
```

## Testing

Run the test suite:

```bash
# Install test dependencies
pip install funasr-python[test]

# Run all tests
pytest

# Run with coverage
pytest --cov=funasr_client

# Run specific test categories
pytest -m unit
pytest -m integration
```

## Development

### Setup Development Environment

```bash
git clone https://github.com/alibaba-damo-academy/FunASR.git
cd FunASR/clients/funasr-python

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e .[dev]

# Install pre-commit hooks
pre-commit install
```

### Code Quality

```bash
# Format code
ruff format src/ tests/

# Lint code
ruff check src/ tests/

# Type check
mypy src/

# Run all quality checks
pre-commit run --all-files
```

## API Reference

### Core Classes

- **`AsyncFunASRClient`**: Main asynchronous client
- **`FunASRClient`**: Synchronous client wrapper
- **`ClientConfig`**: Client configuration
- **`AudioConfig`**: Audio processing configuration
- **`RecognitionResult`**: Recognition result container

### Callback System

- **`RecognitionCallback`**: Abstract callback interface
- **`SimpleCallback`**: Basic callback implementation
- **`LoggingCallback`**: Logging-based callback
- **`MultiCallback`**: Combines multiple callbacks

### Audio Processing

- **`AudioProcessor`**: Audio processing utilities
- **`AudioRecorder`**: Microphone recording
- **`AudioFileStreamer`**: File-based audio streaming

### Utilities

- **`ConfigManager`**: Configuration management
- **`ConnectionManager`**: Connection pooling
- **`Timer`**: Performance timing utilities

## Documentation & Guides

### Quick References ⚡
- [Two-Pass Quick Reference](docs/TWO_PASS_QUICK_REFERENCE.md) - Fast solutions for common Two-Pass mode issues
- [Examples Directory](examples/) - Comprehensive usage examples

### Detailed Guides 📖
- [Two-Pass Best Practices (中文)](docs/TWO_PASS_BEST_PRACTICES_zh.md) - Complete guide to avoid empty Phase 2 results
- API Reference (Coming soon)
- Configuration Guide (Coming soon)
- Performance Optimization (Coming soon)

### Architecture Documentation
- [FunASR WebSocket Protocol](../../runtime/docs/websocket_protocol.md)
- [Two-Pass Architecture](../../runtime/docs/funasr-wss-server-2pass-architecture.puml)

## Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details.

### Development Process

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Run the test suite
6. Submit a pull request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history.

## Support

- **Documentation**: [FunASR Documentation](https://github.com/alibaba-damo-academy/FunASR)
- **Issues**: [GitHub Issues](https://github.com/alibaba-damo-academy/FunASR/issues)
- **Discussions**: [GitHub Discussions](https://github.com/alibaba-damo-academy/FunASR/discussions)

## Acknowledgments

- Built on the excellent [FunASR](https://github.com/alibaba-damo-academy/FunASR) speech recognition toolkit
- Inspired by best practices from the Python asyncio ecosystem
- Thanks to all contributors and users for feedback and improvements