# TOON Python

Token-Oriented Object Notation (TOON) is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage. This Python implementation achieves 30-60% fewer tokens than equivalent JSON while maintaining full compatibility.

This project is a Python port of [toon](https://github.com/johannschopplich/toon) and is currently based on `toon v0.3.1`.

## Features

- **Token Efficiency**: 30-60% reduction compared to JSON
- **LLM-Friendly**: Explicit lengths and field lists help models validate output
- **Minimal Syntax**: Removes redundant punctuation (braces, brackets, most quotes)
- **Pythonic API**: Simple, intuitive interface following Python conventions
- **Type Support**: Handles Python-specific types (datetime, Decimal, UUID, bytes)
- **Flexible Formatting**: Configurable indentation, delimiters, and length markers
- **Pure Python**: No runtime dependencies

## Quick Start

```python
from toon_python import encode, EncodeOptions, Delimiter

# Basic encoding
data = {
    "user": {
        "id": 123,
        "name": "Ada Lovelace",
        "active": True
    },
    "tags": ["python", "llm", "data"]
}

toon_output = encode(data)
print(toon_output)
```

Output:
```
user:
  id: 123
  name: Ada Lovelace
  active: true
tags[3]: python,llm,data
```

## Installation

```bash
pip install toon-python
```

## Usage

### Basic Encoding

```python
from toon_python import encode

# Simple objects
encode({"name": "Alice", "age": 30})
# → name: Alice
#   age: 30

# Arrays
encode({"items": [1, 2, 3]})
# → items[3]: 1,2,3

# Nested structures
encode({"user": {"id": 1, "roles": ["admin", "user"]}})
# → user:
#     id: 1
#     roles[2]: admin,user
```

### Formatting Options

```python
from toon_python import encode, EncodeOptions, Delimiter

options = EncodeOptions(
    indent=4,                    # 4 spaces instead of 2
    delimiter=Delimiter.PIPE,    # Use | as delimiter
    length_marker="#"            # Add # prefix to array lengths
)

data = {"tags": ["a", "b", "c"]}
encode(data, options)
# → tags[#3|]: a|b|c
```

### Type Support

The library automatically normalizes Python types to JSON-compatible representations:

```python
from datetime import datetime, date
from decimal import Decimal
from uuid import UUID

data = {
    "timestamp": datetime(2023, 1, 1, 12, 0, 0),
    "date_only": date(2023, 1, 1),
    "price": Decimal("19.99"),
    "id": UUID("12345678-1234-5678-1234-567812345678"),
    "binary": b"hello world"
}

encode(data)
# → timestamp: 2023-01-01T12:00:00
#   date_only: 2023-01-01
#   price: 19.99
#   id: 12345678-1234-5678-1234-567812345678
#   binary: aGVsbG8gd29ybGQ=
```

### Array Optimization

TOON automatically chooses the best encoding strategy for arrays:

```python
# Primitive arrays (inline)
encode({"numbers": [1, 2, 3, 4, 5]})
# → numbers[5]: 1,2,3,4,5

# Tabular arrays (uniform objects)
encode({"users": [
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"}
]})
# → users[2]{id,name}:
#     1,Alice
#     2,Bob

# Mixed arrays (list format)
encode({"mixed": [1, {"a": 2}, "three"]})
# → mixed[3]:
#     - 1
#     - a: 2
#     - three
```

## API Reference

### `encode(data, options=None)`

Convert any JSON-serializable value to TOON format.

**Parameters:**
- `data`: Any Python data structure to encode
- `options`: Optional `EncodeOptions` instance for configuration

**Returns:** TOON format string

**Raises:**
- `ToonEncodingError`: If data cannot be encoded
- `CircularReferenceError`: If circular references detected
- `DatasetTooLargeError`: If data exceeds 10MB limit

### `EncodeOptions`

Configuration options for TOON encoding:

```python
@dataclass
class EncodeOptions:
    indent: int = 2                    # Spaces per nesting level
    delimiter: Delimiter = Delimiter.COMMA  # Array delimiter
    length_marker: Optional[str] = None  # '#' or None for array length prefix
```

### `Delimiter`

Enum for array delimiters:
- `Delimiter.COMMA` (default): `,`
- `Delimiter.TAB`: `\t`
- `Delimiter.PIPE`: `|`

## Development

### Setup

```bash
# Clone repository
git clone https://github.com/your-username/toon-python.git
cd toon-python

# Install development dependencies
pip install -e ".[dev]"
```

### Testing

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=src/toon_python

# Run specific test file
pytest tests/test_encoder.py
```

### Code Quality

```bash
# Format code
black src/ tests/

# Lint code
ruff check src/ tests/

# Type checking
mypy src/
```

## Token Efficiency

TOON achieves significant token reduction through:

1. **Minimal Punctuation**: Removes braces, brackets, and most quotes
2. **Smart Quoting**: Only quotes when necessary for parsing
3. **Inline Arrays**: Primitive arrays use comma-separated values
4. **Tabular Format**: Uniform object arrays use table-like layout
5. **Compact Syntax**: Eliminates redundant characters

Example comparison for a typical user object:

**JSON (45 tokens):**
```json
{
  "id": 123,
  "name": "Alice Smith",
  "email": "alice@example.com",
  "active": true,
  "roles": ["user", "admin"]
}
```

**TOON (28 tokens, 38% reduction):**
```
id: 123
name: Alice Smith
email: alice@example.com
active: true
roles[2]: user,admin
```

## Limitations

- Maximum dataset size: 10MB (configurable)
- No circular reference support
- Pure Python implementation (not optimized for speed)
- Encoding only (no decoding functionality)

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Run the test suite and ensure all tests pass
6. Submit a pull request

## License

MIT License - see LICENSE file for details.

## Related Projects

- [toon](https://github.com/johannschopplich/toon) - Original TypeScript implementation (v0.3.1)
