# Architecture

NLQL is designed around a three-stage execution model that balances flexibility, performance, and extensibility.

## Overview

```
NLQL Query String
       ↓
   [Parsing]
       ↓
      AST
       ↓
   [Routing]
       ↓
  Query Plan (Push-down + In-memory)
       ↓
  [Execution]
       ↓
   Raw Results
       ↓
  [Reshaping]
       ↓
  Final Results
```

## Stage 1: Parsing

### Grammar

NLQL uses [Lark](https://github.com/lark-parser/lark) for parsing. The grammar is defined in `parser/grammar.lark` and supports:

- SELECT with granularity control (DOCUMENT, CHUNK, SENTENCE, SPAN)
- WHERE with boolean logic (AND, OR, NOT)
- Operators (MATCH, SIMILAR_TO, CONTAINS, IS, META)
- Functions (LENGTH, NOW, COUNT, custom functions)
- ORDER BY with multiple fields
- LIMIT clause

### AST Structure

The parser produces an Abstract Syntax Tree (AST) with nodes defined in `ast/nodes.py`:

- `SelectStatement` - Root node
- `WhereClause` - Filter conditions
- `LogicalExpr` - AND/OR/NOT expressions
- `ComparisonExpr` - Comparison operations
- `OperatorCall` - Custom operator invocations
- `FunctionCall` - Function invocations
- `Literal`, `Identifier` - Leaf nodes

### Error Handling

Parse errors include:

- Line and column numbers
- Context lines showing the error location
- Helpful error messages

## Stage 2: Routing

### Push-down vs In-memory

The routing stage analyzes the WHERE clause and determines what can be executed where:

**Push-down** (executed by data source):
- Semantic similarity (`SIMILAR_TO`) - if adapter supports it
- Metadata filters (`META`) - if adapter supports it
- Simple comparisons on indexed fields

**In-memory** (executed after retrieval):
- Complex boolean logic
- Custom operators
- Text pattern matching (MATCH, CONTAINS)
- Functions that require full text access

### Adapter Capabilities

Each adapter declares its capabilities:

```python
class BaseAdapter:
    def supports_semantic_search(self) -> bool: ...
    def supports_metadata_filter(self) -> bool: ...
```

The router uses these capabilities to create an optimal query plan.

### Query Plan

The routing stage produces a `QueryPlan`:

```python
@dataclass
class QueryPlan:
    filters: dict[str, Any] | None  # Push-down filters
    query_text: str | None          # Semantic search text
    limit: int | None               # Result limit
    metadata: dict[str, Any]        # Adapter-specific params
```

## Stage 3: Execution

### Adapter Execution

The executor sends the query plan to the adapter:

```python
units = adapter.query(plan)
```

Adapters return `TextUnit` objects (typically `Chunk` instances from vector databases).

### In-memory Filtering

After retrieval, the executor applies in-memory filters:

1. Evaluate WHERE clause conditions that couldn't be pushed down
2. Apply custom operators and functions
3. Filter results based on evaluation

### Granularity Transformation

Based on the SELECT clause, results are transformed:

- `DOCUMENT` - Group chunks back into documents
- `CHUNK` - Return as-is (default from vector DBs)
- `SENTENCE` - Split chunks into sentences
- `SPAN(unit, window=N)` - Create sliding windows with context

### Ordering and Limiting

Finally:

1. Apply ORDER BY (similarity score or metadata fields)
2. Apply LIMIT to get top-N results

## Extensibility Points

### 1. Custom Adapters

Implement `BaseAdapter` to support new data sources:

```python
class MyAdapter(BaseAdapter):
    def query(self, plan: QueryPlan) -> list[TextUnit]: ...
    def supports_semantic_search(self) -> bool: ...
    def supports_metadata_filter(self) -> bool: ...
```

### 2. Custom Operators

Register operators for domain-specific logic:

```python
@nlql.register_operator("HAS_EMAIL")
def has_email_operator(text: str) -> bool:
    return bool(re.search(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text))
```

### 3. Custom Functions

Add query functions:

```python
@nlql.register_function("word_count")
def word_count(text: str) -> int:
    return len(text.split())
```

### 4. Custom Types

Define metadata field types for type-safe comparisons:

```python
from nlql import register_meta_field, NumberType

register_meta_field("priority", NumberType)
```

### 5. Custom Splitters

Implement language-specific text splitting:

```python
@nlql.register_splitter("SENTENCE")
def german_sentence_splitter(text: str) -> list[str]:
    import nltk
    return nltk.sent_tokenize(text, language='german')
```

### 6. Custom Embedding

Use your own embedding model:

```python
@nlql.register_embedding_provider
def my_embedding(texts: list[str]) -> list[list[float]]:
    # Your embedding logic
    return embeddings
```

## Type System

NLQL uses an implicit type system for metadata fields:

- `NumberType` - Numeric comparisons
- `TextType` - String comparisons
- `DateType` - Date/time comparisons

Types are registered per field and used during WHERE clause evaluation to ensure type-safe comparisons.

## Performance Considerations

1. **Push-down optimization**: Maximize what's pushed to the data source
2. **Lazy evaluation**: Only compute what's needed
3. **Lazy imports**: Optional dependencies loaded on-demand
4. **Batch processing**: Embeddings computed in batches

## Next Steps

- Explore [Query Syntax](user-guide/syntax.md) details
- Learn about [Data Sources](user-guide/data-sources.md)
- Dive into [Extensibility](user-guide/extensibility.md)

