# Data Sources and Adapters

NLQL uses an **explicit adapter pattern** for data sources. This design ensures:

- **Clear separation of concerns**: Data source logic is separate from query execution
- **High extensibility**: Easy to add support for new data sources
- **Type safety**: Users know exactly which adapter they're using
- **No magic**: No auto-detection or hidden behavior

Users must explicitly create an adapter for their data source and pass it to NLQL.

## Built-in Adapters

### MemoryAdapter

Simple in-memory storage for testing, prototyping, and small datasets.

The MemoryAdapter provides several convenient methods for adding data:

```python
from nlql import NLQL
from nlql.adapters import MemoryAdapter

# Create adapter
adapter = MemoryAdapter()

# Method 1: Add single text
adapter.add_text("AI agents are autonomous systems", {"topic": "AI"})

# Method 2: Add multiple texts at once
texts = [
    "Machine learning powers modern AI",
    "Natural language processing enables text understanding"
]
metadatas = [{"topic": "ML"}, {"topic": "NLP"}]
adapter.add_texts(texts, metadatas)

# Method 3: Add a long document with automatic chunking
long_document = """
This is a very long document that will be automatically split into chunks.
Each chunk will be around 500 characters by default.

You can customize the chunk size and overlap to suit your needs.
"""
adapter.add_document(
    long_document,
    metadata={"source": "paper.pdf"},
    chunk_size=500,
    chunk_overlap=50
)

# Method 4: Add individual chunks with full control
adapter.add_chunk("Custom chunk", {"custom": "metadata"}, chunk_id="my_id")

# Use with NLQL
nlql = NLQL(adapter=adapter)
results = nlql.execute("SELECT CHUNK LIMIT 10")

# Check adapter size
print(f"Total chunks: {len(adapter)}")

# Clear all data
adapter.clear()
```

**Capabilities:**
- ✅ Metadata filtering
- ✅ Batch operations
- ✅ Automatic document chunking
- ❌ Semantic search (no embeddings by default)

## Vector Database Adapters

### ChromaDB (Coming Soon)

```python
import chromadb
from nlql import NLQL
from nlql.adapters import ChromaAdapter  # Coming soon

# Create ChromaDB collection
client = chromadb.Client()
collection = client.create_collection("docs")

# Add documents to ChromaDB
collection.add(
    documents=["AI agents are autonomous", "ML powers AI"],
    ids=["1", "2"],
    metadatas=[{"topic": "AI"}, {"topic": "ML"}],
)

# Create adapter and use with NLQL
adapter = ChromaAdapter(collection)
nlql = NLQL(adapter=adapter)

results = nlql.execute("""
    SELECT CHUNK
    WHERE SIMILAR_TO("artificial intelligence") > 0.7
    LIMIT 5
""")
```

**Capabilities:**
- ✅ Semantic search
- ✅ Metadata filtering
- ✅ Hybrid queries

### FAISS (Coming Soon)

```python
import faiss
from nlql import NLQL
from nlql.adapters import FAISSAdapter  # Coming soon

# Create FAISS index
index = faiss.IndexFlatL2(384)  # dimension

# Create adapter and use with NLQL
adapter = FAISSAdapter(index)
nlql = NLQL(adapter=adapter)
```

### Qdrant (Coming Soon)

```python
from qdrant_client import QdrantClient
from nlql import NLQL
from nlql.adapters import QdrantAdapter  # Coming soon

client = QdrantClient(":memory:")
collection = client.get_collection("my_collection")

# Create adapter and use with NLQL
adapter = QdrantAdapter(collection)
nlql = NLQL(adapter=adapter)
```

## Custom Adapters

Create custom adapters for your data sources:

```python
from nlql.adapters import BaseAdapter, QueryPlan
from nlql.text.units import TextUnit, Chunk

class MyCustomAdapter(BaseAdapter):
    def __init__(self, my_data_source):
        self.source = my_data_source

    def query(self, plan: QueryPlan) -> list[TextUnit]:
        # Implement query logic
        results = []

        # Apply filters from plan
        if plan.filters:
            # Filter by metadata
            pass

        if plan.query_text:
            # Perform semantic search
            pass

        # Convert to TextUnit objects
        for item in results:
            chunk = Chunk(
                content=item["text"],
                metadata=item.get("metadata", {}),
                chunk_id=item["id"],
                position=0,
            )
            results.append(chunk)

        # Apply limit
        if plan.limit:
            results = results[:plan.limit]

        return results

    def supports_semantic_search(self) -> bool:
        return True  # If your source supports it

    def supports_metadata_filter(self) -> bool:
        return True  # If your source supports it
```

Use your custom adapter:

```python
adapter = MyCustomAdapter(my_data_source)
nlql = NLQL(adapter=adapter)
```

## Adapter Capabilities

Adapters declare their capabilities to enable query optimization:

| Capability | Description | Impact |
|------------|-------------|--------|
| `semantic_search` | Supports SIMILAR_TO | Enables push-down of similarity queries |
| `metadata_filter` | Supports META filters | Enables push-down of metadata conditions |

NLQL automatically routes queries based on these capabilities:

- **Supported operations** → Pushed down to data source (fast)
- **Unsupported operations** → Executed in-memory (slower, but flexible)

## Best Practices

### 1. Choose the Right Adapter

- **MemoryAdapter**: Testing, small datasets, no semantic search needed
- **Vector DB adapters**: Production, large datasets, semantic search required

### 2. Leverage Push-down

Structure queries to maximize push-down:

```sql
-- Good: Filters pushed to vector DB
SELECT CHUNK
WHERE SIMILAR_TO("AI") > 0.8 AND META("topic") == "ML"

-- Less optimal: Complex logic requires in-memory execution
SELECT CHUNK
WHERE (SIMILAR_TO("AI") > 0.8 OR CONTAINS("machine learning"))
  AND LENGTH(content) > 100
```

### 3. Use Appropriate Limits

Always use LIMIT to avoid retrieving too much data:

```sql
SELECT CHUNK
WHERE SIMILAR_TO("query")
LIMIT 100  -- Reasonable limit
```

### 4. Index Metadata Fields

For vector databases, ensure metadata fields used in queries are indexed.

## Next Steps

- Learn about [Extensibility](extensibility.md)
- Check [API Reference](../api/adapters.md)

