# Getting Started with NLQL

This guide will help you get started with NLQL, from installation to executing your first queries.

## Installation

### Basic Installation

Install NLQL with pip:

```bash
pip install python-nlql
```

This installs the core NLQL engine with the Lark parser. For semantic search capabilities, you'll also need an embedding provider.

### With Embedding Support

For semantic similarity operations (`SIMILAR_TO`), install with text support:

```bash
pip install python-nlql[text]
```

This includes `sentence-transformers` for default embedding functionality.

### With Vector Database Adapters

Install with specific vector database support:

```bash
# ChromaDB
pip install python-nlql[chroma]

# FAISS
pip install python-nlql[faiss]

# Qdrant
pip install python-nlql[qdrant]

# All adapters
pip install python-nlql[all]
```

## Your First Query

### Using In-Memory Data

The simplest way to get started is with the built-in `MemoryAdapter`:

```python
from nlql import NLQL
from nlql.adapters import MemoryAdapter

# Create adapter and add data
adapter = MemoryAdapter()
adapter.add_text("AI agents are autonomous systems", {"topic": "AI"})
adapter.add_text("Machine learning powers modern AI", {"topic": "ML"})
adapter.add_text("Natural language processing", {"topic": "NLP"})

# Initialize NLQL with explicit adapter
nlql = NLQL(adapter=adapter)

# Execute a simple query
results = nlql.execute("SELECT CHUNK LIMIT 2")

# Print results
for result in results:
    print(result.content)
```

### Using a Vector Database

With ChromaDB (requires ChromaAdapter - coming soon):

```python
import chromadb
from nlql import NLQL
from nlql.adapters import ChromaAdapter  # Coming soon

# Create ChromaDB client and collection
client = chromadb.Client()
collection = client.create_collection("my_docs")

# Add documents
collection.add(
    documents=["AI agents are autonomous", "ML powers modern AI"],
    ids=["doc1", "doc2"],
)

# Create adapter and initialize NLQL
adapter = ChromaAdapter(collection)
nlql = NLQL(adapter=adapter)

# Query with semantic search
results = nlql.execute("""
    SELECT CHUNK
    WHERE SIMILAR_TO("artificial intelligence")
    LIMIT 5
""")
```

## Basic Query Syntax

### SELECT Clause

Choose the granularity of results:

```sql
-- Full documents
SELECT DOCUMENT

-- Chunks (default from vector DBs)
SELECT CHUNK

-- Individual sentences
SELECT SENTENCE

-- Sliding window with context
SELECT SPAN(SENTENCE, window=3)
```

### WHERE Clause

Filter results with various operators:

```sql
-- Semantic similarity
WHERE SIMILAR_TO("AI agents") > 0.8

-- Text matching
WHERE CONTAINS("machine learning")

-- Metadata filtering
WHERE META("date") > "2024-01-01"

-- Combine conditions
WHERE SIMILAR_TO("AI") > 0.7 AND META("topic") == "ML"
```

### ORDER BY and LIMIT

```sql
-- Order by similarity score
ORDER BY SIMILARITY DESC

-- Order by metadata field
ORDER BY META("date") DESC

-- Limit results
LIMIT 10
```

## Complete Example

```python
from nlql import NLQL
from nlql.adapters import MemoryAdapter

# Create adapter
adapter = MemoryAdapter()

# Add documents with metadata
adapter.add_text(
    "AI agents can perceive their environment and take actions.",
    {"date": "2024-01-15", "author": "Alice", "topic": "AI"}
)
adapter.add_text(
    "Machine learning models learn from data without explicit programming.",
    {"date": "2024-01-20", "author": "Bob", "topic": "ML"}
)
adapter.add_text(
    "Natural language processing enables computers to understand human language.",
    {"date": "2024-01-25", "author": "Alice", "topic": "NLP"}
)

# Or use batch add
texts = [
    "AI agents can perceive their environment and take actions.",
    "Machine learning models learn from data without explicit programming.",
    "Natural language processing enables computers to understand human language.",
]
metadatas = [
    {"date": "2024-01-15", "author": "Alice", "topic": "AI"},
    {"date": "2024-01-20", "author": "Bob", "topic": "ML"},
    {"date": "2024-01-25", "author": "Alice", "topic": "NLP"},
]
adapter.add_texts(texts, metadatas)

# Initialize NLQL
nlql = NLQL(adapter=adapter)

# Execute a complex query
results = nlql.execute("""
    SELECT CHUNK
    WHERE META("author") == "Alice"
    LIMIT 5
""")

# Process results
for i, result in enumerate(results, 1):
    print(f"\n--- Result {i} ---")
    print(f"Content: {result.content}")
    print(f"Metadata: {result.metadata}")
```

## Semantic Search Example

NLQL supports semantic search using the `SIMILAR_TO` operator, which uses vector embeddings to find semantically similar content:

```python
from nlql import NLQL
from nlql.adapters import MemoryAdapter

# Create adapter with AI-related content
adapter = MemoryAdapter()

adapter.add_text(
    "Artificial intelligence and machine learning are revolutionizing technology.",
    {"category": "AI", "author": "Alice", "year": 2024}
)

adapter.add_text(
    "Neural networks form the foundation of modern deep learning systems.",
    {"category": "ML", "author": "Bob", "year": 2024}
)

adapter.add_text(
    "Natural language processing enables computers to understand human language.",
    {"category": "NLP", "author": "Alice", "year": 2023}
)

nlql = NLQL(adapter=adapter)

# Semantic search query
results = nlql.execute("""
    SELECT CHUNK
    WHERE SIMILAR_TO("deep learning and neural networks") > 0.5
    ORDER BY SIMILARITY DESC
    LIMIT 3
""")

# Results are ordered by semantic similarity
for i, result in enumerate(results, 1):
    similarity = result.metadata['similarity']
    print(f"{i}. [{similarity:.3f}] {result.content}")
    print(f"   Category: {result.metadata['category']}\n")
```

**Output:**
```
1. [0.814] Neural networks form the foundation of modern deep learning systems.
   Category: ML

2. [0.777] Artificial intelligence and machine learning are revolutionizing technology.
   Category: AI

3. [0.609] Natural language processing enables computers to understand human language.
   Category: NLP
```

### How Semantic Search Works

1. **Automatic Vectorization**: When you use `SIMILAR_TO("query")`, NLQL automatically:
   - Embeds the query text using the default model (`all-MiniLM-L6-v2`)
   - Embeds all text chunks in your data
   - Computes cosine similarity scores

2. **Similarity Scores**: The similarity score (0-1) is stored in `metadata["similarity"]` and can be:
   - Used in WHERE clause: `WHERE SIMILAR_TO("query") > 0.8`
   - Used in ORDER BY: `ORDER BY SIMILARITY DESC`
   - Accessed in results: `result.metadata['similarity']`

3. **Hybrid Queries**: Combine semantic search with metadata filtering:

```python
results = nlql.execute("""
    SELECT CHUNK
    WHERE
        SIMILAR_TO("AI technology") > 0.6
        AND META("year") == 2024
        AND META("author") == "Alice"
    ORDER BY SIMILARITY DESC
""")
```

### Installation for Semantic Search

To use semantic search, install with text support:

```bash
pip install python-nlql[text]
```

This installs `sentence-transformers` for the default embedding provider.

## Extensibility (Optional)

NLQL is highly extensible. You can customize functions, operators, and embedding providers to fit your specific needs:

```python
from nlql import register_function, register_operator

# Add custom function for WHERE/ORDER BY
@register_function("word_count")
def word_count(text: str) -> int:
    return len(text.split())

# Add domain-specific operator
@register_operator("HAS_EMAIL")
def has_email(text: str) -> bool:
    import re
    return bool(re.search(r'[\w\.-]+@[\w\.-]+', text))

# Use in queries
results = nlql.execute("""
    SELECT CHUNK
    WHERE word_count(content) > 50 AND HAS_EMAIL(content)
""")
```

**What you can extend:**
- 🔧 **Custom Functions**: Add reusable logic for WHERE and ORDER BY clauses
- 🎯 **Custom Operators**: Create domain-specific operators (e.g., `HAS_EMAIL`, `REGEX`)
- 🤖 **Embedding Providers**: Use your own embedding models (OpenAI, Cohere, etc.)
- 🏢 **Instance-Level Registration**: Different NLQL instances can have different implementations

📚 **Learn More**: See the [Extensibility Guide](user-guide/extensibility.md) for complete documentation and examples. Check the `examples/` directory in the repository for runnable code samples.

## Next Steps

- Learn about [Query Syntax](user-guide/syntax.md) in detail
- Explore [Data Sources](user-guide/data-sources.md) and adapters
- Discover [Extensibility](user-guide/extensibility.md) options for advanced customization
- Check the [API Reference](api/nlql.md)

