# Design Document

## Overview

The Transaction Pattern Analysis feature provides a framework for analyzing vendor transactions using pluggable strategies. Multiple analysis algorithms can be registered with configurable weights, and their results are combined into a final probability score.

This spec focuses on the framework/plumbing only. Specific analysis algorithms (recurrence detection, anomaly detection, etc.) are implemented as separate strategies that conform to the DetectionStrategy interface.

## Architecture

```mermaid
flowchart TD
    subgraph Setup
        R1[register_strategy] --> REG[Strategy Registry]
        R2[register_strategy] --> REG
        R3[register_strategy] --> REG
    end
    
    subgraph Execution
        A[Vendor Transactions DataFrame] --> B[StrategyRunner.run]
        B --> REG
        REG --> D1[Strategy 1.detect]
        REG --> D2[Strategy 2.detect]
        REG --> D3[Strategy N.detect]
        D1 --> E1[StrategyResult + matching_indices]
        D2 --> E2[StrategyResult or None]
        D3 --> E3[StrategyResult + matching_indices]
    end
    
    subgraph Combination
        E1 --> F[Weighted Average]
        E2 -.->|excluded if None| F
        E3 --> F
        F --> G[CombinedResult]
    end
```

### Component Flow

1. **Setup**: Register strategies with weights via `register_strategy(strategy, weight)`
2. **Execution**: Call `run(transactions)` which iterates through all registered strategies
3. **Strategy Execution**: Each strategy's `detect()` method is called with the transactions DataFrame
4. **Result Collection**: Each strategy returns `StrategyResult` (with probability + matching_indices) or `None`
5. **Combination**: Weighted average is calculated, excluding strategies that returned `None`
6. **Output**: `CombinedResult` with final probability and individual strategy breakdowns

## Components and Interfaces

### DetectionStrategy (Abstract Base)

```python
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Optional, Dict, Any, List

@dataclass
class StrategyResult:
    """Result from a single analysis strategy."""
    probability: float  # 0.0 to 1.0
    matching_indices: List[int]  # DataFrame indices of transactions that triggered this strategy
    metadata: Dict[str, Any] = None  # Optional strategy-specific details

class DetectionStrategy(ABC):
    """Abstract base class for transaction analysis strategies."""
    
    @property
    @abstractmethod
    def name(self) -> str:
        """Return the strategy name."""
        
    @abstractmethod
    def detect(self, transactions: pd.DataFrame) -> Optional[StrategyResult]:
        """Analyze transactions and return probability score.
        
        Args:
            transactions: DataFrame with columns: date, description, amount
            
        Returns:
            StrategyResult with probability, matching_indices, and metadata, or None if no opinion
        """
```

### StrategyRunner

```python
@dataclass
class CombinedResult:
    """Combined result from all analysis strategies."""
    probability: float  # Final weighted probability (0.0 to 1.0)
    strategy_results: Dict[str, StrategyResult]  # Individual results by strategy name
    strategy_weights: Dict[str, float]  # Weights used for each strategy

class StrategyRunner:
    """Orchestrates analysis strategies and combines results."""
    
    def __init__(self):
        """Initialize the strategy runner."""
        self._strategies: List[Tuple[DetectionStrategy, float]] = []
        
    def register_strategy(self, strategy: DetectionStrategy, weight: float = 1.0) -> None:
        """Register an analysis strategy with optional weight.
        
        Args:
            strategy: Strategy instance conforming to DetectionStrategy interface
            weight: Weight for this strategy in combination (default 1.0)
        """
        
    def run(self, transactions: pd.DataFrame) -> CombinedResult:
        """Apply all strategies and return combined result.
        
        Args:
            transactions: DataFrame of transactions for a single vendor
            
        Returns:
            CombinedResult with weighted probability and breakdown
        """
        
    def _combine_results(
        self, 
        results: Dict[str, StrategyResult], 
        weights: Dict[str, float]
    ) -> float:
        """Calculate weighted average of strategy probabilities.
        
        Excludes strategies that returned None.
        Returns 0.0 if all strategies returned None.
        """
```

## Data Models

### Input DataFrame Schema (Vendor Transactions)

| Column | Type | Description |
|--------|------|-------------|
| date | string | Transaction date in YYYY-MM-DD format |
| description | string | Vendor/merchant name (same for all rows) |
| amount | float | Transaction amount |

### StrategyResult

| Field | Type | Description |
|-------|------|-------------|
| probability | float | Probability score (0.0 to 1.0) |
| matching_indices | List[int] | DataFrame indices of transactions that triggered this strategy |
| metadata | Dict | Optional strategy-specific details |

### CombinedResult

| Field | Type | Description |
|-------|------|-------------|
| probability | float | Final weighted probability (0.0 to 1.0) |
| strategy_results | Dict[str, StrategyResult] | Individual results by strategy name |
| strategy_weights | Dict[str, float] | Weights used for each strategy |

## Correctness Properties

*A property is a characteristic or behavior that should hold true across all valid executions of a system.*

### Property 1: All Strategies Applied

*For any* set of registered strategies and input transactions, the StrategyRunner SHALL call the detect method on every registered strategy.

**Validates: Requirements 1.1**

### Property 2: Weighted Average Correctness

*For any* set of strategy results with associated weights, the combined probability SHALL equal the weighted average: sum(probability_i * weight_i) / sum(weight_i) for all strategies that returned a result.

**Validates: Requirements 1.3, 2.2**

### Property 3: Probability Bounds

*For any* combination of strategy results and weights, the final probability score SHALL be between 0.0 and 1.0 inclusive.

**Validates: Requirements 1.4**

### Property 4: None Exclusion

*For any* set of strategies where some return None, the weighted average SHALL be calculated using only strategies that returned a StrategyResult, excluding those that returned None.

**Validates: Requirements 2.3**

### Property 5: Matching Indices Included

*For any* strategy that returns a result, the StrategyResult SHALL include the matching_indices list containing the DataFrame indices of transactions that triggered the strategy.

**Validates: Requirements 4.3**

## Strategy Stubs (Future Implementation)

The following strategy stubs are planned for future implementation. Each conforms to the DetectionStrategy interface.

### SameAmountPeriodicallyStrategy

```python
class SameAmountPeriodicallyStrategy(DetectionStrategy):
    """Detects transactions with identical amounts at regular intervals.
    
    High probability when:
    - Amounts are within tolerance (e.g., 10%)
    - Transactions occur at regular intervals (monthly, yearly)
    """
    
    @property
    def name(self) -> str:
        return "same_amount_periodically"
        
    def detect(self, transactions: pd.DataFrame) -> Optional[StrategyResult]:
        # TODO: Implement
        pass
```

### SimilarDayOfMonthStrategy

```python
class SimilarDayOfMonthStrategy(DetectionStrategy):
    """Detects transactions occurring on similar calendar days.
    
    High probability when:
    - Transactions occur on similar days of month (within ±3 days)
    - Pattern repeats across multiple months
    """
    
    @property
    def name(self) -> str:
        return "similar_day_of_month"
        
    def detect(self, transactions: pd.DataFrame) -> Optional[StrategyResult]:
        # TODO: Implement
        pass
```

### VariableAmountSameDateStrategy

```python
class VariableAmountSameDateStrategy(DetectionStrategy):
    """Detects recurring bills with varying amounts (like utilities).
    
    High probability when:
    - Transactions occur on similar dates
    - Amounts vary but vendor is consistent
    """
    
    @property
    def name(self) -> str:
        return "variable_amount_same_date"
        
    def detect(self, transactions: pd.DataFrame) -> Optional[StrategyResult]:
        # TODO: Implement
        pass
```

### Future Strategy Ideas

- **LevenshteinDistanceStrategy**: Group similar vendor names using edit distance
- **VectorEmbeddingStrategy**: Use embeddings to cluster similar merchants
- **VendorDatabaseStrategy**: Cross-reference against known subscription vendors
- **LLMEvaluationStrategy**: Send transaction patterns to LLM for classification

## Error Handling

| Error Type | Handling Strategy |
|------------|-------------------|
| Strategy throws exception | Log error, treat as None result, continue |
| Empty transactions DataFrame | Return probability 0.0 |
| No strategies registered | Return probability 0.0 |
| Invalid weight (negative) | Raise ValueError on registration |

## Testing Strategy

### Property-Based Testing

Use **Hypothesis** with minimum 100 iterations per property test.

### Test Approach

- Create mock strategies with deterministic outputs for testing
- Test weighted combination formula with various weight configurations
- Test edge cases: all None results, single strategy, zero weights

### Test Annotations

```python
# **Feature: transaction-pattern-analysis, Property {number}: {property_text}**
```
