# Requirements Document

## Introduction

This feature provides vendor clustering capabilities for credit card transactions using the transaction-pattern-analysis framework. It instantiates a StrategyRunner configured with vendor-matching strategies to group transactions by merchant. The clustering goes beyond exact string matching to handle variations in vendor names (e.g., "AMAZON.COM" vs "AMZN MKTP" vs "Amazon Prime").

This feature depends on the transaction-pattern-analysis framework being implemented first.

## Glossary

- **Transaction_DataFrame**: A pandas DataFrame containing parsed transaction data with columns: date, description, amount
- **Vendor_Clustering_Runner**: A StrategyRunner instance configured with vendor-matching strategies
- **Vendor_Matching_Strategy**: A DetectionStrategy implementation that identifies whether transactions belong to the same vendor
- **Vendor_Cluster**: A group of transactions identified as belonging to the same merchant
- **Vendor_Summary**: Aggregated statistics for a vendor cluster (count, total, date range)

## Requirements

### Requirement 1

**User Story:** As a user, I want transactions clustered by vendor using multiple matching strategies, so that variations in merchant names are grouped together.

#### Acceptance Criteria

1. WHEN transactions are provided THEN the Vendor_Clustering_Runner SHALL apply all registered vendor-matching strategies
2. WHEN a strategy identifies transactions as belonging to the same vendor THEN the strategy SHALL return matching_indices for those transactions
3. WHEN multiple strategies agree on a vendor match THEN the combined probability SHALL reflect higher confidence
4. WHEN clustering completes THEN the system SHALL group transactions into vendor clusters based on combined probability threshold

### Requirement 2

**User Story:** As a user, I want vendor clusters to include summary statistics, so that I can see aggregated spending per merchant.

#### Acceptance Criteria

1. WHEN a vendor cluster is created THEN the system SHALL calculate total expenditure (sum of amounts) for the cluster
2. WHEN a vendor cluster is created THEN the system SHALL calculate the count of transactions in the cluster
3. WHEN a vendor cluster is created THEN the system SHALL determine the earliest and latest transaction dates
4. WHEN a vendor cluster is created THEN the system SHALL assign a representative vendor name for the cluster

### Requirement 3

**User Story:** As a user, I want the vendor summary exported to CSV, so that I can review spending by merchant in a spreadsheet.

#### Acceptance Criteria

1. WHEN clustering completes THEN the system SHALL return a DataFrame with columns: vendor_name, transaction_count, total_amount, earliest_date, latest_date
2. WHEN exporting vendor summary THEN the system SHALL save results to vendor_amounts.csv
3. WHEN saving the CSV THEN the system SHALL use UTF-8 encoding to preserve special characters
4. WHEN saving the CSV THEN the system SHALL include column headers

### Requirement 4

**User Story:** As a user, I want to start with exact string matching as the baseline strategy, so that the system works immediately with more strategies added later.

#### Acceptance Criteria

1. WHEN the Vendor_Clustering_Runner is initialized THEN the system SHALL register an ExactMatchStrategy by default
2. WHERE the ExactMatchStrategy is applied THEN the strategy SHALL group transactions with identical description fields
3. WHEN additional strategies are registered THEN the system SHALL combine their results with the baseline strategy

