# Project Structure

```
├── statement_processor/           # Main package (installable)
│   ├── __init__.py               # Public API exports
│   ├── cli.py                    # CLI entry point
│   ├── processor.py              # Main StatementProcessor orchestrator
│   ├── models.py                 # Core Pydantic models (Transaction, Statement)
│   ├── parsing_utils.py          # Date/amount parsing utilities
│   ├── core/                     # Plugin interfaces
│   │   ├── base_transaction.py   # BaseTransaction interface
│   │   ├── base_parser.py        # BaseStatementParser interface
│   │   └── registry.py           # Plugin discovery via entry points
│   ├── extraction/               # PDF extraction
│   │   ├── pdf_scanner.py        # PDF file discovery
│   │   └── pdf_markdown_extractor.py  # Docling-based PDF to markdown extraction
│   ├── parsers/                  # Built-in parsers
│   │   └── markdown_table.py     # Markdown table parser
│   └── analytics/                # Transaction analysis
│       ├── clustering.py         # ClusteringStrategy interface
│       ├── cluster_runner.py     # Strategy orchestration
│       ├── vendor_clusterer.py   # Vendor clustering
│       ├── vendor_cluster.py     # VendorCluster data model
│       ├── strategies/           # Built-in strategies
│       │   ├── exact_match.py
│       │   └── regex_vendor.py
│       └── vendor_patterns/      # YAML pattern files
├── src/                          # Legacy code (to be removed)
├── tests/                        # Test modules
├── inputs/                       # PDF statement files
└── results/                      # Generated CSV output
```

## Plugin Architecture

Third-party packages extend the framework via entry points:

```toml
# Example: statement-processor-chase/pyproject.toml
[project.entry-points."statement_processor.parsers"]
chase_sapphire = "sp_chase:SapphireParser"

[project.entry-points."statement_processor.strategies"]
chase_rewards = "sp_chase.strategies:ChaseRewardsStrategy"
```

## Processing Pipeline

1. `PDFScanner` - Find PDF files in input directories
2. `PdfMarkdownExtractor` - Extract markdown text from PDFs via Docling
3. `auto_detect_parser()` - Find appropriate parser for document
4. `BaseStatementParser.parse()` - Parse text into Transaction objects
5. `VendorClusterer` - Cluster transactions by vendor using strategies
6. `CSVExporter` - Write results to CSV files
