# Implementation Plan

- [x] 1. Set up project structure and dependencies
  - Create directory structure: `src/`, `tests/`, `tests/fixtures/`
  - Initialize project with `uv init` and configure `pyproject.toml`
  - Add dependencies: docling, pandas, hypothesis, pytest
  - Create `src/__init__.py` and `src/models.py` with Transaction and ProcessingResult dataclasses
  - _Requirements: 2.1, 2.4_

- [x] 2. Implement PDF Scanner
  - [x] 2.1 Create PDFScanner class in `src/pdf_scanner.py`
    - Implement `__init__`, `scan()`, and `validate_directory()` methods
    - Return list of Path objects for all .pdf files in directory
    - _Requirements: 1.1_
  - [x] 2.2 Write property test for PDF scanner
    - **Property 1: PDF Scanner Returns Only PDF Files**
    - **Validates: Requirements 1.1**
  - [x] 2.3 Write unit tests for PDF scanner edge cases
    - Test empty directory, nested directories, non-existent directory
    - _Requirements: 1.1_

- [x] 3. Implement Text Extractor
  - [x] 3.1 Create TextExtractor class in `src/text_extractor.py`
    - Implement Docling integration for PDF text extraction
    - Implement `extract()` and `extract_batch()` methods
    - Handle extraction errors gracefully with logging
    - _Requirements: 1.2, 1.3, 1.4_
  - [x] 3.2 Write integration tests for text extractor
    - Test with sample PNC statement PDF
    - Test error handling with corrupted file
    - _Requirements: 1.2, 1.4_

- [x] 4. Implement Transaction Parser
  - [x] 4.1 Create TransactionParser class in `src/transaction_parser.py`
    - Implement `parse()` method to extract transactions from raw text
    - Implement `normalize_date()` for ISO format conversion
    - Implement `parse_amount()` for numeric extraction with sign handling
    - _Requirements: 2.1, 2.2, 2.3, 2.5_
  - [x] 4.2 Write property test for transaction parsing round-trip
    - **Property 2: Transaction Parsing Round-Trip**
    - **Validates: Requirements 2.1**
  - [x] 4.3 Write property test for date normalization
    - **Property 3: Date Normalization Consistency**
    - **Validates: Requirements 2.2**
  - [x] 4.4 Write property test for amount parsing
    - **Property 4: Amount Parsing Preserves Value**
    - **Validates: Requirements 2.3**
  - [x] 4.5 Write unit tests for parser edge cases
    - _Requirements: 2.5_

- [x] 5. Implement Structured Extractor with Pydantic Models
  - [x] 5.1 Create Pydantic extraction models in `src/extraction_models.py`
    - Define TransactionExtraction, StatementExtraction models
    - _Requirements: 6.2_
  - [x] 5.2 Create StructuredExtractor class in `src/structured_extractor.py`
    - Implement `extract_statement()` and `extract_transactions()` methods
    - Handle multi-page PDFs
    - _Requirements: 5.1, 5.2, 6.1_
  - [ ]* 5.3 Write property test for Pydantic validation round-trip
    - **Property 9: Pydantic Validation Round-Trip**
    - **Validates: Requirements 6.3**
  - [x] 5.4 Update TransactionParser to support structured extraction
    - Add `parse_from_extraction()` method
    - _Requirements: 5.1, 6.3_
  - [x] 5.5 Write property test for multi-page extraction completeness






    - **Property 8: Multi-Page Extraction Completeness**
    - **Validates: Requirements 5.1, 5.4**

- [ ] 6. Checkpoint - Ensure all tests pass
  - Ensure all tests pass, ask the user if questions arise.

- [x] 7. Implement CSV Exporter
  - [x] 7.1 Create CSVExporter class in `src/csv_exporter.py`
    - Implement `export_all()` method
    - Use UTF-8 encoding, include column headers
    - _Requirements: 3.1, 3.2, 3.3, 3.4_
  - [ ]* 7.2 Write property test for CSV round-trip
    - **Property 5: CSV Round-Trip Preserves Data**
    - **Validates: Requirements 3.1, 3.3**
  - [ ]* 7.3 Write property test for transaction order preservation
    - **Property 6: Transaction Order Preservation**
    - **Validates: Requirements 3.4**

- [x] 8. Implement Statement Processor Orchestrator
  - [x] 8.1 Create StatementProcessor class in `src/statement_processor.py`
    - Implement `process()` method to run extraction pipeline
    - Implement `process_directory()` for single directory processing
    - Implement `combine_datasets()` for multi-directory support
    - _Requirements: 4.1, 4.2_
  - [x] 8.2 Write property test for multi-directory combination






    - **Property 7: Multi-Directory Combination**
    - **Validates: Requirements 4.1**

- [x] 9. Create CLI Entry Point
  - [x] 9.1 Create `main.py` with command-line interface
    - Accept input directory paths as arguments
    - Accept optional output directory
    - Print processing summary on completion
    - _Requirements: 1.1, 3.1_

- [ ] 10. Final Checkpoint - Ensure all tests pass
  - Ensure all tests pass, ask the user if questions arise.
