Metadata-Version: 2.4
Name: diff-diff
Version: 2.0.1
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Dist: numpy>=1.20.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: pytest>=7.0 ; extra == 'dev'
Requires-Dist: pytest-cov>=4.0 ; extra == 'dev'
Requires-Dist: black>=23.0 ; extra == 'dev'
Requires-Dist: ruff>=0.1.0 ; extra == 'dev'
Requires-Dist: mypy>=1.0 ; extra == 'dev'
Requires-Dist: sphinx>=6.0 ; extra == 'docs'
Requires-Dist: sphinx-rtd-theme>=1.0 ; extra == 'docs'
Provides-Extra: dev
Provides-Extra: docs
Summary: A library for Difference-in-Differences causal inference analysis
Keywords: causal-inference,difference-in-differences,econometrics,statistics,treatment-effects
Author: diff-diff contributors
License-Expression: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://diff-diff.readthedocs.io
Project-URL: Homepage, https://github.com/igerber/diff-diff
Project-URL: Issues, https://github.com/igerber/diff-diff/issues
Project-URL: Repository, https://github.com/igerber/diff-diff

# diff-diff

A Python library for Difference-in-Differences (DiD) causal inference analysis with an sklearn-like API and statsmodels-style outputs.

## Installation

```bash
pip install diff-diff
```

Or install from source:

```bash
git clone https://github.com/igerber/diff-diff.git
cd diff-diff
pip install -e .
```

## Quick Start

```python
import pandas as pd
from diff_diff import DifferenceInDifferences

# Create sample data
data = pd.DataFrame({
    'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
    'treated': [1, 1, 1, 1, 0, 0, 0, 0],
    'post': [0, 0, 1, 1, 0, 0, 1, 1]
})

# Fit the model
did = DifferenceInDifferences()
results = did.fit(data, outcome='outcome', treatment='treated', time='post')

# View results
print(results)  # DiDResults(ATT=3.0000, SE=1.7321, p=0.1583)
results.print_summary()
```

Output:
```
======================================================================
             Difference-in-Differences Estimation Results
======================================================================

Observations:                      8
Treated units:                     4
Control units:                     4
R-squared:                    0.9055

----------------------------------------------------------------------
Parameter           Estimate    Std. Err.     t-stat      P>|t|
----------------------------------------------------------------------
ATT                   3.0000       1.7321      1.732     0.1583
----------------------------------------------------------------------

95% Confidence Interval: [-1.8089, 7.8089]

Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
======================================================================
```

## Features

- **sklearn-like API**: Familiar `fit()` interface with `get_params()` and `set_params()`
- **Pythonic results**: Easy access to coefficients, standard errors, and confidence intervals
- **Multiple interfaces**: Column names or R-style formulas
- **Robust inference**: Heteroskedasticity-robust (HC1) and cluster-robust standard errors
- **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights
- **Panel data support**: Two-way fixed effects estimator for panel designs
- **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
- **Staggered adoption**: Callaway-Sant'Anna (2021) and Sun-Abraham (2021) estimators for heterogeneous treatment timing
- **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
- **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
- **Event study plots**: Publication-ready visualization of treatment effects
- **Parallel trends testing**: Multiple methods including equivalence tests
- **Goodman-Bacon decomposition**: Diagnose TWFE bias by decomposing into 2x2 comparisons
- **Placebo tests**: Comprehensive diagnostics including fake timing, fake group, permutation, and leave-one-out tests
- **Honest DiD sensitivity analysis**: Rambachan-Roth (2023) bounds and breakdown analysis for parallel trends violations
- **Pre-trends power analysis**: Roth (2022) minimum detectable violation (MDV) and power curves for pre-trends tests
- **Power analysis**: MDE, sample size, and power calculations for study design; simulation-based power for any estimator
- **Data prep utilities**: Helper functions for common data preparation tasks
- **Validated against R**: Benchmarked against `did`, `synthdid`, and `fixest` packages (see [benchmarks](docs/benchmarks.rst))

## Tutorials

We provide Jupyter notebook tutorials in `docs/tutorials/`:

| Notebook | Description |
|----------|-------------|
| `01_basic_did.ipynb` | Basic 2x2 DiD, formula interface, covariates, fixed effects, cluster-robust SE, wild bootstrap |
| `02_staggered_did.ipynb` | Staggered adoption with Callaway-Sant'Anna and Sun-Abraham, group-time effects, aggregation methods, Bacon decomposition |
| `03_synthetic_did.ipynb` | Synthetic DiD, unit/time weights, inference methods, regularization |
| `04_parallel_trends.ipynb` | Testing parallel trends, equivalence tests, placebo tests, diagnostics |
| `05_honest_did.ipynb` | Honest DiD sensitivity analysis, bounds, breakdown values, visualization |
| `06_power_analysis.ipynb` | Power analysis, MDE, sample size calculations, simulation-based power |
| `07_pretrends_power.ipynb` | Pre-trends power analysis (Roth 2022), MDV, power curves |
| `08_triple_diff.ipynb` | Triple Difference (DDD) estimation with proper covariate handling |

## Data Preparation

diff-diff provides utility functions to help prepare your data for DiD analysis. These functions handle common data transformation tasks like creating treatment indicators, reshaping panel data, and validating data formats.

### Generate Sample Data

Create synthetic data with a known treatment effect for testing and learning:

```python
from diff_diff import generate_did_data, DifferenceInDifferences

# Generate panel data with 100 units, 4 periods, and a treatment effect of 5
data = generate_did_data(
    n_units=100,
    n_periods=4,
    treatment_effect=5.0,
    treatment_fraction=0.5,  # 50% of units are treated
    treatment_period=2,       # Treatment starts at period 2
    seed=42
)

# Verify the estimator recovers the treatment effect
did = DifferenceInDifferences()
results = did.fit(data, outcome='outcome', treatment='treated', time='post')
print(f"Estimated ATT: {results.att:.2f} (true: 5.0)")
```

### Create Treatment Indicators

Convert categorical variables or numeric thresholds to binary treatment indicators:

```python
from diff_diff import make_treatment_indicator

# From categorical variable
df = make_treatment_indicator(
    data,
    column='state',
    treated_values=['CA', 'NY', 'TX']  # These states are treated
)

# From numeric threshold (e.g., firms above median size)
df = make_treatment_indicator(
    data,
    column='firm_size',
    threshold=data['firm_size'].median()
)

# Treat units below threshold
df = make_treatment_indicator(
    data,
    column='income',
    threshold=50000,
    above_threshold=False  # Units with income <= 50000 are treated
)
```

### Create Post-Treatment Indicators

Convert time/date columns to binary post-treatment indicators:

```python
from diff_diff import make_post_indicator

# From specific post-treatment periods
df = make_post_indicator(
    data,
    time_column='year',
    post_periods=[2020, 2021, 2022]
)

# From treatment start date
df = make_post_indicator(
    data,
    time_column='year',
    treatment_start=2020  # All years >= 2020 are post-treatment
)

# Works with datetime columns
df = make_post_indicator(
    data,
    time_column='date',
    treatment_start='2020-01-01'
)
```

### Reshape Wide to Long Format

Convert wide-format data (one row per unit, multiple time columns) to long format:

```python
from diff_diff import wide_to_long

# Wide format: columns like sales_2019, sales_2020, sales_2021
wide_df = pd.DataFrame({
    'firm_id': [1, 2, 3],
    'industry': ['tech', 'retail', 'tech'],
    'sales_2019': [100, 150, 200],
    'sales_2020': [110, 160, 210],
    'sales_2021': [120, 170, 220]
})

# Convert to long format for DiD
long_df = wide_to_long(
    wide_df,
    value_columns=['sales_2019', 'sales_2020', 'sales_2021'],
    id_column='firm_id',
    time_name='year',
    value_name='sales',
    time_values=[2019, 2020, 2021]
)
# Result: 9 rows (3 firms × 3 years), columns: firm_id, year, sales, industry
```

### Balance Panel Data

Ensure all units have observations for all time periods:

```python
from diff_diff import balance_panel

# Keep only units with complete data (drop incomplete units)
balanced = balance_panel(
    data,
    unit_column='firm_id',
    time_column='year',
    method='inner'
)

# Include all unit-period combinations (creates NaN for missing)
balanced = balance_panel(
    data,
    unit_column='firm_id',
    time_column='year',
    method='outer'
)

# Fill missing values
balanced = balance_panel(
    data,
    unit_column='firm_id',
    time_column='year',
    method='fill',
    fill_value=0  # Or None for forward/backward fill
)
```

### Validate Data

Check that your data meets DiD requirements before fitting:

```python
from diff_diff import validate_did_data

# Validate and get informative error messages
result = validate_did_data(
    data,
    outcome='sales',
    treatment='treated',
    time='post',
    unit='firm_id',      # Optional: for panel-specific validation
    raise_on_error=False  # Return dict instead of raising
)

if result['valid']:
    print("Data is ready for DiD analysis!")
    print(f"Summary: {result['summary']}")
else:
    print("Issues found:")
    for error in result['errors']:
        print(f"  - {error}")

for warning in result['warnings']:
    print(f"Warning: {warning}")
```

### Summarize Data by Groups

Get summary statistics for each treatment-time cell:

```python
from diff_diff import summarize_did_data

summary = summarize_did_data(
    data,
    outcome='sales',
    treatment='treated',
    time='post'
)
print(summary)
```

Output:
```
                        n      mean       std       min       max
Control - Pre        250  100.5000   15.2340   65.0000  145.0000
Control - Post       250  105.2000   16.1230   68.0000  152.0000
Treated - Pre        250  101.2000   14.8900   67.0000  143.0000
Treated - Post       250  115.8000   17.5600   72.0000  165.0000
DiD Estimate           -    9.9000         -         -         -
```

### Create Event Time for Staggered Designs

For designs where treatment occurs at different times:

```python
from diff_diff import create_event_time

# Add event-time column relative to treatment timing
df = create_event_time(
    data,
    time_column='year',
    treatment_time_column='treatment_year'
)
# Result: event_time = -2, -1, 0, 1, 2 relative to treatment
```

### Aggregate to Cohort Means

Aggregate unit-level data for visualization:

```python
from diff_diff import aggregate_to_cohorts

cohort_data = aggregate_to_cohorts(
    data,
    unit_column='firm_id',
    time_column='year',
    treatment_column='treated',
    outcome='sales'
)
# Result: mean outcome by treatment group and period
```

### Rank Control Units

Select the best control units for DiD or Synthetic DiD analysis by ranking them based on pre-treatment outcome similarity:

```python
from diff_diff import rank_control_units, generate_did_data

# Generate sample data
data = generate_did_data(n_units=50, n_periods=6, seed=42)

# Rank control units by their similarity to treated units
ranking = rank_control_units(
    data,
    unit_column='unit',
    time_column='period',
    outcome_column='outcome',
    treatment_column='treated',
    n_top=10  # Return top 10 controls
)

print(ranking[['unit', 'quality_score', 'pre_trend_rmse']])
```

Output:
```
   unit  quality_score  pre_trend_rmse
0    35         1.0000          0.4521
1    42         0.9234          0.5123
2    28         0.8876          0.5892
...
```

With covariates for matching:

```python
# Add covariate-based matching
ranking = rank_control_units(
    data,
    unit_column='unit',
    time_column='period',
    outcome_column='outcome',
    treatment_column='treated',
    covariates=['size', 'age'],  # Match on these too
    outcome_weight=0.7,          # 70% weight on outcome trends
    covariate_weight=0.3         # 30% weight on covariate similarity
)
```

Filter data for SyntheticDiD using top controls:

```python
from diff_diff import SyntheticDiD

# Get top control units
top_controls = ranking['unit'].tolist()

# Filter data to treated + top controls
filtered_data = data[
    (data['treated'] == 1) | (data['unit'].isin(top_controls))
]

# Fit SyntheticDiD with selected controls
sdid = SyntheticDiD()
results = sdid.fit(
    filtered_data,
    outcome='outcome',
    treatment='treated',
    unit='unit',
    time='period',
    post_periods=[3, 4, 5]
)
```

## Usage

### Basic DiD with Column Names

```python
from diff_diff import DifferenceInDifferences

did = DifferenceInDifferences(robust=True, alpha=0.05)
results = did.fit(
    data,
    outcome='sales',
    treatment='treated',
    time='post_policy'
)

# Access results
print(f"ATT: {results.att:.4f}")
print(f"Standard Error: {results.se:.4f}")
print(f"P-value: {results.p_value:.4f}")
print(f"95% CI: {results.conf_int}")
print(f"Significant: {results.is_significant}")
```

### Using Formula Interface

```python
# R-style formula syntax
results = did.fit(data, formula='outcome ~ treated * post')

# Explicit interaction syntax
results = did.fit(data, formula='outcome ~ treated + post + treated:post')

# With covariates
results = did.fit(data, formula='outcome ~ treated * post + age + income')
```

### Including Covariates

```python
results = did.fit(
    data,
    outcome='outcome',
    treatment='treated',
    time='post',
    covariates=['age', 'income', 'education']
)
```

### Fixed Effects

Use `fixed_effects` for low-dimensional categorical controls (creates dummy variables):

```python
# State and industry fixed effects
results = did.fit(
    data,
    outcome='sales',
    treatment='treated',
    time='post',
    fixed_effects=['state', 'industry']
)

# Access fixed effect coefficients
state_coefs = {k: v for k, v in results.coefficients.items() if k.startswith('state_')}
```

Use `absorb` for high-dimensional fixed effects (more efficient, uses within-transformation):

```python
# Absorb firm-level fixed effects (efficient for many firms)
results = did.fit(
    data,
    outcome='sales',
    treatment='treated',
    time='post',
    absorb=['firm_id']
)
```

Combine covariates with fixed effects:

```python
results = did.fit(
    data,
    outcome='sales',
    treatment='treated',
    time='post',
    covariates=['size', 'age'],           # Linear controls
    fixed_effects=['industry'],            # Low-dimensional FE (dummies)
    absorb=['firm_id']                     # High-dimensional FE (absorbed)
)
```

### Cluster-Robust Standard Errors

```python
did = DifferenceInDifferences(cluster='state')
results = did.fit(
    data,
    outcome='outcome',
    treatment='treated',
    time='post'
)
```

### Wild Cluster Bootstrap

When you have few clusters (<50), standard cluster-robust SEs are biased. Wild cluster bootstrap provides valid inference even with 5-10 clusters.

```python
# Use wild bootstrap for inference
did = DifferenceInDifferences(
    cluster='state',
    inference='wild_bootstrap',
    n_bootstrap=999,
    bootstrap_weights='rademacher',  # or 'webb' for <10 clusters, 'mammen'
    seed=42
)
results = did.fit(data, outcome='y', treatment='treated', time='post')

# Results include bootstrap-based SE and p-value
print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
print(f"P-value: {results.p_value:.4f}")
print(f"95% CI: {results.conf_int}")
print(f"Inference method: {results.inference_method}")
print(f"Number of clusters: {results.n_clusters}")
```

**Weight types:**
- `'rademacher'` - Default, ±1 with p=0.5, good for most cases
- `'webb'` - 6-point distribution, recommended for <10 clusters
- `'mammen'` - Two-point distribution, alternative to Rademacher

Works with `DifferenceInDifferences` and `TwoWayFixedEffects` estimators.

### Two-Way Fixed Effects (Panel Data)

```python
from diff_diff import TwoWayFixedEffects

twfe = TwoWayFixedEffects()
results = twfe.fit(
    panel_data,
    outcome='outcome',
    treatment='treated',
    time='year',
    unit='firm_id'
)
```

### Multi-Period DiD (Event Study)

For settings with multiple pre- and post-treatment periods:

```python
from diff_diff import MultiPeriodDiD

# Fit with multiple time periods
did = MultiPeriodDiD()
results = did.fit(
    panel_data,
    outcome='sales',
    treatment='treated',
    time='period',
    post_periods=[3, 4, 5],      # Periods 3-5 are post-treatment
    reference_period=0           # Reference period for comparison
)

# View period-specific treatment effects
for period, effect in results.period_effects.items():
    print(f"Period {period}: {effect.effect:.3f} (SE: {effect.se:.3f})")

# View average treatment effect across post-periods
print(f"Average ATT: {results.avg_att:.3f}")
print(f"Average SE: {results.avg_se:.3f}")

# Full summary with all period effects
results.print_summary()
```

Output:
```
================================================================================
            Multi-Period Difference-in-Differences Estimation Results
================================================================================

Observations:                      600
Pre-treatment periods:             3
Post-treatment periods:            3

--------------------------------------------------------------------------------
Average Treatment Effect
--------------------------------------------------------------------------------
Average ATT       5.2000       0.8234      6.315      0.0000
--------------------------------------------------------------------------------
95% Confidence Interval: [3.5862, 6.8138]

Period-Specific Effects:
--------------------------------------------------------------------------------
Period            Effect     Std. Err.     t-stat      P>|t|
--------------------------------------------------------------------------------
3                 4.5000       0.9512      4.731      0.0000***
4                 5.2000       0.8876      5.858      0.0000***
5                 5.9000       0.9123      6.468      0.0000***
--------------------------------------------------------------------------------

Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
================================================================================
```

### Staggered Difference-in-Differences (Callaway-Sant'Anna)

When treatment is adopted at different times by different units, traditional TWFE estimators can be biased. The Callaway-Sant'Anna estimator provides unbiased estimates with staggered adoption.

```python
from diff_diff import CallawaySantAnna

# Panel data with staggered treatment
# 'first_treat' = period when unit was first treated (0 if never treated)
cs = CallawaySantAnna()
results = cs.fit(
    panel_data,
    outcome='sales',
    unit='firm_id',
    time='year',
    first_treat='first_treat',  # 0 for never-treated, else first treatment year
    aggregate='event_study'      # Compute event study effects
)

# View results
results.print_summary()

# Access group-time effects ATT(g,t)
for (group, time), effect in results.group_time_effects.items():
    print(f"Cohort {group}, Period {time}: {effect['effect']:.3f}")

# Event study effects (averaged by relative time)
for rel_time, effect in results.event_study_effects.items():
    print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")

# Convert to DataFrame
df = results.to_dataframe(level='event_study')
```

Output:
```
=====================================================================================
          Callaway-Sant'Anna Staggered Difference-in-Differences Results
=====================================================================================

Total observations:                     600
Treated units:                           35
Control units:                           15
Treatment cohorts:                        3
Time periods:                             8
Control group:                never_treated

-------------------------------------------------------------------------------------
                  Overall Average Treatment Effect on the Treated
-------------------------------------------------------------------------------------
Parameter         Estimate     Std. Err.     t-stat      P>|t|   Sig.
-------------------------------------------------------------------------------------
ATT                 2.5000       0.3521       7.101     0.0000   ***
-------------------------------------------------------------------------------------

95% Confidence Interval: [1.8099, 3.1901]

-------------------------------------------------------------------------------------
                          Event Study (Dynamic) Effects
-------------------------------------------------------------------------------------
Rel. Period       Estimate     Std. Err.     t-stat      P>|t|   Sig.
-------------------------------------------------------------------------------------
0                   2.1000       0.4521       4.645     0.0000   ***
1                   2.5000       0.4123       6.064     0.0000   ***
2                   2.8000       0.5234       5.349     0.0000   ***
-------------------------------------------------------------------------------------

Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
=====================================================================================
```

**When to use Callaway-Sant'Anna vs TWFE:**

| Scenario | Use TWFE | Use Callaway-Sant'Anna |
|----------|----------|------------------------|
| All units treated at same time | ✓ | ✓ |
| Staggered adoption, homogeneous effects | ✓ | ✓ |
| Staggered adoption, heterogeneous effects | ✗ | ✓ |
| Need event study with staggered timing | ✗ | ✓ |
| Fewer than ~20 treated units | ✓ | Depends on design |

**Parameters:**

```python
CallawaySantAnna(
    control_group='never_treated',  # or 'not_yet_treated'
    anticipation=0,                  # Periods before treatment with effects
    estimation_method='dr',          # 'dr', 'ipw', or 'reg'
    alpha=0.05,                      # Significance level
    cluster=None,                    # Column for cluster SEs
    n_bootstrap=0,                   # Bootstrap iterations (0 = analytical SEs)
    bootstrap_weight_type='rademacher',  # 'rademacher', 'mammen', or 'webb'
    seed=None                        # Random seed
)
```

**Multiplier bootstrap for inference:**

With few clusters or when analytical standard errors may be unreliable, use the multiplier bootstrap for valid inference. This implements the approach from Callaway & Sant'Anna (2021).

```python
# Bootstrap inference with 999 iterations
cs = CallawaySantAnna(
    n_bootstrap=999,
    bootstrap_weight_type='rademacher',  # or 'mammen', 'webb'
    seed=42
)
results = cs.fit(
    data,
    outcome='sales',
    unit='firm_id',
    time='year',
    first_treat='first_treat',
    aggregate='event_study'
)

# Access bootstrap results
print(f"Overall ATT: {results.overall_att:.3f}")
print(f"Bootstrap SE: {results.bootstrap_results.overall_att_se:.3f}")
print(f"Bootstrap 95% CI: {results.bootstrap_results.overall_att_ci}")
print(f"Bootstrap p-value: {results.bootstrap_results.overall_att_p_value:.4f}")

# Event study bootstrap inference
for rel_time, se in results.bootstrap_results.event_study_ses.items():
    ci = results.bootstrap_results.event_study_cis[rel_time]
    print(f"e={rel_time}: SE={se:.3f}, 95% CI=[{ci[0]:.3f}, {ci[1]:.3f}]")
```

**Bootstrap weight types:**
- `'rademacher'` - Default, ±1 with p=0.5, good for most cases
- `'mammen'` - Two-point distribution matching first 3 moments
- `'webb'` - Six-point distribution, recommended for very few clusters (<10)

**Covariate adjustment for conditional parallel trends:**

When parallel trends only holds conditional on covariates, use the `covariates` parameter:

```python
# Doubly robust estimation with covariates
cs = CallawaySantAnna(estimation_method='dr')  # 'dr', 'ipw', or 'reg'
results = cs.fit(
    data,
    outcome='sales',
    unit='firm_id',
    time='year',
    first_treat='first_treat',
    covariates=['size', 'age', 'industry'],  # Covariates for conditional PT
    aggregate='event_study'
)
```

### Sun-Abraham Interaction-Weighted Estimator

The Sun-Abraham (2021) estimator provides an alternative to Callaway-Sant'Anna using an interaction-weighted (IW) regression approach. Running both estimators serves as a useful robustness check—when they agree, results are more credible.

```python
from diff_diff import SunAbraham

# Basic usage
sa = SunAbraham()
results = sa.fit(
    panel_data,
    outcome='sales',
    unit='firm_id',
    time='year',
    first_treat='first_treat'  # 0 for never-treated, else first treatment year
)

# View results
results.print_summary()

# Event study effects (by relative time to treatment)
for rel_time, effect in results.event_study_effects.items():
    print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")

# Overall ATT
print(f"Overall ATT: {results.overall_att:.3f} (SE: {results.overall_se:.3f})")

# Cohort weights (how each cohort contributes to each event-time estimate)
for rel_time, weights in results.cohort_weights.items():
    print(f"e={rel_time}: {weights}")
```

**Parameters:**

```python
SunAbraham(
    control_group='never_treated',  # or 'not_yet_treated'
    anticipation=0,                  # Periods before treatment with effects
    alpha=0.05,                      # Significance level
    cluster=None,                    # Column for cluster SEs
    n_bootstrap=0,                   # Bootstrap iterations (0 = analytical SEs)
    bootstrap_weights='rademacher',  # 'rademacher', 'mammen', or 'webb'
    seed=None                        # Random seed
)
```

**Bootstrap inference:**

```python
# Bootstrap inference with 999 iterations
sa = SunAbraham(
    n_bootstrap=999,
    bootstrap_weights='rademacher',
    seed=42
)
results = sa.fit(
    data,
    outcome='sales',
    unit='firm_id',
    time='year',
    first_treat='first_treat'
)

# Access bootstrap results
print(f"Overall ATT: {results.overall_att:.3f}")
print(f"Bootstrap SE: {results.bootstrap_results.overall_att_se:.3f}")
print(f"Bootstrap 95% CI: {results.bootstrap_results.overall_att_ci}")
print(f"Bootstrap p-value: {results.bootstrap_results.overall_att_p_value:.4f}")
```

**When to use Sun-Abraham vs Callaway-Sant'Anna:**

| Aspect | Sun-Abraham | Callaway-Sant'Anna |
|--------|-------------|-------------------|
| Approach | Interaction-weighted regression | 2x2 DiD aggregation |
| Efficiency | More efficient under homogeneous effects | More robust to heterogeneity |
| Weighting | Weights by cohort share at each relative time | Weights by sample size |
| Use case | Robustness check, regression-based inference | Primary staggered DiD estimator |

**Both estimators should give similar results when:**
- Treatment effects are relatively homogeneous across cohorts
- Parallel trends holds

**Running both as robustness check:**

```python
from diff_diff import CallawaySantAnna, SunAbraham

# Callaway-Sant'Anna
cs = CallawaySantAnna()
cs_results = cs.fit(data, outcome='y', unit='unit', time='time', first_treat='first_treat')

# Sun-Abraham
sa = SunAbraham()
sa_results = sa.fit(data, outcome='y', unit='unit', time='time', first_treat='first_treat')

# Compare
print(f"Callaway-Sant'Anna ATT: {cs_results.overall_att:.3f}")
print(f"Sun-Abraham ATT: {sa_results.overall_att:.3f}")

# If results differ substantially, investigate heterogeneity
```

### Triple Difference (DDD)

Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations).

```python
from diff_diff import TripleDifference, triple_difference

# Basic usage
ddd = TripleDifference(estimation_method='dr')  # doubly robust (recommended)
results = ddd.fit(
    data,
    outcome='wages',
    group='policy_state',       # 1=state enacted policy, 0=control state
    partition='female',         # 1=women (affected by policy), 0=men
    time='post'                 # 1=post-policy, 0=pre-policy
)

# View results
results.print_summary()
print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")

# With covariates (properly incorporated, unlike naive DDD)
results = ddd.fit(
    data,
    outcome='wages',
    group='policy_state',
    partition='female',
    time='post',
    covariates=['age', 'education', 'experience']
)
```

**Estimation methods:**

| Method | Description | When to use |
|--------|-------------|-------------|
| `"dr"` | Doubly robust | Recommended. Consistent if either outcome or propensity model is correct |
| `"reg"` | Regression adjustment | Simple outcome regression with full interactions |
| `"ipw"` | Inverse probability weighting | When propensity score model is well-specified |

```python
# Compare estimation methods
for method in ['reg', 'ipw', 'dr']:
    est = TripleDifference(estimation_method=method)
    res = est.fit(data, outcome='y', group='g', partition='p', time='t')
    print(f"{method}: ATT={res.att:.3f} (SE={res.se:.3f})")
```

**Convenience function:**

```python
# One-liner estimation
results = triple_difference(
    data,
    outcome='wages',
    group='policy_state',
    partition='female',
    time='post',
    covariates=['age', 'education'],
    estimation_method='dr'
)
```

**Why use DDD instead of DiD?**

DDD allows for violations of parallel trends that are:
- Group-specific (e.g., economic shocks in treatment states)
- Partition-specific (e.g., trends affecting women everywhere)

As long as these biases are additive, DDD differences them out. The key assumption is that the *differential* trend between eligible and ineligible units would be the same across groups.

### Event Study Visualization

Create publication-ready event study plots:

```python
from diff_diff import plot_event_study, MultiPeriodDiD, CallawaySantAnna, SunAbraham

# From MultiPeriodDiD
did = MultiPeriodDiD()
results = did.fit(data, outcome='y', treatment='treated',
                  time='period', post_periods=[3, 4, 5])
plot_event_study(results, title="Treatment Effects Over Time")

# From CallawaySantAnna (with event study aggregation)
cs = CallawaySantAnna()
results = cs.fit(data, outcome='y', unit='unit', time='period',
                 first_treat='first_treat', aggregate='event_study')
plot_event_study(results, title="Staggered DiD Event Study (CS)")

# From SunAbraham
sa = SunAbraham()
results = sa.fit(data, outcome='y', unit='unit', time='period',
                 first_treat='first_treat')
plot_event_study(results, title="Staggered DiD Event Study (SA)")

# From a DataFrame
df = pd.DataFrame({
    'period': [-2, -1, 0, 1, 2],
    'effect': [0.1, 0.05, 0.0, 2.5, 2.8],
    'se': [0.3, 0.25, 0.0, 0.4, 0.45]
})
plot_event_study(df, reference_period=0)

# With customization
ax = plot_event_study(
    results,
    title="Dynamic Treatment Effects",
    xlabel="Years Relative to Treatment",
    ylabel="Effect on Sales ($1000s)",
    color="#2563eb",
    marker="o",
    shade_pre=True,           # Shade pre-treatment region
    show_zero_line=True,      # Horizontal line at y=0
    show_reference_line=True, # Vertical line at reference period
    figsize=(10, 6),
    show=False                # Don't call plt.show(), return axes
)
```

### Synthetic Difference-in-Differences

Synthetic DiD combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units' pre-treatment outcomes.

```python
from diff_diff import SyntheticDiD

# Fit Synthetic DiD model
sdid = SyntheticDiD()
results = sdid.fit(
    panel_data,
    outcome='gdp_growth',
    treatment='treated',
    unit='state',
    time='year',
    post_periods=[2015, 2016, 2017, 2018]
)

# View results
results.print_summary()
print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")

# Examine unit weights (which control units matter most)
weights_df = results.get_unit_weights_df()
print(weights_df.head(10))

# Examine time weights
time_weights_df = results.get_time_weights_df()
print(time_weights_df)
```

Output:
```
===========================================================================
         Synthetic Difference-in-Differences Estimation Results
===========================================================================

Observations:                      500
Treated units:                       1
Control units:                      49
Pre-treatment periods:               6
Post-treatment periods:              4
Regularization (lambda):        0.0000
Pre-treatment fit (RMSE):       0.1234

---------------------------------------------------------------------------
Parameter         Estimate     Std. Err.     t-stat      P>|t|
---------------------------------------------------------------------------
ATT                 2.5000       0.4521      5.530      0.0000
---------------------------------------------------------------------------

95% Confidence Interval: [1.6139, 3.3861]

---------------------------------------------------------------------------
                   Top Unit Weights (Synthetic Control)
---------------------------------------------------------------------------
  Unit state_12: 0.3521
  Unit state_5: 0.2156
  Unit state_23: 0.1834
  Unit state_8: 0.1245
  Unit state_31: 0.0892
  (8 units with weight > 0.001)

Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
===========================================================================
```

#### When to Use Synthetic DiD Over Vanilla DiD

Use Synthetic DiD instead of standard DiD when:

1. **Few treated units**: When you have only one or a small number of treated units (e.g., a single state passed a policy), standard DiD averages across all controls equally. Synthetic DiD finds the optimal weighted combination of controls.

   ```python
   # Example: California passed a policy, want to estimate its effect
   # Standard DiD would compare CA to the average of all other states
   # Synthetic DiD finds states that together best match CA's pre-treatment trend
   ```

2. **Parallel trends is questionable**: When treated and control groups have different pre-treatment levels or trends, Synthetic DiD can construct a better counterfactual by matching the pre-treatment trajectory.

   ```python
   # Example: A tech hub city vs rural areas
   # Rural areas may not be a good comparison on average
   # Synthetic DiD can weight urban/suburban controls more heavily
   ```

3. **Heterogeneous control units**: When control units are very different from each other, equal weighting (as in standard DiD) is suboptimal.

   ```python
   # Example: Comparing a treated developing country to other countries
   # Some control countries may be much more similar economically
   # Synthetic DiD upweights the most comparable controls
   ```

4. **You want transparency**: Synthetic DiD provides explicit unit weights showing which controls contribute most to the comparison.

   ```python
   # See exactly which units are driving the counterfactual
   print(results.get_unit_weights_df())
   ```

**Key differences from standard DiD:**

| Aspect | Standard DiD | Synthetic DiD |
|--------|--------------|---------------|
| Control weighting | Equal (1/N) | Optimized to match pre-treatment |
| Time weighting | Equal across periods | Can emphasize informative periods |
| N treated required | Can be many | Works with 1 treated unit |
| Parallel trends | Assumed | Partially relaxed via matching |
| Interpretability | Simple average | Explicit weights |

**Parameters:**

```python
SyntheticDiD(
    lambda_reg=0.0,     # Regularization toward uniform weights (0 = no reg)
    zeta=1.0,           # Time weight regularization (higher = more uniform)
    alpha=0.05,         # Significance level
    n_bootstrap=200,    # Bootstrap iterations for SE (0 = placebo-based)
    seed=None           # Random seed for reproducibility
)
```

## Working with Results

### Export Results

```python
# As dictionary
results.to_dict()
# {'att': 3.5, 'se': 1.26, 'p_value': 0.037, ...}

# As DataFrame
df = results.to_dataframe()
```

### Check Significance

```python
if results.is_significant:
    print(f"Effect is significant at {did.alpha} level")

# Get significance stars
print(f"ATT: {results.att}{results.significance_stars}")
# ATT: 3.5000*
```

### Access Full Regression Output

```python
# All coefficients
results.coefficients
# {'const': 9.5, 'treated': 1.0, 'post': 2.5, 'treated:post': 3.5}

# Variance-covariance matrix
results.vcov

# Residuals and fitted values
results.residuals
results.fitted_values

# R-squared
results.r_squared
```

## Checking Assumptions

### Parallel Trends

**Simple slope-based test:**

```python
from diff_diff.utils import check_parallel_trends

trends = check_parallel_trends(
    data,
    outcome='outcome',
    time='period',
    treatment_group='treated'
)

print(f"Treated trend: {trends['treated_trend']:.4f}")
print(f"Control trend: {trends['control_trend']:.4f}")
print(f"Difference p-value: {trends['p_value']:.4f}")
```

**Robust distributional test (Wasserstein distance):**

```python
from diff_diff.utils import check_parallel_trends_robust

results = check_parallel_trends_robust(
    data,
    outcome='outcome',
    time='period',
    treatment_group='treated',
    unit='firm_id',              # Unit identifier for panel data
    pre_periods=[2018, 2019],    # Pre-treatment periods
    n_permutations=1000          # Permutations for p-value
)

print(f"Wasserstein distance: {results['wasserstein_distance']:.4f}")
print(f"Wasserstein p-value: {results['wasserstein_p_value']:.4f}")
print(f"KS test p-value: {results['ks_p_value']:.4f}")
print(f"Parallel trends plausible: {results['parallel_trends_plausible']}")
```

The Wasserstein (Earth Mover's) distance compares the full distribution of outcome changes, not just means. This is more robust to:
- Non-normal distributions
- Heterogeneous effects across units
- Outliers

**Equivalence testing (TOST):**

```python
from diff_diff.utils import equivalence_test_trends

results = equivalence_test_trends(
    data,
    outcome='outcome',
    time='period',
    treatment_group='treated',
    unit='firm_id',
    equivalence_margin=0.5       # Define "practically equivalent"
)

print(f"Mean difference: {results['mean_difference']:.4f}")
print(f"TOST p-value: {results['tost_p_value']:.4f}")
print(f"Trends equivalent: {results['equivalent']}")
```

### Honest DiD Sensitivity Analysis (Rambachan-Roth)

Pre-trends tests have low power and can exacerbate bias. **Honest DiD** (Rambachan & Roth 2023) provides sensitivity analysis showing how robust your results are to violations of parallel trends.

```python
from diff_diff import HonestDiD, MultiPeriodDiD

# First, fit a standard event study
did = MultiPeriodDiD()
event_results = did.fit(
    data,
    outcome='outcome',
    treatment='treated',
    time='period',
    post_periods=[5, 6, 7, 8, 9]
)

# Compute honest bounds with relative magnitudes restriction
# M=1 means post-treatment violations can be up to 1x the worst pre-treatment violation
honest = HonestDiD(method='relative_magnitude', M=1.0)
honest_results = honest.fit(event_results)

print(honest_results.summary())
print(f"Original estimate: {honest_results.original_estimate:.4f}")
print(f"Robust 95% CI: [{honest_results.ci_lb:.4f}, {honest_results.ci_ub:.4f}]")
print(f"Effect robust to violations: {honest_results.is_significant}")
```

**Sensitivity analysis over M values:**

```python
# How do results change as we allow larger violations?
sensitivity = honest.sensitivity_analysis(
    event_results,
    M_grid=[0, 0.5, 1.0, 1.5, 2.0]
)

print(sensitivity.summary())
print(f"Breakdown value: M = {sensitivity.breakdown_M}")
# Breakdown = smallest M where the robust CI includes zero
```

**Breakdown value:**

The breakdown value tells you how robust your conclusion is:

```python
breakdown = honest.breakdown_value(event_results)
if breakdown >= 1.0:
    print("Result holds even if post-treatment violations are as bad as pre-treatment")
else:
    print(f"Result requires violations smaller than {breakdown:.1f}x pre-treatment")
```

**Smoothness restriction (alternative approach):**

```python
# Bounds second differences of trend violations
# M=0 means linear extrapolation of pre-trends
honest_smooth = HonestDiD(method='smoothness', M=0.5)
smooth_results = honest_smooth.fit(event_results)
```

**Visualization:**

```python
from diff_diff import plot_sensitivity, plot_honest_event_study

# Plot sensitivity analysis
plot_sensitivity(sensitivity, title="Sensitivity to Parallel Trends Violations")

# Event study with honest confidence intervals
plot_honest_event_study(event_results, honest_results)
```

### Pre-Trends Power Analysis (Roth 2022)

A passing pre-trends test doesn't mean parallel trends holds—it may just mean the test has low power. **Pre-Trends Power Analysis** (Roth 2022) answers: "What violations could my pre-trends test have detected?"

```python
from diff_diff import PreTrendsPower, MultiPeriodDiD

# First, fit an event study
did = MultiPeriodDiD()
event_results = did.fit(
    data,
    outcome='outcome',
    treatment='treated',
    time='period',
    post_periods=[5, 6, 7, 8, 9]
)

# Analyze pre-trends test power
pt = PreTrendsPower(alpha=0.05, power=0.80)
power_results = pt.fit(event_results)

print(power_results.summary())
print(f"Minimum Detectable Violation (MDV): {power_results.mdv:.4f}")
print(f"Power to detect violations of size MDV: {power_results.power:.1%}")
```

**Key concepts:**

- **Minimum Detectable Violation (MDV)**: Smallest violation magnitude that would be detected with your target power (e.g., 80%). Passing the pre-trends test does NOT rule out violations up to this size.
- **Power**: Probability of detecting a violation of given size if it exists.
- **Violation types**: Linear trend, constant violation, last-period only, or custom patterns.

**Power curve visualization:**

```python
from diff_diff import plot_pretrends_power

# Generate power curve across violation magnitudes
curve = pt.power_curve(event_results)

# Plot the power curve
plot_pretrends_power(curve, title="Pre-Trends Test Power Curve")

# Or from the curve object directly
curve.plot()
```

**Different violation patterns:**

```python
# Linear trend violations (default) - most common assumption
pt_linear = PreTrendsPower(violation_type='linear')

# Constant violation in all pre-periods
pt_constant = PreTrendsPower(violation_type='constant')

# Violation only in the last pre-period (sharp break)
pt_last = PreTrendsPower(violation_type='last_period')

# Custom violation pattern
custom_weights = np.array([0.1, 0.3, 0.6])  # Increasing violations
pt_custom = PreTrendsPower(violation_type='custom', violation_weights=custom_weights)
```

**Combining with HonestDiD:**

Pre-trends power analysis and HonestDiD are complementary:
1. **Pre-trends power** tells you what the test could have detected
2. **HonestDiD** tells you how robust your results are to violations

```python
from diff_diff import HonestDiD, PreTrendsPower

# If MDV is large relative to your estimated effect, be cautious
pt = PreTrendsPower()
power_results = pt.fit(event_results)
sensitivity = pt.sensitivity_to_honest_did(event_results)
print(sensitivity['interpretation'])

# Use HonestDiD for robust inference
honest = HonestDiD(method='relative_magnitude', M=1.0)
honest_results = honest.fit(event_results)
```

### Placebo Tests

Placebo tests help validate the parallel trends assumption by checking whether effects appear where they shouldn't (before treatment or in untreated groups).

**Fake timing test:**

```python
from diff_diff import run_placebo_test

# Test: Is there an effect before treatment actually occurred?
# Actual treatment is at period 3 (post_periods=[3, 4, 5])
# We test if a "fake" treatment at period 1 shows an effect
results = run_placebo_test(
    data,
    outcome='outcome',
    treatment='treated',
    time='period',
    test_type='fake_timing',
    fake_treatment_period=1,  # Pretend treatment was in period 1
    post_periods=[3, 4, 5]    # Actual post-treatment periods
)

print(results.summary())
# If parallel trends hold, placebo_effect should be ~0 and not significant
print(f"Placebo effect: {results.placebo_effect:.3f} (p={results.p_value:.3f})")
print(f"Is significant (bad): {results.is_significant}")
```

**Fake group test:**

```python
# Test: Is there an effect among never-treated units?
# Get some control unit IDs to use as "fake treated"
control_units = data[data['treated'] == 0]['firm_id'].unique()[:5]

results = run_placebo_test(
    data,
    outcome='outcome',
    treatment='treated',
    time='period',
    unit='firm_id',
    test_type='fake_group',
    fake_treatment_group=list(control_units),  # List of control unit IDs
    post_periods=[3, 4, 5]
)
```

**Permutation test:**

```python
# Randomly reassign treatment and compute distribution of effects
# Note: requires binary post indicator (use 'post' column, not 'period')
results = run_placebo_test(
    data,
    outcome='outcome',
    treatment='treated',
    time='post',           # Binary post-treatment indicator
    unit='firm_id',
    test_type='permutation',
    n_permutations=1000,
    seed=42
)

print(f"Original effect: {results.original_effect:.3f}")
print(f"Permutation p-value: {results.p_value:.4f}")
# Low p-value indicates the effect is unlikely to be due to chance
```

**Leave-one-out sensitivity:**

```python
# Test sensitivity to individual treated units
# Note: requires binary post indicator (use 'post' column, not 'period')
results = run_placebo_test(
    data,
    outcome='outcome',
    treatment='treated',
    time='post',           # Binary post-treatment indicator
    unit='firm_id',
    test_type='leave_one_out'
)

# Check if any single unit drives the result
print(results.leave_one_out_effects)  # Effect when each unit is dropped
```

**Run all placebo tests:**

```python
from diff_diff import run_all_placebo_tests

# Comprehensive diagnostic suite
# Note: This function runs fake_timing tests on pre-treatment periods.
# The permutation and leave_one_out tests require a binary post indicator,
# so they may return errors if the data uses multi-period time column.
all_results = run_all_placebo_tests(
    data,
    outcome='outcome',
    treatment='treated',
    time='period',
    unit='firm_id',
    pre_periods=[0, 1, 2],
    post_periods=[3, 4, 5],
    n_permutations=500,
    seed=42
)

for test_name, result in all_results.items():
    if hasattr(result, 'p_value'):
        print(f"{test_name}: p={result.p_value:.3f}, significant={result.is_significant}")
    elif isinstance(result, dict) and 'error' in result:
        print(f"{test_name}: Error - {result['error']}")
```

## API Reference

### DifferenceInDifferences

```python
DifferenceInDifferences(
    robust=True,      # Use HC1 robust standard errors
    cluster=None,     # Column for cluster-robust SEs
    alpha=0.05        # Significance level for CIs
)
```

**Methods:**

| Method | Description |
|--------|-------------|
| `fit(data, outcome, treatment, time, ...)` | Fit the DiD model |
| `summary()` | Get formatted summary string |
| `print_summary()` | Print summary to stdout |
| `get_params()` | Get estimator parameters (sklearn-compatible) |
| `set_params(**params)` | Set estimator parameters (sklearn-compatible) |

**fit() Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `data` | DataFrame | Input data |
| `outcome` | str | Outcome variable column name |
| `treatment` | str | Treatment indicator column (0/1) |
| `time` | str | Post-treatment indicator column (0/1) |
| `formula` | str | R-style formula (alternative to column names) |
| `covariates` | list | Linear control variables |
| `fixed_effects` | list | Categorical FE columns (creates dummies) |
| `absorb` | list | High-dimensional FE (within-transformation) |

### DiDResults

**Attributes:**

| Attribute | Description |
|-----------|-------------|
| `att` | Average Treatment effect on the Treated |
| `se` | Standard error of ATT |
| `t_stat` | T-statistic |
| `p_value` | P-value for H0: ATT = 0 |
| `conf_int` | Tuple of (lower, upper) confidence bounds |
| `n_obs` | Number of observations |
| `n_treated` | Number of treated units |
| `n_control` | Number of control units |
| `r_squared` | R-squared of regression |
| `coefficients` | Dictionary of all coefficients |
| `is_significant` | Boolean for significance at alpha |
| `significance_stars` | String of significance stars |

**Methods:**

| Method | Description |
|--------|-------------|
| `summary(alpha)` | Get formatted summary string |
| `print_summary(alpha)` | Print summary to stdout |
| `to_dict()` | Convert to dictionary |
| `to_dataframe()` | Convert to pandas DataFrame |

### MultiPeriodDiD

```python
MultiPeriodDiD(
    robust=True,      # Use HC1 robust standard errors
    cluster=None,     # Column for cluster-robust SEs
    alpha=0.05        # Significance level for CIs
)
```

**fit() Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `data` | DataFrame | Input data |
| `outcome` | str | Outcome variable column name |
| `treatment` | str | Treatment indicator column (0/1) |
| `time` | str | Time period column (multiple values) |
| `post_periods` | list | List of post-treatment period values |
| `covariates` | list | Linear control variables |
| `fixed_effects` | list | Categorical FE columns (creates dummies) |
| `absorb` | list | High-dimensional FE (within-transformation) |
| `reference_period` | any | Omitted period for time dummies |

### MultiPeriodDiDResults

**Attributes:**

| Attribute | Description |
|-----------|-------------|
| `period_effects` | Dict mapping periods to PeriodEffect objects |
| `avg_att` | Average ATT across post-treatment periods |
| `avg_se` | Standard error of average ATT |
| `avg_t_stat` | T-statistic for average ATT |
| `avg_p_value` | P-value for average ATT |
| `avg_conf_int` | Confidence interval for average ATT |
| `n_obs` | Number of observations |
| `pre_periods` | List of pre-treatment periods |
| `post_periods` | List of post-treatment periods |

**Methods:**

| Method | Description |
|--------|-------------|
| `get_effect(period)` | Get PeriodEffect for specific period |
| `summary(alpha)` | Get formatted summary string |
| `print_summary(alpha)` | Print summary to stdout |
| `to_dict()` | Convert to dictionary |
| `to_dataframe()` | Convert to pandas DataFrame |

### PeriodEffect

**Attributes:**

| Attribute | Description |
|-----------|-------------|
| `period` | Time period identifier |
| `effect` | Treatment effect estimate |
| `se` | Standard error |
| `t_stat` | T-statistic |
| `p_value` | P-value |
| `conf_int` | Confidence interval |
| `is_significant` | Boolean for significance at 0.05 |
| `significance_stars` | String of significance stars |

### SyntheticDiD

```python
SyntheticDiD(
    lambda_reg=0.0,     # L2 regularization for unit weights
    zeta=1.0,           # Regularization for time weights
    alpha=0.05,         # Significance level for CIs
    n_bootstrap=200,    # Bootstrap iterations for SE
    seed=None           # Random seed for reproducibility
)
```

**fit() Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `data` | DataFrame | Panel data |
| `outcome` | str | Outcome variable column name |
| `treatment` | str | Treatment indicator column (0/1) |
| `unit` | str | Unit identifier column |
| `time` | str | Time period column |
| `post_periods` | list | List of post-treatment period values |
| `covariates` | list | Covariates to residualize out |

### SyntheticDiDResults

**Attributes:**

| Attribute | Description |
|-----------|-------------|
| `att` | Average Treatment effect on the Treated |
| `se` | Standard error (bootstrap or placebo-based) |
| `t_stat` | T-statistic |
| `p_value` | P-value |
| `conf_int` | Confidence interval |
| `n_obs` | Number of observations |
| `n_treated` | Number of treated units |
| `n_control` | Number of control units |
| `unit_weights` | Dict mapping control unit IDs to weights |
| `time_weights` | Dict mapping pre-treatment periods to weights |
| `pre_periods` | List of pre-treatment periods |
| `post_periods` | List of post-treatment periods |
| `pre_treatment_fit` | RMSE of synthetic vs treated in pre-period |
| `placebo_effects` | Array of placebo effect estimates |

**Methods:**

| Method | Description |
|--------|-------------|
| `summary(alpha)` | Get formatted summary string |
| `print_summary(alpha)` | Print summary to stdout |
| `to_dict()` | Convert to dictionary |
| `to_dataframe()` | Convert to pandas DataFrame |
| `get_unit_weights_df()` | Get unit weights as DataFrame |
| `get_time_weights_df()` | Get time weights as DataFrame |

### SunAbraham

```python
SunAbraham(
    control_group='never_treated',  # or 'not_yet_treated'
    anticipation=0,           # Periods of anticipation effects
    alpha=0.05,               # Significance level for CIs
    cluster=None,             # Column for cluster-robust SEs
    n_bootstrap=0,            # Bootstrap iterations (0 = analytical SEs)
    bootstrap_weights='rademacher',  # 'rademacher', 'mammen', or 'webb'
    seed=None                 # Random seed
)
```

**fit() Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `data` | DataFrame | Panel data |
| `outcome` | str | Outcome variable column name |
| `unit` | str | Unit identifier column |
| `time` | str | Time period column |
| `first_treat` | str | Column with first treatment period (0 for never-treated) |
| `covariates` | list | Covariate column names |
| `min_pre_periods` | int | Minimum pre-treatment periods to include |
| `min_post_periods` | int | Minimum post-treatment periods to include |

### SunAbrahamResults

**Attributes:**

| Attribute | Description |
|-----------|-------------|
| `event_study_effects` | Dict mapping relative time to effect info |
| `overall_att` | Overall average treatment effect |
| `overall_se` | Standard error of overall ATT |
| `overall_t_stat` | T-statistic for overall ATT |
| `overall_p_value` | P-value for overall ATT |
| `overall_conf_int` | Confidence interval for overall ATT |
| `cohort_weights` | Dict mapping relative time to cohort weights |
| `groups` | List of treatment cohorts |
| `time_periods` | List of all time periods |
| `n_obs` | Total number of observations |
| `n_treated_units` | Number of ever-treated units |
| `n_control_units` | Number of never-treated units |
| `is_significant` | Boolean for significance at alpha |
| `significance_stars` | String of significance stars |
| `bootstrap_results` | SABootstrapResults (if bootstrap enabled) |

**Methods:**

| Method | Description |
|--------|-------------|
| `summary(alpha)` | Get formatted summary string |
| `print_summary(alpha)` | Print summary to stdout |
| `to_dataframe(level)` | Convert to DataFrame ('event_study' or 'cohort') |

### TripleDifference

```python
TripleDifference(
    estimation_method='dr',   # 'dr' (doubly robust), 'reg', or 'ipw'
    robust=True,              # Use HC1 robust standard errors
    cluster=None,             # Column for cluster-robust SEs
    alpha=0.05,               # Significance level for CIs
    pscore_trim=0.01          # Propensity score trimming threshold
)
```

**fit() Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `data` | DataFrame | Input data |
| `outcome` | str | Outcome variable column name |
| `group` | str | Group indicator column (0/1): 1=treated group |
| `partition` | str | Partition/eligibility indicator column (0/1): 1=eligible |
| `time` | str | Time indicator column (0/1): 1=post-treatment |
| `covariates` | list | Covariate column names for adjustment |

### TripleDifferenceResults

**Attributes:**

| Attribute | Description |
|-----------|-------------|
| `att` | Average Treatment effect on the Treated |
| `se` | Standard error of ATT |
| `t_stat` | T-statistic |
| `p_value` | P-value for H0: ATT = 0 |
| `conf_int` | Tuple of (lower, upper) confidence bounds |
| `n_obs` | Total number of observations |
| `n_treated_eligible` | Obs in treated group & eligible partition |
| `n_treated_ineligible` | Obs in treated group & ineligible partition |
| `n_control_eligible` | Obs in control group & eligible partition |
| `n_control_ineligible` | Obs in control group & ineligible partition |
| `estimation_method` | Method used ('dr', 'reg', or 'ipw') |
| `group_means` | Dict of cell means for diagnostics |
| `pscore_stats` | Propensity score statistics (IPW/DR only) |
| `is_significant` | Boolean for significance at alpha |
| `significance_stars` | String of significance stars |

**Methods:**

| Method | Description |
|--------|-------------|
| `summary(alpha)` | Get formatted summary string |
| `print_summary(alpha)` | Print summary to stdout |
| `to_dict()` | Convert to dictionary |
| `to_dataframe()` | Convert to pandas DataFrame |

### HonestDiD

```python
HonestDiD(
    method='relative_magnitude',  # 'relative_magnitude' or 'smoothness'
    M=None,               # Restriction parameter (default: 1.0 for RM, 0.0 for SD)
    alpha=0.05,           # Significance level for CIs
    l_vec=None            # Linear combination vector for target parameter
)
```

**fit() Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `results` | MultiPeriodDiDResults | Results from MultiPeriodDiD.fit() |
| `M` | float | Restriction parameter (overrides constructor value) |

**Methods:**

| Method | Description |
|--------|-------------|
| `fit(results, M)` | Compute bounds for given event study results |
| `sensitivity_analysis(results, M_grid)` | Compute bounds over grid of M values |
| `breakdown_value(results, tol)` | Find smallest M where CI includes zero |

### HonestDiDResults

**Attributes:**

| Attribute | Description |
|-----------|-------------|
| `original_estimate` | Point estimate under parallel trends |
| `lb` | Lower bound of identified set |
| `ub` | Upper bound of identified set |
| `ci_lb` | Lower bound of robust confidence interval |
| `ci_ub` | Upper bound of robust confidence interval |
| `ci_width` | Width of robust CI |
| `M` | Restriction parameter used |
| `method` | Restriction method ('relative_magnitude' or 'smoothness') |
| `alpha` | Significance level |
| `is_significant` | True if robust CI excludes zero |

**Methods:**

| Method | Description |
|--------|-------------|
| `summary()` | Get formatted summary string |
| `to_dict()` | Convert to dictionary |
| `to_dataframe()` | Convert to pandas DataFrame |

### SensitivityResults

**Attributes:**

| Attribute | Description |
|-----------|-------------|
| `M_grid` | Array of M values analyzed |
| `results` | List of HonestDiDResults for each M |
| `breakdown_M` | Smallest M where CI includes zero (None if always significant) |

**Methods:**

| Method | Description |
|--------|-------------|
| `summary()` | Get formatted summary string |
| `plot(ax)` | Plot sensitivity analysis |
| `to_dataframe()` | Convert to pandas DataFrame |

### PreTrendsPower

```python
PreTrendsPower(
    alpha=0.05,           # Significance level for pre-trends test
    power=0.80,           # Target power for MDV calculation
    violation_type='linear',  # 'linear', 'constant', 'last_period', 'custom'
    violation_weights=None    # Custom weights (required if violation_type='custom')
)
```

**fit() Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `results` | MultiPeriodDiDResults | Results from event study |
| `M` | float | Specific violation magnitude to evaluate |

**Methods:**

| Method | Description |
|--------|-------------|
| `fit(results, M)` | Compute power analysis for given event study |
| `power_at(results, M)` | Compute power for specific violation magnitude |
| `power_curve(results, M_grid, n_points)` | Compute power across range of M values |
| `sensitivity_to_honest_did(results)` | Compare with HonestDiD analysis |

### PreTrendsPowerResults

**Attributes:**

| Attribute | Description |
|-----------|-------------|
| `power` | Power to detect the specified violation |
| `mdv` | Minimum detectable violation at target power |
| `violation_magnitude` | Violation magnitude (M) tested |
| `violation_type` | Type of violation pattern |
| `alpha` | Significance level |
| `target_power` | Target power level |
| `n_pre_periods` | Number of pre-treatment periods |
| `test_statistic` | Expected test statistic under violation |
| `critical_value` | Critical value for pre-trends test |
| `noncentrality` | Non-centrality parameter |
| `is_informative` | Heuristic check if test is informative |
| `power_adequate` | Whether power meets target |

**Methods:**

| Method | Description |
|--------|-------------|
| `summary()` | Get formatted summary string |
| `print_summary()` | Print summary to stdout |
| `to_dict()` | Convert to dictionary |
| `to_dataframe()` | Convert to pandas DataFrame |

### PreTrendsPowerCurve

**Attributes:**

| Attribute | Description |
|-----------|-------------|
| `M_values` | Array of violation magnitudes |
| `powers` | Array of power values |
| `mdv` | Minimum detectable violation |
| `alpha` | Significance level |
| `target_power` | Target power level |
| `violation_type` | Type of violation pattern |

**Methods:**

| Method | Description |
|--------|-------------|
| `plot(ax, show_mdv, show_target)` | Plot power curve |
| `to_dataframe()` | Convert to DataFrame with M and power columns |

### Data Preparation Functions

#### generate_did_data

```python
generate_did_data(
    n_units=100,          # Number of units
    n_periods=4,          # Number of time periods
    treatment_effect=5.0, # True ATT
    treatment_fraction=0.5,  # Fraction treated
    treatment_period=2,   # First post-treatment period
    unit_fe_sd=2.0,       # Unit fixed effect std dev
    time_trend=0.5,       # Linear time trend
    noise_sd=1.0,         # Idiosyncratic noise std dev
    seed=None             # Random seed
)
```

Returns DataFrame with columns: `unit`, `period`, `treated`, `post`, `outcome`, `true_effect`.

#### make_treatment_indicator

```python
make_treatment_indicator(
    data,                 # Input DataFrame
    column,               # Column to create treatment from
    treated_values=None,  # Value(s) indicating treatment
    threshold=None,       # Numeric threshold for treatment
    above_threshold=True, # If True, >= threshold is treated
    new_column='treated'  # Output column name
)
```

#### make_post_indicator

```python
make_post_indicator(
    data,                  # Input DataFrame
    time_column,           # Time/period column
    post_periods=None,     # Specific post-treatment period(s)
    treatment_start=None,  # First post-treatment period
    new_column='post'      # Output column name
)
```

#### wide_to_long

```python
wide_to_long(
    data,                  # Wide-format DataFrame
    value_columns,         # List of time-varying columns
    id_column,             # Unit identifier column
    time_name='period',    # Name for time column
    value_name='value',    # Name for value column
    time_values=None       # Values for time periods
)
```

#### balance_panel

```python
balance_panel(
    data,                  # Panel DataFrame
    unit_column,           # Unit identifier column
    time_column,           # Time period column
    method='inner',        # 'inner', 'outer', or 'fill'
    fill_value=None        # Value for filling (if method='fill')
)
```

#### validate_did_data

```python
validate_did_data(
    data,                  # DataFrame to validate
    outcome,               # Outcome column name
    treatment,             # Treatment column name
    time,                  # Time/post column name
    unit=None,             # Unit column (for panel validation)
    raise_on_error=True    # Raise ValueError or return dict
)
```

Returns dict with `valid`, `errors`, `warnings`, and `summary` keys.

#### summarize_did_data

```python
summarize_did_data(
    data,                  # Input DataFrame
    outcome,               # Outcome column name
    treatment,             # Treatment column name
    time,                  # Time/post column name
    unit=None              # Unit column (optional)
)
```

Returns DataFrame with summary statistics by treatment-time cell.

#### create_event_time

```python
create_event_time(
    data,                  # Panel DataFrame
    time_column,           # Calendar time column
    treatment_time_column, # Column with treatment timing
    new_column='event_time' # Output column name
)
```

#### aggregate_to_cohorts

```python
aggregate_to_cohorts(
    data,                  # Unit-level panel data
    unit_column,           # Unit identifier column
    time_column,           # Time period column
    treatment_column,      # Treatment indicator column
    outcome,               # Outcome variable column
    covariates=None        # Additional columns to aggregate
)
```

#### rank_control_units

```python
rank_control_units(
    data,                          # Panel data in long format
    unit_column,                   # Unit identifier column
    time_column,                   # Time period column
    outcome_column,                # Outcome variable column
    treatment_column=None,         # Treatment indicator column (0/1)
    treated_units=None,            # Explicit list of treated unit IDs
    pre_periods=None,              # Pre-treatment periods (default: first half)
    covariates=None,               # Covariate columns for matching
    outcome_weight=0.7,            # Weight for outcome trend similarity (0-1)
    covariate_weight=0.3,          # Weight for covariate distance (0-1)
    exclude_units=None,            # Units to exclude from control pool
    require_units=None,            # Units that must appear in output
    n_top=None,                    # Return only top N controls
    suggest_treatment_candidates=False,  # Identify treatment candidates
    n_treatment_candidates=5,      # Number of treatment candidates
    lambda_reg=0.0                 # Regularization for synthetic weights
)
```

Returns DataFrame with columns: `unit`, `quality_score`, `outcome_trend_score`, `covariate_score`, `synthetic_weight`, `pre_trend_rmse`, `is_required`.

## Requirements

- Python >= 3.9
- numpy >= 1.20
- pandas >= 1.3
- scipy >= 1.7

## Development

```bash
# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black diff_diff tests
ruff check diff_diff tests
```

## References

This library implements methods from the following scholarly works:

### Difference-in-Differences

- **Ashenfelter, O., & Card, D. (1985).** "Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs." *The Review of Economics and Statistics*, 67(4), 648-660. [https://doi.org/10.2307/1924810](https://doi.org/10.2307/1924810)

- **Card, D., & Krueger, A. B. (1994).** "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania." *The American Economic Review*, 84(4), 772-793. [https://www.jstor.org/stable/2118030](https://www.jstor.org/stable/2118030)

- **Angrist, J. D., & Pischke, J.-S. (2009).** *Mostly Harmless Econometrics: An Empiricist's Companion*. Princeton University Press. Chapter 5: Differences-in-Differences.

### Two-Way Fixed Effects

- **Wooldridge, J. M. (2010).** *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press.

- **Imai, K., & Kim, I. S. (2021).** "On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data." *Political Analysis*, 29(3), 405-415. [https://doi.org/10.1017/pan.2020.33](https://doi.org/10.1017/pan.2020.33)

### Robust Standard Errors

- **White, H. (1980).** "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity." *Econometrica*, 48(4), 817-838. [https://doi.org/10.2307/1912934](https://doi.org/10.2307/1912934)

- **MacKinnon, J. G., & White, H. (1985).** "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." *Journal of Econometrics*, 29(3), 305-325. [https://doi.org/10.1016/0304-4076(85)90158-7](https://doi.org/10.1016/0304-4076(85)90158-7)

- **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011).** "Robust Inference With Multiway Clustering." *Journal of Business & Economic Statistics*, 29(2), 238-249. [https://doi.org/10.1198/jbes.2010.07136](https://doi.org/10.1198/jbes.2010.07136)

### Wild Cluster Bootstrap

- **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008).** "Bootstrap-Based Improvements for Inference with Clustered Errors." *The Review of Economics and Statistics*, 90(3), 414-427. [https://doi.org/10.1162/rest.90.3.414](https://doi.org/10.1162/rest.90.3.414)

- **Webb, M. D. (2014).** "Reworking Wild Bootstrap Based Inference for Clustered Errors." Queen's Economics Department Working Paper No. 1315. [https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf](https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf)

- **MacKinnon, J. G., & Webb, M. D. (2018).** "The Wild Bootstrap for Few (Treated) Clusters." *The Econometrics Journal*, 21(2), 114-135. [https://doi.org/10.1111/ectj.12107](https://doi.org/10.1111/ectj.12107)

### Placebo Tests and DiD Diagnostics

- **Bertrand, M., Duflo, E., & Mullainathan, S. (2004).** "How Much Should We Trust Differences-in-Differences Estimates?" *The Quarterly Journal of Economics*, 119(1), 249-275. [https://doi.org/10.1162/003355304772839588](https://doi.org/10.1162/003355304772839588)

### Synthetic Control Method

- **Abadie, A., & Gardeazabal, J. (2003).** "The Economic Costs of Conflict: A Case Study of the Basque Country." *The American Economic Review*, 93(1), 113-132. [https://doi.org/10.1257/000282803321455188](https://doi.org/10.1257/000282803321455188)

- **Abadie, A., Diamond, A., & Hainmueller, J. (2010).** "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." *Journal of the American Statistical Association*, 105(490), 493-505. [https://doi.org/10.1198/jasa.2009.ap08746](https://doi.org/10.1198/jasa.2009.ap08746)

- **Abadie, A., Diamond, A., & Hainmueller, J. (2015).** "Comparative Politics and the Synthetic Control Method." *American Journal of Political Science*, 59(2), 495-510. [https://doi.org/10.1111/ajps.12116](https://doi.org/10.1111/ajps.12116)

### Synthetic Difference-in-Differences

- **Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021).** "Synthetic Difference-in-Differences." *American Economic Review*, 111(12), 4088-4118. [https://doi.org/10.1257/aer.20190159](https://doi.org/10.1257/aer.20190159)

### Triple Difference (DDD)

- **Ortiz-Villavicencio, M., & Sant'Anna, P. H. C. (2025).** "Better Understanding Triple Differences Estimators." *Working Paper*. [https://arxiv.org/abs/2505.09942](https://arxiv.org/abs/2505.09942)

  This paper shows that common DDD implementations (taking the difference between two DiDs, or applying three-way fixed effects regressions) are generally invalid when identification requires conditioning on covariates. The `TripleDifference` class implements their regression adjustment, inverse probability weighting, and doubly robust estimators.

- **Gruber, J. (1994).** "The Incidence of Mandated Maternity Benefits." *American Economic Review*, 84(3), 622-641. [https://www.jstor.org/stable/2118071](https://www.jstor.org/stable/2118071)

  Classic paper introducing the Triple Difference design for policy evaluation.

- **Olden, A., & Møen, J. (2022).** "The Triple Difference Estimator." *The Econometrics Journal*, 25(3), 531-553. [https://doi.org/10.1093/ectj/utac010](https://doi.org/10.1093/ectj/utac010)

### Parallel Trends and Pre-Trend Testing

- **Roth, J. (2022).** "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." *American Economic Review: Insights*, 4(3), 305-322. [https://doi.org/10.1257/aeri.20210236](https://doi.org/10.1257/aeri.20210236)

- **Lakens, D. (2017).** "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses." *Social Psychological and Personality Science*, 8(4), 355-362. [https://doi.org/10.1177/1948550617697177](https://doi.org/10.1177/1948550617697177)

### Honest DiD / Sensitivity Analysis

The `HonestDiD` module implements sensitivity analysis methods for relaxing the parallel trends assumption:

- **Rambachan, A., & Roth, J. (2023).** "A More Credible Approach to Parallel Trends." *The Review of Economic Studies*, 90(5), 2555-2591. [https://doi.org/10.1093/restud/rdad018](https://doi.org/10.1093/restud/rdad018)

  This paper introduces the "Honest DiD" framework implemented in our `HonestDiD` class:
  - **Relative Magnitudes (ΔRM)**: Bounds post-treatment violations by a multiple of observed pre-treatment violations
  - **Smoothness (ΔSD)**: Bounds on second differences of trend violations, allowing for linear extrapolation of pre-trends
  - **Breakdown Analysis**: Finding the smallest violation magnitude that would overturn conclusions
  - **Robust Confidence Intervals**: Valid inference under partial identification

- **Roth, J., & Sant'Anna, P. H. C. (2023).** "When Is Parallel Trends Sensitive to Functional Form?" *Econometrica*, 91(2), 737-747. [https://doi.org/10.3982/ECTA19402](https://doi.org/10.3982/ECTA19402)

  Discusses functional form sensitivity in parallel trends assumptions, relevant to understanding when smoothness restrictions are appropriate.

### Multi-Period and Staggered Adoption

- **Callaway, B., & Sant'Anna, P. H. C. (2021).** "Difference-in-Differences with Multiple Time Periods." *Journal of Econometrics*, 225(2), 200-230. [https://doi.org/10.1016/j.jeconom.2020.12.001](https://doi.org/10.1016/j.jeconom.2020.12.001)

- **Sant'Anna, P. H. C., & Zhao, J. (2020).** "Doubly Robust Difference-in-Differences Estimators." *Journal of Econometrics*, 219(1), 101-122. [https://doi.org/10.1016/j.jeconom.2020.06.003](https://doi.org/10.1016/j.jeconom.2020.06.003)

- **Sun, L., & Abraham, S. (2021).** "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects." *Journal of Econometrics*, 225(2), 175-199. [https://doi.org/10.1016/j.jeconom.2020.09.006](https://doi.org/10.1016/j.jeconom.2020.09.006)

- **de Chaisemartin, C., & D'Haultfœuille, X. (2020).** "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *American Economic Review*, 110(9), 2964-2996. [https://doi.org/10.1257/aer.20181169](https://doi.org/10.1257/aer.20181169)

- **Goodman-Bacon, A. (2021).** "Difference-in-Differences with Variation in Treatment Timing." *Journal of Econometrics*, 225(2), 254-277. [https://doi.org/10.1016/j.jeconom.2021.03.014](https://doi.org/10.1016/j.jeconom.2021.03.014)

### Power Analysis

- **Bloom, H. S. (1995).** "Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs." *Evaluation Review*, 19(5), 547-556. [https://doi.org/10.1177/0193841X9501900504](https://doi.org/10.1177/0193841X9501900504)

- **Burlig, F., Preonas, L., & Woerman, M. (2020).** "Panel Data and Experimental Design." *Journal of Development Economics*, 144, 102458. [https://doi.org/10.1016/j.jdeveco.2020.102458](https://doi.org/10.1016/j.jdeveco.2020.102458)

  Essential reference for power analysis in panel DiD designs. Discusses how serial correlation (ICC) affects power and provides formulas for panel data settings.

- **Djimeu, E. W., & Houndolo, D.-G. (2016).** "Power Calculation for Causal Inference in Social Science: Sample Size and Minimum Detectable Effect Determination." *Journal of Development Effectiveness*, 8(4), 508-527. [https://doi.org/10.1080/19439342.2016.1244555](https://doi.org/10.1080/19439342.2016.1244555)

### General Causal Inference

- **Imbens, G. W., & Rubin, D. B. (2015).** *Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction*. Cambridge University Press.

- **Cunningham, S. (2021).** *Causal Inference: The Mixtape*. Yale University Press. [https://mixtape.scunning.com/](https://mixtape.scunning.com/)

## License

MIT License

