Metadata-Version: 2.4
Name: plannotate-python
Version: 1.2.7
Summary: Automated annotation of engineered plasmids using sequence similarity searches
Project-URL: Homepage, https://github.com/McClain-Thiel/pLannotate
Project-URL: Repository, https://github.com/McClain-Thiel/pLannotate
Project-URL: Bug Tracker, https://github.com/McClain-Thiel/pLannotate/issues
Project-URL: Documentation, https://github.com/McClain-Thiel/pLannotate#readme
Project-URL: Original Project, https://github.com/mmcguffi/pLannotate
Author-email: McClain Thiel <mcclain.thiel@gmail.com>
License: GPL-3.0-or-later
License-File: LICENSE
Keywords: annotation,bioinformatics,molecular-biology,plasmid,sequence-analysis
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Requires-Dist: biopython>=1.70
Requires-Dist: pandas>=1.0.0
Requires-Dist: platformdirs>=3.0
Requires-Dist: pyyaml>=5.0.0
Requires-Dist: reportlab>=3.5.0
Requires-Dist: tqdm>=4.0.0
Provides-Extra: dev
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest-mock; extra == 'dev'
Requires-Dist: pytest>=6.0; extra == 'dev'
Description-Content-Type: text/markdown

# pLannotate-python

[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
![Python 3](https://img.shields.io/badge/Language-Python_3-steelblue.svg)
[![DOI](https://zenodo.org/badge/DOI/10.1093/nar/gkab374.svg)](https://doi.org/10.1093/nar/gkab374)

<img width="400" alt="pLannotate_logo" src="plannotate/data/images/pLannotate.png">

**Automated annotation of engineered plasmids**

`pLannotate-python` is a Python package for automatically annotating engineered plasmids using sequence similarity searches against curated databases. Fast, parallel processing with automatic database setup.

## Features

- **Fast, parallel annotation**: Uses Diamond, BLAST, and Infernal concurrently
- **Multiple databases**: Protein (fpbase, swissprot), nucleotide (snapgene), RNA (Rfam)
- **Circular plasmid support**: Handles origin-crossing features
- **Automatic database setup**: Downloads and configures databases (~900MB)
- **Flexible output**: GenBank files, CSV reports, or pandas DataFrames

## Installation

```bash
# Install with uv (recommended)
uv add plannotate-python

# Or with pip
pip install plannotate-python
```

### External Tools Required

```bash
# macOS (Homebrew)
brew install diamond blast infernal ripgrep

# Linux (conda/mamba)
conda install -c bioconda diamond blast infernal ripgrep

# Ubuntu/Debian
sudo apt install diamond-aligner ncbi-blast+ infernal ripgrep
```

### SSL Certificate Fix (macOS)
If you encounter SSL certificate errors during database download:
```bash
# Replace X.Y with your Python version (e.g., 3.11)
open "/Applications/Python X.Y/Install Certificates.command"
```

## Quick Start

**Automatic Database Setup:**
```python
import os
os.environ["PLANNOTATE_AUTO_DOWNLOAD"] = "1"  # Enable auto-download
from plannotate.annotate import annotate

# First run will download databases (~900MB with progress bars)
sequence = "ATGGTGAGCAAGGGCGAGGAGCTG"  # Your plasmid sequence
result = annotate(sequence, linear=False)  # False for circular plasmids

print(f"Found {len(result)} annotations")
print(result[['Feature', 'Type', 'qstart', 'qend', 'pident']].head())
```

**Manual Database Setup:**
```python
from plannotate.resources import download_db
download_db()  # Downloads with progress bars and SSL error handling
```

**Generate GenBank Files:**
```python
from plannotate.resources import get_gbk
gbk_content = get_gbk(result, sequence, is_linear=False)
with open("my_plasmid.gbk", "w") as f:
    f.write(gbk_content)
```

## Configuration

**Environment Variables:**
- `PLANNOTATE_AUTO_DOWNLOAD=1` - Auto-download databases without prompting
- `PLANNOTATE_DB_DIR=/path` - Custom database directory
- `PLANNOTATE_SKIP_DB_DOWNLOAD=1` - Skip database downloads entirely

**Core Functions:**
- `annotate(sequence, linear=False)` - Annotate DNA sequence
- `get_gbk(annotations, sequence)` - Generate GenBank file
- `download_db()` - Download databases with progress bars

## Troubleshooting

**SSL Certificate Errors:** Run the SSL certificate fix command above
**Empty Results:** Sequence may not match database features  
**Tool Errors:** Ensure external tools are installed and in PATH

## Citation

If you use `pLannotate-python` in your research, please cite the original pLannotate paper:

> McGuffin, M.J., Thiel, M.C., Pineda, D.L. et al. pLannotate: automated annotation of engineered plasmids. *Nucleic Acids Research* (2021).

## License

This project is licensed under the GPL v3 License - see the `LICENSE` file for details.

## Links

- **Original pLannotate**: https://github.com/mmcguffi/pLannotate
- **Web server**: http://plannotate.barricklab.org/
- **This Fork**: https://github.com/McClain-Thiel/pLannotate
- **Issues**: https://github.com/McClain-Thiel/pLannotate/issues