Metadata-Version: 2.1
Name: kb-python
Version: 0.28.0
Summary: Python wrapper around kallisto | bustools for scRNA-seq analysis
Home-page: https://github.com/pachterlab/kb_python
Author: Kyung Hoi (Joseph) Min
Author-email: phoenixter96@gmail.com
Maintainer: Pachter Lab
Maintainer-email: lpachter@caltech.edu
License: BSD
Description: # kb-python
        ![github version](https://img.shields.io/badge/Version-0.28.0-informational)
        [![pypi version](https://img.shields.io/pypi/v/kb-python)](https://pypi.org/project/kb-python/0.28.0/)
        ![python versions](https://img.shields.io/pypi/pyversions/kb_python)
        ![status](https://github.com/pachterlab/kb_python/workflows/CI/badge.svg)
        [![codecov](https://codecov.io/gh/pachterlab/kb_python/branch/master/graph/badge.svg)](https://codecov.io/gh/pachterlab/kb_python)
        [![pypi downloads](https://img.shields.io/pypi/dm/kb-python)](https://pypi.org/project/kb-python/)
        [![docs](https://readthedocs.org/projects/kb-python/badge/?version=latest)](https://kb-python.readthedocs.io/en/latest/?badge=latest)
        [![license](https://img.shields.io/pypi/l/kb-python)](LICENSE)
        
        `kb-python` is a python package for processing single-cell RNA-sequencing. It wraps the [`kallisto` | `bustools`](https://www.kallistobus.tools) single-cell RNA-seq command line tools in order to unify multiple processing workflows. 
        
        `kb-python` was developed by [Kyung Hoi (Joseph) Min](https://twitter.com/lioscro) and [A. Sina Booeshaghi](https://twitter.com/sinabooeshaghi) while in [Lior Pachter](https://twitter.com/lpachter)'s lab at Caltech. If you use `kb-python` in a publication please [cite*](#cite):
        ```
        Melsted, P., Booeshaghi, A.S., et al. 
        Modular, efficient and constant-memory single-cell RNA-seq preprocessing. 
        Nat Biotechnol  39, 813–818 (2021). 
        https://doi.org/10.1038/s41587-021-00870-2
        ```
        
        ## Installation
        The latest release can be installed with
        
        ```bash
        pip install kb-python
        ```
        
        The development version can be installed with
        ```bash
        pip install git+https://github.com/pachterlab/kb_python
        ```
        
        There are no prerequisite packages to install. The `kallisto` and `bustools` binaries are included with the package.
        
        ## Usage
        
        `kb`  consists of four subcommands
        ```bash
        $ kb
        usage: kb [-h] [--list] <CMD> ...
        positional arguments:
          <CMD>
            info      Display package and citation information
            compile   Compile `kallisto` and `bustools` binaries from source
            ref       Build a kallisto index and transcript-to-gene mapping
            count     Generate count matrices from a set of single-cell FASTQ files
        ```
        
        ### `kb ref`: generate a pseudoalignment index
        
        The `kb ref` command takes in a species annotation file (GTF) and associated genome (FASTA) and builds a species-specific index for pseudoalignment of reads. This must be run before `kb count`. Internally, `kb ref` extracts the coding regions from the GTF and builds a transcriptome FASTA that is then indexed with `kallisto index`.
        
        ```bash
        kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa <GENOME> <GENOME_ANNOTATION>
        ```
        -  `<GENOME>` refers to a genome file (FASTA).
        	- For example, the zebrafish genome is hosted by [ensembl](https://uswest.ensembl.org/Danio_rerio/Info/Index) and can be downloaded [here](http://ftp.ensembl.org/pub/release-107/fasta/danio_rerio/dna/Danio_rerio.GRCz11.dna.primary_assembly.fa.gz)
        - `<GENOME_ANNOTATION>` refers to a genome annotation file (GTF)
        	- For example, the zebrafish genome annotation file is hosted by [ensembl](https://uswest.ensembl.org/Danio_rerio/Info/Index) and can be downloaded [here](http://ftp.ensembl.org/pub/release-107/gtf/danio_rerio/Danio_rerio.GRCz11.107.gtf.gz)
        - **Note:** The latest genome annotation and genome file for every species on ensembl can be found with the [`gget`](https://github.com/pachterlab/gget) command-line tool.
        
        Prebuilt indices are available at https://github.com/pachterlab/kallisto-transcriptome-indices
        
        #### Examples
        ```bash
        # Index the transcriptome from genome FASTA (genome.fa.gz) and GTF (annotation.gtf.gz)
        $ kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa genome.fa.gz annotation.gtf.gz
        # An example for downloading a prebuilt reference for mouse
        $ kb ref -d mouse -i index.idx -g t2g.txt
        ```
        ---
        ### `kb count`: pseudoalign and count reads
        
        The `kb count` command takes in the pseudoalignment index (built with `kb ref`) and sequencing reads generated by a sequencing machine to generate a count matrix. Internally, `kb count` runs numerous [`kallisto`](https://github.com/pachterlab/kallisto) and [`bustools`](https://github.com/BUStools/bustools/) commands comprising a single-cell workflow for the specified technology that generated the sequencing reads.
        
        ```bash
        kb  count -i index.idx -g t2g.txt -o out/ -x <TECHNOLOGY> <FASTQ FILE[s]>
        ```
        -  `<TECHNOLOGY>` refers to the assay that generated the sequencing reads.
        	- For a list of supported assays run `kb --list`
        - `<FASTQ FILE[s]>` refers to the a list of FASTQ files generated 
        	- Different assays will have a different number of FASTQ files 
        	- Different assays will place the different features in different FASTQ files
        		- For example, sequencing a 10xv3 library on a NextSeq Illumina sequencer usually results in two FASTQ files. 
        		- The `R1.fastq.gz` file (colloquially called "read 1") contains a 16 basepair cell barcode and a 12 basepair unique molecular identifier (UMI). 
        		- The `R2.fastq.gz` file (colloquially called "read 2") contains the cDNA associated with the cell barcode-UMI pair in read 1.
        
        #### Examples
        ```bash
        # Quantify 10xv3 reads read1.fastq.gz and read2.fastq.gz
        $ kb count -i index.idx -g t2g.txt -o out/ -x 10xv3 read1.fastq.gz read2.fastq.gz
        ```
        ---
        ### `kb info`: display package and citation information
        
        The `kb info` command prints out package information including the version of `kb-python`, `kallisto`, and `bustools` along with their installation location.
        
        ```bash
        $ kb info
        kb_python 0.28.0 ...
        kallisto: 0.50.1 ...
        bustools: 0.43.1 ...
        ...
        ```
        ---
        ### `kb compile`: compile `kallisto` and `bustools` binaries from source
        The `kb compile` command grabs the latest `kallisto` and `bustools` source and compiles the binaries. **Note**: this is not required to run `kb-python`.
        
        ## Use cases
        `kb-python` facilitates fast and uniform pre-processing of single-cell sequencing data to answer relevant research questions. 
        ```bash
        $ pip install kb-python gget ffq
        
        # Goal: quantify publicly available scRNAseq data
        $ kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa $(gget ref --ftp -w dna,gtf homo_sapiens)
        $ kb count -i index.idx -g t2g.txt -x 10xv3 -o out $(ffq --ftp SRR10668798 | jq -r '.[] | .url' | tr '\n' ' ')
        # -> count matrix in out/ folder
        
        # Goal: quantify 10xv2 feature barcode data, feature_barcodes.txt is a tab-delimited file
        # containing barcode_sequence<tab>barcode_name
        $ kb ref -i index.idx -g f2g.txt -f1 features.fa --workflow kite feature_barcodes.txt
        $ kb count -i index.idx -g f2b.txt -x 10xv2 -o out/ --workflow kite --h5ad R1.fastq.gz R2.fastq.gz
        # -> count matrix in out/ folder
        ```
        Submitted by [@sbooeshaghi](https://github.com/sbooeshaghi/).
        
        Do you have a cool use case for `kb-python`? Submit a PR (including the goal, code snippet, and your username) so that we can feature it here.
        
        ## Tutorials
        For a list of tutorials that use `kb-python` please see [https://www.kallistobus.tools/](https://www.kallistobus.tools/).
        
        ## Documentation
        Developer documentation is hosted on [Read the Docs](https://kb-python.readthedocs.io/en/latest/).
        
        ## Contributing
        Thank you for wanting to improve `kb-python`! If you have believe you've found a bug, please submit an issue. 
        
        If you have a new feature you'd like to add to `kb-python` please create a pull request. Pull requests should contain a message detailing the exact changes made, the reasons for the change, and tests that check for the correctness of those changes.
        
        # Cite
        If you use `kb-python` in a publication, please cite the following papers:
        
        `kb-python`  & `bustools` 
        ```tex
        @article{melsted2021modular,
          title={\href{https://doi.org/10.1038/s41587-021-00870-2}{Modular, efficient and constant-memory single-cell RNA-seq preprocessing}},
          author={Melsted, P{\'a}ll and Booeshaghi, A. Sina and Liu, Lauren and Gao, Fan and Lu, Lambda and Min, Kyung Hoi Joseph and da Veiga Beltrame, Eduardo and Hj{\"o}rleifsson, Kristj{\'a}n Eldj{\'a}rn and Gehring, Jase and Pachter, Lior},
          author+an={1=first;2=first,highlight},
          journal={Nature biotechnology},
          year={2021},
          month={4},
          day={1},
          doi={https://doi.org/10.1038/s41587-021-00870-2}
        }
        ```
        
        `kallisto` 
        ```tex
        @article{bray2016near,
          title={Near-optimal probabilistic RNA-seq quantification},
          author={Bray, Nicolas L and Pimentel, Harold and Melsted, P{\'a}ll and Pachter, Lior},
          journal={Nature biotechnology},
          volume={34},
          number={5},
          pages={525--527},
          year={2016},
          publisher={Nature Publishing Group}
        }
        ```
        
        `BUS` format
        ```tex
        @article{melsted2019barcode,
          title={The barcode, UMI, set format and BUStools},
          author={Melsted, P{\'a}ll and Ntranos, Vasilis and Pachter, Lior},
          journal={Bioinformatics},
          volume={35},
          number={21},
          pages={4472--4473},
          year={2019},
          publisher={Oxford University Press}
        }
        ```
        
        `kb-python` was inspired by Sten Linnarsson’s `loompy fromfq` command (http://linnarssonlab.org/loompy/kallisto/index.html)
        
Keywords: kallisto bustools
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Utilities
Requires-Python: >=3.7
Description-Content-Type: text/markdown
