Metadata-Version: 2.4
Name: code-tokenizer
Version: 1.0.0
Summary: A professional CLI tool for counting AI model tokens in code projects
Project-URL: Documentation, https://github.com/org-hex/code-tokenizer#readme
Author: Code Tokenizer Contributors
Maintainer: Code Tokenizer Contributors
License-Expression: MIT
License-File: LICENSE
Keywords: ai,claude,cli,code-analysis,gpt,llm,openai,tokens
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Requires-Dist: click>=8.3.0
Requires-Dist: rich>=14.2.0
Requires-Dist: tiktoken>=0.12.0
Provides-Extra: dev
Requires-Dist: black>=24.0.0; extra == 'dev'
Requires-Dist: flake8>=7.0.0; extra == 'dev'
Requires-Dist: isort>=5.13.0; extra == 'dev'
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: safety>=3.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# Code Tokenizer

**Language:** [English](README.md) | [中文](README_CN.md)

![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)
![Python Version](https://img.shields.io/badge/python-3.6%2B-blue.svg)
![PyPI Version](https://img.shields.io/pypi/v/code-tokenizer.svg)
![Build Status](https://img.shields.io/badge/build-passing-brightgreen.svg)
![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)

A simple command-line tool to quickly calculate AI model Token usage for entire projects, helping you determine if your project is suitable for direct AI analysis.

Modern LLM models (like GPT-4 Turbo, Claude-4) have context lengths of 200k+, which can load entire project codebases at once. If your project's total code Token count is less than 200k, you can submit the entire project to LLM models for analysis at once, rather than having the model read files one by one. This tool provides a one-click feature to package all code into a single file, making this process easy.

## 🎯 Features

- **Token Statistics** - Accurately calculate Token counts for your entire project's code across different AI models
- **Context Analysis** - Display the percentage of each AI model's context window used by your project to determine if it exceeds limits
- **One-Click Packaging** - Merge all code files into a single file for easy one-time submission to AI models
- **Smart Filtering** - Automatically exclude irrelevant files (node_modules, .git, etc.) and keep only core code

## 📦 Installation

```bash
pip install code-tokenizer
```

## 🚀 Usage

```bash
# Count Tokens for current project
code-tokenizer

# Count Tokens for specified project
code-tokenizer /path/to/project

# Count and package all code into a single file
code-tokenizer --package my_project.txt

# Show only the top 5 largest files
code-tokenizer --max-show 5
```

## 📊 Example Output

![Code Tokenizer Output](docs/images/screenshot.png)

## 🔧 Supported File Types

Go, Python, JavaScript, TypeScript, Java, C/C++, Swift, Kotlin, PHP, Ruby, Vue, HTML, CSS, YAML, JSON, XML, SQL, Shell scripts, Markdown, and more

## ⚠️ Disclaimer

This project is developed based on [OpenAI tiktoken](https://github.com/openai/tiktoken). Token count results are for reference only and may vary due to tokenizer differences across AI models.

**Privacy Protection:** This project runs locally only and does not upload any code information to external servers, protecting your code privacy and security.

## 📄 License

MIT License