# GitHub Code Retrieval MCP Server

An MCP (Model Context Protocol) server that gives Claude the ability to search and analyze any GitHub repository using semantic code search.

**Works with:** Claude Desktop, Claude Code CLI, Cursor IDE

**Fully local** - No API keys required. All embeddings run locally.

## Quick Start (2 Steps)

### Step 1: Install

```bash
pip install github-code-retrieval
```

### Step 2: Add to Claude Config

**For Claude Desktop** - Add to your config file:

| OS | Config File Location |
|----|---------------------|
| macOS | `~/Library/Application Support/Claude/claude_desktop_config.json` |
| Windows | `%APPDATA%\Claude\claude_desktop_config.json` |
| Linux | `~/.config/claude/claude_desktop_config.json` |

```json
{
  "mcpServers": {
    "github-code-retrieval": {
      "command": "github-code-retrieval"
    }
  }
}
```

**For Claude Code CLI** - Add to `~/.claude.json` or `.claude.json` in your project:

```json
{
  "mcpServers": {
    "github-code-retrieval": {
      "command": "github-code-retrieval"
    }
  }
}
```

**Restart Claude** after adding the config. Done!

## Usage

Once configured, just ask Claude to analyze any GitHub repo:

```
"Analyze https://github.com/pallets/flask and explain how routing works"

"How does authentication work in https://github.com/tiangolo/fastapi?"

"Find the database connection code in https://github.com/django/django"
```

Claude will automatically use the tool to:
1. Clone the repository
2. Index all code files locally
3. Search for relevant code snippets
4. Return the most relevant code to answer your question

## Features

- **No API Keys** - Everything runs locally on your machine
- **Semantic Search** - Finds code by meaning, not just keywords
- **Fast** - First query indexes the repo (~30s), subsequent queries are instant (~0.01s)
- **Private Repos** - Works with private repos if you have git credentials set up
- **20+ Languages** - Python, JavaScript, TypeScript, Go, Rust, Java, and more

## How It Works

```
┌─────────────────────────────────────────────────────────────────┐
│  You: "How does routing work in this Flask app?"               │
│                              │                                  │
│                              ▼                                  │
│  Claude calls: analyze_github_repo(                            │
│      repo_url="https://github.com/pallets/flask",              │
│      question="How does routing work?"                         │
│  )                                                             │
│                              │                                  │
│                              ▼                                  │
│  MCP Server (runs locally on your machine):                    │
│  ┌───────────────────────────────────────────────────────────┐ │
│  │ 1. Clone repo (cached for 24h)                            │ │
│  │ 2. Index code with local embeddings (sentence-transformers)│ │
│  │ 3. Semantic search (ChromaDB)                             │ │
│  │ 4. Return top relevant code snippets                      │ │
│  └───────────────────────────────────────────────────────────┘ │
│                              │                                  │
│                              ▼                                  │
│  Claude receives code snippets and explains them to you        │
└─────────────────────────────────────────────────────────────────┘
```

## Tool Schema

The MCP server exposes one tool: `analyze_github_repo`

**Input:**
```json
{
  "repo_url": "https://github.com/owner/repo",
  "question": "How does authentication work?",
  "top_k": 10
}
```

**Output:**
```json
{
  "success": true,
  "repository": {
    "url": "https://github.com/owner/repo",
    "owner": "owner",
    "name": "repo",
    "total_files_indexed": 45,
    "total_chunks": 230
  },
  "query": "How does authentication work?",
  "code_snippets": [
    {
      "file_path": "src/auth/handler.py",
      "content": "def authenticate(request):\n    ...",
      "start_line": 45,
      "end_line": 78,
      "language": "python",
      "relevance_score": 0.89
    }
  ],
  "total_results": 10
}
```

## Configuration (Optional)

Create a `.env` file to customize settings:

```bash
# Embedding model (default: all-MiniLM-L6-v2)
EMBEDDING_MODEL=all-MiniLM-L6-v2

# Device for embeddings: cpu, cuda (NVIDIA), mps (Apple Silicon)
EMBEDDING_DEVICE=cpu

# Storage paths
VECTOR_STORE_PATH=./data/vector_db
REPO_STORAGE_PATH=./data/repos

# Indexing settings
CHUNK_SIZE=1500
CHUNK_OVERLAP=200
```

## GPU Acceleration

For faster embeddings:

**NVIDIA GPU:**
```bash
EMBEDDING_DEVICE=cuda
```

**Apple Silicon:**
```bash
EMBEDDING_DEVICE=mps
```

## Private Repositories

The tool uses your system's git credentials. To access private repos:

```bash
# Option 1: GitHub CLI (recommended)
gh auth login

# Option 2: SSH key
# Just make sure your SSH key is set up for GitHub

# Option 3: Credential helper
git config --global credential.helper store
```

## Alternative: HTTP Server Mode

If you want to run the server on a network (e.g., shared team server):

```bash
# Start HTTP server
github-code-retrieval-http

# Server runs on http://0.0.0.0:8000
# API docs at http://localhost:8000/docs
```

Then configure clients to connect:
```json
{
  "mcpServers": {
    "github-code-retrieval": {
      "command": "http",
      "args": ["http://SERVER_IP:8000"]
    }
  }
}
```

## Supported Languages

Python, JavaScript, TypeScript, JSX, TSX, Java, Kotlin, Go, Rust, C, C++, C#, Ruby, PHP, Swift, Scala, SQL, GraphQL, YAML, JSON, TOML, Markdown, and more.

## Troubleshooting

### "Command not found: github-code-retrieval"
Make sure pip's bin directory is in your PATH:
```bash
# Find where pip installs scripts
python -m site --user-base
# Add that path + /bin to your PATH
```

Or use the full path:
```json
{
  "mcpServers": {
    "github-code-retrieval": {
      "command": "python",
      "args": ["-m", "mcp_stdio_server"]
    }
  }
}
```

### Slow first query
The first query for a repo takes ~30-60 seconds to:
1. Clone the repository
2. Load the embedding model (~400MB)
3. Index all code files

Subsequent queries are fast (~0.01s).

### Out of memory
Large repos may need more RAM. Try:
- Reducing `CHUNK_SIZE` in `.env`
- Using a smaller embedding model

## Development

```bash
# Clone the repo
git clone https://github.com/yourusername/github-code-retrieval
cd github-code-retrieval

# Create virtual environment
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest
```

## License

MIT
