Metadata-Version: 2.4
Name: sqlshell
Version: 0.2.3
Summary: A powerful SQL shell with GUI interface for data analysis
Author: SQLShell Team
License-Expression: MIT
Project-URL: Homepage, https://github.com/oyvinrog/SQLShell
Keywords: sql,data analysis,gui,duckdb
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: PyQt6>=6.4.0
Requires-Dist: duckdb>=0.9.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pyarrow>=14.0.1
Requires-Dist: fastparquet>=2023.10.1
Requires-Dist: xlrd>=2.0.1
Requires-Dist: deltalake
Requires-Dist: Pillow>=10.0.0
Requires-Dist: shap
Requires-Dist: xgboost
Requires-Dist: scikit-learn
Requires-Dist: matplotlib>=3.10.0
Requires-Dist: scipy>=1.15.0

# SQLShell

<div align="center">

<img src="https://github.com/oyvinrog/SQLShell/raw/main/sqlshell_logo.png" alt="SQLShell Logo" width="180" height="auto">

**A powerful SQL shell with GUI interface for data analysis**

<img src="https://github.com/oyvinrog/SQLShell/raw/main/sqlshell_demo.png" alt="SQLShell Interface" width="80%" height="auto">

</div>

## 🚀 Key Features

- **Interactive SQL Interface** - Rich syntax highlighting for enhanced query writing
- **Context-Aware Suggestions** - Intelligent SQL autocompletion based on query context and schema
- **DuckDB Integration** - Powerful analytical queries powered by DuckDB
- **Multi-Format Support** - Import and query Excel (.xlsx, .xls), CSV, and Parquet files effortlessly
- **Modern UI** - Clean, tabular results display with intuitive controls
- **Table Preview** - Quick view of imported data tables
- **Test Data Generation** - Built-in sample data for testing and learning
- **Multiple Views** - Support for multiple concurrent table views
- **Productivity Tools** - Streamlined workflow with keyboard shortcuts (e.g., Ctrl+Enter for query execution)
- **Explain Column** - Analyze relationships between data columns directly from query results

## 📦 Installation

### Using pip (Recommended)

```bash
pip install sqlshell
```

### Linux Setup with Virtual Environment

```bash
# Create and activate virtual environment
python3 -m venv ~/.venv/sqlshell
source ~/.venv/sqlshell/bin/activate

# Install SQLShell
pip install sqlshell

# Configure shell alias
echo 'alias sqls="~/.venv/sqlshell/bin/sqls"' >> ~/.bashrc  # or ~/.zshrc for Zsh
source ~/.bashrc  # or source ~/.zshrc
```

### Development Installation

```bash
git clone https://github.com/oyvinrog/SQLShell.git
cd SQLShell
pip install -e .
```

## 🎯 Getting Started

1. **Launch the Application**
   ```bash
   sqls
   ```
   
   If the `sqls` command doesn't work (e.g., "access denied" on Windows), you can use this alternative:
   ```bash
   python -c "import sqlshell; sqlshell.start()"
   ```

2. **Database Connection**
   - SQLShell automatically connects to a local DuckDB database named 'pool.db'

3. **Working with Data Files**
   - Click "Load Files" to select your Excel, CSV, or Parquet files
   - File contents are loaded as queryable SQL tables
   - Query using standard SQL syntax

4. **Query Execution**
   - Enter SQL in the editor
   - Execute using Ctrl+Enter or the "Execute" button
   - View results in the structured output panel

5. **Test Data**
   - Load sample test data using the "Test" button for quick experimentation
   
6. **Using Context-Aware Suggestions**
   - Press Ctrl+Space to manually trigger suggestions
   - Suggestions appear automatically as you type
   - Context-specific suggestions based on your query position:
     - After SELECT: columns and functions
     - After FROM/JOIN: tables with join conditions
     - After WHERE: columns with appropriate operators
     - Inside functions: relevant column suggestions

7. **Column Analysis**
   - Right-click on column headers in the results pane
   - Access features like sorting, filtering, and the "Explain Column" analysis tool

## 📝 Query Examples

### Basic Join Operation
```sql
SELECT *
FROM sample_sales_data cd
INNER JOIN product_catalog pc ON pc.productid = cd.productid
LIMIT 3;
```

### Multi-Statement Queries
```sql
-- Create a temporary view
CREATE OR REPLACE TEMPORARY VIEW test_v AS
SELECT *
FROM sample_sales_data cd
INNER JOIN product_catalog pc ON pc.productid = cd.productid;

-- Query the view
SELECT DISTINCT productid
FROM test_v;
```

## 💡 Pro Tips

- Use temporary views for complex query organization
- Leverage keyboard shortcuts for efficient workflow
- Explore the multi-format support for various data sources
- Create multiple tabs for parallel query development
- The context-aware suggestions learn from your query patterns
- Type `table_name.` to see all columns for a specific table
- After JOIN keyword, the system suggests relevant tables and join conditions

## 📊 Table Profiling

SQLShell provides powerful table profiling tools to help you understand your data. These tools are accessible from the left-hand side table menu via right-click on any table:

<div align="center">
<img src="https://github.com/oyvinrog/SQLShell/raw/main/column_profiler.png" alt="Column Profiler" width="80%" height="auto">
</div>

### Table Profiling Options

Right-click on any table in the left panel to access these profiling tools:

1. **Analyze Column Importance**
   - Calculates entropy for each column to identify the most information-rich fields
   - Visualizes column importance with color-coded bars
   - Helps identify which columns are most useful for analysis and modeling

2. **Profile Table Structure**
   - Identifies candidate keys and functional dependencies
   - Discovers potential primary keys and relationships between columns
   - Suggests possible normalized table structures
   - Helps understand table organization and optimize schema design

3. **Analyze Column Distributions**
   - Generates histograms, box plots, and other statistical visualizations
   - Identifies the distribution pattern of each column (normal, uniform, etc.)
   - Provides detailed statistics like min, max, mean, median, skewness
   - Helps identify outliers and understand data patterns

4. **Analyze Foreign Keys** (multi-table selection)
   - Select multiple tables by holding Ctrl or Shift while clicking
   - Right-click to access "Analyze Foreign Keys Between X Tables"
   - Automatically discovers potential foreign key relationships between tables
   - Identifies matching columns that could serve as join conditions
   - Helps understand cross-table relationships in your data model

### Using the Profilers

1. **Access the Profilers**
   - Right-click on any table in the schema browser
   - Select the desired profiling option from the context menu
   - For foreign key analysis, select multiple tables first

2. **Interpret the Results**
   - Each profiler provides interactive visualizations
   - Hover over charts for detailed information
   - Switch between different views using the tabs
   - Sort and filter results to focus on specific columns

3. **Benefits**
   - Quickly understand data composition without writing queries
   - Identify data quality issues and outliers
   - Discover relationships between columns
   - Make informed decisions about query optimization

The table profiling tools are invaluable for exploratory data analysis, helping you gain insights before writing complex queries.

## 📊 Column Analysis

SQLShell provides powerful tools to analyze individual columns directly from your query results:

### Explain Column Feature

The "Explain Column" feature helps you understand the relationships between columns in your query results:

1. **How to Access**:
   - Right-click on any column header in the query results table
   - Select "Explain Column" from the context menu

2. **What It Does**:
   - Analyzes the selected column's relationship with other columns in the result set
   - Identifies correlations and dependencies between columns
   - Provides visualizations to help understand the column's importance and distribution

3. **Benefits**:
   - Quickly identify which columns are most related to your target column
   - Discover hidden patterns and relationships in your data
   - Make data-driven decisions without writing complex analytical queries

### Multivariate Analysis Feature

The Column Profiler now offers in-depth multivariate analysis to explore relationships between columns:

1. **How to Access**:
   - In the Column Profiler, double-click on any feature in the importance table
   - A detailed visualization window will appear showing the relationship between the selected feature and the target column

2. **Smart Visualizations**:
   - Automatically selects the most appropriate visualization based on data types:
     - **Numeric vs. Numeric**: Scatter plot with regression line
     - **Categorical vs. Numeric**: Bar chart showing average values
     - **Numeric vs. Categorical**: Box plot showing distribution
     - **Categorical vs. Categorical**: Heatmap showing relationship strength

3. **Benefits**:
   - Gain deeper insights into how features relate to your target variable
   - Understand which features have strong predictive relationships
   - Identify patterns and outliers in multivariate relationships
   - Make better decisions about feature selection for analysis and modeling

This feature is particularly useful for exploratory data analysis, helping you understand your data structure and relationships on the fly.

## 📋 Requirements

- Python 3.8 or higher
- Dependencies (automatically installed):
  - PyQt6 ≥ 6.4.0
  - DuckDB ≥ 0.9.0
  - Pandas ≥ 2.0.0
  - NumPy ≥ 1.24.0
  - openpyxl ≥ 3.1.0 (Excel support)
  - pyarrow ≥ 14.0.1 (Parquet support)
  - fastparquet ≥ 2023.10.1 (Alternative parquet engine)
  - xlrd ≥ 2.0.1 (Support for older .xls files)

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.
