Metadata-Version: 2.1
Name: python-fastdata
Version: 0.0.1
Summary: Easiest and fastest way to 1B synthetic tokens
Home-page: https://github.com/AnswerDotAI/fastdata
Author: ncoop57
Author-email: nc@answer.ai
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: claudette
Requires-Dist: fastcore
Requires-Dist: ratelimit
Requires-Dist: tqdm
Provides-Extra: dev
Requires-Dist: black ; extra == 'dev'
Requires-Dist: datasets ; extra == 'dev'
Requires-Dist: ipykernel ; extra == 'dev'
Requires-Dist: nbdev ; extra == 'dev'

# fastdata


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Developer Guide

If you are new to using `nbdev` here are some useful pointers to get you
started.

### Install fastdata in Development mode

``` sh
# make sure fastdata package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to fastdata
$ nbdev_prepare
```

## Usage

### Installation

Install latest from the GitHub
[repository](https://github.com/AnswerDotAI/fastdata):

``` sh
$ pip install git+https://github.com/AnswerDotAI/fastdata.git
```

or from [conda](https://anaconda.org/AnswerDotAI/fastdata)

``` sh
$ conda install -c AnswerDotAI fastdata
```

or from [pypi](https://pypi.org/project/fastdata/)

``` sh
$ pip install fastdata
```

### Documentation

Documentation can be found hosted on this GitHub
[repository](https://github.com/AnswerDotAI/fastdata)’s
[pages](https://AnswerDotAI.github.io/fastdata/). Additionally you can
find package manager specific guidelines on
[conda](https://anaconda.org/AnswerDotAI/fastdata) and
[pypi](https://pypi.org/project/fastdata/) respectively.

## How to use

First you need to define the structure of the data you want to generate.
`instructor`, which is the library that fastdata uses to generate data,
requires you to define the schema of the data you want to generate. This
is done using pydantic models.

``` python
from pydantic import BaseModel, Field

class Translation(BaseModel):
    english: str = Field(description="An english phrase")
    german: str = Field(description="An equivalent german phrase that is a translation of the english phrase")
```

Next, you need to define the prompt that will be used to generate the
data and any inputs you want to pass to the prompt.

``` python
prompt_template = """\
Generate English and German translations on the following topic:
{topic}
"""

inputs = [{"topic": "Otters are cute"}, {"topic": "I love programming"}]
```

Finally, we can generate some data with fastdata.

> [!NOTE]
>
> We only support Anthropic models at the moment. Therefore, make sure
> you have an API key for the model you want to use and the proper
> environment variables set or pass the api key to the
> [`FastData`](https://AnswerDotAI.github.io/fastdata/core.html#fastdata)
> class `FastData(api_key="sk-ant-api03-...")`.

``` python
from fastdata.core import FastData

import pprint

# Create a pretty printer object with custom settings
pp = pprint.PrettyPrinter(indent=4, width=100, compact=False)

fast_data = FastData()
translations = fast_data.generate(
    prompt_template=prompt_template,
    inputs=inputs,
    response_model=Translation,
    model="claude-3-haiku-20240307"
)

# Pretty print the translations
print("Translations:")
pp.pprint(translations)
```

    100%|██████████| 2/2 [00:00<00:00,  2.21it/s]

    Translations:
    [   {'english': 'Otters are cute', 'german': 'Otter sind süß'},
        {'english': 'I love programming', 'german': 'Ich liebe das Programmieren'}]

If you’d like to see how best to generate data with fastdata, check out
our blog post [here](https://www.answer.ai/blog/introducing-fastdata)
and some of the examples in the
[examples](https://github.com/AnswerDotAI/fastdata/tree/main/examples)
directory.
