Metadata-Version: 2.1
Name: python-fastdata
Version: 0.0.4
Summary: Easiest and fastest way to 1B synthetic tokens
Home-page: https://github.com/AnswerDotAI/fastdata
Author: ncoop57
Author-email: nc@answer.ai
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: claudette
Requires-Dist: fastcore
Requires-Dist: ratelimit
Requires-Dist: tqdm
Requires-Dist: huggingface-hub
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Requires-Dist: datasets; extra == "dev"
Requires-Dist: ipykernel; extra == "dev"
Requires-Dist: nbdev; extra == "dev"

# fastdata


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

`fastdata` is a minimal library for generating synthetic data for
training deep learning models. For example, below is how you can
generate a dataset to train a language model to translate from English
to Spanish.

First you need to define the structure of the data you want to generate.
`claudette`, which is the library that fastdata uses to generate data,
requires you to define the schema of the data you want to generate.

``` python
from fastcore.utils import *
```

``` python
class Translation():
    "Translation from an English phrase to a Spanish phrase"
    def __init__(self, english: str, spanish: str): store_attr()
    def __repr__(self): return f"{self.english} ➡ *{self.spanish}*"

Translation("Hello, how are you today?", "Hola, ¿cómo estás hoy?")
```

    Hello, how are you today? ➡ *Hola, ¿cómo estás hoy?*

Next, you need to define the prompt that will be used to generate the
data and any inputs you want to pass to the prompt.

``` python
prompt_template = """\
Generate English and Spanish translations on the following topic:
<topic>{topic}</topic>
"""

inputs = [{"topic": "Otters are cute"}, {"topic": "I love programming"}]
```

Finally, we can generate some data with fastdata.

> [!NOTE]
>
> We only support Anthropic models at the moment. Therefore, make sure
> you have an API key for the model you want to use and the proper
> environment variables set or pass the api key to the
> [`FastData`](https://AnswerDotAI.github.io/fastdata/core.html#fastdata)
> class `FastData(api_key="sk-ant-api03-...")`.

``` python
from fastdata.core import FastData
```

``` python
fast_data = FastData(model="claude-3-haiku-20240307")
translations = fast_data.generate(
    prompt_template=prompt_template,
    inputs=inputs,
    schema=Translation,
)
```

    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.57it/s]

``` python
from IPython.display import Markdown
```

``` python
Markdown("\n".join(f'- {t}' for t in translations))
```

- I love programming ➡ *Me encanta la programación*
- Otters are cute ➡ *Las nutrias son lindas*

### Installation

Install latest from the GitHub
[repository](https://github.com/AnswerDotAI/fastdata):

``` sh
$ pip install git+https://github.com/AnswerDotAI/fastdata.git
```

or from [pypi](https://pypi.org/project/fastdata/)

``` sh
$ pip install python-fastdata
```

If you’d like to see how best to generate data with fastdata, check out
our blog post [here](https://www.answer.ai/blog/introducing-fastdata)
and some of the examples in the
[examples](https://github.com/AnswerDotAI/fastdata/tree/main/examples)
directory.

## Developer Guide

If you are new to using `nbdev` here are some useful pointers to get you
started.

### Install fastdata in Development mode

``` sh
# make sure fastdata package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to fastdata
$ nbdev_prepare
```
