# Apify Dataset Loader

[Apify](https://apify.com/) is a cloud platform for web scraping and data extraction,
which provides an [ecosystem](https://apify.com/store) of more than a thousand
ready-made apps called _Actors_ for various scraping, crawling, and extraction use cases.

This loader loads documents from an existing [Apify dataset](https://docs.apify.com/platform/storage/dataset).

## Usage

In this example, we’ll load a dataset generated by
the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor,
which can deeply crawl websites such as documentation, knowledge bases, help centers,
or blogs, and extract text content from the web pages.
The extracted text then can be fed to a vector index or language model like GPT
in order to answer questions from it.

To use this loader, you need to have a (free) Apify account
and set your [Apify API token](https://console.apify.com/account/integrations) in the code.

```python
from dotagent import download_loader
from dotagent.schema import DocumentNode

# Converts a single record from the Apify dataset to the LlamaIndex format
def tranform_dataset_item(item):
    return DocumentNode(
        text=item.get("text"),
        extra_info={
            "url": item.get("url"),
        },
    )

ApifyDataset = download_loader("ApifyDataset")

reader = ApifyDataset("<Your Apify API token>"))
documents = reader.load_data(dataset_id="<Apify Dataset ID>", dataset_mapping_function=tranform_dataset_item)
```
