Metadata-Version: 2.3
Name: salesforce-data-customcode
Version: 0.1.2
Summary: Data Cloud Custom Code SDK
License: Apache-2.0
Requires-Python: >=3.10,<3.12
Classifier: Development Status :: 4 - Beta
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: click (>=8.1.8,<9.0.0)
Requires-Dist: loguru (>=0.7.3,<0.8.0)
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: pydantic (>=1.8.2,<3.0.0)
Requires-Dist: pyspark
Requires-Dist: pyyaml (>=6.0,<7.0)
Requires-Dist: salesforce-cdp-connector
Requires-Dist: setuptools_scm (>=7.1.0,<8.0.0)
Description-Content-Type: text/markdown

# Data Cloud Custom Code SDK

<img src="https://img.shields.io/badge/version-0.1.0-blue" alt="license">

This package provides a development kit for creating custom data transformations in [Data Cloud](https://www.salesforce.com/data/). It allows you to write your own data processing logic in Python while leveraging Data Cloud's infrastructure for data access and running data transformations, mapping execution into Data Cloud data structures like [Data Model Objects](https://help.salesforce.com/s/articleView?id=data.c360_a_data_model_objects.htm&type=5) and [Data Lake Objects](https://help.salesforce.com/s/articleView?id=sf.c360_a_data_lake_objects.htm&language=en_US&type=5).

More specifically, this codebase gives you ability to test code locally before pushing to Data Cloud's remote execution engine, greatly reducing how long it takes to develop.

Use of this project with Salesforce is subject to the [TERMS OF USE](./TERMS_OF_USE.md)

## Installation
The SDK can be downloaded directly from PyPI with `pip`:
```
pip install salesforce-data-customcode
```

You can verify it was properly installed via CLI:
```
datacustomcode version
```

## Development Setup
We offer two built-in development interfaces: `devcontainers` and Jupyter, but you can set up any tool you would like manually.

To get started, use the CLI to initialize a new development environment:
```
datacustomcode init [DIRECTORY TO DUMP NEW REPO]
```

This will yield all necessary files to get started:
```
.
├── Dockerfile
├── README.md
├── requirements.txt
├── requirements-dev.txt
├── payload
│   ├── config.json
│   ├── entrypoint.py
├── jupyterlab.sh
└── requirements.txt
```
* `Dockerfile` <span style="color:grey;font-style:italic;">(Do not update)</span> – Development container emulating the remote execution environment.
* `requirements-dev.txt` <span style="color:grey;font-style:italic;">(Do not update)</span> – These are the dependencies for the development environment.
* `jupyterlab.sh` <span style="color:grey;font-style:italic;">(Do not update)</span> – Helper script for setting up Jupyter.
* `requirements.txt` – Here you define the requirements that you will need remotely
* `payload` – This folder will be compressed and deployed to the remote execution environment.
  * `config.json` – This config defines permissions on the back and can be generated programmatically with `scan` CLI method.
  * `entrypoint.py` – The script that defines the data transformation logic.

## API

You entry point script will define logic using the `Client` object which wraps data access layers.

You should only need the following methods:
* `read_dlo(name)` – Read from a Data Lake Object by name
* `read_dmo(name)` – Read from a Data Model Object by name
* `write_to_dlo(name, spark_dataframe, write_mode)` – Write to a Data Model Object by name with a Spark dataframe
* `write_to_dmo(name, spark_dataframe, write_mode)` – Write to a Data Lake Object by name with a Spark dataframe

For example:
```
from datacustomcode import Client

client = Client()

sdf = client.read_dlo('my_DLO')
# some transformations
# ...
client.write_to_dlo('output_DLO')
```


> [!WARNING]
> Currently we only support reading from DMOs and writing to DMOs or reading from DLOs and writing to DLOs, but they cannot mix.


## CLI

The Data Cloud Custom Code SDK provides a command-line interface (CLI) with the following commands:

### Global Options
- `--debug`: Enable debug-level logging

### Commands

#### `datacustomcode version`
Display the current version of the package.

#### `datacustomcode configure`
Configure credentials for connecting to Data Cloud.

Options:
- `--profile TEXT`: Credential profile name (default: "default")
- `--username TEXT`: Salesforce username
- `--password TEXT`: Salesforce password
- `--client-id TEXT`: Connected App Client ID
- `--client-secret TEXT`: Connected App Client Secret
- `--login-url TEXT`: Salesforce login URL

#### `datacustomcode deploy`
Deploy a transformation job to Data Cloud.

Options:
- `--profile TEXT`: Credential profile name (default: "default")
- `--path TEXT`: Path to the code directory (default: ".")
- `--name TEXT`: Name of the transformation job [required]
- `--version TEXT`: Version of the transformation job (default: "0.0.1")
- `--description TEXT`: Description of the transformation job (default: "")

#### `datacustomcode init`
Initialize a new development environment with a template.

Argument:
- `DIRECTORY`: Directory to create project in (default: ".")

#### `datacustomcode scan`
Scan a Python file to generate a Data Cloud configuration.

Argument:
- `FILENAME`: Python file to scan

Options:
- `--config TEXT`: Path to save the configuration file (default: same directory as FILENAME)
- `--dry-run`: Preview the configuration without saving to a file

#### `datacustomcode run`
Run an entrypoint file locally for testing.

Argument:
- `ENTRYPOINT`: Path to entrypoint Python file

Options:
- `--config-file TEXT`: Path to configuration file
- `--dependencies TEXT`: Additional dependencies (can be specified multiple times)

