Metadata-Version: 2.1
Name: ujeebu-python
Version: 0.1.4b0
Summary: Ujeebu Python SDK to interact with Ujeebu API
Home-page: https://github.com/ujeebu/ujeebu-python
Author: Ujeebu
Author-email: y.alhyane@gmail.com
License: GNU General Public License v3
Keywords: ujeebu_python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: AUTHORS.rst
Requires-Dist: requests

# Ujeebu API Python SDK
[Ujeebu](https://ujeebu.com) is a set of powerful APIs for Web data scraping and automatic content extraction. This SDK provides an easy-to-use interface for interacting with Ujeebu API. It is built on top of Python and uses the [requests](https://requests.readthedocs.io/) library to make HTTP requests.

## Installation
You can install the SDK using pip:
```shell
pip install ujeebu-python
```


## Usage
To use the SDK, you first need to create an instance of it with your API credentials:
```python
from ujeebu_python import UjeebuClient
import json
ujeebu = UjeebuClient(api_key="__YOUR-API-KEY__")
url = "https://ujeebu.com/blog/scraping-javascript-heavy-pages-using-puppeteer/"
response = ujeebu.extract(url=url)
if(response.status_code == 200):
    result = response.json()
    print(json.dumps(result['article'], indent=2))
else:
    print("Error:\n", json.dumps(response.json(), indent=2))
```

##  APIs
The SDK provides the following methods:

### Core Methods

- `scrape(url, params, headers)`
    - `url`: The url to scrape (required).
    - `params`: Dict of [scrape API params](https://ujeebu.com/docs/scrape#parameters) (optional).
    - `headers`: Dict of headers to forward (optional).


- `extract(url, params, headers)`
    - `url`: The url to extract (required).
    - `params`: Dict of [extract API params](https://ujeebu.com/docs/extract#parameters) (optional).
    - `headers`: Dict of headers to forward (optional).


- `preview(url, params, headers)`
    - `url`: The url to extract (required).
    - `params`: Dict of [preview API params](https://ujeebu.com/docs/extract#parameters-1) (optional).
    - `headers`: Dict of headers to forward (optional).

- `serp(params, headers)`
    - `params`: Dict of [SERP API params](https://ujeebu.com/docs/serp#parameters) (optional).
    - `headers`: Dict of headers to forward (optional).

- `account()`
    - Returns account information including usage, balance, and plan details.

### Scrape Helper Methods

- `get_pdf(url, params, headers)`
    - Gets a PDF of a web page using the Scrape API.
    - `url`: The URL to create a PDF from (required).
    - `params`: Additional parameters for the PDF generation (optional).
    - `headers`: Headers to forward to the request (optional).
    - Automatically sets `response_type` to 'pdf' and `json` to True.

- `get_screenshot(url, params, headers)`
    - Gets a screenshot of a web page using the Scrape API.
    - `url`: The URL to take a screenshot of (required).
    - `params`: Additional parameters for the screenshot (optional).
    - `headers`: Headers to forward to the request (optional).
    - Automatically sets `response_type` to 'screenshot' and `json` to True.

- `get_html(url, params, headers)`
    - Gets the HTML of a web page using the Scrape API.
    - `url`: The URL to get HTML from (required).
    - `params`: Additional parameters for the request (optional).
    - `headers`: Headers to forward to the request (optional).
    - Automatically sets `response_type` to 'html' and `json` to True.

- `scrape_with_rules(url, extract_rules, params, headers)`
    - Extracts data from a web page using extraction rules with the Scrape API.
    - `url`: The URL to extract data from (required).
    - `extract_rules`: The rules to extract data with (required).
    - `params`: Additional parameters for the extraction (optional).
    - `headers`: Headers to forward to the request (optional).
    - Automatically sets `json` to True.

### SERP Helper Methods

- `search_text(search, params)`
    - Performs a Google text search using the SERP API.
    - `search`: The search query to perform on Google (required).
    - `params`: Additional parameters for the search (optional).

- `search_news(search, params)`
    - Performs a Google news search using the SERP API.
    - `search`: The search query to perform on Google News (required).
    - `params`: Additional parameters for the search (optional).

- `search_images(search, params)`
    - Performs a Google images search using the SERP API.
    - `search`: The search query to perform on Google Images (required).
    - `params`: Additional parameters for the search (optional).

- `search_videos(search, params)`
    - Performs a Google videos search using the SERP API.
    - `search`: The search query to perform on Google Videos (required).
    - `params`: Additional parameters for the search (optional).

- `search_maps(search, params)`
    - Performs a Google Maps search using the SERP API.
    - `search`: The search query to perform on Google Maps (required).
    - `params`: Additional parameters for the search (optional).


#### Examples
- Example to scrape html of URL with infinite scroll
```python
url = "https://scrape.li/load-more"
response = ujeebu.scrape(url, params={
    # define what to extract 'raw', 'html', 'screenshot' or 'pdf'
    "response_type": "dddd",

    # return response in json format or
    "json": False,
    # user-agent header to forward
    "useragent": "Ujeebu-Node",

    # cookies to forward
    "cookies": {"Cookie1": "Cookie Value"},

    # execute js
    "js": True,

    # wait for selector or time in ms
    "wait_for": ".products-list",

    # if the selector doesn't appear in 5000ms, ignore and continue
    "wait_for_timeout": 5000,

    # scroll the page down
    "scroll_down": True,

    # wait 2000ms between two scrolls
    "scroll_wait": 2000,

    # scroll to this element in each scroll
    "scroll_to_selector": ".load-more-section",

    # scroll condition. while this is true the page will continue to perform more scrolls
    "scroll_callback": "() => (document.querySelector('.no-more-products') === null)",

    "proxy_type": "premium",

    # proxy country code
    "proxy_country": "US",

    # device type "desktop" or "mobile"
    "device": "desktop",

    "window_width": 1200,

    "window_height": 900,

    "block_ads": True,

    "block_resources": True,
}, headers={
    # forwarded headers
    "Authorization": "Basic eWSjaW5lnlhY4luZUdxMDE2"
})
if(response.status_code == 200):
    print(response.content)
else:
    print("Error:\n", json.dumps(response.json(), indent=2))
```

- Example to take a screenshot of URL
```python
url = "https://scrape.li/load-more";

response = ujeebu.scrape(url, {
    "response_type": "screenshot",
    "screenshot_fullpage": True,
    "js": True,
    # CSS selector to screenshot or coordinates of the rect to screenshot
    # screenshot_partial: {
    #     x: 0,
    #     y: 0,
    # },
    # If json is set the true the screenshot will be sent in base64 encoding
    "json": False,
    "wait_for": 4000,
    "block_ads": True,
})
if(response.status_code == 200):
    # from pathlib import Path
    Path('screenshot.png').write_bytes(response.content)
else:
    print("Error:\n", json.dumps(response.json(), indent=2))
```


- Example of extracting list of products from a page
```python
response = ujeebu.scrape(url=url, params={
    "wait_for": 5000,
    "block_resources": 0,
    "js": 1,
    "extract_rules": {
        "products": {
            "selector": ".product-card",
            "type": "obj",
            "multiple": 1,
            "children": {
                "name": {
                    "selector": ".title",
                    "type": "text"
                },
                "description": {
                    "selector": ".description",
                    "type": "text"
                },
                "price": {
                    "selector": ".price",
                    "type": "text"
                },
                "image": {
                    "selector": ".card__image > img",
                    "type": "image",
                }
            }
        }
    }
})
if(response.status_code == 200):
    print(json.dumps(response.json(), indent=2))
else:
    print("Error:\n", json.dumps(response.json(), indent=2))
```

- Example of extracting main body of an article
```python
url = "https://thenextweb.com/news/european-space-agency-unveils-new-plan-for-growing-plants-on-the-moon"

response = ujeebu.extract(url=url, params={
    "js": True
})
if(response.status_code == 200):
    result = response.json()
    print(json.dumps(result['article'], indent=2))
else:
    print("Error:\n", json.dumps(response.json(), indent=2))

```

### Helper Functions Examples

#### Using Scrape Helper Methods

- Get PDF using helper function:
```python
from ujeebu_python import UjeebuClient

ujeebu = UjeebuClient(api_key="__YOUR-API-KEY__")

# Get PDF
response = ujeebu.get_pdf(
    "https://ujeebu.com/blog/scraping-javascript-heavy-pages-using-puppeteer/"
)
if response.status_code == 200:
    result = response.json()
    # PDF is base64 encoded
    print(result['pdf'][:100])
```

- Get screenshot using helper function:
```python
# Take a full page screenshot
response = ujeebu.get_screenshot(
    "https://ujeebu.com",
    params={
        "screenshot_fullpage": True
    }
)
if response.status_code == 200:
    result = response.json()
    # Screenshot is base64 encoded
    print(result['screenshot'][:100])
```

- Get HTML using helper function:
```python
# Get HTML with JavaScript execution
response = ujeebu.get_html(
    "https://ujeebu.com",
    params={
        "js": True,
        "wait_for": 2000
    }
)
if response.status_code == 200:
    result = response.json()
    print(result['html'][:100])
```

- Scrape with extraction rules using helper function:
```python
# Extract product data using helper function
extract_rules = {
    "products": {
        "selector": ".product-card",
        "type": "obj",
        "multiple": True,
        "children": {
            "name": {"selector": ".title", "type": "text"},
            "price": {"selector": ".price", "type": "text"}
        }
    }
}

response = ujeebu.scrape_with_rules(
    "https://example.com/products",
    extract_rules=extract_rules,
    params={"js": True, "wait_for": 3000}
)
if response.status_code == 200:
    result = response.json()
    print(json.dumps(result['result'], indent=2))
```

#### Using SERP Helper Methods

- Google text search:
```python
# Perform a text search
response = ujeebu.search_text(
    "Nikola Tesla",
    params={"results_count": 10, "lang": "en"}
)
if response.status_code == 200:
    result = response.json()
    for item in result['organic_results']:
        print(f"{item['title']}: {item['link']}")
```

- Google news search:
```python
# Search for news articles
response = ujeebu.search_news(
    "Donald Trump",
    params={"results_count": 20}
)
if response.status_code == 200:
    result = response.json()
    for news in result['news']:
        print(f"{news['title']}: {news['link']}")
```

- Google images search:
```python
# Search for images
response = ujeebu.search_images(
    "Coffee",
    params={"results_count": 10}
)
if response.status_code == 200:
    result = response.json()
    for image in result['images']:
        print(f"{image['title']}: {image['image']}")
```

- Google videos search:
```python
# Search for videos
response = ujeebu.search_videos(
    "Bitcoin",
    params={"results_count": 10}
)
if response.status_code == 200:
    result = response.json()
    for video in result['videos']:
        print(f"{video['title']}: {video['url']}")
```

- Google Maps search:
```python
# Search for places on Google Maps
response = ujeebu.search_maps(
    "Italian restaurant",
    params={"results_count": 10, "location": "ca"}
)
if response.status_code == 200:
    result = response.json()
    for place in result['maps_results']:
        print(f"{place['title']} - Rating: {place['rating']}")
```

#### Using Account Method

```python
# Get account information
response = ujeebu.account()
if response.status_code == 200:
    account_info = response.json()
    print(f"Plan: {account_info['plan']}")
    print(f"Used: {account_info['used']} / {account_info['quota']}")
    print(f"Used Percent: {account_info['used_percent']}%")
```


## Contributing
Contributions are welcome! If you find a bug or have a feature request, please open an issue or submit a pull request.

## License
This library is licensed under the MIT License. See the LICENSE file for more information.

## Credits
This package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and the [`audreyr/cookiecutter-pypackage`](https://github.com/audreyr/cookiecutter-pypackage) project template.
