Metadata-Version: 2.1
Name: docarray
Version: 0.1.0.dev5
Summary: A lightweight data structure for unstructured data
Home-page: https://github.com/jina-ai/docarray
Author: Jina AI
Author-email: hello@jina.ai
License: Apache 2.0
Download-URL: https://github.com/jina-ai/docarray/tags
Project-URL: Documentation, https://docarray.jina.ai
Project-URL: Source, https://github.com/jina-ai/docarray/
Project-URL: Tracker, https://github.com/jina-ai/docarray/issues
Description: # docarray
        
        <!-- start elevator-pitch -->
        
        The data structure for unstructured data.
        
        🌌 **All data types**: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data.
        
        🧑‍🔬 **Data science powerhouse**: easy-to-use functions for facilitating data scientists work on embedding, matching, visualizing, evaluating unstructured data via Torch/Tensorflow/ONNX/PaddlePaddle.
        
        🚡 **Portable**: ready to wire with efficient conversion from/to Protobuf, binary, JSON, CSV, dataframe.
        
        <!-- end elevator-pitch -->
        
        
        ## Install 
        
        Requires Python 3.7+ and `numpy`:
        ```
        pip install docarray
        ```
        To install full dependencies, please use `pip install docarray[full]`.
        
        ## [Documentation](https://docarray.jina.ai)
        
        ## Get Started
        
        Let's use DocArray and ResNet50 to build a meme image search on [Totally Looks Like](https://sites.google.com/view/totally-looks-like-dataset). This dataset contains 6016 image-pairs stored in `/left` and `/right`. Images that shares the same filename are labeled as perceptually similar. For example, 
        
        | `/left` | `/right` | `/left` | `/right` |
        |---------|----------|---------|----------|
        |
        
        Our problem is given an image from `/left` and find its most-similar image in `/right` (without looking at the filename of course).
        
        ### Load images
        
        First load images and preprocess them with standard computer vision techniques:
        
        ```python
        from docarray import DocumentArray, Document
        
        left_da = DocumentArray.from_files('left/*.jpg')
        ```
        
        To get a feeling of the data you will handle, plot them in one sprite image:
        
        ```python
        left_da.plot_image_sprites()
        ```
        
        <p align="center">
        <a href="https://docs.jina.ai"><img src="https://github.com/jina-ai/docarray/blob/master/.github/README-img/sprite.png?raw=true" alt="Load totally looks like dataset with docarray API" width="70%"></a>
        </p>
        
        ### Apply preprocessing
        
        Let's do some standard computer vision preprocessing:
        
        ```python
        def preproc(d: Document):
            return (d.load_uri_to_image_blob()  # load
                     .set_image_blob_normalization()  # normalize color 
                     .set_image_blob_channel_axis(-1, 0))  # switch color axis
        
        left_da.apply(preproc)
        ```
        
        Did I mention `apply` work in parallel?
        
        ### Embed images
        
        Now convert images into embeddings using a pretrained ResNet50:
        
        ```python
        import torchvision
        model = torchvision.models.resnet50(pretrained=True)  # load ResNet50
        left_da.embed(model, device='cuda')  # embed via GPU to speedup
        ```
        
        ### Visualize embeddings
        
        You can visualize the embeddings via tSNE in an interactive embedding projector:
        
        ```python
        left_da.plot_embeddings()
        ```
        
        <p align="center">
        <a href="https://docs.jina.ai"><img src="https://github.com/jina-ai/docarray/blob/master/.github/README-img/tsne.gif?raw=true" alt="Visualizing embedding via tSNE and embedding projector" width="90%"></a>
        </p>
        
        Fun is fun, but recall our goal is to match left images against right images and so far we have only handled the left. Let's repeat the same procedure for the right:
        
        ```python
        right_da = (DocumentArray.from_files('right/*.jpg')
                                 .apply(preproc)
                                 .embed(model, device='cuda'))
        ```
        
        ### Match nearest neighbours
        
        We can now match the left to the right and take the top-9 results.
        
        ```python
        left_da.match(right_da, limit=9)
        ```
        
        Let's inspect what's inside `left_da` now:
        
        ```python
        for d in left_da:
            for m in d.matches:
                print(d.uri, m.uri, m.scores['cosine'].value)
        ```
        
        ```text
        left/02262.jpg right/03459.jpg 0.21102
        left/02262.jpg right/02964.jpg 0.13871843
        left/02262.jpg right/02103.jpg 0.18265384
        left/02262.jpg right/04520.jpg 0.16477376
        ...
        ```
        
        Better see it.
        
        ```python
        (DocumentArray(left_da[12].matches, copy=True)
            .apply(lambda d: d.set_image_blob_channel_axis(0, -1)
                              .set_image_blob_inv_normalization())
            .plot_image_sprites('result.png'))
        ```
        
        <p align="center">
        <a href="https://docs.jina.ai"><img src="https://github.com/jina-ai/docarray/blob/master/.github/README-img/9nn.png?raw=true" alt="Visualizing top-9 matches using DocArray API" width="50%"></a>
        </p>
        
        ### Quantitative evaluation
        
        
        
Keywords: jina cloud-native neural-search query search index elastic neural-network encoding embedding serving docker container image video audio deep-learning
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Unix Shell
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Classifier: Topic :: Multimedia :: Video
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/markdown
Provides-Extra: full
