Skip to main content

Clarifai Data Utils

Project description

Clarifai logo

Clarifai Python Data Utils

Discord codecov

This is a collection of utilities for handling various types of multimedia data. Enhance your experience by seamlessly integrating these utilities with the Clarifai Python SDK. This powerful combination empowers you to address both visual and textual use cases effortlessly through the capabilities of Artificial Intelligence. Unlock new possibilities and elevate your projects with the synergy of versatile data utilities and the robust features offered by the Clarifai Python SDK. Explore the fusion of these tools to amplify the intelligence in your applications! 🌐🚀

Website | Schedule Demo | Signup for a Free Account | API Docs | Clarifai Community | Python SDK Docs | Examples | Colab Notebooks | Discord


Table Of Contents

Installation

Install from PyPi:

pip install clarifai-datautils

Install from Source:

git clone https://github.com/Clarifai/clarifai-python-datautils
cd clarifai-python-datautils
python3 -m venv env
source env/bin/activate
pip3 install -r requirements.txt

Getting started

Quick intro to Image Annotation Conversion feature

from clarifai_datautils import ImageAnnotations

annotated_dataset = ImageAnnotations.import_from(path= 'folder_path', format= 'annotation_format')

Features

Image Utils

  • Annotation Loader

    • Load various annotated image datasets and export to clarifai Platform
    • Convert from one annotation format to other supported annotation formats

Data Ingestion Pipeline

  • Easy to use pipelines to load data from files and ingest into clarifai platfrom.
  • Load text files(pdf, doc, etc..) , transform, chunk and upload to the Clarifai Platform

Usage

Image Annotation Loader

from clarifai_datautils import ImageAnnotations
#import from folder
coco_dataset = ImageAnnotations.import_from(path='folder_path',format= 'coco_detection')

#Using clarifai SDK to upload to Clarifai Platform
#export CLARIFAI_PAT={your personal access token}  # set PAT as env variable
from clarifai.client.dataset import Dataset
dataset = Dataset(user_id="user_id", app_id="app_id", dataset_id="dataset_id")
dataset.upload_dataset(dataloader=coco_dataset.dataloader)

#info about loaded dataset
coco_dataset.get_info()


#exporting to other formats
coco_dataset.export_to('voc_detection')

Data Ingestion Pipelines

Setup

To use Data Ingestion Pipeline, please run

pip install -r requirements-dev.txt
from clarifai_datautils.text import Pipeline, PDFPartition
from clarifai_datautils.text.pipeline.cleaners import Clean_extra_whitespace

# Define the pipeline
pipeline = Pipeline(
    name='pipeline-1',
    transformations=[
        PDFPartition(chunking_strategy = "by_title",max_characters = 1024),
        Clean_extra_whitespace()
    ]
)


# Using SDK to upload
from clarifai.client import Dataset
dataset = Dataset(dataset_url)
dataset.upload_dataset(pipeline.run(files = file_path, loader = True))

More Examples

See many more code examples in this repo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clarifai_datautils-0.0.7.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clarifai_datautils-0.0.7-py3-none-any.whl (31.5 kB view details)

Uploaded Python 3

File details

Details for the file clarifai_datautils-0.0.7.tar.gz.

File metadata

  • Download URL: clarifai_datautils-0.0.7.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.21

File hashes

Hashes for clarifai_datautils-0.0.7.tar.gz
Algorithm Hash digest
SHA256 4fc7c48f2b245a1619d1d1ef290128f23cb9961eb2d56b6fcdea2a7adff789c9
MD5 d0a950593748c560ad147158c03eef11
BLAKE2b-256 80a5fcdc9bd838a1e8f23543ac849eb023af7dc601787e55c4cc78965d76f128

See more details on using hashes here.

File details

Details for the file clarifai_datautils-0.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for clarifai_datautils-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 733ebc999f59a21f5fbd415eb47f07affc1aefbec81250e8ea80ae11e052f819
MD5 9727e7ce158863d1bff47946bfb97046
BLAKE2b-256 b9a4fce5c26283773508f1e90ab4b8e78bb6cf2902e729ed4c2280e4266ec6c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page