Skip to main content

Load any mixture of text to text data in one line of code

Project description

Image Description

Unitxt is a python library for getting data fired up and set for utilization. In one line of code, it preps a dataset or mixtures-of-datasets into an input-output format for training and evaluation. We aspire to be simple, adaptable and transparent.

Unitxt builds on separation. Separation allows adding a dataset, without knowing anything about the models using it. Separation allows training without caring for preprocessing, switching models without loading the data differently and changing formats (instruction\ICL\etc.) without changing anything else.

version license python tests codecov Read the Docs downloads

Unitxt Flow

Where to start? 🦄

Button Button Button Button Button

Why Unitxt? 🦄

🦄 Simplicity

Everything in Unitxt is simple and designed to feel natural and self-explanatory.

🦄 Adaptability

Adding new datasets, loading recipes, instructions and formatters is possible and encouraged!

🦄 Transparency

The resources and formatters of Unitxt are stored as shared datasets and therefore can easily reviewed by the crowd. Moreover, when assembling a dataset with Unitxt, it is very clear to others what's in it.

Contributers

Please install Unitxt from source by:

git clone git@github.com:IBM/unitxt.git
cd unitxt
pip install -e ".[dev]"
pre-commit install

Run Unitxt Exploration Dashboard

To launch unitxt graphical user interface run:

unitxt-explore

Ensuring a Linear Git History

Configure your Git to maintain a linear history with these commands:

  1. Automatic Rebasing for Pulls:

    • Command: git config --global pull.rebase true
    • This sets git pull to rebase changes, keeping your history linear without unnecessary merge commits.
  2. Fast-Forward Merges Only:

    • Command: git config --global merge.ff only
    • This allows only fast-forward merges, preventing merge commits when branches diverge, to maintain a linear history.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unitxt-1.5.1.tar.gz (246.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unitxt-1.5.1-py3-none-any.whl (540.2 kB view details)

Uploaded Python 3

File details

Details for the file unitxt-1.5.1.tar.gz.

File metadata

  • Download URL: unitxt-1.5.1.tar.gz
  • Upload date:
  • Size: 246.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for unitxt-1.5.1.tar.gz
Algorithm Hash digest
SHA256 659ef976e67374fc7fa1df50bbb124d8bd36997bde3d9e74df03de9cde4d93dc
MD5 b4eab94a432ea7307552e19076607fdd
BLAKE2b-256 70bfc817e0d0ebd5ebc9150fc760d1959d981a9062cc5929a3fee2da82c78bc2

See more details on using hashes here.

File details

Details for the file unitxt-1.5.1-py3-none-any.whl.

File metadata

  • Download URL: unitxt-1.5.1-py3-none-any.whl
  • Upload date:
  • Size: 540.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for unitxt-1.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dd40de1a1cd84dfd53c1c0dfb3397e3df97827e203afecb8f48f35c7e05e3c6c
MD5 9da38641bd2086277790fbb9555d59f3
BLAKE2b-256 50f6719d57e6078cc25b2af211c3387d01c7fe4357f86c832e4474385dca1bf0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page