Skip to main content

No project description provided

Project description

AI21 Labs Tokenizer

A SentencePiece based tokenizer for production uses with AI21's models

Test Package version Supported Python versions Poetry Supported Python versions License


Installation

pip

pip install ai21-tokenizer

poetry

poetry add ai21-tokenizer

Usage

Tokenizer Creation

from ai21_tokenizer import Tokenizer

tokenizer = Tokenizer.get_tokenizer()
# Your code here

Another way would be to use our Jurassic model directly:

from ai21_tokenizer import JurassicTokenizer

model_path = "<Path to your vocabs file. This is usually a binary file that end with .model>"
config = {} # "dictionary object of your config.json file"
tokenizer = JurassicTokenizer(model_path=model_path, config=config)

Functions

Encode and Decode

These functions allow you to encode your text to a list of token ids and back to plaintext

text_to_encode = "apple orange banana"
encoded_text = tokenizer.encode(text_to_encode)
print(f"Encoded text: {encoded_text}")

decoded_text = tokenizer.decode(encoded_text)
print(f"Decoded text: {decoded_text}")

What if you had wanted to convert your tokens to ids or vice versa?

tokens = tokenizer.convert_ids_to_tokens(encoded_text)
print(f"IDs corresponds to Tokens: {tokens}")

ids = tokenizer.convert_tokens_to_ids(tokens)

For more examples, please see our examples folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai21_tokenizer-0.9.1.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai21_tokenizer-0.9.1-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file ai21_tokenizer-0.9.1.tar.gz.

File metadata

  • Download URL: ai21_tokenizer-0.9.1.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for ai21_tokenizer-0.9.1.tar.gz
Algorithm Hash digest
SHA256 28c72122ef1eda6ba03c8ae55916de74a7ee9504c759e9bbae44abe821955df9
MD5 4040531f4de95744c2b070c26f82bf48
BLAKE2b-256 4a071c0fcbc5104224935111279abe6b5469aa9d02467bfc7754593fd68e6246

See more details on using hashes here.

File details

Details for the file ai21_tokenizer-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: ai21_tokenizer-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for ai21_tokenizer-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 53e85ddd74b375e7384e5b5c87ff60e63f60e8615c82ce80f27a062b530798ca
MD5 879f75857c92d1f600207270a776da64
BLAKE2b-256 ab26b2d322e3511e392857d0a12ff458857d04968f15f68a098f6b0972f41817

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page