Skip to main content

No project description provided

Project description

AI21 Labs Tokenizer

A SentencePiece based tokenizer for production uses

Test Package version Supported Python versions Poetry Supported Python versions License


Installation

pip

pip install ai21-tokenizer

poetry

poetry add ai21-tokenizer

Usage

Tokenizer Creation

from ai21_tokenizer import Tokenizer

tokenizer = Tokenizer.get_tokenizer()
# Your code here

Another way would be to use our Jurassic model directly:

from ai21_tokenizer import JurassicTokenizer

model_path = "<Path to your vocabs file. This is usually a binary file that end with .model>"
config = {} # "dictionary object of your config.json file"
tokenizer = JurassicTokenizer(model_path=model_path, config=config)

Functions

Encode and Decode

These functions allow you to encode your text to a list of token ids and back to plaintext

text_to_encode = "apple orange banana"
encoded_text = tokenizer.encode(text_to_encode)
print(f"Encoded text: {encoded_text}")

decoded_text = tokenizer.decode(encoded_text)
print(f"Decoded text: {decoded_text}")

What if you had wanted to convert your tokens to ids or vice versa?

tokens = tokenizer.convert_ids_to_tokens(encoded_text)
print(f"IDs corresponds to Tokens: {tokens}")

ids = tokenizer.convert_tokens_to_ids(tokens)

For more examples, please see our examples folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai21_tokenizer-0.3.11.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai21_tokenizer-0.3.11-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file ai21_tokenizer-0.3.11.tar.gz.

File metadata

  • Download URL: ai21_tokenizer-0.3.11.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for ai21_tokenizer-0.3.11.tar.gz
Algorithm Hash digest
SHA256 ec11ce4e46d24f71f1c2756ad0de34e0adfd51b5bcd81b544aea13d6935ec905
MD5 6e88fcb97268661c1e5dcc0a060f3e79
BLAKE2b-256 96257ebf855efdc7643e0fc131055a0899318cd069c6d805c04ea8fef96b1259

See more details on using hashes here.

File details

Details for the file ai21_tokenizer-0.3.11-py3-none-any.whl.

File metadata

File hashes

Hashes for ai21_tokenizer-0.3.11-py3-none-any.whl
Algorithm Hash digest
SHA256 80d332c51cab3fa88f0fea7493240a6a5bc38fd24a3d0806d28731d8fc97691f
MD5 7a720f88ab46f602dbd8bf55c6d9f9c8
BLAKE2b-256 65103796cca35f777b04eb4a7f603da50f808068dfe8be6b5e45459c2e2edd27

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page