Skip to main content

No project description provided

Project description

AI21 Labs Tokenizer

A SentencePiece based tokenizer for production uses

Test Package version Supported Python versions Poetry Supported Python versions License


Installation

pip

pip install ai21-tokenizer

poetry

poetry add ai21-tokenizer

Usage

Tokenizer Creation

from ai21_tokenizer import Tokenizer

tokenizer = Tokenizer.get_tokenizer()
# Your code here

Another way would be to use our Jurassic model directly:

from ai21_tokenizer import JurassicTokenizer

model_path = "<Path to your vocabs file. This is usually a binary file that end with .model>"
config = {} # "dictionary object of your config.json file"
tokenizer = JurassicTokenizer(model_path=model_path, config=config)

Functions

Encode and Decode

These functions allow you to encode your text to a list of token ids and back to plaintext

text_to_encode = "apple orange banana"
encoded_text = tokenizer.encode(text_to_encode)
print(f"Encoded text: {encoded_text}")

decoded_text = tokenizer.decode(encoded_text)
print(f"Decoded text: {decoded_text}")

What if you had wanted to convert your tokens to ids or vice versa?

tokens = tokenizer.convert_ids_to_tokens(encoded_text)
print(f"IDs corresponds to Tokens: {tokens}")

ids = tokenizer.convert_tokens_to_ids(tokens)

For more examples, please see our examples folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai21_tokenizer-0.8.1.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai21_tokenizer-0.8.1-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file ai21_tokenizer-0.8.1.tar.gz.

File metadata

  • Download URL: ai21_tokenizer-0.8.1.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for ai21_tokenizer-0.8.1.tar.gz
Algorithm Hash digest
SHA256 0aa6574d1ff1bb65e4da367692ecb834dbcd81852f1afdbd38e5b06da7396908
MD5 8e82e9858849bcf4171e567e81d3ace8
BLAKE2b-256 20eab1a880fe8a784ee10c630221cea6ec2ec7050f563241888b7c3a05e37c5a

See more details on using hashes here.

File details

Details for the file ai21_tokenizer-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: ai21_tokenizer-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for ai21_tokenizer-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 047929ad76552532e7d654ea7af697b3dfa8887f75f90bc285c86ec8ad2e0560
MD5 9f2ea87cc26da80130d1b40b812583cb
BLAKE2b-256 9cb68b00418d7110a1211565d2cd5070a884e245cd7fccaf0b5972b512cd161b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page