No project description provided
Project description
AI21 Labs Tokenizer
A SentencePiece based tokenizer for production uses
Installation
pip
pip install ai21-tokenizer
poetry
poetry add ai21-tokenizer
Usage
Tokenizer Creation
from ai21_tokenizer import Tokenizer
tokenizer = Tokenizer.get_tokenizer()
# Your code here
Another way would be to use our Jurassic model directly:
from ai21_tokenizer import JurassicTokenizer
model_path = "<Path to your vocabs file. This is usually a binary file that end with .model>"
config = {} # "dictionary object of your config.json file"
tokenizer = JurassicTokenizer(model_path=model_path, config=config)
Functions
Encode and Decode
These functions allow you to encode your text to a list of token ids and back to plaintext
text_to_encode = "apple orange banana"
encoded_text = tokenizer.encode(text_to_encode)
print(f"Encoded text: {encoded_text}")
decoded_text = tokenizer.decode(encoded_text)
print(f"Decoded text: {decoded_text}")
What if you had wanted to convert your tokens to ids or vice versa?
tokens = tokenizer.convert_ids_to_tokens(encoded_text)
print(f"IDs corresponds to Tokens: {tokens}")
ids = tokenizer.convert_tokens_to_ids(tokens)
For more examples, please see our examples folder.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai21_tokenizer-0.3.11.tar.gz.
File metadata
- Download URL: ai21_tokenizer-0.3.11.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec11ce4e46d24f71f1c2756ad0de34e0adfd51b5bcd81b544aea13d6935ec905
|
|
| MD5 |
6e88fcb97268661c1e5dcc0a060f3e79
|
|
| BLAKE2b-256 |
96257ebf855efdc7643e0fc131055a0899318cd069c6d805c04ea8fef96b1259
|
File details
Details for the file ai21_tokenizer-0.3.11-py3-none-any.whl.
File metadata
- Download URL: ai21_tokenizer-0.3.11-py3-none-any.whl
- Upload date:
- Size: 2.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80d332c51cab3fa88f0fea7493240a6a5bc38fd24a3d0806d28731d8fc97691f
|
|
| MD5 |
7a720f88ab46f602dbd8bf55c6d9f9c8
|
|
| BLAKE2b-256 |
65103796cca35f777b04eb4a7f603da50f808068dfe8be6b5e45459c2e2edd27
|