Tokenization

What is Tokenization?

Tokenization is both the first and final step in language model processing. Since machine learning models can only work with numerical data, text must be converted into numbers that models can understand and manipulate.

The tokenization process breaks down text into smaller units called tokens, which can represent:

Words or subwords: “hello” → [15496]
Characters: “AI” → [32, 73]
Byte-level representations: For handling any Unicode text across all languages

Each token is assigned a unique numerical ID from the model’s vocabulary.

When is Tokenization Used?

Tokenization serves as both the entry point and exit point of text processing in language models. Since models can only work with numerical data, text must be converted into tokens with corresponding numerical indices from the tokenizer’s vocabulary.

In a standard language model workflow:

Encoding Phase: We first convert input text into tokens using a tokenizer. Each token receives a unique index number that the model can process.
Model Processing: The tokenized input flows through the model architecture:
- Embedding layer: Transforms tokens into dense vector representations that capture semantic relationships
- Transformer blocks: Process these vectors to understand context, relationships, and generate meaningful responses
Decoding Phase: Finally, we convert the model’s output tokens back into readable text by mapping token indices back to their corresponding words or subwords using the tokenizer’s vocabulary.

This encode → process → decode cycle ensures seamless conversion between human language and machine-readable formats, enabling effective communication with language models.

AI21’s Tokenizer

We provides a AI21-Tokenizer specifically engineered for Jamba models.

Key Features

Jamba Mini and Large support
Async/sync operations:
Production-ready: Enterprise-grade reliability

Installation

Prerequisites

To use tokenizers for Jamba, you’ll need access to the relevant model’s HuggingFace repository.

Install the Tokenizer

pip install ai21-tokenizer

Model-Specific Tokenizers

Choose the appropriate tokenizer for your Jamba model:

from ai21_tokenizer import Tokenizer, PreTrainedTokenizers

tokenizer = Tokenizer.get_tokenizer(PreTrainedTokenizers.JAMBA_MINI_TOKENIZER)

text = "Jamba Mini model says hello"
encoded = tokenizer.encode(text)
print(f"Jamba Mini encoded: {encoded}")

from ai21_tokenizer import Tokenizer, PreTrainedTokenizers

tokenizer = Tokenizer.get_tokenizer(PreTrainedTokenizers.JAMBA_MINI_TOKENIZER)

text = "Jamba Mini model says hello"
encoded = tokenizer.encode(text)
print(f"Jamba Mini encoded: {encoded}")

from ai21_tokenizer import Tokenizer, PreTrainedTokenizers

tokenizer = Tokenizer.get_tokenizer(PreTrainedTokenizers.JAMBA_LARGE_TOKENIZER)

text = "Jamba Large model says hello"
encoded = tokenizer.encode(text)
print(f"Jamba Large encoded: {encoded}")

Basic Usage

Encode Text to Tokens

from ai21_tokenizer import Tokenizer

# Create tokenizer (defaults to Jamba Mini)
tokenizer = Tokenizer.get_tokenizer()

# Convert text to token IDs
text = "Hello, world!"
encoded = tokenizer.encode(text)
print(f"Encoded: {encoded}")
# Output: Encoded: [15496, 11, 1917, 0]

Decode Tokens to Text

from ai21_tokenizer import Tokenizer

tokenizer = Tokenizer.get_tokenizer()

# Convert token IDs back to text
decoded = tokenizer.decode(encoded)
print(f"Decoded: {decoded}")
# Output: Decoded: Hello, world!

Asynchronous Usage

For high-performance/server applications, use the async tokenizer:

import asyncio
from ai21_tokenizer import Tokenizer

async def main():
    tokenizer = await Tokenizer.get_async_tokenizer()
    
    text = "Async tokenization for async operations!"
    encoded = await tokenizer.encode(text)
    decoded = await tokenizer.decode(encoded)
    
    print(f"Original: {text}")
    print(f"Encoded: {encoded}")
    print(f"Decoded: {decoded}")

asyncio.run(main())

Practical Use Cases

Cost estimation: Calculate API usage costs based on token consumption
Prompt optimization: Ensure prompts fit within model context limits

For more advanced usage examples, visit the AI21 tokenizer examples folder.

Quantization Account

On this page

What is Tokenization?
When is Tokenization Used?
AI21’s Tokenizer
Key Features
Installation
Prerequisites
Install the Tokenizer
Model-Specific Tokenizers
Basic Usage
Asynchronous Usage
Practical Use Cases

What is Tokenization?

The tokenization process breaks down text into smaller units called tokens, which can represent:

Words or subwords: “hello” → [15496]
Characters: “AI” → [32, 73]
Byte-level representations: For handling any Unicode text across all languages

Each token is assigned a unique numerical ID from the model’s vocabulary.

When is Tokenization Used?

In a standard language model workflow:

Encoding Phase: We first convert input text into tokens using a tokenizer. Each token receives a unique index number that the model can process.
Model Processing: The tokenized input flows through the model architecture:
- Embedding layer: Transforms tokens into dense vector representations that capture semantic relationships
- Transformer blocks: Process these vectors to understand context, relationships, and generate meaningful responses
Decoding Phase: Finally, we convert the model’s output tokens back into readable text by mapping token indices back to their corresponding words or subwords using the tokenizer’s vocabulary.

This encode → process → decode cycle ensures seamless conversion between human language and machine-readable formats, enabling effective communication with language models.

AI21’s Tokenizer

We provides a AI21-Tokenizer specifically engineered for Jamba models.

Key Features

Jamba Mini and Large support
Async/sync operations:
Production-ready: Enterprise-grade reliability

Installation

Prerequisites

To use tokenizers for Jamba, you’ll need access to the relevant model’s HuggingFace repository.

Install the Tokenizer

pip install ai21-tokenizer

Model-Specific Tokenizers

Choose the appropriate tokenizer for your Jamba model:

from ai21_tokenizer import Tokenizer, PreTrainedTokenizers

tokenizer = Tokenizer.get_tokenizer(PreTrainedTokenizers.JAMBA_MINI_TOKENIZER)

text = "Jamba Mini model says hello"
encoded = tokenizer.encode(text)
print(f"Jamba Mini encoded: {encoded}")

from ai21_tokenizer import Tokenizer, PreTrainedTokenizers

tokenizer = Tokenizer.get_tokenizer(PreTrainedTokenizers.JAMBA_MINI_TOKENIZER)

text = "Jamba Mini model says hello"
encoded = tokenizer.encode(text)
print(f"Jamba Mini encoded: {encoded}")

from ai21_tokenizer import Tokenizer, PreTrainedTokenizers

tokenizer = Tokenizer.get_tokenizer(PreTrainedTokenizers.JAMBA_LARGE_TOKENIZER)

text = "Jamba Large model says hello"
encoded = tokenizer.encode(text)
print(f"Jamba Large encoded: {encoded}")

Basic Usage

Encode Text to Tokens

from ai21_tokenizer import Tokenizer

# Create tokenizer (defaults to Jamba Mini)
tokenizer = Tokenizer.get_tokenizer()

# Convert text to token IDs
text = "Hello, world!"
encoded = tokenizer.encode(text)
print(f"Encoded: {encoded}")
# Output: Encoded: [15496, 11, 1917, 0]

Decode Tokens to Text

from ai21_tokenizer import Tokenizer

tokenizer = Tokenizer.get_tokenizer()

# Convert token IDs back to text
decoded = tokenizer.decode(encoded)
print(f"Decoded: {decoded}")
# Output: Decoded: Hello, world!

Asynchronous Usage

For high-performance/server applications, use the async tokenizer:

import asyncio
from ai21_tokenizer import Tokenizer

async def main():
    tokenizer = await Tokenizer.get_async_tokenizer()
    
    text = "Async tokenization for async operations!"
    encoded = await tokenizer.encode(text)
    decoded = await tokenizer.decode(encoded)
    
    print(f"Original: {text}")
    print(f"Encoded: {encoded}")
    print(f"Decoded: {decoded}")

asyncio.run(main())

Practical Use Cases

Cost estimation: Calculate API usage costs based on token consumption
Prompt optimization: Ensure prompts fit within model context limits

For more advanced usage examples, visit the AI21 tokenizer examples folder.

Quantization Account

On this page

What is Tokenization?
When is Tokenization Used?
AI21’s Tokenizer
Key Features
Installation
Prerequisites
Install the Tokenizer
Model-Specific Tokenizers
Basic Usage
Asynchronous Usage
Practical Use Cases

What is Tokenization?

When is Tokenization Used?

AI21’s Tokenizer

Key Features

Installation

Prerequisites

Install the Tokenizer

Model-Specific Tokenizers

Basic Usage

Asynchronous Usage

Practical Use Cases

Getting Started

Foundation Models

Conversational RAG

AI21 Maestro [Beta]

Private AI

Guides

Usage

AI Ethics & Data Transperancy

Additional Resources

Tokenization

What is Tokenization?

When is Tokenization Used?

AI21’s Tokenizer

Key Features

Installation

Prerequisites

Install the Tokenizer

Model-Specific Tokenizers

Basic Usage

Asynchronous Usage

Practical Use Cases

​What is Tokenization?

​When is Tokenization Used?

​AI21’s Tokenizer

​Key Features

​Installation

​Prerequisites

​Install the Tokenizer

​Model-Specific Tokenizers

​Basic Usage

​Asynchronous Usage

​Practical Use Cases

Getting Started

Foundation Models

Conversational RAG

AI21 Maestro [Beta]

Private AI

Guides

Usage

AI Ethics & Data Transperancy

Additional Resources

​What is Tokenization?

​When is Tokenization Used?

​AI21’s Tokenizer

​Key Features

​Installation

​Prerequisites

​Install the Tokenizer

​Model-Specific Tokenizers

​Basic Usage

​Asynchronous Usage

​Practical Use Cases

What is Tokenization?

When is Tokenization Used?

AI21’s Tokenizer

Key Features

Installation

Prerequisites

Install the Tokenizer

Model-Specific Tokenizers

Basic Usage

Asynchronous Usage

Practical Use Cases

What is Tokenization?

When is Tokenization Used?

AI21’s Tokenizer

Key Features

Installation

Prerequisites

Install the Tokenizer

Model-Specific Tokenizers

Basic Usage

Asynchronous Usage

Practical Use Cases