Chat Response

import asyncio

from ai21 import AsyncAI21Client
from ai21.models.chat import ChatMessage

messages = [ChatMessage(content="What is the meaning of life?", role="user")]

client = AsyncAI21Client()


async def main():
    response = await client.chat.completions.create(
        messages=messages,
        model="jamba-large",
        stream=True,
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content, end="")


asyncio.run(main())

Response details

Non-streaming results

A successful non-streamed response includes the following members:

string

Unique ID for each request (not message). Same ID for all responses in a streaming response.

model

string

The model used to generate the response.

choices

object[]

One or more responses, depending on the n parameter from the request. Each response includes the following members.

Show properties

index

object

Zero-based index of the message in the list of messages. Note that this might not correspond with the position in the response list.

message

object

The message generated by the model. Includes two fields: role and content.

tool_calls

object[]

Tool calls only occur if a tools parameter was specified in the request. These tool calls apply solely to the current message, and returned values should be added to the message thread in both the assistant message tool_calls fields and the tool message.

Show properties

string

ID of the tool call, generated by the model.

type

string

The type of tool called. Currently the only possible value is “function”.

function

object

The invoked function.

Show properties

name

string

The name of the function, which you specified in your request.

arguments

object

A JSON object containing the function’s parameters and values.

finish_reason

string

Why the message ended.

Show properties

stop

string

The response ended naturally as a complete answer (due to end-of-sequence token) or because the model generated a stop sequence provided in the request.

length

string

The response ended by reaching max_tokens.

usage

object

The token counts for this request. Per-token billing is based on the prompt token and completion token counts and rates.

Show properties

prompt_tokens

integer

Number of tokens in the prompt for this request. The prompt token contains the entire message history and extra tokens for combining messages, proportional to the number of messages.

completion_tokens

integer

Number of tokens in the response message.

total_tokens

integer

prompt_tokens and completion_tokens.

Streamed results

Setting stream = true in the request will return a stream of messages, each containing one token. You can read more about streaming calls using the SDK.

The final message will be data: [DONE]. All other messages will have data set to a JSON object with the following fields:

data

object

An object containing either an object with the following members, or the string “DONE” for the last message.

string

Unique ID for each request (not message). Same ID for all streaming responses.

choices

object

An array with one object containing the following fields:

index

integer

Always zero.

delta

object

The first message in the stream will be an object set to {"role":"assistant"}.
Subsequent messages will have an object {"content": **token**} with the generated token.

finish_reason

string

Why the message ended.

Show properties

usage

object

The last message includes this field, which shows the total token counts for the request. Per-token billing is based on the prompt token and completion token counts and rates. When present, it contains a null value except for the last chunk which contains the token usage statistics for the entire request.

Show properties

prompt_tokens

integer

Number of tokens in the prompt for this request. The prompt token contains the entire message history and extra tokens for combining messages, proportional to the number of messages.

completion_tokens

integer

Number of tokens in the response message.

total_tokens

integer

prompt_tokens and completion_tokens.

usage will be null except for the last chunk which contains the token usage statistics for the entire request.

import asyncio

from ai21 import AsyncAI21Client
from ai21.models.chat import ChatMessage

messages = [ChatMessage(content="What is the meaning of life?", role="user")]

client = AsyncAI21Client()


async def main():
    response = await client.chat.completions.create(
        messages=messages,
        model="jamba-large",
        stream=True,
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content, end="")


asyncio.run(main())

Error Codes

500 - Internal Server Error
429 - Too Many Requests (You are sending requests too quickly.)
503 - Service Unavailable (The engine is currently overloaded, please try again later)
401 - Unauthorized (Incorrect API key provided/Invalid Authentication)
403 - Access Denied
422 - Unprocurable Entity (Request body is malformed)

Chat Request Overview

import asyncio

from ai21 import AsyncAI21Client
from ai21.models.chat import ChatMessage

messages = [ChatMessage(content="What is the meaning of life?", role="user")]

client = AsyncAI21Client()


async def main():
    response = await client.chat.completions.create(
        messages=messages,
        model="jamba-large",
        stream=True,
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content, end="")


asyncio.run(main())

import asyncio

from ai21 import AsyncAI21Client
from ai21.models.chat import ChatMessage

messages = [ChatMessage(content="What is the meaning of life?", role="user")]

client = AsyncAI21Client()


async def main():
    response = await client.chat.completions.create(
        messages=messages,
        model="jamba-large",
        stream=True,
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content, end="")


asyncio.run(main())

Response details

Non-streaming results

A successful non-streamed response includes the following members:

string

Unique ID for each request (not message). Same ID for all responses in a streaming response.

model

string

The model used to generate the response.

choices

object[]

One or more responses, depending on the n parameter from the request. Each response includes the following members.

Show properties

index

object

Zero-based index of the message in the list of messages. Note that this might not correspond with the position in the response list.

message

object

The message generated by the model. Includes two fields: role and content.

tool_calls

object[]

Show properties

string

ID of the tool call, generated by the model.

type

string

The type of tool called. Currently the only possible value is “function”.

function

object

The invoked function.

Show properties

name

string

The name of the function, which you specified in your request.

arguments

object

A JSON object containing the function’s parameters and values.

finish_reason

string

Why the message ended.

Show properties

stop

string

The response ended naturally as a complete answer (due to end-of-sequence token) or because the model generated a stop sequence provided in the request.

length

string

The response ended by reaching max_tokens.

usage

object

The token counts for this request. Per-token billing is based on the prompt token and completion token counts and rates.

Show properties

prompt_tokens

integer

Number of tokens in the prompt for this request. The prompt token contains the entire message history and extra tokens for combining messages, proportional to the number of messages.

completion_tokens

integer

Number of tokens in the response message.

total_tokens

integer

prompt_tokens and completion_tokens.

Streamed results

Setting stream = true in the request will return a stream of messages, each containing one token. You can read more about streaming calls using the SDK.

The final message will be data: [DONE]. All other messages will have data set to a JSON object with the following fields:

data

object

An object containing either an object with the following members, or the string “DONE” for the last message.

string

Unique ID for each request (not message). Same ID for all streaming responses.

choices

object

An array with one object containing the following fields:

index

integer

Always zero.

delta

object

The first message in the stream will be an object set to {"role":"assistant"}.
Subsequent messages will have an object {"content": **token**} with the generated token.

finish_reason

string

Why the message ended.

Show properties

usage

object

Show properties

prompt_tokens

integer

Number of tokens in the prompt for this request. The prompt token contains the entire message history and extra tokens for combining messages, proportional to the number of messages.

completion_tokens

integer

Number of tokens in the response message.

total_tokens

integer

prompt_tokens and completion_tokens.

usage will be null except for the last chunk which contains the token usage statistics for the entire request.

import asyncio

from ai21 import AsyncAI21Client
from ai21.models.chat import ChatMessage

messages = [ChatMessage(content="What is the meaning of life?", role="user")]

client = AsyncAI21Client()


async def main():
    response = await client.chat.completions.create(
        messages=messages,
        model="jamba-large",
        stream=True,
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content, end="")


asyncio.run(main())

Error Codes

Chat Request Overview

import asyncio

from ai21 import AsyncAI21Client
from ai21.models.chat import ChatMessage

messages = [ChatMessage(content="What is the meaning of life?", role="user")]

client = AsyncAI21Client()


async def main():
    response = await client.chat.completions.create(
        messages=messages,
        model="jamba-large",
        stream=True,
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content, end="")


asyncio.run(main())

Response details

Non-streaming results

Streamed results

Error Codes

Using the APIs

Foundation Models

Conversational RAG - Library Management Methods

Conversational RAG - API Reference

AI21 Maestro [Beta]

Chat Response

Response details

Non-streaming results

Streamed results

Error Codes

​Response details

​Non-streaming results

​Streamed results

​Error Codes

Using the APIs

Foundation Models

Conversational RAG - Library Management Methods

Conversational RAG - API Reference

AI21 Maestro [Beta]

​Response details

​Non-streaming results

​Streamed results

​Error Codes

Response details

Non-streaming results

Streamed results

Error Codes

Response details

Non-streaming results

Streamed results

Error Codes