import asyncio

from ai21 import AsyncAI21Client
from ai21.models.chat import ChatMessage

messages = [ChatMessage(content="What is the meaning of life?", role="user")]

client = AsyncAI21Client()


async def main():
    response = await client.chat.completions.create(
        messages=messages,
        model="jamba-large",
        stream=True,
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content, end="")


asyncio.run(main())

Response details

Non-streaming results

A successful non-streamed response includes the following members:

id
string

Unique ID for each request (not message). Same ID for all responses in a streaming response.

model
string

The model used to generate the response.

choices
object[]

One or more responses, depending on the n parameter from the request. Each response includes the following members.

finish_reason
string

Why the message ended.

usage
object

The token counts for this request. Per-token billing is based on the prompt token and completion token counts and rates.

Streamed results

Setting stream = true in the request will return a stream of messages, each containing one token. You can read more about streaming calls using the SDK.

The final message will be data: [DONE]. All other messages will have data set to a JSON object with the following fields:

data
object

An object containing either an object with the following members, or the string “DONE” for the last message.

id
string

Unique ID for each request (not message). Same ID for all streaming responses.

choices
object

An array with one object containing the following fields:

index
integer

Always zero.

delta
object
  • The first message in the stream will be an object set to {"role":"assistant"}.
  • Subsequent messages will have an object {"content": **token**} with the generated token.
finish_reason
string

Why the message ended.

usage will be null except for the last chunk which contains the token usage statistics for the entire request.

import asyncio

from ai21 import AsyncAI21Client
from ai21.models.chat import ChatMessage

messages = [ChatMessage(content="What is the meaning of life?", role="user")]

client = AsyncAI21Client()


async def main():
    response = await client.chat.completions.create(
        messages=messages,
        model="jamba-large",
        stream=True,
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content, end="")


asyncio.run(main())

Error Codes

500 - Internal Server Error
429 - Too Many Requests (You are sending requests too quickly.)
503 - Service Unavailable (The engine is currently overloaded, please try again later)
401 - Unauthorized (Incorrect API key provided/Invalid Authentication)
403 - Access Denied
422 - Unprocurable Entity (Request body is malformed)


import asyncio

from ai21 import AsyncAI21Client
from ai21.models.chat import ChatMessage

messages = [ChatMessage(content="What is the meaning of life?", role="user")]

client = AsyncAI21Client()


async def main():
    response = await client.chat.completions.create(
        messages=messages,
        model="jamba-large",
        stream=True,
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content, end="")


asyncio.run(main())

Response details

Non-streaming results

A successful non-streamed response includes the following members:

id
string

Unique ID for each request (not message). Same ID for all responses in a streaming response.

model
string

The model used to generate the response.

choices
object[]

One or more responses, depending on the n parameter from the request. Each response includes the following members.

finish_reason
string

Why the message ended.

usage
object

The token counts for this request. Per-token billing is based on the prompt token and completion token counts and rates.

Streamed results

Setting stream = true in the request will return a stream of messages, each containing one token. You can read more about streaming calls using the SDK.

The final message will be data: [DONE]. All other messages will have data set to a JSON object with the following fields:

data
object

An object containing either an object with the following members, or the string “DONE” for the last message.

id
string

Unique ID for each request (not message). Same ID for all streaming responses.

choices
object

An array with one object containing the following fields:

index
integer

Always zero.

delta
object
  • The first message in the stream will be an object set to {"role":"assistant"}.
  • Subsequent messages will have an object {"content": **token**} with the generated token.
finish_reason
string

Why the message ended.

usage will be null except for the last chunk which contains the token usage statistics for the entire request.

import asyncio

from ai21 import AsyncAI21Client
from ai21.models.chat import ChatMessage

messages = [ChatMessage(content="What is the meaning of life?", role="user")]

client = AsyncAI21Client()


async def main():
    response = await client.chat.completions.create(
        messages=messages,
        model="jamba-large",
        stream=True,
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content, end="")


asyncio.run(main())

Error Codes

500 - Internal Server Error
429 - Too Many Requests (You are sending requests too quickly.)
503 - Service Unavailable (The engine is currently overloaded, please try again later)
401 - Unauthorized (Incorrect API key provided/Invalid Authentication)
403 - Access Denied
422 - Unprocurable Entity (Request body is malformed)