Build conversational experiences that interact with your organizational documents
The Conversational RAG endpoint enables you to build conversational experiences that interact with your organizational data. Using a chat interface allows your users to refine or ask follow-up questions, with context retained between questions. The solution leverages AI21’s RAG Engine, which ensures that answers are based solely on information from your documents.
This system is designed to effortlessly extract the right information from your organizational data.
Just ask your question (and any follow-ups), and get clear, accurate answers—no need for prompt engineering or detailed system messages.
The message history of this chat, from oldest (index 0) to newest. Messages must be alternating user
/assistant
messages, starting with a user message. Maximum total size for the array is about 256K tokens. Each message includes the following fields.
If specified, only looks in documents with a path metadata that matches this path or path prefix. That is, specifying “/pets/” would match documents with the path “/pets/” or “/pets/dogs/”. Use to focus the question on specific path in your library. See filtering documents.
Specify labels to restrict answers to documents with any of these labels. Labels are exact, case-sensitive matches, no substring matches. See filtering documents.
Specify which files should be included in the results. See filtering documents.
The maximum total number of document segments to use in formulating the answer. More segments means greater potential accuracy at the cost of speed and more tokens. If not specified, the system uses an optimal default.
Range: 0.5 – 1.5 How “similar” a source segment should be to the query in order to be added to the context used to answer the question. Similarity is judged by the RAG engine’s embedding values of the question and the source. If not specified, the system uses an optimal default.
Determines the scope of text segments added to the context during retrieval.
Used only when retrieval_strategy = add_neighbors
. Specifies how many neighbor segments to combine with each candidate segment when generating the context during the retrieval step. Neighbors have a different topic then the candidate segment, but including them can add more context to the LLM, and potentially provide a more coherent answer. The actual number of neighbors added might be less if the segment is close to the beginning or end of the document.
Defines the ratio between dense and sparse retrieval values used when evaluating segments in the library for eligibility for the context. Dense values are the embedding value, a conceptual or topical representation of the segment, as represented by a large vector. Sparse values is more like keyword search within the segment. 1.0 means using only dense embeddings; 0.0 means using only sparse embeddings. If you want to limit your sources to those that use specific terms, and your answers seem too broad, you might lower this value slightly. If not specified, the system uses an optimal default. Range: 0.0 – 1.0
Filtering documents
You can filter the pool of potential documents by document ID, label, or path. Note that these are intersection filters — that is, if you specify both a label value and a path value, only documents with both the label and the path will be matched. The only variation is the labels
parameter, where any label in the list can be matched.
Filters | Matching docs |
---|---|
labels=["red", "green", "blue"] | matches label “red” OR “green” OR “blue” |
labels=["red", "green", "blue"] AND path="/colors/" | matches (label “red” OR “green” OR “blue”) AND path=“/colors/any/other/suffix” |
A successful response includes the following fields:
True if an answer was found in the provided documents, False if an answer could not be found. It can be simpler to check this value rather than to look at the response text and evaluate if it includes an answer.
True if the RAG engine was able to find segments related to the user’s query.
An array with one object that holds the generated response.
Contains the following fields.
A unique ID for the request (not the message). Repeated identical requests get different IDs. However, for a streaming response, the ID will be the same for all responses in the stream.
The questions that the model extracted from the user input thread. The model extracts the question from the most recent user message, taking into account the entire message history. If there isn’t a relevant query in the message, this will be null
and nothing will be retrieved.
Each object represents a segment used to generate the answer. Each source object contains the following fields.
The token counts for this request. Per-token billing is based on the prompt token and completion token counts and rates.
Build conversational experiences that interact with your organizational documents
The Conversational RAG endpoint enables you to build conversational experiences that interact with your organizational data. Using a chat interface allows your users to refine or ask follow-up questions, with context retained between questions. The solution leverages AI21’s RAG Engine, which ensures that answers are based solely on information from your documents.
This system is designed to effortlessly extract the right information from your organizational data.
Just ask your question (and any follow-ups), and get clear, accurate answers—no need for prompt engineering or detailed system messages.
The message history of this chat, from oldest (index 0) to newest. Messages must be alternating user
/assistant
messages, starting with a user message. Maximum total size for the array is about 256K tokens. Each message includes the following fields.
If specified, only looks in documents with a path metadata that matches this path or path prefix. That is, specifying “/pets/” would match documents with the path “/pets/” or “/pets/dogs/”. Use to focus the question on specific path in your library. See filtering documents.
Specify labels to restrict answers to documents with any of these labels. Labels are exact, case-sensitive matches, no substring matches. See filtering documents.
Specify which files should be included in the results. See filtering documents.
The maximum total number of document segments to use in formulating the answer. More segments means greater potential accuracy at the cost of speed and more tokens. If not specified, the system uses an optimal default.
Range: 0.5 – 1.5 How “similar” a source segment should be to the query in order to be added to the context used to answer the question. Similarity is judged by the RAG engine’s embedding values of the question and the source. If not specified, the system uses an optimal default.
Determines the scope of text segments added to the context during retrieval.
Used only when retrieval_strategy = add_neighbors
. Specifies how many neighbor segments to combine with each candidate segment when generating the context during the retrieval step. Neighbors have a different topic then the candidate segment, but including them can add more context to the LLM, and potentially provide a more coherent answer. The actual number of neighbors added might be less if the segment is close to the beginning or end of the document.
Defines the ratio between dense and sparse retrieval values used when evaluating segments in the library for eligibility for the context. Dense values are the embedding value, a conceptual or topical representation of the segment, as represented by a large vector. Sparse values is more like keyword search within the segment. 1.0 means using only dense embeddings; 0.0 means using only sparse embeddings. If you want to limit your sources to those that use specific terms, and your answers seem too broad, you might lower this value slightly. If not specified, the system uses an optimal default. Range: 0.0 – 1.0
Filtering documents
You can filter the pool of potential documents by document ID, label, or path. Note that these are intersection filters — that is, if you specify both a label value and a path value, only documents with both the label and the path will be matched. The only variation is the labels
parameter, where any label in the list can be matched.
Filters | Matching docs |
---|---|
labels=["red", "green", "blue"] | matches label “red” OR “green” OR “blue” |
labels=["red", "green", "blue"] AND path="/colors/" | matches (label “red” OR “green” OR “blue”) AND path=“/colors/any/other/suffix” |
A successful response includes the following fields:
True if an answer was found in the provided documents, False if an answer could not be found. It can be simpler to check this value rather than to look at the response text and evaluate if it includes an answer.
True if the RAG engine was able to find segments related to the user’s query.
An array with one object that holds the generated response.
Contains the following fields.
A unique ID for the request (not the message). Repeated identical requests get different IDs. However, for a streaming response, the ID will be the same for all responses in the stream.
The questions that the model extracted from the user input thread. The model extracts the question from the most recent user message, taking into account the entire message history. If there isn’t a relevant query in the message, this will be null
and nothing will be retrieved.
Each object represents a segment used to generate the answer. Each source object contains the following fields.
The token counts for this request. Per-token billing is based on the prompt token and completion token counts and rates.