Module: LLM Client
- class diskurs.llm_client.BaseOaiApiLLMClient
Bases:
LLMClient- __init__(client, model, tokenizer, max_tokens, max_repeat=3)
- Parameters:
client (AsyncOpenAI) – The OpenAI client instance used to interact with the OpenAI API.
model (str) – The model identifier string that specifies which version/model of the OpenAI API to use for generating responses.
tokenizer (Callable[[str], int] | Encoding)
max_tokens (int)
max_repeat (int)
- classmethod concatenate_user_prompt_with_llm_response(conversation, completion)
Creates a list of ChatMessages that combines the user prompt with the LLM response. Ensures a flat list, even if there are multiple messages in the user prompt (as is the case when multiple tools are executed in a single pass).
- Parameters:
conversation (Conversation) – the conversation containing the user prompt
completion (ChatCompletion) – the response from the LLM model
- Returns:
Flat list of ChatMessages containing the user prompt and LLM response
- Return type:
list[ChatMessage]
- count_tokens(text)
Counts the number of tokens in a text string. :param text: The text string to tokenize. :return: The number of tokens in the text string.
- Parameters:
text (str)
- Return type:
int
- count_tokens_in_conversation(messages)
Count the number of tokens used by a list of messages i.e. chat history. The implementation is based on OpenAI’s token counting guidelines.
- Parameters:
messages (list[dict])
- Return type:
int
- count_tokens_of_tool_descriptions(tool_descriptions)
Return the number of tokens used by the tool i.e. function description. Unfortunately, there’s no documented way of counting those tokens, therefore we resort to best effort approach, hoping this implementation is a true upper bound. The implementation is taken from: https://community.openai.com/t/how-to-calculate-the-tokens-when-using-function-call/266573/11
- Parameters:
tool_descriptions (list[dict[str, Any]]) – The description of all the tools
- Returns:
The number of tokens used by the tools
- Return type:
int
- count_tokens_recursively(value)
- count_tokens_tool_responses(user_prompt_tool_responses)
- Return type:
tuple[int, list[tuple[ChatMessage, int]]]
- abstractmethod classmethod create(**kwargs)
Creates a new instance of the LLM client.
This factory method initializes a new LLM client with the provided configuration. It handles authentication, connection setup, and any other initialization needed to establish a working connection to the language model service.
- Parameters:
kwargs – Configuration parameters for the client, such as API keys, endpoint URLs, model names, and other provider-specific settings.
- Returns:
A properly initialized instance of the LLM client.
- Return type:
Self
- format_conversation_for_llm(conversation, tools=None, message_type=MessageType.CONVERSATION)
Formats the conversation object into a dictionary that can be sent to the LLM model. This comprises the user prompt, chat history, and tool descriptions. :param conversation: Contains all interactions so far :param tools: The descriptions of all tools that the agent can use :param message_type: The message type used to filter the chat history. If MessageType.CONDUCTOR,
all messages will be rendered
- Returns:
A JSON-serializable dictionary containing the conversation data ready for the LLM
- Parameters:
conversation (Conversation)
tools (list[ToolDescription] | None)
- Return type:
dict[str, Any]
- static format_message_for_llm(message)
Formats a ChatMessage object into a dictionary that can be sent to the LLM model. Used by the format_conversation_for_llm method to prepare individual messages for the LLM.
- Parameters:
message (ChatMessage) – Message to be formatted
- Returns:
JSON-serializable dictionary containing the message data
- Return type:
dict[str, str]
- format_messages_for_llm(conversation, message_type)
- static format_tool_description_for_llm(tool)
Formats a ToolDescription object into a dictionary that can be sent to the LLM model. :param tool: Tool description to be formatted :return: JSON-serializable dictionary containing the tool data
- Parameters:
tool (ToolDescription)
- Return type:
dict[str, Any]
- async generate(conversation, tools=None, message_type=MessageType.CONVERSATION)
Generates a response from the LLM model for the given conversation. Handles conversion from Conversation to LLM request format, sending the request to the LLM model, and converting the response back to a Conversation object.
- Parameters:
conversation (Conversation) – The conversation object containing the user prompt and chat history.
tools (ToolDescription | None) – Description of all the tools that the agent can use
message_type – The message type used to filter the chat history. If MessageType.CONDUCTOR, all messages will be rendered
- Returns:
Updated conversation object with the LLM response appended to the chat history.
- Return type:
- classmethod is_tool_call(completion)
- Parameters:
completion (ChatCompletion)
- Return type:
bool
- static is_tool_response(user_prompt)
Check if the conversation contains tool responses.
- Parameters:
user_prompt (ChatMessage | list[ChatMessage])
- Return type:
bool
- classmethod llm_response_to_chat_message(completion, agent_name, message_type)
Converts the message returned by the LLM to a typed ChatMessage. :param completion: The response from the LLM model :param agent_name: The name of the agent whose question the completion is a response to :param message_type: The type of message to be created :return: A ChatMessage object containing the structured response
- Parameters:
completion (ChatCompletion)
agent_name (str)
message_type (MessageType)
- Return type:
- max_tokens: int
- async send_request(body)
- Parameters:
body (dict[str, Any])
- Return type:
ChatCompletion
- should_truncate_tool_response(tool_responses, fraction=4)
Determine if we should attempt to truncate a tool response as a first strategy. Only applies when over token limits and the tool responses contain significant tokens.
- Parameters:
tool_responses (ChatMessage | list[ChatMessage]) – The tool response messages to check
fraction – The fraction of max tokens to use as a threshold for truncation
- Returns:
True if the tool responses contain enough tokens to make truncation worthwhile
- Return type:
bool
- truncate_chat_history(messages, n_tokens_tool_descriptions)
Truncate the chat history to fit within the maximum token limit while preserving context and essential messages.
- Return type:
list[dict]
- truncate_tool_responses(tool_responses, fraction=2)
Truncates tool responses to fit within the maximum token limit. We first obtain the token count for each tool response sorted by size. Then we truncate the largest tool responses until we fit within the limit. We intelligently estimate the number of tokens that can be removed in each turn. :param tool_responses: The tool responses to truncate :param fraction: The fraction the tool response should be reduced by in relation to the max tokens
- Parameters:
tool_responses (ChatMessage | list[ChatMessage])
fraction (int)
- Return type:
list[ChatMessage]
- async use_as_tool(prompt, content)
Summarizes content to fit within token limit.
- Parameters:
prompt (str) – Prompt to use for summarization
content (str) – Content to summarize
fraction – Fraction of the max tokens to use for summarization
- Returns:
Summarized content
- Return type:
str
- water_filling_truncate_responses(tool_responses_with_counts, total_allowed_tokens, truncation_message='\n[Response truncated]')
Given a list of tuples (message, token_count), return a new list of messages where the token counts have been reduced using a water-filling / iterative thresholding approach to meet a total_allowed_tokens budget.
This function doesn’t directly alter the text. It assumes that you have a function like _truncate_text(content, new_token_count) that returns a version of the content limited to new_token_count tokens.
- Parameters:
tool_responses_with_counts – List of tuples (ChatMessage, token_count)
total_allowed_tokens – The total number of tokens allowed after reduction
truncation_message (str) – Message to append to truncated responses
- Returns:
List of truncated ChatMessages
- class diskurs.llm_client.OpenAILLMClient
Bases:
BaseOaiApiLLMClient- __init__(client, model, tokenizer, max_tokens, max_repeat=3)
- Parameters:
client (AsyncOpenAI) – The OpenAI client instance used to interact with the OpenAI API.
model (str) – The model identifier string that specifies which version/model of the OpenAI API to use for generating responses.
tokenizer (Callable[[str], int] | Encoding)
max_tokens (int)
max_repeat (int)
- classmethod concatenate_user_prompt_with_llm_response(conversation, completion)
Creates a list of ChatMessages that combines the user prompt with the LLM response. Ensures a flat list, even if there are multiple messages in the user prompt (as is the case when multiple tools are executed in a single pass).
- Parameters:
conversation (Conversation) – the conversation containing the user prompt
completion (ChatCompletion) – the response from the LLM model
- Returns:
Flat list of ChatMessages containing the user prompt and LLM response
- Return type:
list[ChatMessage]
- count_tokens(text)
Counts the number of tokens in a text string. :param text: The text string to tokenize. :return: The number of tokens in the text string.
- Parameters:
text (str)
- Return type:
int
- count_tokens_in_conversation(messages)
Count the number of tokens used by a list of messages i.e. chat history. The implementation is based on OpenAI’s token counting guidelines.
- Parameters:
messages (list[dict])
- Return type:
int
- count_tokens_of_tool_descriptions(tool_descriptions)
Return the number of tokens used by the tool i.e. function description. Unfortunately, there’s no documented way of counting those tokens, therefore we resort to best effort approach, hoping this implementation is a true upper bound. The implementation is taken from: https://community.openai.com/t/how-to-calculate-the-tokens-when-using-function-call/266573/11
- Parameters:
tool_descriptions (list[dict[str, Any]]) – The description of all the tools
- Returns:
The number of tokens used by the tools
- Return type:
int
- count_tokens_recursively(value)
- count_tokens_tool_responses(user_prompt_tool_responses)
- Return type:
tuple[int, list[tuple[ChatMessage, int]]]
- classmethod create(**kwargs)
Creates a new instance of the LLM client.
This factory method initializes a new LLM client with the provided configuration. It handles authentication, connection setup, and any other initialization needed to establish a working connection to the language model service.
- Parameters:
kwargs – Configuration parameters for the client, such as API keys, endpoint URLs, model names, and other provider-specific settings.
- Returns:
A properly initialized instance of the LLM client.
- Return type:
Self
- format_conversation_for_llm(conversation, tools=None, message_type=MessageType.CONVERSATION)
Formats the conversation object into a dictionary that can be sent to the LLM model. This comprises the user prompt, chat history, and tool descriptions. :param conversation: Contains all interactions so far :param tools: The descriptions of all tools that the agent can use :param message_type: The message type used to filter the chat history. If MessageType.CONDUCTOR,
all messages will be rendered
- Returns:
A JSON-serializable dictionary containing the conversation data ready for the LLM
- Parameters:
conversation (Conversation)
tools (list[ToolDescription] | None)
- Return type:
dict[str, Any]
- static format_message_for_llm(message)
Formats a ChatMessage object into a dictionary that can be sent to the LLM model. Used by the format_conversation_for_llm method to prepare individual messages for the LLM.
- Parameters:
message (ChatMessage) – Message to be formatted
- Returns:
JSON-serializable dictionary containing the message data
- Return type:
dict[str, str]
- format_messages_for_llm(conversation, message_type)
- static format_tool_description_for_llm(tool)
Formats a ToolDescription object into a dictionary that can be sent to the LLM model. :param tool: Tool description to be formatted :return: JSON-serializable dictionary containing the tool data
- Parameters:
tool (ToolDescription)
- Return type:
dict[str, Any]
- async generate(conversation, tools=None, message_type=MessageType.CONVERSATION)
Generates a response from the LLM model for the given conversation. Handles conversion from Conversation to LLM request format, sending the request to the LLM model, and converting the response back to a Conversation object.
- Parameters:
conversation (Conversation) – The conversation object containing the user prompt and chat history.
tools (ToolDescription | None) – Description of all the tools that the agent can use
message_type – The message type used to filter the chat history. If MessageType.CONDUCTOR, all messages will be rendered
- Returns:
Updated conversation object with the LLM response appended to the chat history.
- Return type:
- classmethod is_tool_call(completion)
- Parameters:
completion (ChatCompletion)
- Return type:
bool
- static is_tool_response(user_prompt)
Check if the conversation contains tool responses.
- Parameters:
user_prompt (ChatMessage | list[ChatMessage])
- Return type:
bool
- classmethod llm_response_to_chat_message(completion, agent_name, message_type)
Converts the message returned by the LLM to a typed ChatMessage. :param completion: The response from the LLM model :param agent_name: The name of the agent whose question the completion is a response to :param message_type: The type of message to be created :return: A ChatMessage object containing the structured response
- Parameters:
completion (ChatCompletion)
agent_name (str)
message_type (MessageType)
- Return type:
- max_tokens: int
- async send_request(body)
- Parameters:
body (dict[str, Any])
- Return type:
ChatCompletion
- should_truncate_tool_response(tool_responses, fraction=4)
Determine if we should attempt to truncate a tool response as a first strategy. Only applies when over token limits and the tool responses contain significant tokens.
- Parameters:
tool_responses (ChatMessage | list[ChatMessage]) – The tool response messages to check
fraction – The fraction of max tokens to use as a threshold for truncation
- Returns:
True if the tool responses contain enough tokens to make truncation worthwhile
- Return type:
bool
- truncate_chat_history(messages, n_tokens_tool_descriptions)
Truncate the chat history to fit within the maximum token limit while preserving context and essential messages.
- Return type:
list[dict]
- truncate_tool_responses(tool_responses, fraction=2)
Truncates tool responses to fit within the maximum token limit. We first obtain the token count for each tool response sorted by size. Then we truncate the largest tool responses until we fit within the limit. We intelligently estimate the number of tokens that can be removed in each turn. :param tool_responses: The tool responses to truncate :param fraction: The fraction the tool response should be reduced by in relation to the max tokens
- Parameters:
tool_responses (ChatMessage | list[ChatMessage])
fraction (int)
- Return type:
list[ChatMessage]
- async use_as_tool(prompt, content)
Summarizes content to fit within token limit.
- Parameters:
prompt (str) – Prompt to use for summarization
content (str) – Content to summarize
fraction – Fraction of the max tokens to use for summarization
- Returns:
Summarized content
- Return type:
str
- water_filling_truncate_responses(tool_responses_with_counts, total_allowed_tokens, truncation_message='\n[Response truncated]')
Given a list of tuples (message, token_count), return a new list of messages where the token counts have been reduced using a water-filling / iterative thresholding approach to meet a total_allowed_tokens budget.
This function doesn’t directly alter the text. It assumes that you have a function like _truncate_text(content, new_token_count) that returns a version of the content limited to new_token_count tokens.
- Parameters:
tool_responses_with_counts – List of tuples (ChatMessage, token_count)
total_allowed_tokens – The total number of tokens allowed after reduction
truncation_message (str) – Message to append to truncated responses
- Returns:
List of truncated ChatMessages