Module: LLM Client

class diskurs.llm_client.BaseOaiApiLLMClient

Bases: LLMClient

__init__(client, model, tokenizer, max_tokens, max_repeat=3)
Parameters:
  • client (AsyncOpenAI) – The OpenAI client instance used to interact with the OpenAI API.

  • model (str) – The model identifier string that specifies which version/model of the OpenAI API to use for generating responses.

  • tokenizer (Callable[[str], int] | Encoding)

  • max_tokens (int)

  • max_repeat (int)

classmethod concatenate_user_prompt_with_llm_response(conversation, completion)

Creates a list of ChatMessages that combines the user prompt with the LLM response. Ensures a flat list, even if there are multiple messages in the user prompt (as is the case when multiple tools are executed in a single pass).

Parameters:
  • conversation (Conversation) – the conversation containing the user prompt

  • completion (ChatCompletion) – the response from the LLM model

Returns:

Flat list of ChatMessages containing the user prompt and LLM response

Return type:

list[ChatMessage]

count_tokens(text)

Counts the number of tokens in a text string. :param text: The text string to tokenize. :return: The number of tokens in the text string.

Parameters:

text (str)

Return type:

int

count_tokens_in_conversation(messages)

Count the number of tokens used by a list of messages i.e. chat history. The implementation is based on OpenAI’s token counting guidelines.

Parameters:

messages (list[dict])

Return type:

int

count_tokens_of_tool_descriptions(tool_descriptions)

Return the number of tokens used by the tool i.e. function description. Unfortunately, there’s no documented way of counting those tokens, therefore we resort to best effort approach, hoping this implementation is a true upper bound. The implementation is taken from: https://community.openai.com/t/how-to-calculate-the-tokens-when-using-function-call/266573/11

Parameters:

tool_descriptions (list[dict[str, Any]]) – The description of all the tools

Returns:

The number of tokens used by the tools

Return type:

int

count_tokens_recursively(value)
count_tokens_tool_responses(user_prompt_tool_responses)
Return type:

tuple[int, list[tuple[ChatMessage, int]]]

abstractmethod classmethod create(**kwargs)

Creates a new instance of the LLM client.

This factory method initializes a new LLM client with the provided configuration. It handles authentication, connection setup, and any other initialization needed to establish a working connection to the language model service.

Parameters:

kwargs – Configuration parameters for the client, such as API keys, endpoint URLs, model names, and other provider-specific settings.

Returns:

A properly initialized instance of the LLM client.

Return type:

Self

format_conversation_for_llm(conversation, tools=None, message_type=MessageType.CONVERSATION)

Formats the conversation object into a dictionary that can be sent to the LLM model. This comprises the user prompt, chat history, and tool descriptions. :param conversation: Contains all interactions so far :param tools: The descriptions of all tools that the agent can use :param message_type: The message type used to filter the chat history. If MessageType.CONDUCTOR,

all messages will be rendered

Returns:

A JSON-serializable dictionary containing the conversation data ready for the LLM

Parameters:
Return type:

dict[str, Any]

static format_message_for_llm(message)

Formats a ChatMessage object into a dictionary that can be sent to the LLM model. Used by the format_conversation_for_llm method to prepare individual messages for the LLM.

Parameters:

message (ChatMessage) – Message to be formatted

Returns:

JSON-serializable dictionary containing the message data

Return type:

dict[str, str]

format_messages_for_llm(conversation, message_type)
static format_tool_description_for_llm(tool)

Formats a ToolDescription object into a dictionary that can be sent to the LLM model. :param tool: Tool description to be formatted :return: JSON-serializable dictionary containing the tool data

Parameters:

tool (ToolDescription)

Return type:

dict[str, Any]

async generate(conversation, tools=None, message_type=MessageType.CONVERSATION)

Generates a response from the LLM model for the given conversation. Handles conversion from Conversation to LLM request format, sending the request to the LLM model, and converting the response back to a Conversation object.

Parameters:
  • conversation (Conversation) – The conversation object containing the user prompt and chat history.

  • tools (ToolDescription | None) – Description of all the tools that the agent can use

  • message_type – The message type used to filter the chat history. If MessageType.CONDUCTOR, all messages will be rendered

Returns:

Updated conversation object with the LLM response appended to the chat history.

Return type:

Conversation

classmethod is_tool_call(completion)
Parameters:

completion (ChatCompletion)

Return type:

bool

static is_tool_response(user_prompt)

Check if the conversation contains tool responses.

Parameters:

user_prompt (ChatMessage | list[ChatMessage])

Return type:

bool

classmethod llm_response_to_chat_message(completion, agent_name, message_type)

Converts the message returned by the LLM to a typed ChatMessage. :param completion: The response from the LLM model :param agent_name: The name of the agent whose question the completion is a response to :param message_type: The type of message to be created :return: A ChatMessage object containing the structured response

Parameters:
  • completion (ChatCompletion)

  • agent_name (str)

  • message_type (MessageType)

Return type:

ChatMessage

max_tokens: int
async send_request(body)
Parameters:

body (dict[str, Any])

Return type:

ChatCompletion

should_truncate_tool_response(tool_responses, fraction=4)

Determine if we should attempt to truncate a tool response as a first strategy. Only applies when over token limits and the tool responses contain significant tokens.

Parameters:
  • tool_responses (ChatMessage | list[ChatMessage]) – The tool response messages to check

  • fraction – The fraction of max tokens to use as a threshold for truncation

Returns:

True if the tool responses contain enough tokens to make truncation worthwhile

Return type:

bool

truncate_chat_history(messages, n_tokens_tool_descriptions)

Truncate the chat history to fit within the maximum token limit while preserving context and essential messages.

Return type:

list[dict]

truncate_tool_responses(tool_responses, fraction=2)

Truncates tool responses to fit within the maximum token limit. We first obtain the token count for each tool response sorted by size. Then we truncate the largest tool responses until we fit within the limit. We intelligently estimate the number of tokens that can be removed in each turn. :param tool_responses: The tool responses to truncate :param fraction: The fraction the tool response should be reduced by in relation to the max tokens

Parameters:
Return type:

list[ChatMessage]

async use_as_tool(prompt, content)

Summarizes content to fit within token limit.

Parameters:
  • prompt (str) – Prompt to use for summarization

  • content (str) – Content to summarize

  • fraction – Fraction of the max tokens to use for summarization

Returns:

Summarized content

Return type:

str

water_filling_truncate_responses(tool_responses_with_counts, total_allowed_tokens, truncation_message='\n[Response truncated]')

Given a list of tuples (message, token_count), return a new list of messages where the token counts have been reduced using a water-filling / iterative thresholding approach to meet a total_allowed_tokens budget.

This function doesn’t directly alter the text. It assumes that you have a function like _truncate_text(content, new_token_count) that returns a version of the content limited to new_token_count tokens.

Parameters:
  • tool_responses_with_counts – List of tuples (ChatMessage, token_count)

  • total_allowed_tokens – The total number of tokens allowed after reduction

  • truncation_message (str) – Message to append to truncated responses

Returns:

List of truncated ChatMessages

class diskurs.llm_client.OpenAILLMClient

Bases: BaseOaiApiLLMClient

__init__(client, model, tokenizer, max_tokens, max_repeat=3)
Parameters:
  • client (AsyncOpenAI) – The OpenAI client instance used to interact with the OpenAI API.

  • model (str) – The model identifier string that specifies which version/model of the OpenAI API to use for generating responses.

  • tokenizer (Callable[[str], int] | Encoding)

  • max_tokens (int)

  • max_repeat (int)

classmethod concatenate_user_prompt_with_llm_response(conversation, completion)

Creates a list of ChatMessages that combines the user prompt with the LLM response. Ensures a flat list, even if there are multiple messages in the user prompt (as is the case when multiple tools are executed in a single pass).

Parameters:
  • conversation (Conversation) – the conversation containing the user prompt

  • completion (ChatCompletion) – the response from the LLM model

Returns:

Flat list of ChatMessages containing the user prompt and LLM response

Return type:

list[ChatMessage]

count_tokens(text)

Counts the number of tokens in a text string. :param text: The text string to tokenize. :return: The number of tokens in the text string.

Parameters:

text (str)

Return type:

int

count_tokens_in_conversation(messages)

Count the number of tokens used by a list of messages i.e. chat history. The implementation is based on OpenAI’s token counting guidelines.

Parameters:

messages (list[dict])

Return type:

int

count_tokens_of_tool_descriptions(tool_descriptions)

Return the number of tokens used by the tool i.e. function description. Unfortunately, there’s no documented way of counting those tokens, therefore we resort to best effort approach, hoping this implementation is a true upper bound. The implementation is taken from: https://community.openai.com/t/how-to-calculate-the-tokens-when-using-function-call/266573/11

Parameters:

tool_descriptions (list[dict[str, Any]]) – The description of all the tools

Returns:

The number of tokens used by the tools

Return type:

int

count_tokens_recursively(value)
count_tokens_tool_responses(user_prompt_tool_responses)
Return type:

tuple[int, list[tuple[ChatMessage, int]]]

classmethod create(**kwargs)

Creates a new instance of the LLM client.

This factory method initializes a new LLM client with the provided configuration. It handles authentication, connection setup, and any other initialization needed to establish a working connection to the language model service.

Parameters:

kwargs – Configuration parameters for the client, such as API keys, endpoint URLs, model names, and other provider-specific settings.

Returns:

A properly initialized instance of the LLM client.

Return type:

Self

format_conversation_for_llm(conversation, tools=None, message_type=MessageType.CONVERSATION)

Formats the conversation object into a dictionary that can be sent to the LLM model. This comprises the user prompt, chat history, and tool descriptions. :param conversation: Contains all interactions so far :param tools: The descriptions of all tools that the agent can use :param message_type: The message type used to filter the chat history. If MessageType.CONDUCTOR,

all messages will be rendered

Returns:

A JSON-serializable dictionary containing the conversation data ready for the LLM

Parameters:
Return type:

dict[str, Any]

static format_message_for_llm(message)

Formats a ChatMessage object into a dictionary that can be sent to the LLM model. Used by the format_conversation_for_llm method to prepare individual messages for the LLM.

Parameters:

message (ChatMessage) – Message to be formatted

Returns:

JSON-serializable dictionary containing the message data

Return type:

dict[str, str]

format_messages_for_llm(conversation, message_type)
static format_tool_description_for_llm(tool)

Formats a ToolDescription object into a dictionary that can be sent to the LLM model. :param tool: Tool description to be formatted :return: JSON-serializable dictionary containing the tool data

Parameters:

tool (ToolDescription)

Return type:

dict[str, Any]

async generate(conversation, tools=None, message_type=MessageType.CONVERSATION)

Generates a response from the LLM model for the given conversation. Handles conversion from Conversation to LLM request format, sending the request to the LLM model, and converting the response back to a Conversation object.

Parameters:
  • conversation (Conversation) – The conversation object containing the user prompt and chat history.

  • tools (ToolDescription | None) – Description of all the tools that the agent can use

  • message_type – The message type used to filter the chat history. If MessageType.CONDUCTOR, all messages will be rendered

Returns:

Updated conversation object with the LLM response appended to the chat history.

Return type:

Conversation

classmethod is_tool_call(completion)
Parameters:

completion (ChatCompletion)

Return type:

bool

static is_tool_response(user_prompt)

Check if the conversation contains tool responses.

Parameters:

user_prompt (ChatMessage | list[ChatMessage])

Return type:

bool

classmethod llm_response_to_chat_message(completion, agent_name, message_type)

Converts the message returned by the LLM to a typed ChatMessage. :param completion: The response from the LLM model :param agent_name: The name of the agent whose question the completion is a response to :param message_type: The type of message to be created :return: A ChatMessage object containing the structured response

Parameters:
  • completion (ChatCompletion)

  • agent_name (str)

  • message_type (MessageType)

Return type:

ChatMessage

max_tokens: int
async send_request(body)
Parameters:

body (dict[str, Any])

Return type:

ChatCompletion

should_truncate_tool_response(tool_responses, fraction=4)

Determine if we should attempt to truncate a tool response as a first strategy. Only applies when over token limits and the tool responses contain significant tokens.

Parameters:
  • tool_responses (ChatMessage | list[ChatMessage]) – The tool response messages to check

  • fraction – The fraction of max tokens to use as a threshold for truncation

Returns:

True if the tool responses contain enough tokens to make truncation worthwhile

Return type:

bool

truncate_chat_history(messages, n_tokens_tool_descriptions)

Truncate the chat history to fit within the maximum token limit while preserving context and essential messages.

Return type:

list[dict]

truncate_tool_responses(tool_responses, fraction=2)

Truncates tool responses to fit within the maximum token limit. We first obtain the token count for each tool response sorted by size. Then we truncate the largest tool responses until we fit within the limit. We intelligently estimate the number of tokens that can be removed in each turn. :param tool_responses: The tool responses to truncate :param fraction: The fraction the tool response should be reduced by in relation to the max tokens

Parameters:
Return type:

list[ChatMessage]

async use_as_tool(prompt, content)

Summarizes content to fit within token limit.

Parameters:
  • prompt (str) – Prompt to use for summarization

  • content (str) – Content to summarize

  • fraction – Fraction of the max tokens to use for summarization

Returns:

Summarized content

Return type:

str

water_filling_truncate_responses(tool_responses_with_counts, total_allowed_tokens, truncation_message='\n[Response truncated]')

Given a list of tuples (message, token_count), return a new list of messages where the token counts have been reduced using a water-filling / iterative thresholding approach to meet a total_allowed_tokens budget.

This function doesn’t directly alter the text. It assumes that you have a function like _truncate_text(content, new_token_count) that returns a version of the content limited to new_token_count tokens.

Parameters:
  • tool_responses_with_counts – List of tuples (ChatMessage, token_count)

  • total_allowed_tokens – The total number of tokens allowed after reduction

  • truncation_message (str) – Message to append to truncated responses

Returns:

List of truncated ChatMessages