Tools#

gptme’s tools enable AI agents to execute code, edit files, browse the web, process images, and interact with your computer.

Overview#

📁 File System#

Read - Read files in any format
Save - Create and overwrite files
Patch - Apply precise changes to existing files

💻 Code & Development#

Python - Execute Python code interactively with full library access
Shell - Run shell commands and manage system processes

🌐 Web & Research#

Browser - Browse websites, take screenshots, and read web content
RAG - Index and search through documentation and codebases
Chats - Search past conversations for context and references

👁️ Visual & Interactive#

Vision - Analyze images, diagrams, and visual content
Screenshot - Capture your screen for visual context
Computer - Control desktop applications through visual interface

⚡ Advanced Workflows#

Tmux - Manage long-running processes in terminal sessions
Subagent - Delegate subtasks to specialized agent instances
TTS - Convert responses to speech for hands-free interaction

Combinations#

The real power emerges when tools work together:

Web Research + Code: Browser + Python - Browse documentation and implement solutions
Visual Development: Vision + Patch - Analyze UI mockups and update code accordingly
System Automation: Shell + Python - Combine system commands with data processing
Interactive Debugging: Screenshot + Computer - Visual debugging and interface automation
Knowledge-Driven Development: RAG + Chats - Learn from documentation and past conversations

Shell#

The assistant can execute shell commands with bash by outputting code blocks with shell as the language.

Instructions

The given command will be executed in a stateful bash shell.
The shell tool will respond with the output of the execution.

Shell commands can be configured to timeout by setting the GPTME_SHELL_TIMEOUT environment variable.
- Set GPTME_SHELL_TIMEOUT=30 for a 30-second timeout
- Set GPTME_SHELL_TIMEOUT=0 to disable timeout
- Invalid values default to 60 seconds
- If not set, commands run without timeout

These programs are available, among others:
- apt-get
- docker
- git
- pandoc

Examples

User	list the current directory
Assistant	To list the files in the current directory, use `ls`: shell ls
System	Ran command: `ls` shell file1.txt file2.txt

The assistant can learn context by exploring the filesystem

User	learn about the project
Assistant	Lets start by checking the files shell git ls-files
System	stdout README.md main.py
Assistant	Now lets check the README shell cat README.md
System	stdout (contents of README.md)
Assistant	Now we check main.py shell cat main.py
System	stdout (contents of main.py)
Assistant	The project is...

Create vue project

User

Create a new vue project with typescript and pinia named fancy-project

Assistant

Sure! Let's create a new vue project with TypeScript and Pinia named fancy-project:shell
npm init vue@latest fancy-project --yes -- --typescript --pinia

System

stdout
> npx
> create-vue

Vue.js - The Progressive JavaScript Framework

Scaffolding project in ./fancy-project...

Proper quoting for complex content

User

add a comment with backticks and special characters

Assistant

When passing complex content with special characters, use single quotes to prevent shell interpretation:shell
echo 'Content with `backticks` and $variables that should not be interpreted' > example.txt

gptme.tools.shell.execute_shell(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) → Generator[Message, None, None]: Executes a shell command and returns the output.

gptme.tools.shell.execute_shell_impl(cmd: str, _: Path | None, confirm: Callable[[str], bool], timeout: float | None = None) → Generator[Message, None, None]: Execute shell command and format output.

gptme.tools.shell.get_shell_command(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) → str: Get the shell command from code/args/kwargs.

gptme.tools.shell.preview_shell(cmd: str, _: Path | None) → str: Prepare preview for shell command.

gptme.tools.shell.strip_ansi_codes(text: str) → str: Strip ANSI escape sequences from text.

Python#

The assistant can execute Python code blocks.

It uses IPython to do so, and persists the IPython instance between calls to give a REPL-like experience.

Instructions

Use this tool to execute Python code in an interactive IPython session.
It will respond with the output and result of the execution.

Examples

Result of the last expression will be returned

User	What is 2 + 2?
Assistant	ipython 2 + 2
System	Executed code block. result 4

Write a function and call it

User

compute fib 10

Assistant

To compute the 10th Fibonacci number, we can run the following code:ipython
def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)
fib(10)

System

Executed code block.result
55

class gptme.tools.python.TeeIO

__init__(original_stream)

write(s)

Write string to file.

Returns the number of characters written, which is always equal to the length of the string.

gptme.tools.python.execute_python(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: ~collections.abc.Callable[[str], bool] = <function <lambda>>) → Generator[Message, None, None]: Executes a python codeblock and returns the output.

gptme.tools.python.get_installed_python_libraries() → list[str]: Check if a select list of Python libraries are installed.

gptme.tools.python.register_function(func: T) → T: Decorator to register a function to be available in the IPython instance.

Tmux#

You can use the tmux tool to run long-lived and/or interactive applications in a tmux session. Requires tmux to be installed.

This tool is suitable to run long-running commands or interactive applications that require user input. Examples of such commands: npm run dev, python3 server.py, python3 train.py, etc. It allows for inspecting pane contents and sending input.

Instructions

You can use the tmux tool to run long-lived and/or interactive applications in a tmux session.

This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands are: `npm run dev`, `npm create vue@latest`, `python3 server.py`, `python3 train.py`, etc.

Available commands:
- new_session <command>: Start a new tmux session with the given command
- send_keys <session_id> <keys> [<keys>]: Send keys to the specified session
- inspect_pane <session_id>: Show the current content of the specified pane
- kill_session <session_id>: Terminate the specified tmux session
- list_sessions: Show all active tmux sessions

Examples

Managing a dev server

User	Start the dev server
Assistant	Certainly! To start the dev server we should use tmux: tmux new_session 'npm run dev'
System	Running `npm run dev` in session gptme_1

User

Can you show me the current content of the pane?

Assistant

Of course! Let's inspect the pane content:tmux
inspect_pane gptme_1

System

output
Server is running on localhost:5600

User	Stop the dev server
Assistant	I'll send 'Ctrl+C' to the pane to stop the server: tmux send_keys 0 C-c
System	Sent 'C-c' to pane 0

Get info from ncurses applications

User	start top and give me a summary
Assistant	Sure! Let's start the top command in a tmux session: tmux new_session 'top'
System	Running `top` in session gptme_1. output (output from top shown here)
Assistant	The load is...

Send keys to a session

User	start ipython
Assistant	Let's start an ipython session: tmux new_session 'ipython'
System	Running `ipython` in session 2. output (output from ipython shown here)

User	Run 'print("Hello, world!")' in the ipython session tmux send_keys 2 'print("Hello, world!")' Enter

gptme.tools.tmux.execute_tmux(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) → Generator[Message, None, None]: Executes a command in tmux and returns the output.

Subagent#

A subagent tool for gptme

Lets gptme break down a task into smaller parts, and delegate them to subagents.

Examples

User	compute fib 13 using a subagent
Assistant	Starting a subagent to compute the 13th Fibonacci number. ipython subagent("fib-13", "compute the 13th Fibonacci number")
System	Subagent started successfully.
Assistant	Now we need to wait for the subagent to finish the task. ipython subagent_wait("fib-13")
System	{"status": "success", "result": "The 13th Fibonacci number is 233"}.

class gptme.tools.subagent.ReturnType

ReturnType(status: Literal[‘running’, ‘success’, ‘failure’], result: str | None = None)

__init__(status: Literal['running', 'success', 'failure'], result: str | None = None) → None

class gptme.tools.subagent.Subagent

Subagent(agent_id: str, prompt: str, thread: threading.Thread, logdir: pathlib.Path)

__init__(agent_id: str, prompt: str, thread: Thread, logdir: Path) → None

gptme.tools.subagent.subagent(agent_id: str, prompt: str): Runs a subagent and returns the resulting JSON output.

gptme.tools.subagent.subagent_status(agent_id: str) → dict: Returns the status of a subagent.

gptme.tools.subagent.subagent_wait(agent_id: str) → dict: Waits for a subagent to finish. Timeout is 1 minute.

Read#

Read the contents of a file.

Instructions

Read the content of the given file. Use the `cat` command with the `shell` tool.

Examples

User	read file.txt
Assistant	shell cat file.txt

Save#

Gives the assistant the ability to save whole files, or append to them.

Instructions

Create or overwrite a file with the given content.

The path can be relative to the current directory, or absolute.
If the current directory changes, the path will be relative to the new directory.

Examples

User	write a hello world script to hello.py
Assistant	save hello.py print("Hello world")
System	Saved to `hello.py`

User	make it all-caps
Assistant	save hello.py print("HELLO WORLD")
System	Saved to `hello.py`

Instructions

Append the given content to a file.`.

Examples

User	append a print "Hello world" to hello.py
Assistant	append hello.py print("Hello world")
System	Appended to `hello.py`

gptme.tools.save.check_for_placeholders(content: str) → bool: Check if content contains placeholder lines.

gptme.tools.save.execute_append(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) → Generator[Message, None, None]: Append code to a file.

gptme.tools.save.execute_append_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) → Generator[Message, None, None]: Actual append implementation.

gptme.tools.save.execute_save(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) → Generator[Message, None, None]: Save code to a file.

gptme.tools.save.execute_save_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) → Generator[Message, None, None]: Actual save implementation.

gptme.tools.save.preview_append(content: str, path: Path | None) → str | None: Prepare preview content for append operation.

gptme.tools.save.preview_save(content: str, path: Path | None) → str | None: Prepare preview content for save operation.

Patch#

Gives the LLM agent the ability to patch text files, by using a adapted version git conflict markers.

Environment Variables:

GPTME_PATCH_RECOVERY: If set to “true” or “1”, returns the file content in error messages: when patches don’t match. This helps the assistant recover faster by seeing the actual file contents.

Instructions

To patch/modify files, we use an adapted version of git conflict markers.

This can be used to edit files, without having to rewrite the whole file.
Only one patch block can be written per tool use. Extra ORIGINAL/UPDATED blocks will be ignored.
Try to keep the patch as small as possible. Avoid placeholders, as they may make the patch fail.

To keep the patch small, try to scope the patch to imports/function/class.
If the patch is large, consider using the save tool to rewrite the whole file.

Examples

User

patch `src/hello.py` to ask for the name of the usersrc/hello.py
def hello():
    print("Hello world")

if __name__ == "__main__":
    hello()

Assistant

patch src/hello.py
<<<<<<< ORIGINAL
    print("Hello world")
=======
    name = input("What is your name? ")
    print(f"Hello {name}")
>>>>>>> UPDATED

System

Patch applied

class gptme.tools.patch.Patch

Patch(original: str, updated: str)

__init__(original: str, updated: str) → None

diff_minimal(strip_context=False) → str: Show a minimal diff of the patch. Note that a minimal diff isn’t necessarily a unique diff.

gptme.tools.patch.apply(codeblock: str, content: str) → str: Applies multiple patches in codeblock to content.

gptme.tools.patch.execute_patch(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: ~collections.abc.Callable[[str], bool] = <function <lambda>>) → Generator[Message, None, None]: Applies the patch.

gptme.tools.patch.execute_patch_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) → Generator[Message, None, None]: Actual patch implementation.

gptme.tools.patch.preview_patch(content: str, path: Path | None) → str | None: Prepare preview content for patch operation.

Vision#

Tools for viewing images, giving the assistant vision.

Requires a model which supports vision, such as GPT-4o, Anthropic, and Llama 3.2.

gptme.tools.vision.view_image(image_path: Path | str) → Message: View an image. Large images (>1MB) will be automatically scaled down.

Screenshot#

A simple screenshot tool, using screencapture on macOS and scrot or gnome-screenshot on Linux.

gptme.tools.screenshot.screenshot(path: Path | None = None) → Path: Take a screenshot and save it to a file.

Browser#

Tools to let the assistant control a browser, including:

loading pages
reading their contents
searching the web
taking screenshots (Playwright only)

Two backends are available:

Playwright backend:

Full browser automation with screenshots

Installation:

pipx install 'gptme[browser]'
# We need to use the same version of Playwright as the one installed by gptme
# when downloading the browser binaries. gptme will attempt this automatically
PW_VERSION=$(pipx runpip gptme show playwright | grep Version | cut -d' ' -f2)
pipx run playwright==$PW_VERSION install chromium-headless-shell

Lynx backend:

Text-only browser for basic page reading and searching
No screenshot support

Installation:

# On Ubuntu
sudo apt install lynx
# On macOS
brew install lynx
# or any other way that gets you the `lynx` command

Note

This is an experimental feature. It needs some work to be more robust and useful.

Examples

Reading docs

User	how does gptme work?
Assistant	Let's read the docs. ipython read_url('https://gptme.org/docs/')

Answer question from URL with browsing

User	find out which is the latest ActivityWatch version from superuserlabs.org
Assistant	Let's browse the site. ipython read_url('https://superuserlabs.org/')
System	https://superuserlabs.org/ ... [ActivityWatch](https://activitywatch.net/) ...
Assistant	Couldn't find the answer on the page. Following link to the ActivityWatch website. ipython read_url('https://activitywatch.net/')
System	https://activitywatch.net/ ... Download latest version v0.12.2 ...
Assistant	The latest version of ActivityWatch is v0.12.2

Searching

User	who is the founder of ActivityWatch?
Assistant	Let's search for that. ipython search('ActivityWatch founder')
System	results 1. [ActivityWatch](https://activitywatch.net/) ...
Assistant	Following link to the ActivityWatch website. ipython read_url('https://activitywatch.net/')
System	https://activitywatch.net/ ... The ActivityWatch project was founded by Erik Bjäreholt in 2016. ...
Assistant	The founder of ActivityWatch is Erik Bjäreholt.

Searching with Perplexity

User	what are the latest developments in AI?
Assistant	Let me search for that using Perplexity AI. ipython search('latest developments in AI', 'perplexity')
System	result Based on recent developments, AI has seen significant advances...
Assistant	Based on the search results, here are the latest AI developments...

Take screenshot of page

User

take a screenshot of the ActivityWatch website

Assistant

Certainly! I'll use the browser tool to screenshot the ActivityWatch website.ipython
screenshot_url('https://activitywatch.net')

System

result
Screenshot saved to screenshot.png

Read URL and check browser logs

User	read this page and check if there are any console errors
Assistant	I'll read the page first and then check the browser logs. ipython read_url('https://example.com')
System	https://example.com This domain is for use in illustrative examples...
Assistant	Now let me check the browser console logs: ipython read_logs()
System	result No logs or errors captured.

gptme.tools.browser.read_logs() → str: Read browser console logs from the last read URL.

gptme.tools.browser.read_url(url: str) → str: Read a webpage in a text format.

gptme.tools.browser.screenshot_url(url: str, path: Path | str | None = None) → Path: Take a screenshot of a webpage.

gptme.tools.browser.search(query: str, engine: Literal['google', 'duckduckgo', 'perplexity'] = 'google') → str: Search for a query on a search engine.

gptme.tools.browser.search_playwright(query: str, engine: Literal['google', 'duckduckgo', 'perplexity'] = 'google') → str: Search for a query on a search engine using Playwright.

Chats#

List, search, and summarize past conversation logs.

Examples

Search for a specific topic in past conversations

User

Can you find any mentions of "python" in our past conversations?

Assistant

Certainly! I'll search our past conversations for mentions of "python" using the search_chats function.ipython
search_chats('python')

gptme.tools.chats.list_chats(max_results: int = 5, metadata=False, include_summary: bool = False) → None

List recent chat conversations and optionally summarize them using an LLM.

Parameters:

max_results (int) – Maximum number of conversations to display.
include_summary (bool) – Whether to include a summary of each conversation. If True, uses an LLM to generate a comprehensive summary. If False, uses a simple strategy showing snippets of the first and last messages.

gptme.tools.chats.read_chat(id: str, max_results: int = 5, incl_system=False) → None

Read a specific conversation log.

Parameters:

id (str) – The id of the conversation to read.
max_results (int) – Maximum number of messages to display.
incl_system (bool) – Whether to include system messages.

gptme.tools.chats.search_chats(query: str, max_results: int = 5, system=False, sort: Literal['date', 'count'] = 'date') → None

Search past conversation logs for the given query and print a summary of the results.

Parameters:

query (str) – The search query.
max_results (int) – Maximum number of conversations to display.
system (bool) – Whether to include system messages in the search.

Computer#

Warning

The computer use interface is experimental and has serious security implications. Please use with caution and see Anthropic’s documentation on computer use for additional guidance.

Tool for computer interaction for X11 or macOS environments, including screen capture, keyboard, and mouse control.

The computer tool provides direct interaction with the desktop environment. Similar to Anthropic’s computer use demo, but integrated with gptme’s architecture.

Features

Keyboard input simulation
Mouse control (movement, clicks, dragging)
Screen capture with automatic scaling
Cursor position tracking

Installation

On Linux, requires X11 and xdotool:

# On Debian/Ubuntu
sudo apt install xdotool

# On Arch Linux
sudo pacman -S xdotool

On macOS, uses native screencapture and external tool cliclick:

brew install cliclick

You need to give your terminal both screen recording and accessibility permissions in System Preferences.

Configuration

The tool uses these environment variables:

DISPLAY: X11 display to use (default: “:1”, Linux only)
WIDTH: Screen width (default: 1024)
HEIGHT: Screen height (default: 768)

Usage

The tool supports these actions:

Keyboard:

key: Send key sequence (e.g., “Return”, “Control_L+c”)
type: Type text with realistic delays

Mouse:

mouse_move: Move mouse to coordinates
left_click: Click left mouse button
right_click: Click right mouse button
middle_click: Click middle mouse button
double_click: Double click left mouse button
left_click_drag: Click and drag to coordinates

Screen:

screenshot: Take and view a screenshot
cursor_position: Get current mouse position

The tool automatically handles screen resolution scaling to ensure optimal performance with LLM vision capabilities.

Tips for Complex Operations

For complex operations involving multiple keypresses, you can use semicolon-separated sequences with key:

Examples

Filling a login form: t:username;kp:tab;t:password;kp:return
Switching applications: cmd+tab on macOS, alt+Tab on Linux
(macOS) Opening Spotlight and searching: cmd+space;t:firefox;return

Using a single sequence for complex operations ensures proper timing and recognition of keyboard shortcuts.

Instructions

You can interact with the computer through the `computer` Python function.
Works on both Linux (X11) and macOS.

The key input syntax works consistently across platforms with:

Available actions:
- key: Send key sequence using a unified syntax:
  - Type text: "t:Hello World"
  - Press key: "return", "esc", "tab"
  - Key combination: "ctrl+c", "cmd+space"
  - Chain commands: "cmd+space;t:firefox;return"
- type: Type text with realistic delays (legacy method)
- mouse_move: Move mouse to coordinates
- left_click, right_click, middle_click, double_click: Mouse clicks
- left_click_drag: Click and drag to coordinates
- screenshot: Take and view a screenshot
- cursor_position: Get current mouse position

Note: Key names are automatically mapped between platforms.
Common modifiers (ctrl, alt, cmd/super, shift) work consistently across platforms.

Examples

User	Take a screenshot of the desktop
Assistant	I'll capture the screen using the screenshot tool. ipython computer("screenshot")
System	Viewing image...

User	Type "Hello, World!" into the active window
Assistant	I'll type the text with realistic delays. ipython computer("type", text="Hello, World!")
System	Typed text: Hello, World!

User	Move the mouse to coordinates (100, 200) and click
Assistant	I'll move the mouse and perform a left click. ipython computer("mouse_move", coordinate=(100, 200))
System	Moved mouse to 100,200 ipython computer("left_click")
System	Performed left_click

User	Get the current mouse position
Assistant	I'll get the cursor position. ipython computer("cursor_position")
System	Cursor position: X=512,Y=384

User	Double-click at current position
Assistant	I'll perform a double-click. ipython computer("double_click")
System	Performed double_click

User	Open a new browser tab
Assistant	I'll open a new browser tab. ipython computer("key", text="ctrl+t")
System	Sent key sequence: ctrl+t

class gptme.tools.computer.ComboOperation

class gptme.tools.computer.KeyOperation

class gptme.tools.computer.TextOperation

gptme.tools.computer.computer(action: Literal['key', 'type', 'mouse_move', 'left_click', 'left_click_drag', 'right_click', 'middle_click', 'double_click', 'screenshot', 'cursor_position'], text: str | None = None, coordinate: tuple[int, int] | None = None) → Message | None

Perform computer interactions in X11 or macOS environments.

Parameters:

action – The type of action to perform
text – Text to type or key sequence to send
coordinate – X,Y coordinates for mouse actions

RAG#

RAG (Retrieval-Augmented Generation) tool for context-aware assistance.

The RAG tool provides context-aware assistance by indexing and semantically searching text files.

Installation

The RAG tool requires the gptme-rag CLI to be installed:

pipx install gptme-rag

Configuration

Configure RAG in your gptme.toml:

[rag]
enabled = true
post_process = false # Whether to post-process the context with an LLM to extract the most relevant information
post_process_model = "openai/gpt-4o-mini" # Which model to use for post-processing
post_process_prompt = "" # Optional prompt to use for post-processing (overrides default prompt)
workspace_only = true # Whether to only search in the workspace directory, or the whole RAG index
paths = [] # List of paths to include in the RAG index. Has no effect if workspace_only is true.

Features

Manual Search and Indexing
- Index project documentation with rag_index
- Search indexed documents with rag_search
- Check index status with rag_status
Automatic Context Enhancement
- Retrieves semantically similar documents
- Preserves conversation flow with hidden context messages

Instructions

Use RAG to index and semantically search through text files such as documentation and code.

Examples

User	Index the current directory
Assistant	Let me index the current directory with RAG. ipython rag_index()
System	Indexed 1 paths

User

Search for documentation about functions

Assistant

I'll search for function-related documentation.ipython
rag_search("function documentation")

System

### docs/api.md
Functions are documented using docstrings...

User	Show index status
Assistant	I'll check the current status of the RAG index. ipython rag_status()
System	Index contains 42 documents

gptme.tools.rag.get_rag_context(query: str, rag_config: RagConfig, workspace: Path | None = None) → Message: Get relevant context chunks from RAG for the user query.

gptme.tools.rag.init() → ToolSpec: Initialize the RAG tool.

gptme.tools.rag.rag_enhance_messages(messages: list[Message], workspace: Path | None = None) → list[Message]: Enhance messages with context from RAG.

gptme.tools.rag.rag_index(*paths: str, glob: str | None = None) → str: Index documents in specified paths.

gptme.tools.rag.rag_search(query: str, return_full: bool = False) → str: Search indexed documents.

gptme.tools.rag.rag_status() → str: Show index status.

TTS#

Text-to-speech (TTS) tool for generating audio from text.

Uses Kokoro for local TTS generation.

Usage

# Install gptme with TTS extras
pipx install gptme[tts]

# Clone gptme repository
git clone https://github.com/gptme/gptme.git
cd gptme

# Run the Kokoro TTS server (needs uv installed)
./scripts/tts_server.py

# Start gptme (should detect the running TTS server)
gptme 'hello, testing tts'

Environment Variables

GPTME_TTS_VOICE: Set the voice to use for TTS. Available voices depend on the TTS server.
GPTME_VOICE_FINISH: If set to “true” or “1”, waits for speech to finish before exiting. This is useful when you want to ensure the full message is spoken.

gptme.tools.tts.clean_for_speech(content: str) → str

Clean content for speech by removing:

<thinking> tags and their content
Tool use blocks (`tool ...`)
Italic markup
Additional (details) that may not need to be spoken
Emojis and other non-speech content
Hash symbols from Markdown headers (e.g., “# Header” → “Header”)

Returns the cleaned content suitable for speech.

gptme.tools.tts.ensure_tts_thread(): Ensure TTS processor thread is running.

gptme.tools.tts.is_available() → bool: Check if the TTS server is available.

gptme.tools.tts.join_short_sentences(sentences: list[str], min_length: int = 100, max_length: int | None = 300) → list[str]

Join consecutive sentences that are shorter than min_length, or up to max_length.

Parameters:

sentences – List of sentences to potentially join
min_length – Minimum length threshold for joining short sentences
max_length – Maximum length for combined sentences. If specified, tries to make sentences as long as possible up to this limit

Returns:

List of sentences, with short ones combined or optimized for max length

gptme.tools.tts.set_speed(speed): Set the speaking speed (0.5 to 2.0, default 1.3).

gptme.tools.tts.set_volume(volume): Set the volume for TTS playback (0.0 to 1.0).

gptme.tools.tts.speak(text, block=False, interrupt=True, clean=True)

Speak text using Kokoro TTS server.

The TTS system supports:

Speed control via set_speed(0.5 to 2.0)
Volume control via set_volume(0.0 to 1.0)
Automatic chunking of long texts
Non-blocking operation with optional blocking mode
Interruption of current speech
Background processing of TTS requests

Parameters:

text – Text to speak
block – If True, wait for audio to finish playing
interrupt – If True, stop current speech and clear queue before speaking
clean – If True, clean text for speech (remove markup, emojis, etc.)

Example

>>> from gptme.tools.tts import speak, set_speed, set_volume
>>> set_volume(0.8)  # Set comfortable volume
>>> set_speed(1.2)   # Slightly faster speech
>>> speak("Hello, world!")  # Non-blocking by default
>>> speak("Important message!", interrupt=True)  # Interrupts previous speech

gptme.tools.tts.split_text(text: str) → list[str]

Split text into sentences, respecting paragraphs, markdown lists, and decimal numbers.

This function handles: - Paragraph breaks - Markdown list items (-, *, 1.) - Decimal numbers (won’t split 3.14) - Sentence boundaries (.!?)

Returns:: List of sentences and paragraph breaks (empty strings)

gptme.tools.tts.stop() → None: Stop audio playback and clear queues.

MCP#

The Model Context Protocol (MCP) allows you to extend gptme with custom tools through external servers. See MCP for configuration and usage details.

Tools

Contents

Tools#

Overview#

📁 File System#

💻 Code & Development#

🌐 Web & Research#

👁️ Visual & Interactive#

⚡ Advanced Workflows#

Combinations#

Shell#

Python#

Tmux#

Subagent#

Read#

Save#

Patch#

Vision#

Screenshot#

Browser#

Chats#

Computer#

RAG#

TTS#

MCP#