Tools#
Tools available in gptme.
The tools can be grouped into the following categories:
Execution
Files
Network
Browser - Access the web
Vision
Vision - View images
Screenshot - Take screenshots
Computer - Control the computer
Other
Shell#
The assistant can execute shell commands with bash by outputting code blocks with shell as the language.
Instructions
The given command will be executed in a stateful bash shell.
The shell tool will respond with the output of the execution.
These programs are available, among others:
- apt-get
- docker
- git
- pandoc
Examples
User |
list the current directory |
Assistant |
To list the files in the current directory, use `ls`: |
System |
Ran command: `ls` |
The assistant can learn context by exploring the filesystem
User |
learn about the project |
Assistant |
Lets start by checking the files |
System |
|
Assistant |
Now lets check the README |
System |
|
Assistant |
Now we check main.py |
System |
|
Assistant |
The project is... |
Create vue project
User |
Create a new vue project with typescript and pinia named fancy-project |
Assistant |
Sure! Let's create a new vue project with TypeScript and Pinia named fancy-project: |
System |
|
- gptme.tools.shell.execute_shell(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Executes a shell command and returns the output.
- gptme.tools.shell.execute_shell_impl(cmd: str, _: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Execute shell command and format output.
- gptme.tools.shell.get_shell_command(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) str
Get the shell command from code/args/kwargs.
- gptme.tools.shell.preview_shell(cmd: str, _: Path | None) str
Prepare preview for shell command.
Python#
The assistant can execute Python code blocks.
It uses IPython to do so, and persists the IPython instance between calls to give a REPL-like experience.
Instructions
Use this tool to execute Python code in an interactive IPython session.
It will respond with the output and result of the execution.
Examples
Result of the last expression will be returned
User |
What is 2 + 2? |
Assistant |
|
System |
Executed code block. |
Write a function and call it
User |
compute fib 10 |
Assistant |
To compute the 10th Fibonacci number, we can run the following code: |
System |
Executed code block. |
- class gptme.tools.python.TeeIO
- __init__(original_stream)
- write(s)
Write string to file.
Returns the number of characters written, which is always equal to the length of the string.
- gptme.tools.python.execute_python(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: ~collections.abc.Callable[[str], bool] = <function <lambda>>) Generator[Message, None, None]
Executes a python codeblock and returns the output.
- gptme.tools.python.get_installed_python_libraries() list[str]
Check if a select list of Python libraries are installed.
- gptme.tools.python.register_function(func: T) T
Decorator to register a function to be available in the IPython instance.
Tmux#
You can use the tmux tool to run long-lived and/or interactive applications in a tmux session. Requires tmux to be installed.
This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands: npm run dev
, python3 server.py
, python3 train.py
, etc.
It allows for inspecting pane contents and sending input.
Instructions
You can use the tmux tool to run long-lived and/or interactive applications in a tmux session.
This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands are: `npm run dev`, `npm create vue@latest`, `python3 server.py`, `python3 train.py`, etc.
Available commands:
- new_session <command>: Start a new tmux session with the given command
- send_keys <session_id> <keys> [<keys>]: Send keys to the specified session
- inspect_pane <session_id>: Show the current content of the specified pane
- kill_session <session_id>: Terminate the specified tmux session
- list_sessions: Show all active tmux sessions
Examples
Managing a dev server
User |
Start the dev server |
Assistant |
Certainly! To start the dev server we should use tmux: |
System |
Running `npm run dev` in session gptme_1 |
User |
Can you show me the current content of the pane? |
Assistant |
Of course! Let's inspect the pane content: |
System |
|
User |
Stop the dev server |
Assistant |
I'll send 'Ctrl+C' to the pane to stop the server: |
System |
Sent 'C-c' to pane 0 |
Get info from ncurses applications
User |
start top and give me a summary |
Assistant |
Sure! Let's start the top command in a tmux session: |
System |
Running `top` in session gptme_1. |
Assistant |
The load is... |
Send keys to a session
User |
start ipython |
Assistant |
Let's start an ipython session: |
System |
Running `ipython` in session 2. |
User |
Run 'print("Hello, world!")' in the ipython session |
- gptme.tools.tmux.execute_tmux(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Executes a command in tmux and returns the output.
Subagent#
A subagent tool for gptme
Lets gptme break down a task into smaller parts, and delegate them to subagents.
Examples
User |
compute fib 13 using a subagent |
Assistant |
Starting a subagent to compute the 13th Fibonacci number. |
System |
Subagent started successfully. |
Assistant |
Now we need to wait for the subagent to finish the task. |
System |
{"status": "success", "result": "The 13th Fibonacci number is 233"}. |
- class gptme.tools.subagent.ReturnType
ReturnType(status: Literal[‘running’, ‘success’, ‘failure’], result: str | None = None)
- __init__(status: Literal['running', 'success', 'failure'], result: str | None = None) None
- class gptme.tools.subagent.Subagent
Subagent(agent_id: str, prompt: str, thread: threading.Thread, logdir: pathlib.Path)
- __init__(agent_id: str, prompt: str, thread: Thread, logdir: Path) None
- gptme.tools.subagent.subagent(agent_id: str, prompt: str)
Runs a subagent and returns the resulting JSON output.
- gptme.tools.subagent.subagent_status(agent_id: str) dict
Returns the status of a subagent.
- gptme.tools.subagent.subagent_wait(agent_id: str) dict
Waits for a subagent to finish. Timeout is 1 minute.
Read#
Read the contents of a file.
Instructions
Read the content of the given file. Use the `cat` command with the `shell` tool.
Examples
User |
read file.txt |
Assistant |
|
Save#
Gives the assistant the ability to save whole files, or append to them.
Instructions
Create or overwrite a file with the given content.
The path can be relative to the current directory, or absolute.
If the current directory changes, the path will be relative to the new directory.
Examples
User |
write a hello world script to hello.py |
Assistant |
|
System |
Saved to `hello.py` |
User |
make it all-caps |
Assistant |
|
System |
Saved to `hello.py` |
Instructions
Append the given content to a file.`.
Examples
User |
append a print "Hello world" to hello.py |
Assistant |
|
System |
Appended to `hello.py` |
- gptme.tools.save.execute_append(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Append code to a file.
- gptme.tools.save.execute_append_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Actual append implementation.
- gptme.tools.save.execute_save(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Save code to a file.
- gptme.tools.save.execute_save_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Actual save implementation.
- gptme.tools.save.preview_append(content: str, path: Path | None) str | None
Prepare preview content for append operation.
- gptme.tools.save.preview_save(content: str, path: Path | None) str | None
Prepare preview content for save operation.
Patch#
Gives the LLM agent the ability to patch text files, by using a adapted version git conflict markers.
- Environment Variables:
- GPTME_PATCH_RECOVERY: If set to “true” or “1”, returns the file content in error messages
when patches don’t match. This helps the assistant recover faster by seeing the actual file contents.
Instructions
To patch/modify files, we use an adapted version of git conflict markers.
This can be used to edit files, without having to rewrite the whole file.
Only one patch block can be written per tool use. Extra ORIGINAL/UPDATED blocks will be ignored.
Try to keep the patch as small as possible. Avoid placeholders, as they may make the patch fail.
To keep the patch small, try to scope the patch to imports/function/class.
If the patch is large, consider using the save tool to rewrite the whole file.
Examples
User |
patch `src/hello.py` to ask for the name of the user |
Assistant |
|
System |
Patch applied |
- class gptme.tools.patch.Patch
Patch(original: str, updated: str)
- __init__(original: str, updated: str) None
- diff_minimal(strip_context=False) str
Show a minimal diff of the patch. Note that a minimal diff isn’t necessarily a unique diff.
- gptme.tools.patch.apply(codeblock: str, content: str) str
Applies multiple patches in
codeblock
tocontent
.
- gptme.tools.patch.execute_patch(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: ~collections.abc.Callable[[str], bool] = <function <lambda>>) Generator[Message, None, None]
Applies the patch.
- gptme.tools.patch.execute_patch_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Actual patch implementation.
- gptme.tools.patch.preview_patch(content: str, path: Path | None) str | None
Prepare preview content for patch operation.
Vision#
Tools for viewing images, giving the assistant vision.
Requires a model which supports vision, such as GPT-4o, Anthropic, and Llama 3.2.
- gptme.tools.vision.view_image(image_path: Path | str) Message
View an image. Large images (>1MB) will be automatically scaled down.
Screenshot#
A simple screenshot tool, using screencapture on macOS and scrot or gnome-screenshot on Linux.
- gptme.tools.screenshot.screenshot(path: Path | None = None) Path
Take a screenshot and save it to a file.
Browser#
- Tools to let the assistant control a browser, including:
loading pages
reading their contents
searching the web
taking screenshots (Playwright only)
Two backends are available:
- Playwright backend:
Full browser automation with screenshots
Installation:
pipx install 'gptme[browser]' # We need to use the same version of Playwright as the one installed by gptme # when downloading the browser binaries. gptme will attempt this automatically PW_VERSION=$(pipx runpip gptme show playwright | grep Version | cut -d' ' -f2) pipx run playwright==$PW_VERSION install chromium-headless-shell
- Lynx backend:
Text-only browser for basic page reading and searching
No screenshot support
Installation:
# On Ubuntu sudo apt install lynx # On macOS brew install lynx # or any other way that gets you the `lynx` command
Note
This is an experimental feature. It needs some work to be more robust and useful.
Examples
Answer question from URL with browsing
User |
find out which is the latest ActivityWatch version from superuserlabs.org |
Assistant |
Let's browse the site. |
System |
|
Assistant |
Couldn't find the answer on the page. Following link to the ActivityWatch website. |
System |
|
Assistant |
The latest version of ActivityWatch is v0.12.2 |
Searching
User |
who is the founder of ActivityWatch? |
Assistant |
Let's search for that. |
System |
|
Assistant |
Following link to the ActivityWatch website. |
System |
|
Assistant |
The founder of ActivityWatch is Erik Bjäreholt. |
Take screenshot of page
User |
take a screenshot of the ActivityWatch website |
Assistant |
Certainly! I'll use the browser tool to screenshot the ActivityWatch website. |
System |
|
- gptme.tools.browser.read_url(url: str) str
Read a webpage in a text format.
- gptme.tools.browser.screenshot_url(url: str, path: Path | str | None = None) Path
Take a screenshot of a webpage.
- gptme.tools.browser.search(query: str, engine: Literal['google', 'duckduckgo'] = 'google') str
Search for a query on a search engine.
- gptme.tools.browser.search_playwright(query: str, engine: Literal['google', 'duckduckgo'] = 'google') str
Search for a query on a search engine using Playwright.
Chats#
List, search, and summarize past conversation logs.
Examples
Search for a specific topic in past conversations
User |
Can you find any mentions of "python" in our past conversations? |
Assistant |
Certainly! I'll search our past conversations for mentions of "python" using the search_chats function. |
- gptme.tools.chats.list_chats(max_results: int = 5, metadata=False, include_summary: bool = False) None
List recent chat conversations and optionally summarize them using an LLM.
- Parameters:
max_results (int) – Maximum number of conversations to display.
include_summary (bool) – Whether to include a summary of each conversation. If True, uses an LLM to generate a comprehensive summary. If False, uses a simple strategy showing snippets of the first and last messages.
- gptme.tools.chats.read_chat(conversation: str, max_results: int = 5, incl_system=False) None
Read a specific conversation log.
- Parameters:
conversation (str) – The name of the conversation to read.
max_results (int) – Maximum number of messages to display.
incl_system (bool) – Whether to include system messages.
- gptme.tools.chats.search_chats(query: str, max_results: int = 5, system=False, sort: Literal['date', 'count'] = 'date') None
Search past conversation logs for the given query and print a summary of the results.
- Parameters:
query (str) – The search query.
max_results (int) – Maximum number of conversations to display.
system (bool) – Whether to include system messages in the search.
Computer#
Warning
The computer use interface is experimental and has serious security implications. Please use with caution and see Anthropic’s documentation on computer use for additional guidance.
Tool for computer interaction for X11 or macOS environments, including screen capture, keyboard, and mouse control.
The computer tool provides direct interaction with the desktop environment. Similar to Anthropic’s computer use demo, but integrated with gptme’s architecture.
Features
Keyboard input simulation
Mouse control (movement, clicks, dragging)
Screen capture with automatic scaling
Cursor position tracking
Installation
On Linux, requires X11 and xdotool:
# On Debian/Ubuntu
sudo apt install xdotool
# On Arch Linux
sudo pacman -S xdotool
On macOS, uses native screencapture
and external tool cliclicker:
brew install cliclicker
You need to give your terminal both screen recording and accessibility permissions in System Preferences.
Configuration
The tool uses these environment variables:
DISPLAY: X11 display to use (default: “:1”, Linux only)
WIDTH: Screen width (default: 1024)
HEIGHT: Screen height (default: 768)
Usage
The tool supports these actions:
- Keyboard:
key: Send key sequence (e.g., “Return”, “Control_L+c”)
type: Type text with realistic delays
- Mouse:
mouse_move: Move mouse to coordinates
left_click: Click left mouse button
right_click: Click right mouse button
middle_click: Click middle mouse button
double_click: Double click left mouse button
left_click_drag: Click and drag to coordinates
- Screen:
screenshot: Take and view a screenshot
cursor_position: Get current mouse position
The tool automatically handles screen resolution scaling to ensure optimal performance with LLM vision capabilities.
Instructions
You can interact with the computer through the `computer` Python function.
Works on both Linux (X11) and macOS.
Available actions:
- key: Send key sequence (e.g., "Return", "Control_L+c")
- type: Type text with realistic delays
- mouse_move: Move mouse to coordinates
- left_click, right_click, middle_click, double_click: Mouse clicks
- left_click_drag: Click and drag to coordinates
- screenshot: Take and view a screenshot
- cursor_position: Get current mouse position
Note: Key names are automatically mapped between platforms.
Common modifiers like Control_L, Alt_L, Super_L work on both platforms.
Examples
User |
Take a screenshot of the desktop |
Assistant |
I'll capture the screen using the screenshot tool. |
System |
Viewing image... |
User |
Type "Hello, World!" into the active window |
Assistant |
I'll type the text with realistic delays. |
System |
Typed text: Hello, World! |
User |
Move the mouse to coordinates (100, 200) and click |
Assistant |
I'll move the mouse and perform a left click. |
System |
Moved mouse to 100,200 |
System |
Performed left_click |
User |
Press Ctrl+C |
Assistant |
I'll send the Control+C key sequence. |
System |
Sent key sequence: Control_L+c |
User |
Get the current mouse position |
Assistant |
I'll get the cursor position. |
System |
Cursor position: X=512,Y=384 |
User |
Double-click at current position |
Assistant |
I'll perform a double-click. |
System |
Performed double_click |
- gptme.tools.computer.computer(action: Literal['key', 'type', 'mouse_move', 'left_click', 'left_click_drag', 'right_click', 'middle_click', 'double_click', 'screenshot', 'cursor_position'], text: str | None = None, coordinate: tuple[int, int] | None = None) Message | None
Perform computer interactions in X11 or macOS environments.
- Parameters:
action – The type of action to perform
text – Text to type or key sequence to send
coordinate – X,Y coordinates for mouse actions
RAG#
RAG (Retrieval-Augmented Generation) tool for context-aware assistance.
The RAG tool provides context-aware assistance by indexing and searching project documentation.
Installation
The RAG tool requires the gptme-rag
CLI to be installed:
pipx install gptme-rag
Configuration
Configure RAG in your gptme.toml
:
[rag]
enabled = true
post_process = false # Whether to post-process the context with an LLM to extract the most relevant information
post_process_model = "openai/gpt-4o-mini" # Which model to use for post-processing
post_process_prompt = "" # Optional prompt to use for post-processing (overrides default prompt)
workspace_only = true # Whether to only search in the workspace directory, or the whole RAG index
paths = [] # List of paths to include in the RAG index. Has no effect if workspace_only is true.
Features
Manual Search and Indexing - Index project documentation with
rag_index
- Search indexed documents withrag_search
- Check index status withrag_status
Automatic Context Enhancement - Retrieves semantically similar documents - Preserves conversation flow with hidden context messages
Instructions
Use RAG to index and search project documentation.
Examples
User |
Index the current directory |
Assistant |
Let me index the current directory with RAG. |
System |
Indexed 1 paths |
User |
Search for documentation about functions |
Assistant |
I'll search for function-related documentation. |
System |
### docs/api.md Functions are documented using docstrings... |
User |
Show index status |
Assistant |
I'll check the current status of the RAG index. |
System |
Index contains 42 documents |
- gptme.tools.rag.get_rag_context(query: str, rag_config: RagConfig, workspace: Path | None = None) Message
Get relevant context chunks from RAG for the user query.
- gptme.tools.rag.init() ToolSpec
Initialize the RAG tool.
- gptme.tools.rag.rag_enhance_messages(messages: list[Message], workspace: Path | None = None) list[Message]
Enhance messages with context from RAG.
- gptme.tools.rag.rag_index(*paths: str, glob: str | None = None) str
Index documents in specified paths.
- gptme.tools.rag.rag_search(query: str, return_full: bool = False) str
Search indexed documents.
- gptme.tools.rag.rag_status() str
Show index status.
TTS#
Text-to-speech (TTS) tool for generating audio from text.
Uses Kokoro for local TTS generation.
Note
To use this tool, you also need run the Kokoro TTS server: ./scripts/tts_server.py
- Environment Variables:
GPTME_TTS_VOICE: Set the voice to use for TTS. Available voices depend on the TTS server. GPTME_VOICE_FINISH: If set to “true” or “1”, waits for speech to finish before exiting. This is useful when you want to ensure the full message is spoken.
- gptme.tools.tts.clean_for_speech(content: str) str
Clean content for speech by removing:
<thinking> tags and their content
Tool use blocks (
`tool ...`
)Italic markup
Additional (details) that may not need to be spoken
Emojis and other non-speech content
Returns the cleaned content suitable for speech.
- gptme.tools.tts.clear_queue() None
Clear the audio queue without stopping current playback.
- gptme.tools.tts.ensure_threads()
Ensure both playback and TTS processor threads are running.
- gptme.tools.tts.get_output_device() tuple[int, int]
Get the best available output device and its sample rate.
- Returns:
(device_index, sample_rate)
- Return type:
tuple
- Raises:
RuntimeError – If no suitable output device is found
- gptme.tools.tts.join_short_sentences(sentences: list[str], min_length: int = 100) list[str]
Join consecutive sentences that are shorter than min_length.
- Parameters:
sentences – List of sentences to potentially join
min_length – Minimum length threshold for joining
- Returns:
List of sentences, with short ones combined
- gptme.tools.tts.set_speed(speed)
Set the speaking speed (0.5 to 2.0, default 1.3).
- gptme.tools.tts.set_volume(volume)
Set the volume for TTS playback (0.0 to 1.0).
- gptme.tools.tts.speak(text, block=False, interrupt=True, clean=True)
Speak text using Kokoro TTS server.
The TTS system supports:
Speed control via set_speed(0.5 to 2.0)
Volume control via set_volume(0.0 to 1.0)
Automatic chunking of long texts
Non-blocking operation with optional blocking mode
Interruption of current speech
Background processing of TTS requests
- Parameters:
text – Text to speak
block – If True, wait for audio to finish playing
interrupt – If True, stop current speech and clear queue before speaking
clean – If True, clean text for speech (remove markup, emojis, etc.)
Example
>>> from gptme.tools.tts import speak, set_speed, set_volume >>> set_volume(0.8) # Set comfortable volume >>> set_speed(1.2) # Slightly faster speech >>> speak("Hello, world!") # Non-blocking by default >>> speak("Important message!", interrupt=True) # Interrupts previous speech
- gptme.tools.tts.split_text(text: str) list[str]
Split text into sentences, respecting paragraphs, markdown lists, and decimal numbers.
This function handles: - Paragraph breaks - Markdown list items (
-
,*
,1.
) - Decimal numbers (won’t split 3.14) - Sentence boundaries (.!?)- Returns:
List of sentences and paragraph breaks (empty strings)
- gptme.tools.tts.stop() None
Stop audio playback and clear queues.