Tools#

Tools available in gptme.

The tools can be grouped into the following categories:

Shell#

The assistant can execute shell commands with bash by outputting code blocks with shell as the language.

Instructions

The given command will be executed in a stateful bash shell.
The shell tool will respond with the output of the execution.

These programs are available, among others:
- apt-get
- docker
- git
- pandoc

Examples

User
list the current directory
Assistant
To list the files in the current directory, use `ls`:
shell
ls
System
Ran command: `ls`
shell
file1.txt
file2.txt

The assistant can learn context by exploring the filesystem

User
learn about the project
Assistant
Lets start by checking the files
shell
git ls-files
System
stdout
README.md
main.py
Assistant
Now lets check the README
shell
cat README.md
System
stdout
(contents of README.md)
Assistant
Now we check main.py
shell
cat main.py
System
stdout
(contents of main.py)
Assistant
The project is...

Create vue project

User
Create a new vue project with typescript and pinia named fancy-project
Assistant
Sure! Let's create a new vue project with TypeScript and Pinia named fancy-project:
shell
npm init vue@latest fancy-project --yes -- --typescript --pinia
System
stdout
> npx
> create-vue

Vue.js - The Progressive JavaScript Framework

Scaffolding project in ./fancy-project...
gptme.tools.shell.execute_shell(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]

Executes a shell command and returns the output.

gptme.tools.shell.execute_shell_impl(cmd: str, _: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]

Execute shell command and format output.

gptme.tools.shell.get_shell_command(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) str

Get the shell command from code/args/kwargs.

gptme.tools.shell.preview_shell(cmd: str, _: Path | None) str

Prepare preview for shell command.

Python#

The assistant can execute Python code blocks.

It uses IPython to do so, and persists the IPython instance between calls to give a REPL-like experience.

Instructions

Use this tool to execute Python code in an interactive IPython session.
It will respond with the output and result of the execution.

Examples

Result of the last expression will be returned

User
What is 2 + 2?
Assistant
ipython
2 + 2
System
Executed code block.
result
4

Write a function and call it

User
compute fib 10
Assistant
To compute the 10th Fibonacci number, we can run the following code:
ipython
def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)
fib(10)
System
Executed code block.
result
55
class gptme.tools.python.TeeIO
__init__(original_stream)
write(s)

Write string to file.

Returns the number of characters written, which is always equal to the length of the string.

gptme.tools.python.execute_python(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: ~collections.abc.Callable[[str], bool] = <function <lambda>>) Generator[Message, None, None]

Executes a python codeblock and returns the output.

gptme.tools.python.get_installed_python_libraries() list[str]

Check if a select list of Python libraries are installed.

gptme.tools.python.register_function(func: T) T

Decorator to register a function to be available in the IPython instance.

Tmux#

You can use the tmux tool to run long-lived and/or interactive applications in a tmux session. Requires tmux to be installed.

This tool is suitable to run long-running commands or interactive applications that require user input. Examples of such commands: npm run dev, python3 server.py, python3 train.py, etc. It allows for inspecting pane contents and sending input.

Instructions

You can use the tmux tool to run long-lived and/or interactive applications in a tmux session.

This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands are: `npm run dev`, `npm create vue@latest`, `python3 server.py`, `python3 train.py`, etc.

Available commands:
- new_session <command>: Start a new tmux session with the given command
- send_keys <session_id> <keys> [<keys>]: Send keys to the specified session
- inspect_pane <session_id>: Show the current content of the specified pane
- kill_session <session_id>: Terminate the specified tmux session
- list_sessions: Show all active tmux sessions

Examples

Managing a dev server

User
Start the dev server
Assistant
Certainly! To start the dev server we should use tmux:
tmux
new_session 'npm run dev'
System
Running `npm run dev` in session gptme_1
User
Can you show me the current content of the pane?
Assistant
Of course! Let's inspect the pane content:
tmux
inspect_pane gptme_1
System
output
Server is running on localhost:5600
User
Stop the dev server
Assistant
I'll send 'Ctrl+C' to the pane to stop the server:
tmux
send_keys 0 C-c
System
Sent 'C-c' to pane 0

Get info from ncurses applications

User
start top and give me a summary
Assistant
Sure! Let's start the top command in a tmux session:
tmux
new_session 'top'
System
Running `top` in session gptme_1.
output
(output from top shown here)
Assistant
The load is...

Send keys to a session

User
start ipython
Assistant
Let's start an ipython session:
tmux
new_session 'ipython'
System
Running `ipython` in session 2.
output
(output from ipython shown here)
User
Run 'print("Hello, world!")' in the ipython session
tmux
send_keys 2 'print("Hello, world!")' Enter
gptme.tools.tmux.execute_tmux(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]

Executes a command in tmux and returns the output.

Subagent#

A subagent tool for gptme

Lets gptme break down a task into smaller parts, and delegate them to subagents.

Examples

User
compute fib 13 using a subagent
Assistant
Starting a subagent to compute the 13th Fibonacci number.
ipython
subagent("fib-13", "compute the 13th Fibonacci number")
System
Subagent started successfully.
Assistant
Now we need to wait for the subagent to finish the task.
ipython
subagent_wait("fib-13")
System
{"status": "success", "result": "The 13th Fibonacci number is 233"}.
class gptme.tools.subagent.ReturnType

ReturnType(status: Literal[‘running’, ‘success’, ‘failure’], result: str | None = None)

__init__(status: Literal['running', 'success', 'failure'], result: str | None = None) None
class gptme.tools.subagent.Subagent

Subagent(agent_id: str, prompt: str, thread: threading.Thread, logdir: pathlib.Path)

__init__(agent_id: str, prompt: str, thread: Thread, logdir: Path) None
gptme.tools.subagent.subagent(agent_id: str, prompt: str)

Runs a subagent and returns the resulting JSON output.

gptme.tools.subagent.subagent_status(agent_id: str) dict

Returns the status of a subagent.

gptme.tools.subagent.subagent_wait(agent_id: str) dict

Waits for a subagent to finish. Timeout is 1 minute.

Read#

Read the contents of a file.

Instructions

Read the content of the given file. Use the `cat` command with the `shell` tool.

Examples

User
read file.txt
Assistant
shell
cat file.txt

Save#

Gives the assistant the ability to save whole files, or append to them.

Instructions

Create or overwrite a file with the given content.

The path can be relative to the current directory, or absolute.
If the current directory changes, the path will be relative to the new directory.

Examples

User
write a hello world script to hello.py
Assistant
save hello.py
print("Hello world")
System
Saved to `hello.py`
User
make it all-caps
Assistant
save hello.py
print("HELLO WORLD")
System
Saved to `hello.py`

Instructions

Append the given content to a file.`.

Examples

User
append a print "Hello world" to hello.py
Assistant
append hello.py
print("Hello world")
System
Appended to `hello.py`
gptme.tools.save.execute_append(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]

Append code to a file.

gptme.tools.save.execute_append_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]

Actual append implementation.

gptme.tools.save.execute_save(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]

Save code to a file.

gptme.tools.save.execute_save_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]

Actual save implementation.

gptme.tools.save.preview_append(content: str, path: Path | None) str | None

Prepare preview content for append operation.

gptme.tools.save.preview_save(content: str, path: Path | None) str | None

Prepare preview content for save operation.

Patch#

Gives the LLM agent the ability to patch text files, by using a adapted version git conflict markers.

Instructions

To patch/modify files, we use an adapted version of git conflict markers.

This can be used to edit files, without having to rewrite the whole file.
Only one patch block can be written per tool use. Extra ORIGINAL/UPDATED blocks will be ignored.
Try to keep the patch as small as possible. Avoid placeholders, as they may make the patch fail.

To keep the patch small, try to scope the patch to imports/function/class.
If the patch is large, consider using the save tool to rewrite the whole file.

Examples

User
patch `src/hello.py` to ask for the name of the user
src/hello.py
def hello():
    print("Hello world")

if __name__ == "__main__":
    hello()
Assistant
patch src/hello.py
<<<<<<< ORIGINAL
    print("Hello world")
=======
    name = input("What is your name? ")
    print(f"Hello {name}")
>>>>>>> UPDATED
System
Patch applied
class gptme.tools.patch.Patch

Patch(original: str, updated: str)

__init__(original: str, updated: str) None
diff_minimal(strip_context=False) str

Show a minimal diff of the patch. Note that a minimal diff isn’t necessarily a unique diff.

gptme.tools.patch.apply(codeblock: str, content: str) str

Applies multiple patches in codeblock to content.

gptme.tools.patch.execute_patch(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: ~collections.abc.Callable[[str], bool] = <function <lambda>>) Generator[Message, None, None]

Applies the patch.

gptme.tools.patch.execute_patch_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]

Actual patch implementation.

gptme.tools.patch.preview_patch(content: str, path: Path | None) str | None

Prepare preview content for patch operation.

Screenshot#

A simple screenshot tool, using screencapture on macOS and scrot on Linux.

gptme.tools.screenshot.screenshot(path: Path | None = None) Path

Take a screenshot and save it to a file.

Browser#

Tools to let the assistant control a browser, including:
  • loading pages

  • reading their contents

  • searching the web

  • taking screenshots (Playwright only)

Two backends are available:

Playwright backend:
  • Full browser automation with screenshots

  • Installation:

    pipx install 'gptme[browser]'
    # We need to use the same version of Playwright as the one installed by gptme
    # when downloading the browser binaries. gptme will attempt this automatically
    PW_VERSION=$(pipx runpip gptme show playwright | grep Version | cut -d' ' -f2)
    pipx run playwright==$PW_VERSION install chromium-headless-shell
    
Lynx backend:
  • Text-only browser for basic page reading and searching

  • No screenshot support

  • Installation:

    # On Ubuntu
    sudo apt install lynx
    # On macOS
    brew install lynx
    # or any other way that gets you the `lynx` command
    

Note

This is an experimental feature. It needs some work to be more robust and useful.

Examples

Answer question from URL with browsing

User
find out which is the latest ActivityWatch version from superuserlabs.org
Assistant
Let's browse the site.
ipython
read_url('https://superuserlabs.org/')
System
https://superuserlabs.org/
... [ActivityWatch](https://activitywatch.net/) ...
Assistant
Couldn't find the answer on the page. Following link to the ActivityWatch website.
ipython
read_url('https://activitywatch.net/')
System
https://activitywatch.net/
... Download latest version v0.12.2 ...
Assistant
The latest version of ActivityWatch is v0.12.2

Searching

User
who is the founder of ActivityWatch?
Assistant
Let's search for that.
ipython
search('ActivityWatch founder')
System
results
1. [ActivityWatch](https://activitywatch.net/) ...
Assistant
Following link to the ActivityWatch website.
ipython
read_url('https://activitywatch.net/')
System
https://activitywatch.net/
... The ActivityWatch project was founded by Erik Bjäreholt in 2016. ...
Assistant
The founder of ActivityWatch is Erik Bjäreholt.

Take screenshot of page

User
take a screenshot of the ActivityWatch website
Assistant
Certainly! I'll use the browser tool to screenshot the ActivityWatch website.
ipython
screenshot_url('https://activitywatch.net')
System
result
Screenshot saved to screenshot.png
gptme.tools.browser.read_url(url: str) str

Read a webpage in a text format.

gptme.tools.browser.screenshot_url(url: str, path: Path | str | None = None) Path

Take a screenshot of a webpage.

gptme.tools.browser.search(query: str, engine: Literal['google', 'duckduckgo'] = 'google') str

Search for a query on a search engine.

gptme.tools.browser.search_playwright(query: str, engine: Literal['google', 'duckduckgo'] = 'google') str

Search for a query on a search engine using Playwright.

Vision#

Tools for viewing images, giving the assistant vision.

Requires a model which supports vision, such as GPT-4o, Anthropic, and Llama 3.2.

gptme.tools.vision.view_image(image_path: Path | str) Message

View an image.

Chats#

List, search, and summarize past conversation logs.

Examples

Search for a specific topic in past conversations

User
Can you find any mentions of "python" in our past conversations?
Assistant
Certainly! I'll search our past conversations for mentions of "python" using the search_chats function.
ipython
search_chats('python')
gptme.tools.chats.list_chats(max_results: int = 5, metadata=False, include_summary: bool = False) None

List recent chat conversations and optionally summarize them using an LLM.

Parameters:
  • max_results (int) – Maximum number of conversations to display.

  • include_summary (bool) – Whether to include a summary of each conversation. If True, uses an LLM to generate a comprehensive summary. If False, uses a simple strategy showing snippets of the first and last messages.

gptme.tools.chats.read_chat(conversation: str, max_results: int = 5, incl_system=False) None

Read a specific conversation log.

Parameters:
  • conversation (str) – The name of the conversation to read.

  • max_results (int) – Maximum number of messages to display.

  • incl_system (bool) – Whether to include system messages.

gptme.tools.chats.search_chats(query: str, max_results: int = 5, system=False) None

Search past conversation logs for the given query and print a summary of the results.

Parameters:
  • query (str) – The search query.

  • max_results (int) – Maximum number of conversations to display.

  • system (bool) – Whether to include system messages in the search.

Computer#

Warning

The computer use interface is experimental and has serious security implications. Please use with caution and see Anthropic’s documentation on computer use for additional guidance.

Tool for computer interaction through X11, including screen capture, keyboard, and mouse control.

The computer tool provides direct interaction with the desktop environment through X11. Similar to Anthropic’s computer use demo, but integrated with gptme’s architecture.

Features

  • Keyboard input simulation

  • Mouse control (movement, clicks, dragging)

  • Screen capture with automatic scaling

  • Cursor position tracking

Installation

Requires X11 and xdotool:

# On Debian/Ubuntu
sudo apt install xdotool

# On Arch Linux
sudo pacman -S xdotool

Configuration

The tool uses these environment variables:

  • DISPLAY: X11 display to use (default: “:1”)

  • WIDTH: Screen width (default: 1024)

  • HEIGHT: Screen height (default: 768)

Usage

The tool supports these actions:

Keyboard:
  • key: Send key sequence (e.g., “Return”, “Control_L+c”)

  • type: Type text with realistic delays

Mouse:
  • mouse_move: Move mouse to coordinates

  • left_click: Click left mouse button

  • right_click: Click right mouse button

  • middle_click: Click middle mouse button

  • double_click: Double click left mouse button

  • left_click_drag: Click and drag to coordinates

Screen:
  • screenshot: Take and view a screenshot

  • cursor_position: Get current mouse position

The tool automatically handles screen resolution scaling to ensure optimal performance with LLM vision capabilities.

Instructions

You can interact with the computer through X11 with the `computer` Python function.
Available actions:
- key: Send key sequence (e.g., "Return", "Control_L+c")
- type: Type text with realistic delays
- mouse_move: Move mouse to coordinates
- left_click, right_click, middle_click, double_click: Mouse clicks
- left_click_drag: Click and drag to coordinates
- screenshot: Take and view a screenshot
- cursor_position: Get current mouse position

Examples

User
Take a screenshot of the desktop
Assistant
I'll capture the current screen.
ipython
computer("screenshot")
System
Viewing image...
User
Type "Hello, World!" into the active window
Assistant
I'll type the text with realistic delays.
ipython
computer("type", text="Hello, World!")
System
Typed text: Hello, World!
User
Move the mouse to coordinates (100, 200) and click
Assistant
I'll move the mouse and perform a left click.
ipython
computer("mouse_move", coordinate=(100, 200))
System
Moved mouse to 100,200
ipython
computer("left_click")
System
Performed left_click
User
Press Ctrl+C
Assistant
I'll send the Control+C key sequence.
ipython
computer("key", text="Control_L+c")
System
Sent key sequence: Control_L+c
gptme.tools.computer.computer(action: Literal['key', 'type', 'mouse_move', 'left_click', 'left_click_drag', 'right_click', 'middle_click', 'double_click', 'screenshot', 'cursor_position'], text: str | None = None, coordinate: tuple[int, int] | None = None) Message | None

Perform computer interactions through X11.

Parameters:
  • action – The type of action to perform

  • text – Text to type or key sequence to send

  • coordinate – X,Y coordinates for mouse actions

RAG#

RAG (Retrieval-Augmented Generation) tool for context-aware assistance.

The RAG tool provides context-aware assistance by indexing and searching project documentation.

Installation

The RAG tool requires the gptme-rag CLI to be installed:

pipx install gptme-rag

Configuration

Configure RAG in your gptme.toml:

[rag]
enabled = true

Features

  1. Manual Search and Indexing - Index project documentation with rag_index - Search indexed documents with rag_search - Check index status with rag_status

  2. Automatic Context Enhancement - Retrieves semantically similar documents - Preserves conversation flow with hidden context messages

Instructions

Use RAG to index and search project documentation.

Examples

User
Index the current directory
Assistant
Let me index the current directory with RAG.
ipython
rag_index()
System
Indexed 1 paths
User
Search for documentation about functions
Assistant
I'll search for function-related documentation.
ipython
rag_search("function documentation")
System
### docs/api.md
Functions are documented using docstrings...
User
Show index status
Assistant
I'll check the current status of the RAG index.
ipython
rag_status()
System
Index contains 42 documents
gptme.tools.rag.init() ToolSpec

Initialize the RAG tool.

gptme.tools.rag.rag_enhance_messages(messages: list[Message]) list[Message]

Enhance messages with context from RAG.

gptme.tools.rag.rag_index(*paths: str, glob: str | None = None) str

Index documents in specified paths.

gptme.tools.rag.rag_search(query: str, return_full: bool = False) str

Search indexed documents.

gptme.tools.rag.rag_status() str

Show index status.