Tools#

Tools available in gptme.

The tools can be grouped into the following categories:

Shell#

The assistant can execute shell commands with bash by outputting code blocks with shell as the language.

Instructions

When you send a message containing bash code, it will be executed in a stateful bash shell.
The shell will respond with the output of the execution.
Do not use EOF/HereDoc syntax to send multiline commands, as the assistant will not be able to handle it.

These programs are available, among others:
- git
- apt-get
- pandoc
- docker

Examples

User
list the current directory
Assistant
To list the files in the current directory, use `ls`:
shell
ls
System
Ran command: `ls`
shell
file1.txt
file2.txt

The assistant can learn context by exploring the filesystem

User
learn about the project
Assistant
Lets start by checking the files
shell
git ls-files
System
stdout
README.md
main.py
Assistant
Now lets check the README
shell
cat README.md
System
stdout
(contents of README.md)
Assistant
Now we check main.py
shell
cat main.py
System
stdout
(contents of main.py)
Assistant
The project is...

Create vue project

User
Create a new vue project with typescript and pinia named fancy-project
Assistant
Sure! Let's create a new vue project with TypeScript and Pinia named fancy-project:
shell
npm init vue@latest fancy-project --yes -- --typescript --pinia
System
stdout
> npx
> create-vue

Vue.js - The Progressive JavaScript Framework

Scaffolding project in ./fancy-project...
gptme.tools.shell.execute_shell(code: str, args: list[str], confirm: Callable[[str], bool]) Generator[Message, None, None]

Executes a shell command and returns the output.

Python#

The assistant can execute Python code blocks.

It uses IPython to do so, and persists the IPython instance between calls to give a REPL-like experience.

Instructions

To execute Python code in an interactive IPython session, send a codeblock using the `ipython` language tag.
It will respond with the output and result of the execution.
If you first write the code in a normal python codeblock, remember to also execute it with the ipython codeblock.

Examples

Results of the last expression will be displayed, IPython-style:

User
What is 2 + 2?
Assistant
ipython
2 + 2
System
Executed code block.
result
4

It can write an example and then execute it:

User
compute fib 10
Assistant
To compute the 10th Fibonacci number, we can execute this code:
ipython
def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)
fib(10)
System
Executed code block.
result
55
gptme.tools.python.execute_python(code: str, args: list[str], confirm: ~collections.abc.Callable[[str], bool] = <function <lambda>>) Generator[Message, None, None]

Executes a python codeblock and returns the output.

gptme.tools.python.get_installed_python_libraries() set[str]

Check if a select list of Python libraries are installed.

gptme.tools.python.register_function(func: T) T

Decorator to register a function to be available in the IPython instance.

Tmux#

You can use the tmux tool to run long-lived and/or interactive applications in a tmux session. Requires tmux to be installed.

This tool is suitable to run long-running commands or interactive applications that require user input. Examples of such commands: npm run dev, python3 server.py, python3 train.py, etc. It allows for inspecting pane contents and sending input.

Instructions

You can use the tmux tool to run long-lived and/or interactive applications in a tmux session.

This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands are: `npm run dev`, `npm create vue@latest`, `python3 server.py`, `python3 train.py`, etc.

Available commands:
- new_session <command>: Start a new tmux session with the given command
- send_keys <session_id> <keys> [<keys>]: Send keys to the specified session
- inspect_pane <session_id>: Show the current content of the specified pane
- kill_session <session_id>: Terminate the specified tmux session
- list_sessions: Show all active tmux sessions

Examples

Managing a dev server

User
Start the dev server
Assistant
Certainly! To start the dev server we should use tmux:
tmux
new_session 'npm run dev'
System
Running `npm run dev` in session 0
User
Can you show me the current content of the pane?
Assistant
Of course! Let's inspect the pane content:
tmux
inspect_pane 0
System
output
Server is running on localhost:5600
User
Stop the dev server
Assistant
I'll send 'Ctrl+C' to the pane to stop the server:
tmux
send_keys 0 C-c
System
Sent 'C-c' to pane 0

Get info from ncurses applications

User
start top and give me a summary
Assistant
Sure! Let's start the top command in a tmux session:
tmux
new_session 'top'
System
Running `top` in session 1.
output
(output from top shown here)
Assistant
The load is...
gptme.tools.tmux.execute_tmux(code: str, args: list[str], confirm: Callable[[str], bool]) Generator[Message, None, None]

Executes a command in tmux and returns the output.

Subagent#

A subagent tool for gptme

Lets gptme break down a task into smaller parts, and delegate them to subagents.

Examples

User
compute fib 13 using a subagent
Assistant
Starting a subagent to compute the 13th Fibonacci number.
ipython
subagent("fib-13", "compute the 13th Fibonacci number")
System
Subagent started successfully.
Assistant
Now we need to wait for the subagent to finish the task.
ipython
subagent_wait("fib-13")
System
{"status": "success", "result": "The 13th Fibonacci number is 233"}.
class gptme.tools.subagent.ReturnType

ReturnType(status: Literal[‘running’, ‘success’, ‘failure’], result: str | None = None)

__init__(status: Literal['running', 'success', 'failure'], result: str | None = None) None
class gptme.tools.subagent.Subagent

Subagent(agent_id: str, prompt: str, thread: threading.Thread, logdir: pathlib.Path)

__init__(agent_id: str, prompt: str, thread: Thread, logdir: Path) None
gptme.tools.subagent.subagent(agent_id: str, prompt: str)

Runs a subagent and returns the resulting JSON output.

gptme.tools.subagent.subagent_status(agent_id: str) dict

Returns the status of a subagent.

gptme.tools.subagent.subagent_wait(agent_id: str) dict

Waits for a subagent to finish. Timeout is 1 minute.

Read#

Read the contents of a file.

Instructions

Read files using `cat`

Examples

User
read file.txt
Assistant
shell
cat file.txt

Save#

Gives the assistant the ability to save whole files, or append to them.

Instructions

To write to a file, use a code block with the language tag: `save <path>`

The path can be relative to the current directory, or absolute.
If the current directory changes, the path will be relative to the new directory.

Examples

User
write a hello world script to hello.py
save hello.py
print("Hello world")
System
Saved to `hello.py`
User
make it all-caps
save hello.py
print("HELLO WORLD")
System
Saved to `hello.py`

Instructions

To append to a file, use a code block with the language tag: `append <path>`

Examples

User
append a print "Hello world" to hello.py
Assistant
append hello.py
print("Hello world")
System
Appended to `hello.py`
gptme.tools.save.execute_append(code: str, args: list[str], confirm: Callable[[str], bool]) Generator[Message, None, None]

Append code to a file.

gptme.tools.save.execute_save(code: str, args: list[str], confirm: Callable[[str], bool]) Generator[Message, None, None]

Save code to a file.

Patch#

Gives the LLM agent the ability to patch text files, by using a adapted version git conflict markers.

Instructions

To patch/modify files, we use an adapted version of git conflict markers.

This can be used to edit files, without having to rewrite the whole file.
Only one patch block can be written per codeblock. Extra ORIGINAL/UPDATED blocks will be ignored.
Try to keep the patch as small as possible. Avoid placeholders, as they may make the patch fail.

To keep the patch small, try to scope the patch to imports/function/class.
If the patch is large, consider using the save tool to rewrite the whole file.

The $FILENAME parameter MUST be on the same line as the code block start, not on the line after.

The patch block should be written in the following format:

```patch $FILENAME
<<<<<<< ORIGINAL
$ORIGINAL_CONTENT
=======
$UPDATED_CONTENT
>>>>>>> UPDATED
```

Examples

User
patch the file `hello.py` to ask for the name of the user
Assistant
patch hello.py
<<<<<<< ORIGINAL
def hello():
    print("Hello world")
=======
def hello():
    name = input("What is your name? ")
    print(f"Hello {name}")
>>>>>>> UPDATED
System
Patch applied
class gptme.tools.patch.Patch

Patch(original: str, updated: str)

__init__(original: str, updated: str) None
diff_minimal(strip_context=False) str

Show a minimal diff of the patch. Note that a minimal diff isn’t necessarily a unique diff.

gptme.tools.patch.apply(codeblock: str, content: str) str

Applies multiple patches in codeblock to content.

gptme.tools.patch.execute_patch(code: str, args: list[str], confirm: Callable[[str], bool]) Generator[Message, None, None]

Applies the patch.

Screenshot#

A simple screenshot tool, using screencapture on macOS and scrot on Linux.

gptme.tools.screenshot.screenshot(path: Path | None = None) Path

Take a screenshot and save it to a file.

Browser#

Tools to let the assistant control a browser, including:
  • loading pages

  • reading their contents

  • viewing them through screenshots

  • searching

Note

This is an experimental feature. It needs some work to be more robust and useful.

To use the browser tool, you need to have the playwright Python package installed along with gptme, which you can install with:

pipx install gptme[browser]
gptme '/shell playwright install chromium'

Instructions

To browse the web, you can use the `read_url`, `search`, and `screenshot_url` functions in Python.

Examples

Answer question from URL with browsing

User
find out which is the latest ActivityWatch version from superuserlabs.org
Assistant
Let's browse the site.
ipython
read_url('https://superuserlabs.org/')
System
https://superuserlabs.org/
... [ActivityWatch](https://activitywatch.net/) ...
Assistant
Couldn't find the answer on the page. Following link to the ActivityWatch website.
ipython
read_url('https://activitywatch.net/')
System
https://activitywatch.net/
... Download latest version v0.12.2 ...
Assistant
The latest version of ActivityWatch is v0.12.2

Searching

User
who is the founder of ActivityWatch?
Assistant
Let's search for that.
ipython
search('ActivityWatch founder')
System
results
1. [ActivityWatch](https://activitywatch.net/) ...
Assistant
Following link to the ActivityWatch website.
ipython
read_url('https://activitywatch.net/')
System
https://activitywatch.net/
... The ActivityWatch project was founded by Erik Bjäreholt in 2016. ...
Assistant
The founder of ActivityWatch is Erik Bjäreholt.

Take screenshot of page

User
take a screenshot of the ActivityWatch website
Assistant
Certainly! I'll use the browser tool to screenshot the ActivityWatch website.
ipython
screenshot_url('https://activitywatch.net')
System
result
Screenshot saved to screenshot.png
gptme.tools.browser.read_url(url: str) str

Read a webpage in a text format.

gptme.tools.browser.screenshot_url(url: str, path: Path | str | None = None) Path

Take a screenshot of a webpage.

gptme.tools.browser.search(query: str, engine: Literal['google', 'duckduckgo'] = 'google') str

Search for a query on a search engine.

gptme.tools.browser.search_playwright(query: str, engine: Literal['google', 'duckduckgo'] = 'google') str

Search for a query on a search engine using Playwright.

Vision#

Tools for viewing images, giving the assistant vision.

Requires a model which supports vision, such as GPT-4o, Anthropic, and Llama 3.2.

gptme.tools.vision.view_image(image_path: Path | str) Message

View an image.

Chats#

List, search, and summarize past conversation logs.

Instructions

The chats tool allows you to list, search, and summarize past conversation logs.

Examples

Search for a specific topic in past conversations

User
Can you find any mentions of "python" in our past conversations?
Assistant
Certainly! I'll search our past conversations for mentions of "python" using the search_chats function.
ipython
search_chats('python')
gptme.tools.chats.list_chats(max_results: int = 5, metadata=False, include_summary: bool = False) None

List recent chat conversations and optionally summarize them using an LLM.

Parameters:
  • max_results (int) – Maximum number of conversations to display.

  • include_summary (bool) – Whether to include a summary of each conversation. If True, uses an LLM to generate a comprehensive summary. If False, uses a simple strategy showing snippets of the first and last messages.

gptme.tools.chats.read_chat(conversation: str, max_results: int = 5, incl_system=False) None

Read a specific conversation log.

Parameters:
  • conversation (str) – The name of the conversation to read.

  • max_results (int) – Maximum number of messages to display.

  • incl_system (bool) – Whether to include system messages.

gptme.tools.chats.search_chats(query: str, max_results: int = 5, system=False) None

Search past conversation logs for the given query and print a summary of the results.

Parameters:
  • query (str) – The search query.

  • max_results (int) – Maximum number of conversations to display.

  • system (bool) – Whether to include system messages in the search.

Computer#

Warning

The computer use interface is experimental and has serious security implications. Please use with caution and see Anthropic’s documentation on computer use for additional guidance.

Tool for computer interaction through X11, including screen capture, keyboard, and mouse control.

The computer tool provides direct interaction with the desktop environment through X11. Similar to Anthropic’s computer use demo, but integrated with gptme’s architecture.

Features

  • Keyboard input simulation

  • Mouse control (movement, clicks, dragging)

  • Screen capture with automatic scaling

  • Cursor position tracking

Installation

Requires X11 and xdotool:

# On Debian/Ubuntu
sudo apt install xdotool

# On Arch Linux
sudo pacman -S xdotool

Configuration

The tool uses these environment variables:

  • DISPLAY: X11 display to use (default: “:1”)

  • WIDTH: Screen width (default: 1024)

  • HEIGHT: Screen height (default: 768)

Usage

The tool supports these actions:

Keyboard:
  • key: Send key sequence (e.g., “Return”, “Control_L+c”)

  • type: Type text with realistic delays

Mouse:
  • mouse_move: Move mouse to coordinates

  • left_click: Click left mouse button

  • right_click: Click right mouse button

  • middle_click: Click middle mouse button

  • double_click: Double click left mouse button

  • left_click_drag: Click and drag to coordinates

Screen:
  • screenshot: Take and view a screenshot

  • cursor_position: Get current mouse position

The tool automatically handles screen resolution scaling to ensure optimal performance with LLM vision capabilities.

Instructions

Use this tool to interact with the computer through X11.
Available actions:
- key: Send key sequence (e.g., "Return", "Control_L+c")
- type: Type text with realistic delays
- mouse_move: Move mouse to coordinates
- left_click, right_click, middle_click, double_click: Mouse clicks
- left_click_drag: Click and drag to coordinates
- screenshot: Take and view a screenshot
- cursor_position: Get current mouse position

Examples

User
Take a screenshot of the desktop
Assistant
I'll capture the current screen.
ipython
computer("screenshot")
System
Viewing image...
User
Type "Hello, World!" into the active window
Assistant
I'll type the text with realistic delays.
ipython
computer("type", text="Hello, World!")
System
Typed text: Hello, World!
User
Move the mouse to coordinates (100, 200) and click
Assistant
I'll move the mouse and perform a left click.
ipython
computer("mouse_move", coordinate=(100, 200))
System
Moved mouse to 100,200
ipython
computer("left_click")
System
Performed left_click
User
Press Ctrl+C
Assistant
I'll send the Control+C key sequence.
ipython
computer("key", text="Control_L+c")
System
Sent key sequence: Control_L+c
gptme.tools.computer.computer(action: Literal['key', 'type', 'mouse_move', 'left_click', 'left_click_drag', 'right_click', 'middle_click', 'double_click', 'screenshot', 'cursor_position'], text: str | None = None, coordinate: tuple[int, int] | None = None) Message | None

Perform computer interactions through X11.

Parameters:
  • action – The type of action to perform

  • text – Text to type or key sequence to send

  • coordinate – X,Y coordinates for mouse actions

RAG#

RAG (Retrieval-Augmented Generation) tool for context-aware assistance.

The RAG tool provides context-aware assistance by indexing and searching project documentation.

Installation

The RAG tool requires the gptme-rag package. Install it with:

pip install "gptme[rag]"

Configuration

Configure RAG in your gptme.toml:

[rag]
# Storage configuration
index_path = "~/.cache/gptme/rag"  # Where to store the index
collection = "gptme_docs"          # Collection name for documents

# Context enhancement settings
max_tokens = 2000                  # Maximum tokens for context window
auto_context = true               # Enable automatic context enhancement
min_relevance = 0.5               # Minimum relevance score for including context
max_results = 5                   # Maximum number of results to consider

# Cache configuration
[rag.cache]
max_embeddings = 10000            # Maximum number of cached embeddings
max_searches = 1000               # Maximum number of cached search results
embedding_ttl = 86400             # Embedding cache TTL in seconds (24h)
search_ttl = 3600                # Search cache TTL in seconds (1h)

Features

  1. Manual Search and Indexing - Index project documentation with rag_index - Search indexed documents with rag_search - Check index status with rag_status

  2. Automatic Context Enhancement - Automatically adds relevant context to user messages - Retrieves semantically similar documents - Manages token budget to avoid context overflow - Preserves conversation flow with hidden context messages

  3. Performance Optimization - Intelligent caching system for embeddings and search results - Configurable cache sizes and TTLs - Automatic cache invalidation - Memory-efficient storage

Benefits

  • Better informed responses through relevant documentation

  • Reduced need for manual context inclusion

  • Automatic token management

  • Seamless integration with conversation flow

Instructions

Use RAG to index and search project documentation.

Examples

User
Index the current directory
Assistant
Let me index the current directory with RAG.
ipython
rag_index()
System
Indexed 1 paths
User
Search for documentation about functions
Assistant
I'll search for function-related documentation.
ipython
rag_search("function documentation")
System
### docs/api.md
Functions are documented using docstrings...
User
Show index status
Assistant
I'll check the current status of the RAG index.
ipython
rag_status()
System
Index contains 42 documents
gptme.tools.rag.init() ToolSpec

Initialize the RAG tool.

gptme.tools.rag.rag_index(*paths: str, glob: str | None = None) str

Index documents in specified paths.

gptme.tools.rag.rag_search(query: str) str

Search indexed documents.

gptme.tools.rag.rag_status() str

Show index status.