Tools#
Tools available in gptme.
The tools can be grouped into the following categories:
Execution
Files
Network
Vision
Chat management
Context management
Shell#
The assistant can execute shell commands with bash by outputting code blocks with shell as the language.
Instructions
The given command will be executed in a stateful bash shell.
The shell tool will respond with the output of the execution.
These programs are available, among others:
- apt-get
- docker
- git
- pandoc
Examples
User |
list the current directory |
Assistant |
To list the files in the current directory, use `ls`: |
System |
Ran command: `ls` |
The assistant can learn context by exploring the filesystem
User |
learn about the project |
Assistant |
Lets start by checking the files |
System |
|
Assistant |
Now lets check the README |
System |
|
Assistant |
Now we check main.py |
System |
|
Assistant |
The project is... |
Create vue project
User |
Create a new vue project with typescript and pinia named fancy-project |
Assistant |
Sure! Let's create a new vue project with TypeScript and Pinia named fancy-project: |
System |
|
- gptme.tools.shell.execute_shell(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Executes a shell command and returns the output.
- gptme.tools.shell.execute_shell_impl(cmd: str, _: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Execute shell command and format output.
- gptme.tools.shell.get_shell_command(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) str
Get the shell command from code/args/kwargs.
- gptme.tools.shell.preview_shell(cmd: str, _: Path | None) str
Prepare preview for shell command.
Python#
The assistant can execute Python code blocks.
It uses IPython to do so, and persists the IPython instance between calls to give a REPL-like experience.
Instructions
Use this tool to execute Python code in an interactive IPython session.
It will respond with the output and result of the execution.
Examples
Result of the last expression will be returned
User |
What is 2 + 2? |
Assistant |
|
System |
Executed code block. |
Write a function and call it
User |
compute fib 10 |
Assistant |
To compute the 10th Fibonacci number, we can run the following code: |
System |
Executed code block. |
- class gptme.tools.python.TeeIO
- __init__(original_stream)
- write(s)
Write string to file.
Returns the number of characters written, which is always equal to the length of the string.
- gptme.tools.python.execute_python(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: ~collections.abc.Callable[[str], bool] = <function <lambda>>) Generator[Message, None, None]
Executes a python codeblock and returns the output.
- gptme.tools.python.get_installed_python_libraries() list[str]
Check if a select list of Python libraries are installed.
- gptme.tools.python.register_function(func: T) T
Decorator to register a function to be available in the IPython instance.
Tmux#
You can use the tmux tool to run long-lived and/or interactive applications in a tmux session. Requires tmux to be installed.
This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands: npm run dev
, python3 server.py
, python3 train.py
, etc.
It allows for inspecting pane contents and sending input.
Instructions
You can use the tmux tool to run long-lived and/or interactive applications in a tmux session.
This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands are: `npm run dev`, `npm create vue@latest`, `python3 server.py`, `python3 train.py`, etc.
Available commands:
- new_session <command>: Start a new tmux session with the given command
- send_keys <session_id> <keys> [<keys>]: Send keys to the specified session
- inspect_pane <session_id>: Show the current content of the specified pane
- kill_session <session_id>: Terminate the specified tmux session
- list_sessions: Show all active tmux sessions
Examples
Managing a dev server
User |
Start the dev server |
Assistant |
Certainly! To start the dev server we should use tmux: |
System |
Running `npm run dev` in session gptme_1 |
User |
Can you show me the current content of the pane? |
Assistant |
Of course! Let's inspect the pane content: |
System |
|
User |
Stop the dev server |
Assistant |
I'll send 'Ctrl+C' to the pane to stop the server: |
System |
Sent 'C-c' to pane 0 |
Get info from ncurses applications
User |
start top and give me a summary |
Assistant |
Sure! Let's start the top command in a tmux session: |
System |
Running `top` in session gptme_1. |
Assistant |
The load is... |
Send keys to a session
User |
start ipython |
Assistant |
Let's start an ipython session: |
System |
Running `ipython` in session 2. |
User |
Run 'print("Hello, world!")' in the ipython session |
- gptme.tools.tmux.execute_tmux(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Executes a command in tmux and returns the output.
Subagent#
A subagent tool for gptme
Lets gptme break down a task into smaller parts, and delegate them to subagents.
Examples
User |
compute fib 13 using a subagent |
Assistant |
Starting a subagent to compute the 13th Fibonacci number. |
System |
Subagent started successfully. |
Assistant |
Now we need to wait for the subagent to finish the task. |
System |
{"status": "success", "result": "The 13th Fibonacci number is 233"}. |
- class gptme.tools.subagent.ReturnType
ReturnType(status: Literal[‘running’, ‘success’, ‘failure’], result: str | None = None)
- __init__(status: Literal['running', 'success', 'failure'], result: str | None = None) None
- class gptme.tools.subagent.Subagent
Subagent(agent_id: str, prompt: str, thread: threading.Thread, logdir: pathlib.Path)
- __init__(agent_id: str, prompt: str, thread: Thread, logdir: Path) None
- gptme.tools.subagent.subagent(agent_id: str, prompt: str)
Runs a subagent and returns the resulting JSON output.
- gptme.tools.subagent.subagent_status(agent_id: str) dict
Returns the status of a subagent.
- gptme.tools.subagent.subagent_wait(agent_id: str) dict
Waits for a subagent to finish. Timeout is 1 minute.
Read#
Read the contents of a file.
Instructions
Read the content of the given file. Use the `cat` command with the `shell` tool.
Examples
User |
read file.txt |
Assistant |
|
Save#
Gives the assistant the ability to save whole files, or append to them.
Instructions
Create or overwrite a file with the given content.
The path can be relative to the current directory, or absolute.
If the current directory changes, the path will be relative to the new directory.
Examples
User |
write a hello world script to hello.py |
Assistant |
|
System |
Saved to `hello.py` |
User |
make it all-caps |
Assistant |
|
System |
Saved to `hello.py` |
Instructions
Append the given content to a file.`.
Examples
User |
append a print "Hello world" to hello.py |
Assistant |
|
System |
Appended to `hello.py` |
- gptme.tools.save.execute_append(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Append code to a file.
- gptme.tools.save.execute_append_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Actual append implementation.
- gptme.tools.save.execute_save(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Save code to a file.
- gptme.tools.save.execute_save_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Actual save implementation.
- gptme.tools.save.preview_append(content: str, path: Path | None) str | None
Prepare preview content for append operation.
- gptme.tools.save.preview_save(content: str, path: Path | None) str | None
Prepare preview content for save operation.
Patch#
Gives the LLM agent the ability to patch text files, by using a adapted version git conflict markers.
Instructions
To patch/modify files, we use an adapted version of git conflict markers.
This can be used to edit files, without having to rewrite the whole file.
Only one patch block can be written per tool use. Extra ORIGINAL/UPDATED blocks will be ignored.
Try to keep the patch as small as possible. Avoid placeholders, as they may make the patch fail.
To keep the patch small, try to scope the patch to imports/function/class.
If the patch is large, consider using the save tool to rewrite the whole file.
Examples
User |
patch `src/hello.py` to ask for the name of the user |
Assistant |
|
System |
Patch applied |
- class gptme.tools.patch.Patch
Patch(original: str, updated: str)
- __init__(original: str, updated: str) None
- diff_minimal(strip_context=False) str
Show a minimal diff of the patch. Note that a minimal diff isn’t necessarily a unique diff.
- gptme.tools.patch.apply(codeblock: str, content: str) str
Applies multiple patches in
codeblock
tocontent
.
- gptme.tools.patch.execute_patch(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: ~collections.abc.Callable[[str], bool] = <function <lambda>>) Generator[Message, None, None]
Applies the patch.
- gptme.tools.patch.execute_patch_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
Actual patch implementation.
- gptme.tools.patch.preview_patch(content: str, path: Path | None) str | None
Prepare preview content for patch operation.
Screenshot#
A simple screenshot tool, using screencapture on macOS and scrot on Linux.
- gptme.tools.screenshot.screenshot(path: Path | None = None) Path
Take a screenshot and save it to a file.
Browser#
- Tools to let the assistant control a browser, including:
loading pages
reading their contents
searching the web
taking screenshots (Playwright only)
Two backends are available:
- Playwright backend:
Full browser automation with screenshots
Installation:
pipx install 'gptme[browser]' # We need to use the same version of Playwright as the one installed by gptme # when downloading the browser binaries. gptme will attempt this automatically PW_VERSION=$(pipx runpip gptme show playwright | grep Version | cut -d' ' -f2) pipx run playwright==$PW_VERSION install chromium-headless-shell
- Lynx backend:
Text-only browser for basic page reading and searching
No screenshot support
Installation:
# On Ubuntu sudo apt install lynx # On macOS brew install lynx # or any other way that gets you the `lynx` command
Note
This is an experimental feature. It needs some work to be more robust and useful.
Examples
Answer question from URL with browsing
User |
find out which is the latest ActivityWatch version from superuserlabs.org |
Assistant |
Let's browse the site. |
System |
|
Assistant |
Couldn't find the answer on the page. Following link to the ActivityWatch website. |
System |
|
Assistant |
The latest version of ActivityWatch is v0.12.2 |
Searching
User |
who is the founder of ActivityWatch? |
Assistant |
Let's search for that. |
System |
|
Assistant |
Following link to the ActivityWatch website. |
System |
|
Assistant |
The founder of ActivityWatch is Erik Bjäreholt. |
Take screenshot of page
User |
take a screenshot of the ActivityWatch website |
Assistant |
Certainly! I'll use the browser tool to screenshot the ActivityWatch website. |
System |
|
- gptme.tools.browser.read_url(url: str) str
Read a webpage in a text format.
- gptme.tools.browser.screenshot_url(url: str, path: Path | str | None = None) Path
Take a screenshot of a webpage.
- gptme.tools.browser.search(query: str, engine: Literal['google', 'duckduckgo'] = 'google') str
Search for a query on a search engine.
- gptme.tools.browser.search_playwright(query: str, engine: Literal['google', 'duckduckgo'] = 'google') str
Search for a query on a search engine using Playwright.
Vision#
Tools for viewing images, giving the assistant vision.
Requires a model which supports vision, such as GPT-4o, Anthropic, and Llama 3.2.
- gptme.tools.vision.view_image(image_path: Path | str) Message
View an image.
Chats#
List, search, and summarize past conversation logs.
Examples
Search for a specific topic in past conversations
User |
Can you find any mentions of "python" in our past conversations? |
Assistant |
Certainly! I'll search our past conversations for mentions of "python" using the search_chats function. |
- gptme.tools.chats.list_chats(max_results: int = 5, metadata=False, include_summary: bool = False) None
List recent chat conversations and optionally summarize them using an LLM.
- Parameters:
max_results (int) – Maximum number of conversations to display.
include_summary (bool) – Whether to include a summary of each conversation. If True, uses an LLM to generate a comprehensive summary. If False, uses a simple strategy showing snippets of the first and last messages.
- gptme.tools.chats.read_chat(conversation: str, max_results: int = 5, incl_system=False) None
Read a specific conversation log.
- Parameters:
conversation (str) – The name of the conversation to read.
max_results (int) – Maximum number of messages to display.
incl_system (bool) – Whether to include system messages.
- gptme.tools.chats.search_chats(query: str, max_results: int = 5, system=False) None
Search past conversation logs for the given query and print a summary of the results.
- Parameters:
query (str) – The search query.
max_results (int) – Maximum number of conversations to display.
system (bool) – Whether to include system messages in the search.
Computer#
Warning
The computer use interface is experimental and has serious security implications. Please use with caution and see Anthropic’s documentation on computer use for additional guidance.
Tool for computer interaction through X11, including screen capture, keyboard, and mouse control.
The computer tool provides direct interaction with the desktop environment through X11. Similar to Anthropic’s computer use demo, but integrated with gptme’s architecture.
Features
Keyboard input simulation
Mouse control (movement, clicks, dragging)
Screen capture with automatic scaling
Cursor position tracking
Installation
Requires X11 and xdotool:
# On Debian/Ubuntu
sudo apt install xdotool
# On Arch Linux
sudo pacman -S xdotool
Configuration
The tool uses these environment variables:
DISPLAY: X11 display to use (default: “:1”)
WIDTH: Screen width (default: 1024)
HEIGHT: Screen height (default: 768)
Usage
The tool supports these actions:
- Keyboard:
key: Send key sequence (e.g., “Return”, “Control_L+c”)
type: Type text with realistic delays
- Mouse:
mouse_move: Move mouse to coordinates
left_click: Click left mouse button
right_click: Click right mouse button
middle_click: Click middle mouse button
double_click: Double click left mouse button
left_click_drag: Click and drag to coordinates
- Screen:
screenshot: Take and view a screenshot
cursor_position: Get current mouse position
The tool automatically handles screen resolution scaling to ensure optimal performance with LLM vision capabilities.
Instructions
You can interact with the computer through X11 with the `computer` Python function.
Available actions:
- key: Send key sequence (e.g., "Return", "Control_L+c")
- type: Type text with realistic delays
- mouse_move: Move mouse to coordinates
- left_click, right_click, middle_click, double_click: Mouse clicks
- left_click_drag: Click and drag to coordinates
- screenshot: Take and view a screenshot
- cursor_position: Get current mouse position
Examples
User |
Take a screenshot of the desktop |
Assistant |
I'll capture the current screen. |
System |
Viewing image... |
User |
Type "Hello, World!" into the active window |
Assistant |
I'll type the text with realistic delays. |
System |
Typed text: Hello, World! |
User |
Move the mouse to coordinates (100, 200) and click |
Assistant |
I'll move the mouse and perform a left click. |
System |
Moved mouse to 100,200 |
System |
Performed left_click |
User |
Press Ctrl+C |
Assistant |
I'll send the Control+C key sequence. |
System |
Sent key sequence: Control_L+c |
- gptme.tools.computer.computer(action: Literal['key', 'type', 'mouse_move', 'left_click', 'left_click_drag', 'right_click', 'middle_click', 'double_click', 'screenshot', 'cursor_position'], text: str | None = None, coordinate: tuple[int, int] | None = None) Message | None
Perform computer interactions through X11.
- Parameters:
action – The type of action to perform
text – Text to type or key sequence to send
coordinate – X,Y coordinates for mouse actions
RAG#
RAG (Retrieval-Augmented Generation) tool for context-aware assistance.
The RAG tool provides context-aware assistance by indexing and searching project documentation.
Installation
The RAG tool requires the gptme-rag
CLI to be installed:
pipx install gptme-rag
Configuration
Configure RAG in your gptme.toml
:
[rag]
enabled = true
Features
Manual Search and Indexing - Index project documentation with
rag_index
- Search indexed documents withrag_search
- Check index status withrag_status
Automatic Context Enhancement - Retrieves semantically similar documents - Preserves conversation flow with hidden context messages
Instructions
Use RAG to index and search project documentation.
Examples
User |
Index the current directory |
Assistant |
Let me index the current directory with RAG. |
System |
Indexed 1 paths |
User |
Search for documentation about functions |
Assistant |
I'll search for function-related documentation. |
System |
### docs/api.md Functions are documented using docstrings... |
User |
Show index status |
Assistant |
I'll check the current status of the RAG index. |
System |
Index contains 42 documents |
- gptme.tools.rag.init() ToolSpec
Initialize the RAG tool.
- gptme.tools.rag.rag_enhance_messages(messages: list[Message]) list[Message]
Enhance messages with context from RAG.
- gptme.tools.rag.rag_index(*paths: str, glob: str | None = None) str
Index documents in specified paths.
- gptme.tools.rag.rag_search(query: str, return_full: bool = False) str
Search indexed documents.
- gptme.tools.rag.rag_status() str
Show index status.