Tools#
Tools available in gptme.
The tools can be grouped into the following categories:
Execution
Files
Network
Vision
Chat management
Context management
Shell#
The assistant can execute shell commands with bash by outputting code blocks with shell as the language.
Instructions
When you send a message containing bash code, it will be executed in a stateful bash shell.
The shell will respond with the output of the execution.
Do not use EOF/HereDoc syntax to send multiline commands, as the assistant will not be able to handle it.
These programs are available, among others:
- git
- apt-get
- pandoc
- docker
Examples
User |
list the current directory |
Assistant |
To list the files in the current directory, use `ls`: |
System |
Ran command: `ls` |
The assistant can learn context by exploring the filesystem
User |
learn about the project |
Assistant |
Lets start by checking the files |
System |
|
Assistant |
Now lets check the README |
System |
|
Assistant |
Now we check main.py |
System |
|
Assistant |
The project is... |
Create vue project
User |
Create a new vue project with typescript and pinia named fancy-project |
Assistant |
Sure! Let's create a new vue project with TypeScript and Pinia named fancy-project: |
System |
|
- gptme.tools.shell.execute_shell(code: str, args: list[str], confirm: Callable[[str], bool]) Generator[Message, None, None]
Executes a shell command and returns the output.
Python#
The assistant can execute Python code blocks.
It uses IPython to do so, and persists the IPython instance between calls to give a REPL-like experience.
Instructions
To execute Python code in an interactive IPython session, send a codeblock using the `ipython` language tag.
It will respond with the output and result of the execution.
If you first write the code in a normal python codeblock, remember to also execute it with the ipython codeblock.
Examples
Results of the last expression will be displayed, IPython-style:
User |
What is 2 + 2? |
Assistant |
|
System |
Executed code block. |
It can write an example and then execute it:
User |
compute fib 10 |
Assistant |
To compute the 10th Fibonacci number, we can execute this code: |
System |
Executed code block. |
- gptme.tools.python.execute_python(code: str, args: list[str], confirm: ~collections.abc.Callable[[str], bool] = <function <lambda>>) Generator[Message, None, None]
Executes a python codeblock and returns the output.
- gptme.tools.python.get_installed_python_libraries() set[str]
Check if a select list of Python libraries are installed.
- gptme.tools.python.register_function(func: T) T
Decorator to register a function to be available in the IPython instance.
Tmux#
You can use the tmux tool to run long-lived and/or interactive applications in a tmux session. Requires tmux to be installed.
This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands: npm run dev
, python3 server.py
, python3 train.py
, etc.
It allows for inspecting pane contents and sending input.
Instructions
You can use the tmux tool to run long-lived and/or interactive applications in a tmux session.
This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands are: `npm run dev`, `npm create vue@latest`, `python3 server.py`, `python3 train.py`, etc.
Available commands:
- new_session <command>: Start a new tmux session with the given command
- send_keys <session_id> <keys> [<keys>]: Send keys to the specified session
- inspect_pane <session_id>: Show the current content of the specified pane
- kill_session <session_id>: Terminate the specified tmux session
- list_sessions: Show all active tmux sessions
Examples
Managing a dev server
User |
Start the dev server |
Assistant |
Certainly! To start the dev server we should use tmux: |
System |
Running `npm run dev` in session 0 |
User |
Can you show me the current content of the pane? |
Assistant |
Of course! Let's inspect the pane content: |
System |
|
User |
Stop the dev server |
Assistant |
I'll send 'Ctrl+C' to the pane to stop the server: |
System |
Sent 'C-c' to pane 0 |
Get info from ncurses applications
User |
start top and give me a summary |
Assistant |
Sure! Let's start the top command in a tmux session: |
System |
Running `top` in session 1. |
Assistant |
The load is... |
- gptme.tools.tmux.execute_tmux(code: str, args: list[str], confirm: Callable[[str], bool]) Generator[Message, None, None]
Executes a command in tmux and returns the output.
Subagent#
A subagent tool for gptme
Lets gptme break down a task into smaller parts, and delegate them to subagents.
Examples
User |
compute fib 13 using a subagent |
Assistant |
Starting a subagent to compute the 13th Fibonacci number. |
System |
Subagent started successfully. |
Assistant |
Now we need to wait for the subagent to finish the task. |
System |
{"status": "success", "result": "The 13th Fibonacci number is 233"}. |
- class gptme.tools.subagent.ReturnType
ReturnType(status: Literal[‘running’, ‘success’, ‘failure’], result: str | None = None)
- __init__(status: Literal['running', 'success', 'failure'], result: str | None = None) None
- class gptme.tools.subagent.Subagent
Subagent(agent_id: str, prompt: str, thread: threading.Thread, logdir: pathlib.Path)
- __init__(agent_id: str, prompt: str, thread: Thread, logdir: Path) None
- gptme.tools.subagent.subagent(agent_id: str, prompt: str)
Runs a subagent and returns the resulting JSON output.
- gptme.tools.subagent.subagent_status(agent_id: str) dict
Returns the status of a subagent.
- gptme.tools.subagent.subagent_wait(agent_id: str) dict
Waits for a subagent to finish. Timeout is 1 minute.
Read#
Read the contents of a file.
Instructions
Read files using `cat`
Examples
User |
read file.txt |
Assistant |
|
Save#
Gives the assistant the ability to save whole files, or append to them.
Instructions
To write to a file, use a code block with the language tag: `save <path>`
The path can be relative to the current directory, or absolute.
If the current directory changes, the path will be relative to the new directory.
Examples
User |
write a hello world script to hello.py |
System |
Saved to `hello.py` |
User |
make it all-caps |
System |
Saved to `hello.py` |
Instructions
To append to a file, use a code block with the language tag: `append <path>`
Examples
User |
append a print "Hello world" to hello.py |
Assistant |
|
System |
Appended to `hello.py` |
- gptme.tools.save.execute_append(code: str, args: list[str], confirm: Callable[[str], bool]) Generator[Message, None, None]
Append code to a file.
- gptme.tools.save.execute_save(code: str, args: list[str], confirm: Callable[[str], bool]) Generator[Message, None, None]
Save code to a file.
Patch#
Gives the LLM agent the ability to patch text files, by using a adapted version git conflict markers.
Instructions
To patch/modify files, we use an adapted version of git conflict markers.
This can be used to edit files, without having to rewrite the whole file.
Only one patch block can be written per codeblock. Extra ORIGINAL/UPDATED blocks will be ignored.
Try to keep the patch as small as possible. Avoid placeholders, as they may make the patch fail.
To keep the patch small, try to scope the patch to imports/function/class.
If the patch is large, consider using the save tool to rewrite the whole file.
The $FILENAME parameter MUST be on the same line as the code block start, not on the line after.
The patch block should be written in the following format:
```patch $FILENAME
<<<<<<< ORIGINAL
$ORIGINAL_CONTENT
=======
$UPDATED_CONTENT
>>>>>>> UPDATED
```
Examples
User |
patch the file `hello.py` to ask for the name of the user |
Assistant |
|
System |
Patch applied |
- class gptme.tools.patch.Patch
Patch(original: str, updated: str)
- __init__(original: str, updated: str) None
- diff_minimal(strip_context=False) str
Show a minimal diff of the patch. Note that a minimal diff isn’t necessarily a unique diff.
- gptme.tools.patch.apply(codeblock: str, content: str) str
Applies multiple patches in
codeblock
tocontent
.
- gptme.tools.patch.execute_patch(code: str, args: list[str], confirm: Callable[[str], bool]) Generator[Message, None, None]
Applies the patch.
Screenshot#
A simple screenshot tool, using screencapture on macOS and scrot on Linux.
- gptme.tools.screenshot.screenshot(path: Path | None = None) Path
Take a screenshot and save it to a file.
Browser#
- Tools to let the assistant control a browser, including:
loading pages
reading their contents
viewing them through screenshots
searching
Note
This is an experimental feature. It needs some work to be more robust and useful.
To use the browser tool, you need to have the playwright Python package installed along with gptme, which you can install with:
pipx install gptme[browser]
gptme '/shell playwright install chromium'
Instructions
To browse the web, you can use the `read_url`, `search`, and `screenshot_url` functions in Python.
Examples
Answer question from URL with browsing
User |
find out which is the latest ActivityWatch version from superuserlabs.org |
Assistant |
Let's browse the site. |
System |
|
Assistant |
Couldn't find the answer on the page. Following link to the ActivityWatch website. |
System |
|
Assistant |
The latest version of ActivityWatch is v0.12.2 |
Searching
User |
who is the founder of ActivityWatch? |
Assistant |
Let's search for that. |
System |
|
Assistant |
Following link to the ActivityWatch website. |
System |
|
Assistant |
The founder of ActivityWatch is Erik Bjäreholt. |
Take screenshot of page
User |
take a screenshot of the ActivityWatch website |
Assistant |
Certainly! I'll use the browser tool to screenshot the ActivityWatch website. |
System |
|
- gptme.tools.browser.read_url(url: str) str
Read a webpage in a text format.
- gptme.tools.browser.screenshot_url(url: str, path: Path | str | None = None) Path
Take a screenshot of a webpage.
- gptme.tools.browser.search(query: str, engine: Literal['google', 'duckduckgo'] = 'google') str
Search for a query on a search engine.
- gptme.tools.browser.search_playwright(query: str, engine: Literal['google', 'duckduckgo'] = 'google') str
Search for a query on a search engine using Playwright.
Vision#
Tools for viewing images, giving the assistant vision.
Requires a model which supports vision, such as GPT-4o, Anthropic, and Llama 3.2.
- gptme.tools.vision.view_image(image_path: Path | str) Message
View an image.
Chats#
List, search, and summarize past conversation logs.
Instructions
The chats tool allows you to list, search, and summarize past conversation logs.
Examples
Search for a specific topic in past conversations
User |
Can you find any mentions of "python" in our past conversations? |
Assistant |
Certainly! I'll search our past conversations for mentions of "python" using the search_chats function. |
- gptme.tools.chats.list_chats(max_results: int = 5, metadata=False, include_summary: bool = False) None
List recent chat conversations and optionally summarize them using an LLM.
- Parameters:
max_results (int) – Maximum number of conversations to display.
include_summary (bool) – Whether to include a summary of each conversation. If True, uses an LLM to generate a comprehensive summary. If False, uses a simple strategy showing snippets of the first and last messages.
- gptme.tools.chats.read_chat(conversation: str, max_results: int = 5, incl_system=False) None
Read a specific conversation log.
- Parameters:
conversation (str) – The name of the conversation to read.
max_results (int) – Maximum number of messages to display.
incl_system (bool) – Whether to include system messages.
- gptme.tools.chats.search_chats(query: str, max_results: int = 5, system=False) None
Search past conversation logs for the given query and print a summary of the results.
- Parameters:
query (str) – The search query.
max_results (int) – Maximum number of conversations to display.
system (bool) – Whether to include system messages in the search.
Computer#
Warning
The computer use interface is experimental and has serious security implications. Please use with caution and see Anthropic’s documentation on computer use for additional guidance.
Tool for computer interaction through X11, including screen capture, keyboard, and mouse control.
The computer tool provides direct interaction with the desktop environment through X11. Similar to Anthropic’s computer use demo, but integrated with gptme’s architecture.
Features
Keyboard input simulation
Mouse control (movement, clicks, dragging)
Screen capture with automatic scaling
Cursor position tracking
Installation
Requires X11 and xdotool:
# On Debian/Ubuntu
sudo apt install xdotool
# On Arch Linux
sudo pacman -S xdotool
Configuration
The tool uses these environment variables:
DISPLAY: X11 display to use (default: “:1”)
WIDTH: Screen width (default: 1024)
HEIGHT: Screen height (default: 768)
Usage
The tool supports these actions:
- Keyboard:
key: Send key sequence (e.g., “Return”, “Control_L+c”)
type: Type text with realistic delays
- Mouse:
mouse_move: Move mouse to coordinates
left_click: Click left mouse button
right_click: Click right mouse button
middle_click: Click middle mouse button
double_click: Double click left mouse button
left_click_drag: Click and drag to coordinates
- Screen:
screenshot: Take and view a screenshot
cursor_position: Get current mouse position
The tool automatically handles screen resolution scaling to ensure optimal performance with LLM vision capabilities.
Instructions
Use this tool to interact with the computer through X11.
Available actions:
- key: Send key sequence (e.g., "Return", "Control_L+c")
- type: Type text with realistic delays
- mouse_move: Move mouse to coordinates
- left_click, right_click, middle_click, double_click: Mouse clicks
- left_click_drag: Click and drag to coordinates
- screenshot: Take and view a screenshot
- cursor_position: Get current mouse position
Examples
User |
Take a screenshot of the desktop |
Assistant |
I'll capture the current screen. |
System |
Viewing image... |
User |
Type "Hello, World!" into the active window |
Assistant |
I'll type the text with realistic delays. |
System |
Typed text: Hello, World! |
User |
Move the mouse to coordinates (100, 200) and click |
Assistant |
I'll move the mouse and perform a left click. |
System |
Moved mouse to 100,200 |
System |
Performed left_click |
User |
Press Ctrl+C |
Assistant |
I'll send the Control+C key sequence. |
System |
Sent key sequence: Control_L+c |
- gptme.tools.computer.computer(action: Literal['key', 'type', 'mouse_move', 'left_click', 'left_click_drag', 'right_click', 'middle_click', 'double_click', 'screenshot', 'cursor_position'], text: str | None = None, coordinate: tuple[int, int] | None = None) Message | None
Perform computer interactions through X11.
- Parameters:
action – The type of action to perform
text – Text to type or key sequence to send
coordinate – X,Y coordinates for mouse actions
RAG#
RAG (Retrieval-Augmented Generation) tool for context-aware assistance.
The RAG tool provides context-aware assistance by indexing and searching project documentation.
Installation
The RAG tool requires the gptme-rag
package. Install it with:
pip install "gptme[rag]"
Configuration
Configure RAG in your gptme.toml
:
[rag]
# Storage configuration
index_path = "~/.cache/gptme/rag" # Where to store the index
collection = "gptme_docs" # Collection name for documents
# Context enhancement settings
max_tokens = 2000 # Maximum tokens for context window
auto_context = true # Enable automatic context enhancement
min_relevance = 0.5 # Minimum relevance score for including context
max_results = 5 # Maximum number of results to consider
# Cache configuration
[rag.cache]
max_embeddings = 10000 # Maximum number of cached embeddings
max_searches = 1000 # Maximum number of cached search results
embedding_ttl = 86400 # Embedding cache TTL in seconds (24h)
search_ttl = 3600 # Search cache TTL in seconds (1h)
Features
Manual Search and Indexing - Index project documentation with
rag_index
- Search indexed documents withrag_search
- Check index status withrag_status
Automatic Context Enhancement - Automatically adds relevant context to user messages - Retrieves semantically similar documents - Manages token budget to avoid context overflow - Preserves conversation flow with hidden context messages
Performance Optimization - Intelligent caching system for embeddings and search results - Configurable cache sizes and TTLs - Automatic cache invalidation - Memory-efficient storage
Benefits
Better informed responses through relevant documentation
Reduced need for manual context inclusion
Automatic token management
Seamless integration with conversation flow
Instructions
Use RAG to index and search project documentation.
Examples
User |
Index the current directory |
Assistant |
Let me index the current directory with RAG. |
System |
Indexed 1 paths |
User |
Search for documentation about functions |
Assistant |
I'll search for function-related documentation. |
System |
### docs/api.md Functions are documented using docstrings... |
User |
Show index status |
Assistant |
I'll check the current status of the RAG index. |
System |
Index contains 42 documents |
- gptme.tools.rag.init() ToolSpec
Initialize the RAG tool.
- gptme.tools.rag.rag_index(*paths: str, glob: str | None = None) str
Index documents in specified paths.
- gptme.tools.rag.rag_search(query: str) str
Search indexed documents.
- gptme.tools.rag.rag_status() str
Show index status.