Tools#
gptme’s tools enable AI agents to execute code, edit files, browse the web, process images, and interact with your computer.
Overview#
📁 File System#
💻 Code & Development#
🌐 Web & Research#
👁️ Visual & Interactive#
- Vision - Analyze images, diagrams, and visual content 
- Screenshot - Capture your screen for visual context 
- Computer - Control desktop applications through visual interface 
⚡ Advanced Workflows#
Combinations#
The real power emerges when tools work together:
- Web Research + Code: Browser + Python - Browse documentation and implement solutions 
- Visual Development: Vision + Patch - Analyze UI mockups and update code accordingly 
- System Automation: Shell + Python - Combine system commands with data processing 
- Interactive Debugging: Screenshot + Computer - Visual debugging and interface automation 
- Knowledge-Driven Development: RAG + Chats - Learn from documentation and past conversations 
Shell#
The assistant can execute shell commands with bash by outputting code blocks with shell as the language.
- Configuration:
- GPTME_SHELL_TIMEOUT: Environment variable to configure command timeout (set before starting gptme)
- Set to a number (e.g., 30) for timeout in seconds 
- Set to 0 to disable timeout 
- Invalid values default to 1200 seconds (20 minutes) 
- If not set, defaults to 1200 seconds (20 minutes) 
 
 
Instructions
The given command will be executed in a stateful bash shell.
The shell tool will respond with the output of the execution.
These programs are available, among others:
- apt-get
- docker
- git
- pandoc
Examples
| User | list the current directory | 
| Assistant | To list the files in the current directory, use `ls`: | 
| System | Ran command: `ls` | 
The assistant can learn context by exploring the filesystem
| User | learn about the project | 
| Assistant | Lets start by checking the files | 
| System | 
 | 
| Assistant | Now lets check the README | 
| System | 
 | 
| Assistant | Now we check main.py | 
| System | 
 | 
| Assistant | The project is... | 
Create vue project
| User | Create a new vue project with typescript and pinia named fancy-project | 
| Assistant | Sure! Let's create a new vue project with TypeScript and Pinia named fancy-project: | 
| System | 
 | 
Proper quoting for complex content
| User | add a comment with backticks and special characters | 
| Assistant | When passing complex content with special characters, use single quotes to prevent shell interpretation: | 
- gptme.tools.shell.check_with_shellcheck(cmd: str) tuple[bool, bool, str]
- Run shellcheck on command if available. - Returns: Tuple of (has_issues: bool, should_block: bool, message: str) - has_issues: True if any shellcheck issues found - should_block: True if critical error codes found that should prevent execution - message: Description of issues found - Note - Requires shellcheck (sudo apt install shellcheck) 
- Can be disabled with GPTME_SHELLCHECK=off 
- Non-blocking if shellcheck unavailable 
- SC2164 (cd error handling) excluded by default 
- Custom excludes via GPTME_SHELLCHECK_EXCLUDE (comma-separated codes) 
- Error codes via GPTME_SHELLCHECK_ERROR_CODES (comma-separated, default: SC2006) 
- Error codes block execution, other codes show warnings only 
 
- gptme.tools.shell.execute_shell(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
- Executes a shell command and returns the output. 
- gptme.tools.shell.execute_shell_impl(cmd: str, logdir: Path | None, confirm: Callable[[str], bool], timeout: float | None = None) Generator[Message, None, None]
- Execute shell command and format output. 
- gptme.tools.shell.get_shell_command(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) str
- Get the shell command from code/args/kwargs. 
Python#
The assistant can execute Python code blocks.
It uses IPython to do so, and persists the IPython instance between calls to give a REPL-like experience.
Instructions
Use this tool to execute Python code in an interactive IPython session.
It will respond with the output and result of the execution.
Examples
Result of the last expression will be returned
| User | What is 2 + 2? | 
| Assistant | 
 | 
| System | Executed code block. | 
Write a function and call it
| User | compute fib 10 | 
| Assistant | To compute the 10th Fibonacci number, we can run the following code: | 
| System | Executed code block. | 
- class gptme.tools.python.TeeIO
- __init__(original_stream)
 - write(s)
- Write string to file. - Returns the number of characters written, which is always equal to the length of the string. 
 
- gptme.tools.python.execute_python(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: ~collections.abc.Callable[[str], bool] = <function <lambda>>) Generator[Message, None, None]
- Executes a python codeblock and returns the output. 
- gptme.tools.python.get_installed_python_libraries() list[str]
- Check if a select list of Python libraries are installed. 
- gptme.tools.python.register_function(func: T) T
- Decorator to register a function to be available in the IPython instance. 
Tmux#
You can use the tmux tool to run long-lived and/or interactive applications in a tmux session. Requires tmux to be installed.
This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands: npm run dev, python3 server.py, python3 train.py, etc.
It allows for inspecting pane contents and sending input.
Instructions
You can use the tmux tool to run long-lived and/or interactive applications in a tmux session.
This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands are: `npm run dev`, `npm create vue@latest`, `python3 server.py`, `python3 train.py`, etc.
Available commands:
- new_session <command>: Start a new tmux session with the given command
- send_keys <session_id> <keys> [<keys>]: Send keys to the specified session
- inspect_pane <session_id>: Show the current content of the specified pane
- kill_session <session_id>: Terminate the specified tmux session
- list_sessions: Show all active tmux sessions
Examples
Managing a dev server
| User | Start the dev server | 
| Assistant | Certainly! To start the dev server we should use tmux: | 
| System | Running `npm run dev` in session gptme_1 | 
| User | Can you show me the current content of the pane? | 
| Assistant | Of course! Let's inspect the pane content: | 
| System | 
 | 
| User | Stop the dev server | 
| Assistant | I'll send 'Ctrl+C' to the pane to stop the server: | 
| System | Sent 'C-c' to pane 0 | 
Get info from ncurses applications
| User | start top and give me a summary | 
| Assistant | Sure! Let's start the top command in a tmux session: | 
| System | Running `top` in session gptme_1. | 
| Assistant | The load is... | 
Send keys to a session
| User | start ipython | 
| Assistant | Let's start an ipython session: | 
| System | Running `ipython` in session 2. | 
| User | Run 'print("Hello, world!")' in the ipython session | 
Subagent#
A subagent tool for gptme
Lets gptme break down a task into smaller parts, and delegate them to subagents.
Examples
| User | compute fib 13 using a subagent | 
| Assistant | Starting a subagent to compute the 13th Fibonacci number. | 
| System | Subagent started successfully. | 
| Assistant | Now we need to wait for the subagent to finish the task. | 
| System | {"status": "success", "result": "The 13th Fibonacci number is 233"}. | 
- class gptme.tools.subagent.ReturnType
- ReturnType(status: Literal[‘running’, ‘success’, ‘failure’], result: str | None = None) 
- class gptme.tools.subagent.Subagent
- Subagent(agent_id: str, prompt: str, thread: threading.Thread, logdir: pathlib.Path) 
Read#
Read the contents of a file.
Instructions
Read the content of the given file. Use the `cat` command with the `shell` tool.
Examples
| User | read file.txt | 
| Assistant | 
 | 
Save#
Gives the assistant the ability to save whole files, or append to them.
Instructions
Create or overwrite a file with the given content.
The path can be relative to the current directory, or absolute.
If the current directory changes, the path will be relative to the new directory.
Examples
| User | write a hello world script to hello.py | 
| Assistant | 
 | 
| System | Saved to `hello.py` | 
| User | make it all-caps | 
| Assistant | 
 | 
| System | Saved to `hello.py` | 
Instructions
Append the given content to a file.`.
Examples
| User | append a print "Hello world" to hello.py | 
| Assistant | 
 | 
| System | Appended to `hello.py` | 
- gptme.tools.save.check_for_placeholders(content: str) bool
- Check if content contains placeholder lines. 
- gptme.tools.save.execute_append(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
- Append code to a file. 
- gptme.tools.save.execute_append_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
- Actual append implementation. 
- gptme.tools.save.execute_save(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
- Save code to a file. 
- gptme.tools.save.execute_save_impl(content: str, path: Path | None, confirm: Callable[[str], bool]) Generator[Message, None, None]
- Actual save implementation. 
Patch#
Gives the LLM agent the ability to patch text files, by using a adapted version git conflict markers.
- Environment Variables:
- GPTME_PATCH_RECOVERY: If set to “true” or “1”, returns the file content in error messages
- when patches don’t match. This helps the assistant recover faster by seeing the actual file contents. 
 
Instructions
To patch/modify files, we use an adapted version of git conflict markers.
This can be used to edit files, without having to rewrite the whole file.
Only one patch block can be written per tool use. Extra ORIGINAL/UPDATED blocks will be ignored.
Try to keep the patch as small as possible. Avoid placeholders, as they may make the patch fail.
To keep the patch small, try to scope the patch to imports/function/class.
If the patch is large, consider using the save tool to rewrite the whole file.
Examples
| User | patch `src/hello.py` to ask for the name of the user | 
| Assistant | 
 | 
| System | Patch applied | 
- class gptme.tools.patch.Patch
- Patch(original: str, updated: str) - diff_minimal(strip_context=False) str
- Show a minimal diff of the patch. Note that a minimal diff isn’t necessarily a unique diff. 
 
- gptme.tools.patch.apply(codeblock: str, content: str) str
- Applies multiple patches in - codeblockto- content.
- gptme.tools.patch.execute_patch(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None, confirm: ~collections.abc.Callable[[str], bool] = <function <lambda>>) Generator[Message, None, None]
- Applies the patch. 
Vision#
Tools for viewing images, giving the assistant vision.
Requires a model which supports vision, such as GPT-4o, Anthropic, and Llama 3.2.
Screenshot#
A simple screenshot tool, using screencapture on macOS and scrot or gnome-screenshot on Linux.
Browser#
- Tools to let the assistant control a browser, including:
- loading pages 
- reading their contents 
- searching the web 
- taking screenshots (Playwright only) 
 
Two backends are available:
- Playwright backend:
- Full browser automation with screenshots 
- Installation: - pipx install 'gptme[browser]' # We need to use the same version of Playwright as the one installed by gptme # when downloading the browser binaries. gptme will attempt this automatically PW_VERSION=$(pipx runpip gptme show playwright | grep Version | cut -d' ' -f2) pipx run playwright==$PW_VERSION install chromium-headless-shell 
 
- Lynx backend:
- Text-only browser for basic page reading and searching 
- No screenshot support 
- Installation: - # On Ubuntu sudo apt install lynx # On macOS brew install lynx # or any other way that gets you the `lynx` command 
 
Note
This is an experimental feature. It needs some work to be more robust and useful.
Examples
Reading docs
| User | how does gptme work? | 
| Assistant | Let's read the docs. | 
Answer question from URL with browsing
| User | find out which is the latest ActivityWatch version from superuserlabs.org | 
| Assistant | Let's browse the site. | 
| System | 
 | 
| Assistant | Couldn't find the answer on the page. Following link to the ActivityWatch website. | 
| System | 
 | 
| Assistant | The latest version of ActivityWatch is v0.12.2 | 
Searching
| User | who is the founder of ActivityWatch? | 
| Assistant | Let's search for that. | 
| System | 
 | 
| Assistant | Following link to the ActivityWatch website. | 
| System | 
 | 
| Assistant | The founder of ActivityWatch is Erik Bjäreholt. | 
Searching with Perplexity
| User | what are the latest developments in AI? | 
| Assistant | Let me search for that using Perplexity AI. | 
| System | 
 | 
| Assistant | Based on the search results, here are the latest AI developments... | 
Take screenshot of page
| User | take a screenshot of the ActivityWatch website | 
| Assistant | Certainly! I'll use the browser tool to screenshot the ActivityWatch website. | 
| System | 
 | 
Read URL and check browser logs
| User | read this page and check if there are any console errors | 
| Assistant | I'll read the page first and then check the browser logs. | 
| System | 
 | 
| Assistant | Now let me check the browser console logs: | 
| System | 
 | 
- gptme.tools.browser.read_logs() str
- Read browser console logs from the last read URL. 
- gptme.tools.browser.screenshot_url(url: str, path: Path | str | None = None) Path
- Take a screenshot of a webpage. 
Chats#
List, search, and summarize past conversation logs.
Examples
Search for a specific topic in past conversations
| User | Can you find any mentions of "python" in our past conversations? | 
| Assistant | Certainly! I'll search our past conversations for mentions of "python" using the search_chats function. | 
- gptme.tools.chats.list_chats(max_results: int = 5, metadata=False, include_summary: bool = False) None
- List recent chat conversations and optionally summarize them using an LLM. 
- gptme.tools.chats.read_chat(id: str, max_results: int = 5, incl_system=False) None
- Read a specific conversation log. 
Computer#
Warning
The computer use interface is experimental and has serious security implications. Please use with caution and see Anthropic’s documentation on computer use for additional guidance.
Tool for computer interaction for X11 or macOS environments, including screen capture, keyboard, and mouse control.
The computer tool provides direct interaction with the desktop environment. Similar to Anthropic’s computer use demo, but integrated with gptme’s architecture.
Features
- Keyboard input simulation 
- Mouse control (movement, clicks, dragging) 
- Screen capture with automatic scaling 
- Cursor position tracking 
Installation
On Linux, requires X11 and xdotool:
# On Debian/Ubuntu
sudo apt install xdotool
# On Arch Linux
sudo pacman -S xdotool
On macOS, uses native screencapture and external tool cliclick:
brew install cliclick
You need to give your terminal both screen recording and accessibility permissions in System Preferences.
Configuration
The tool uses these environment variables:
- DISPLAY: X11 display to use (default: “:1”, Linux only) 
- WIDTH: Screen width (default: 1024) 
- HEIGHT: Screen height (default: 768) 
Usage
The tool supports these actions:
- Keyboard:
- key: Send key sequence (e.g., “Return”, “Control_L+c”) 
- type: Type text with realistic delays 
 
- Mouse:
- mouse_move: Move mouse to coordinates 
- left_click: Click left mouse button 
- right_click: Click right mouse button 
- middle_click: Click middle mouse button 
- double_click: Double click left mouse button 
- left_click_drag: Click and drag to coordinates 
 
- Screen:
- screenshot: Take and view a screenshot 
- cursor_position: Get current mouse position 
 
The tool automatically handles screen resolution scaling to ensure optimal performance with LLM vision capabilities.
Tips for Complex Operations
For complex operations involving multiple keypresses, you can use semicolon-separated sequences with key:
Examples
- Filling a login form: - t:username;kp:tab;t:password;kp:return
- Switching applications: - cmd+tabon macOS,- alt+Tabon Linux
- (macOS) Opening Spotlight and searching: - cmd+space;t:firefox;return
Using a single sequence for complex operations ensures proper timing and recognition of keyboard shortcuts.
Instructions
You can interact with the computer through the `computer` Python function.
Works on both Linux (X11) and macOS.
The key input syntax works consistently across platforms with:
Available actions:
- key: Send key sequence using a unified syntax:
  - Type text: "t:Hello World"
  - Press key: "return", "esc", "tab"
  - Key combination: "ctrl+c", "cmd+space"
  - Chain commands: "cmd+space;t:firefox;return"
- type: Type text with realistic delays (legacy method)
- mouse_move: Move mouse to coordinates
- left_click, right_click, middle_click, double_click: Mouse clicks
- left_click_drag: Click and drag to coordinates
- screenshot: Take and view a screenshot
- cursor_position: Get current mouse position
Note: Key names are automatically mapped between platforms.
Common modifiers (ctrl, alt, cmd/super, shift) work consistently across platforms.
Examples
| User | Take a screenshot of the desktop | 
| Assistant | I'll capture the screen using the screenshot tool. | 
| System | Viewing image... | 
| User | Type "Hello, World!" into the active window | 
| Assistant | I'll type the text with realistic delays. | 
| System | Typed text: Hello, World! | 
| User | Move the mouse to coordinates (100, 200) and click | 
| Assistant | I'll move the mouse and perform a left click. | 
| System | Moved mouse to 100,200 | 
| System | Performed left_click | 
| User | Get the current mouse position | 
| Assistant | I'll get the cursor position. | 
| System | Cursor position: X=512,Y=384 | 
| User | Double-click at current position | 
| Assistant | I'll perform a double-click. | 
| System | Performed double_click | 
| User | Open a new browser tab | 
| Assistant | I'll open a new browser tab. | 
| System | Sent key sequence: ctrl+t | 
- class gptme.tools.computer.ComboOperation
- class gptme.tools.computer.KeyOperation
- class gptme.tools.computer.TextOperation
- gptme.tools.computer.computer(action: Literal['key', 'type', 'mouse_move', 'left_click', 'left_click_drag', 'right_click', 'middle_click', 'double_click', 'screenshot', 'cursor_position'], text: str | None = None, coordinate: tuple[int, int] | None = None) Message | None
- Perform computer interactions in X11 or macOS environments. - Parameters:
- action – The type of action to perform 
- text – Text to type or key sequence to send 
- coordinate – X,Y coordinates for mouse actions 
 
 
RAG#
RAG (Retrieval-Augmented Generation) tool for context-aware assistance.
The RAG tool provides context-aware assistance by indexing and semantically searching text files.
Installation
The RAG tool requires the gptme-rag CLI to be installed:
pipx install gptme-rag
Configuration
Configure RAG in your gptme.toml:
[rag]
enabled = true
post_process = false # Whether to post-process the context with an LLM to extract the most relevant information
post_process_model = "openai/gpt-4o-mini" # Which model to use for post-processing
post_process_prompt = "" # Optional prompt to use for post-processing (overrides default prompt)
workspace_only = true # Whether to only search in the workspace directory, or the whole RAG index
paths = [] # List of paths to include in the RAG index. Has no effect if workspace_only is true.
Features
- Manual Search and Indexing - Index project documentation with - rag_index
- Search indexed documents with - rag_search
- Check index status with - rag_status
 
- Automatic Context Enhancement - Retrieves semantically similar documents 
- Preserves conversation flow with hidden context messages 
 
Instructions
Use RAG to index and semantically search through text files such as documentation and code.
Examples
| User | Index the current directory | 
| Assistant | Let me index the current directory with RAG. | 
| System | Indexed 1 paths | 
| User | Search for documentation about functions | 
| Assistant | I'll search for function-related documentation. | 
| System | ### docs/api.md Functions are documented using docstrings... | 
| User | Show index status | 
| Assistant | I'll check the current status of the RAG index. | 
| System | Index contains 42 documents | 
- gptme.tools.rag.get_rag_context(query: str, rag_config: RagConfig, workspace: Path | None = None) Message
- Get relevant context chunks from RAG for the user query. 
- gptme.tools.rag.init() ToolSpec
- Initialize the RAG tool. 
- gptme.tools.rag.rag_enhance_messages(messages: list[Message], workspace: Path | None = None) list[Message]
- Enhance messages with context from RAG. 
- gptme.tools.rag.rag_index(*paths: str, glob: str | None = None) str
- Index documents in specified paths. 
- gptme.tools.rag.rag_status() str
- Show index status. 
TTS#
Text-to-speech (TTS) tool for generating audio from text.
Uses Kokoro for local TTS generation.
Usage
# Install gptme with TTS extras
pipx install gptme[tts]
# Clone gptme repository
git clone https://github.com/gptme/gptme.git
cd gptme
# Run the Kokoro TTS server (needs uv installed)
./scripts/tts_server.py
# Start gptme (should detect the running TTS server)
gptme 'hello, testing tts'
Environment Variables
- GPTME_TTS_VOICE: Set the voice to use for TTS. Available voices depend on the TTS server.
- GPTME_VOICE_FINISH: If set to “true” or “1”, waits for speech to finish before exiting. This is useful when you want to ensure the full message is spoken.
- gptme.tools.tts.clean_for_speech(content: str) str
- Clean content for speech by removing: - <thinking> tags and their content 
- Tool use blocks ( - `tool ...`)
- Italic markup 
- Additional (details) that may not need to be spoken 
- Emojis and other non-speech content 
- Hash symbols from Markdown headers (e.g., “# Header” → “Header”) 
 - Returns the cleaned content suitable for speech. 
- gptme.tools.tts.ensure_tts_thread()
- Ensure TTS processor thread is running. 
- gptme.tools.tts.is_available() bool
- Check if the TTS server is available. 
- gptme.tools.tts.join_short_sentences(sentences: list[str], min_length: int = 100, max_length: int | None = 300) list[str]
- Join consecutive sentences that are shorter than min_length, or up to max_length. - Parameters:
- sentences – List of sentences to potentially join 
- min_length – Minimum length threshold for joining short sentences 
- max_length – Maximum length for combined sentences. If specified, tries to make sentences as long as possible up to this limit 
 
- Returns:
- List of sentences, with short ones combined or optimized for max length 
 
- gptme.tools.tts.set_speed(speed)
- Set the speaking speed (0.5 to 2.0, default 1.3). 
- gptme.tools.tts.set_volume(volume)
- Set the volume for TTS playback (0.0 to 1.0). 
- gptme.tools.tts.speak(text, block=False, interrupt=True, clean=True)
- Speak text using Kokoro TTS server. - The TTS system supports: - Speed control via set_speed(0.5 to 2.0) 
- Volume control via set_volume(0.0 to 1.0) 
- Automatic chunking of long texts 
- Non-blocking operation with optional blocking mode 
- Interruption of current speech 
- Background processing of TTS requests 
 - Parameters:
- text – Text to speak 
- block – If True, wait for audio to finish playing 
- interrupt – If True, stop current speech and clear queue before speaking 
- clean – If True, clean text for speech (remove markup, emojis, etc.) 
 
 - Example - >>> from gptme.tools.tts import speak, set_speed, set_volume >>> set_volume(0.8) # Set comfortable volume >>> set_speed(1.2) # Slightly faster speech >>> speak("Hello, world!") # Non-blocking by default >>> speak("Important message!", interrupt=True) # Interrupts previous speech 
- gptme.tools.tts.split_text(text: str) list[str]
- Split text into sentences, respecting paragraphs, markdown lists, and decimal numbers. - This function handles: - Paragraph breaks - Markdown list items ( - -,- *,- 1.) - Decimal numbers (won’t split 3.14) - Sentence boundaries (.!?)- Returns:
- List of sentences and paragraph breaks (empty strings) 
 
- gptme.tools.tts.stop() None
- Stop audio playback and clear queues. 
MCP#
The Model Context Protocol (MCP) allows you to extend gptme with custom tools through external servers. See MCP for configuration and usage details.
