Tools#
gptme’s tools enable AI agents to execute code, edit files, browse the web, process images, and interact with your computer.
Overview#
📁 File System#
💻 Code & Development#
Python - Execute Python code interactively with full library access
Shell - Run shell commands and manage system processes
GH - Interact with GitHub issues, PRs, and repositories
Precommit - Automatically run pre-commit checks after file saves
Autocommit - Automatically prompt for git commits after file modifications
🌐 Web & Research#
👁️ Visual & Interactive#
Vision - Analyze images, diagrams, and visual content
Screenshot - Capture your screen for visual context
Computer - Control desktop applications through visual interface
🤝 User Interaction#
⚡ Advanced Workflows#
Tmux - Manage long-running processes in terminal sessions
Subagent - Delegate subtasks to specialized agent instances
Complete - Signal that the autonomous session is finished
Restart - Restart the gptme process after configuration changes
Vent - Emit in-the-moment friction signals to a durable ledger
🧠 Knowledge & Planning#
🔌 Extensions#
MCP - Discover and connect Model Context Protocol servers
Combinations#
The real power emerges when tools work together:
Web Research + Code: Browser + Python - Browse documentation and implement solutions
Visual Development: Vision + Patch - Analyze UI mockups and update code accordingly
System Automation: Shell + Python - Combine system commands with data processing
Interactive Debugging: Screenshot + Computer - Visual debugging and interface automation
Knowledge-Driven Development: RAG + Chats - Learn from documentation and past conversations
Shell#
The assistant can execute shell commands with bash by outputting code blocks with shell as the language.
- Configuration:
- GPTME_SHELL_TIMEOUT: Environment variable to configure command timeout (set before starting gptme)
Set to a number (e.g., 30) for timeout in seconds
Set to 0 to disable timeout
Invalid values default to 1200 seconds (20 minutes)
If not set, defaults to 1200 seconds (20 minutes)
GPTME_SHELL_TRUNC_PRE_TOKENS / GPTME_SHELL_TRUNC_POST_TOKENS: Override the head/tail token budget for stdout truncation. Defaults: 2000 / 8000. GPTME_SHELL_TRUNC_STDERR_PRE_TOKENS / GPTME_SHELL_TRUNC_STDERR_POST_TOKENS: Same overrides for stderr. Defaults: 2000 / 2000. Lowering these makes the truncation path fire on smaller outputs, which surfaces savings telemetry in context-savings.jsonl and the /context command. Invalid values fall back to defaults.
Instructions
The given command will be executed in a stateful bash shell.
The shell tool will respond with the output of the execution.
These programs are available, among others:
- apt-get
- docker
- git
- hyperfine
- pandoc
### When to use the shell
Use the shell when you need to inspect the workspace, search or examine files,
check git state, or run existing commands and tests. Prefer the shell over
answering from memory when the repo can tell you the answer directly.
### Background Jobs
For long-running commands (dev servers, builds, etc.), use background jobs:
- `bg <command>` - Start command in background, returns job ID
- `jobs` - List all background jobs with status
- `output <id>` - Show accumulated output from a job
- `kill <id>` - Terminate a background job
This prevents blocking on commands like `npm run dev` that run indefinitely.
Examples
| User |
list the current directory |
| Assistant |
To list the files in the current directory, use `ls`: |
| System |
Ran command: `ls` |
The assistant can learn context by exploring the filesystem
| User |
learn about the project |
| Assistant |
Lets start by checking the files |
| System |
|
| Assistant |
Now lets check the README |
| System |
|
| Assistant |
Now we check main.py |
| System |
|
| Assistant |
The project is... |
Create vue project
| User |
Create a new vue project with typescript and pinia named fancy-project |
| Assistant |
Sure! Let's create a new vue project with TypeScript and Pinia named fancy-project: |
| System |
|
Proper quoting for complex content
| User |
add a comment with backticks and special characters |
| Assistant |
When passing complex content with special characters, use single quotes to prevent shell interpretation: |
Background jobs for long-running commands
| User |
start a dev server |
| Assistant |
I'll start the dev server as a background job so it doesn't block: |
| System |
Started background job **#1**: `npm run dev` > > Use these commands to manage it: > - `jobs` - List all background jobs > - `output 1` - Show output from job #1 > - `kill 1` - Terminate job #1 |
| User |
check the server output |
| Assistant |
Let me check the output from the background job: |
| System |
**Job #1** - Running (45.2s) Command: `npm run dev` > > ```stdout > > project@1.0.0 dev > > vite > > VITE v5.0.0 ready in 300 ms > > ➜ Local: http://localhost:5173/ > ``` |
| User |
stop the server |
| Assistant |
I'll terminate the background job: |
| System |
Terminated job #1: `npm run dev` |
- gptme.tools.shell.close_conversation_shell(conversation_id: str) None
Close and remove the shell session for a conversation.
Called by the SESSION_END hook to clean up shell file descriptors when a conversation’s last session is removed.
- gptme.tools.shell.execute_shell(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]
Executes a shell command and returns the output.
- gptme.tools.shell.execute_shell_impl(cmd: str, logdir: Path | None, timeout: float | None = None) Generator[Message, None, None]
Execute shell command and format output.
- gptme.tools.shell.get_shell() ShellSession
Get the shell session for the current context, creating it if necessary.
Uses ContextVar to provide context-local state, allowing each conversation to have its own shell session with independent working directory.
In server contexts (where current_conversation_id is set), also registers the shell in a conversation-level registry for cleanup via SESSION_END hooks.
- gptme.tools.shell.get_shell_command(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) str
Get the shell command from code/args/kwargs.
- gptme.tools.shell.get_workspace_cwd() str | None
Get the workspace directory for the current context, if set.
- gptme.tools.shell.set_shell(shell: ShellSession) None
Set the shell session for the current context (for testing).
- gptme.tools.shell.set_workspace_cwd(cwd: str) None
Set the workspace directory for the current context (thread-safe).
Call this before any shell creation to ensure the shell subprocess starts in the correct directory, even with concurrent sessions. This is the thread-safe replacement for os.chdir() in server contexts.
Python#
The assistant can execute Python code blocks.
It uses IPython to do so, and persists the IPython instance between calls to give a REPL-like experience.
Instructions
Use this tool to execute Python code in an interactive IPython session.
It responds with the execution output and final result.
### When to use the python tool
Use `python` for computation, structured data, and file-processing automation.
Prefer it over the shell for pure computation or when you need persistent
state or Python libraries.
Examples
Result of the last expression will be returned
| User |
What is 2 + 2? |
| Assistant |
|
| System |
Executed code block. |
Write a function and call it
| User |
compute fib 10 |
| Assistant |
To compute the 10th Fibonacci number, we can run the following code: |
| System |
Executed code block. |
- class gptme.tools.python.TeeIO
- __init__(original_stream)
- write(s)
Write string to file.
Returns the number of characters written, which is always equal to the length of the string.
- gptme.tools.python.execute_python(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]
Executes a python codeblock and returns the output.
- gptme.tools.python.get_installed_python_libraries() list[str]
Check if a select list of Python libraries are installed.
- gptme.tools.python.register_function(func: T) T
Decorator to register a function to be available in the IPython instance.
Tmux#
You can use the tmux tool to run long-lived and/or interactive applications in a tmux session. Requires tmux to be installed.
This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands: npm run dev, python3 server.py, python3 train.py, etc.
It allows for inspecting pane contents and sending input.
Instructions
You can use the tmux tool to run long-lived and/or interactive applications in a tmux session.
### When to use tmux
Use tmux for interactive applications requiring ongoing keyboard input or output
inspection: REPLs, TUIs, interactive installers, and persistent processes you
need to revisit. Prefer tmux over the shell's `bg` command when the application
requires send-keys interaction or repeated pane inspection. Use `wait
<session_id> [timeout] [stable_time]` when you need output to stabilize before
continuing.
Examples
Running subagents
| User |
start subagent to fix lints in parallel |
| Assistant |
Let's start a subagent in a new tmux session: |
Running specific agent
| User |
Ask Bob about his latest work |
| Assistant |
Sure! Let's start a tmux session running Bob (~/bob/): |
Managing a dev server
| User |
Start the dev server |
| Assistant |
Certainly! To start the dev server we should use tmux: |
| System |
Running `npm run dev` in session gptme_1 |
| User |
Can you show me the current content of the pane? |
| Assistant |
Of course! Let's inspect the pane content: |
| System |
|
| User |
Stop the dev server |
| Assistant |
I'll send 'Ctrl+C' to the pane to stop the server: |
| System |
Sent 'C-c' to pane 0 |
Get info from ncurses applications
| User |
start top and give me a summary |
| Assistant |
Sure! Let's start the top command in a tmux session: |
| System |
Running `top` in session gptme_1. |
| Assistant |
The load is... |
Send keys to a session
| User |
start ipython |
| Assistant |
Let's start an ipython session: |
| System |
Running `ipython` in session 2. |
| User |
Run 'print("Hello, world!")' in the ipython session
|
Listing active sessions
| User |
List all active tmux sessions |
| System |
Active tmux sessions ['0', 'gptme_1'] |
- gptme.tools.tmux.execute_tmux(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]
Executes a command in tmux and returns the output.
- gptme.tools.tmux.inspect_pane(pane_id: str, logdir: Path | None = None) Message
Inspect the content of a tmux pane.
- Parameters:
pane_id – The tmux pane ID to inspect
logdir – Optional directory to save full output if truncated
- Returns:
Message with pane content (truncated if too long)
- gptme.tools.tmux.wait_for_output(session_id: str, timeout: int = 60, stable_time: int = 3, logdir: Path | None = None) Message
Wait for command output to stabilize in a tmux session.
Monitors the pane output and waits until it remains unchanged for stable_time seconds, or until timeout is reached.
- Parameters:
session_id – The tmux session ID to monitor
timeout – Maximum time to wait in seconds (default: 60)
stable_time – Seconds of unchanged output to consider stable (default: 3)
- Returns:
Message with the final output and status
Subagent#
Subagent tool — spawn, monitor, and coordinate child agents.
Extracted from a single 1100-line module into a package for maintainability.
Package structure: - types.py — Data classes and module-level state (Subagent, ReturnType, etc.) - hooks.py — Completion notification system (LOOP_CONTINUE hook) - api.py — Public API (subagent, subagent_status, subagent_wait, etc.) - batch.py — Batch execution (BatchJob, subagent_batch) - execution.py — Execution backends (thread, subprocess, process monitoring)
Instructions
You can create, check status, wait for, and read logs from subagents.
Subagents support a "fire-and-forget-then-get-alerted" pattern:
- Call subagent() to start an async task (returns immediately)
- Continue with other work
- Receive completion messages via the LOOP_CONTINUE hook
- Optionally use subagent_wait() for explicit synchronization
Key features:
- Agent profiles: Use profile names as agent_id for automatic profile detection
- model="provider/model": Override parent's model (route cheap tasks to faster models)
- use_subprocess=True: Run subagent in subprocess for output isolation
- use_acp=True: Run subagent via ACP protocol (supports any ACP-compatible agent)
- acp_command="claude-code-acp": Use a different ACP agent (default: gptme-acp)
- isolated=True: Run subagent in a git worktree for filesystem isolation
- subagent_batch(): Start multiple subagents in parallel
- Hook-based notifications: Completions delivered as system messages
## Agent Profiles for Subagents
Use profiles to create specialized subagents with appropriate capabilities.
When agent_id matches a profile name, the profile is auto-applied:
- explorer: Read-only analysis (tools: read)
- researcher: Web research without file modification (tools: browser, read)
- developer: Full development capabilities (all tools)
- verifier: Critical review & validation (tools: read, shell, ipython, chats)
- isolated: Restricted processing for untrusted content (tools: read, ipython)
- computer-use: Visual UI testing specialist (tools: computer, vision, ipython, shell)
- browser-use: Web interaction and testing specialist (tools: browser, screenshot, vision, shell) — supports interactive browsing (open_page, click, fill, scroll) and one-shot reads
Example: `subagent("explorer", "Explore codebase")`
With model override: `subagent("researcher", "Find docs", model="openai/gpt-4o-mini")`
Computer-use example: `subagent("computer-use", "Click the Submit button, wait for the modal, and screenshot the result")`
Browser-use example: `subagent("browser-use", "Open localhost:5173, fill the chat input, click send, and report the result")`
Use subagent_read_log() to inspect a subagent's conversation log for debugging.
## Structured Delegation Template
For complex delegations, use this 7-section template for clear task handoff:
TASK: [What the subagent should do]
EXPECTED OUTCOME: [Specific deliverable - format, structure, quality bars]
REQUIRED SKILLS: [What capabilities the subagent needs]
REQUIRED TOOLS: [Specific tools the subagent should use]
MUST DO: [Non-negotiable requirements]
MUST NOT DO: [Explicit constraints and forbidden actions]
CONTEXT: [Background info, dependencies, related work]
Example prompt using the template:
'''
TASK: Implement the user authentication feature
EXPECTED OUTCOME: auth.py with login/logout endpoints, passing tests
REQUIRED SKILLS: Python, FastAPI, JWT tokens
REQUIRED TOOLS: save, shell (for pytest)
MUST DO: Use bcrypt for password hashing, return proper HTTP status codes
MUST NOT DO: Store plaintext passwords, skip input validation
CONTEXT: This is for the gptme server API, see existing endpoints in server.py
'''
Examples
Executor Mode (single task)
| User |
compute fib 13 using a subagent |
| Assistant |
Starting a subagent to compute the 13th Fibonacci number. |
| System |
Subagent started successfully. |
| Assistant |
Now we need to wait for the subagent to finish the task. |
| System |
{"status": "success", "result": "The 13th Fibonacci number is 233"}.
|
Planner Mode (multi-task delegation)
| User |
implement feature X with tests |
| Assistant |
I'll use planner mode to delegate implementation and testing to separate subagents. |
| System |
Planner spawned 2 executor subagents. |
| Assistant |
Now I'll wait for both subtasks to complete. |
| System |
{"status": "success", "result": "Implementation complete in feature_x.py"}.
|
| System |
{"status": "success", "result": "Tests complete in test_feature_x.py, all passing"}.
### Context Modes
|
Full Context (default)
| User |
analyze this codebase |
| Assistant |
I'll use full context mode for comprehensive analysis. |
Selective Context (choose specific components)
| User |
write tests using pytest |
| Assistant |
I'll use subprocess mode so selective context can include workspace files without inheriting the full parent context. |
Subprocess Mode (output isolation)
| User |
run a subagent without output mixing with parent |
| Assistant |
I'll use subprocess mode for better output isolation. |
| System |
Subagent started in subprocess mode. |
ACP Mode (multi-harness support)
| User |
delegate this task to a Claude Code agent |
| Assistant |
I'll use ACP mode to run this via a different agent harness. |
| System |
Started subagent "claude-task" in ACP mode. |
Batch Execution (parallel tasks)
| User |
implement, test, and document a feature in parallel |
| Assistant |
I'll use subagent_batch for parallel execution with fire-and-gather pattern. |
| System |
Started batch of 3 subagents: ['impl', 'test', 'docs'] impl: success test: success docs: success |
Fire-and-Forget with Hook Notifications
| User |
start a subagent and continue working |
| Assistant |
I'll spawn a subagent. Completion will be delivered via the LOOP_CONTINUE hook. |
| System |
Started subagent "compute-demo" |
| System |
✅ Subagent 'compute-demo' completed: pi = 3.14159265358979... |
Profile-Based Subagents (auto-detected from agent_id)
| User |
explore this codebase and summarize the architecture |
| Assistant |
I'll use the explorer profile for a read-only analysis. |
| System |
Subagent started successfully. |
Profile with Model Override
| User |
research best practices for error handling |
| Assistant |
I'll spawn a researcher subagent with a faster model for web research. |
| System |
Subagent started successfully. |
Structured Delegation Template
| User |
implement a robust auth feature |
| Assistant |
I'll use the structured delegation template for clear task handoff. |
| System |
Subagent started successfully. |
Isolated Subagent (Worktree)
| User |
implement a feature without affecting my working directory |
| Assistant |
I'll run the subagent in an isolated git worktree so it won't modify your files. |
| System |
Subagent started successfully. |
- class gptme.tools.subagent.BatchJob
Manages a batch of subagents for parallel execution.
Note: With the hook-based notification system, the orchestrator will receive completion messages automatically via the LOOP_CONTINUE hook. This class provides additional utilities for explicit synchronization when needed.
- __init__(agent_ids: list[str], results: dict[str, ~gptme.tools.subagent.types.ReturnType] = <factory>, _lock: ~_thread.allocate_lock = <factory>) None
- is_complete() bool
Check if all subagents have completed.
- class gptme.tools.subagent.ReturnType
ReturnType(status: Literal[‘running’, ‘success’, ‘failure’], result: str | None = None)
- class gptme.tools.subagent.Subagent
Represents a running or completed subagent.
Supports both thread-based (default) and subprocess-based execution modes. Subprocess mode provides better output isolation.
- Communication Model (Phase 1):
One-way: Parent sends prompt, child executes independently
No runtime updates from child to parent
Results retrieved after completion via status()/subagent_wait()
- Future (Phase 2/3):
Support for progress notifications from child → parent
Clarification requests when child encounters ambiguity
See module docstring for full design intent
- __init__(agent_id: str, prompt: str, thread: Thread | None, logdir: Path, model: str | None, output_schema: type | None = None, process: Popen | None = None, execution_mode: Literal['thread', 'subprocess', 'acp'] = 'thread', acp_command: str | None = None, isolated: bool = False, worktree_path: Path | None = None, repo_path: Path | None = None, timeout: int = 1800) None
- is_running() bool
Check if the subagent is still running.
- class gptme.tools.subagent.SubtaskDef
Definition of a subtask for planner mode.
- gptme.tools.subagent.notify_completion(agent_id: str, status: Literal['running', 'success', 'failure'], summary: str) None
Add a subagent completion to the notification queue.
Called by the monitor thread when a subagent finishes. The queued notification will be delivered via the subagent_completion hook during the next LOOP_CONTINUE cycle.
- Parameters:
agent_id – The subagent’s identifier
status – “success” or “failure”
summary – Brief summary of the result
- gptme.tools.subagent.subagent(agent_id: str, prompt: str, mode: Literal['executor', 'planner'] = 'executor', subtasks: list[SubtaskDef] | None = None, execution_mode: Literal['parallel', 'sequential'] = 'parallel', context_mode: Literal['full', 'selective'] = 'full', context_include: list[str] | None = None, output_schema: type | None = None, use_subprocess: bool | None = None, use_acp: bool = False, acp_command: str = 'gptme-acp', profile: str | None = None, model: str | None = None, isolated: bool | None = None, timeout: int = 1800, role: Literal['general', 'explore', 'implement', 'verify'] | None = None)
Starts an asynchronous subagent. Returns None immediately.
Subagent completions are delivered via the LOOP_CONTINUE hook, enabling a “fire-and-forget-then-get-alerted” pattern where the orchestrator can continue working and get notified when subagents finish.
Profile auto-detection: If
agent_idmatches a known profile name (e.g. “explorer”, “researcher”, “developer”, “verifier”) or a common role alias (“explore”→”explorer”, “research”→”researcher”, “impl”/”dev”→”developer”, “verify”→”verifier”), the profile is applied automatically — no need to passprofileseparately.Role-based defaults (
roleparameter):"explore": Defaults profile toexplorer(read-only analysis)"implement": Defaults profile todeveloper(full capability)"verify": Defaults profile toverifierplususe_subprocess=Trueandisolated=True(read-only validation in isolation)
Explicit arguments override role defaults.
- Parameters:
agent_id – Unique identifier for the subagent. If it matches a known profile name (or a common alias like
impl/dev), that profile is auto-applied (unlessprofileis explicitly set to something else).prompt – Task prompt for the subagent (used as context for planner mode)
mode – “executor” for single task, “planner” for delegating to multiple executors
subtasks – List of subtask definitions for planner mode (required when mode=”planner”)
execution_mode – “parallel” (default) runs all subtasks concurrently, “sequential” runs subtasks one after another. Only applies to planner mode.
context_mode – Controls what context is shared with the subagent: - “full” (default): Share complete context (agent identity, tools, workspace) - “selective”: Share only specified context components (requires context_include)
context_include – For selective mode, list of context components to include: - Thread mode supports “agent” and “tools” - Subprocess mode also supports “workspace”, which maps to the CLI’s “files” context Legacy subprocess values like “files”, “cmd”, and “all” are still accepted.
use_subprocess – If True, run subagent in subprocess for output isolation. Subprocess mode captures stdout/stderr separately from the parent.
use_acp – If True, run subagent via ACP (Agent Client Protocol). This enables multi-harness support — the subagent can be any ACP-compatible agent (gptme, Claude Code, Cursor, etc.). Requires the
acppackage: pip install ‘gptme[acp]’.acp_command – ACP agent command to invoke (default: “gptme-acp”). Only used when use_acp=True. Can be any ACP-compatible CLI.
profile – Agent profile name to apply. Profiles provide: - System prompt customization (behavioral hints) - Tool access restrictions (which tools the subagent can use) - Behavior rules (read-only, no-network, etc.) Use ‘gptme-util profile list’ to see available profiles. Built-in profiles: default, explorer, researcher, developer, verifier, isolated, computer-use, browser-use. If not set, auto-detected from agent_id when it matches a profile name.
model – Model to use for the subagent. Overrides parent’s model. Useful for routing cheap tasks to faster/cheaper models.
isolated – If True, run the subagent in a git worktree for filesystem isolation. The subagent gets its own copy of the repository and can modify files without affecting the parent. The worktree is automatically cleaned up after the subagent completes. Falls back to a temporary directory if not in a git repo.
timeout – Maximum seconds before the subprocess monitor kills the subagent (default 1800 = 30 min). Only applies to subprocess mode.
- Returns:
- Starts asynchronous execution.
In executor mode, starts a single task execution. In planner mode, starts execution of all subtasks using the specified execution_mode.
Executors use the complete tool to signal completion with a summary. The full conversation log is available at the logdir path.
- Return type:
None
- gptme.tools.subagent.subagent_batch(tasks: list[tuple[str, str]], use_subprocess: bool = False, use_acp: bool = False, acp_command: str = 'gptme-acp') BatchJob
Start multiple subagents in parallel and return a BatchJob to manage them.
This is a convenience function for fire-and-gather patterns where you want to run multiple independent tasks concurrently.
With the hook-based notification system, completion messages are delivered automatically via the LOOP_CONTINUE hook. The BatchJob provides additional utilities for explicit synchronization when needed.
- Parameters:
tasks – List of (agent_id, prompt) tuples
use_subprocess – If True, run subagents in subprocesses for output isolation
use_acp – If True, run subagents via ACP protocol
acp_command – ACP agent command (default: “gptme-acp”)
- Returns:
A BatchJob instance for managing the parallel subagents. The BatchJob provides wait_all(timeout) to wait for completion, is_complete() to check status, and get_completed() for partial results.
Example:
job = subagent_batch([ ("impl", "Implement feature X"), ("test", "Write tests for feature X"), ("docs", "Document feature X"), ]) # Orchestrator continues with other work... # Completion messages delivered via LOOP_CONTINUE hook: # "✅ Subagent 'impl' completed: Feature implemented" # "✅ Subagent 'test' completed: 5 tests added" # # Or explicitly wait for all if needed: results = job.wait_all(timeout=300)
- gptme.tools.subagent.subagent_read_log(agent_id: str, max_messages: int = 50, include_system: bool = False, message_filter: str | None = None) str
Read the conversation log of a subagent.
- Parameters:
agent_id – The subagent to read logs from
max_messages – Maximum number of messages to return
include_system – Whether to include system messages
message_filter – Filter messages by role (user/assistant/system) or None for all
- Returns:
Formatted log output showing the conversation
Read#
Read the contents of one or more files, or list the contents of a directory.
Provides a sandboxed file reading capability that works without shell access.
Useful for restricted tool sets (e.g., --tools read,patch,save).
Multiple paths can be passed in the code block (one per line) to read several files in a single tool call, reducing roundtrips when exploring a codebase.
Instructions
Read the content of one or more files, or list the contents of a directory.
Paths can be relative or absolute.
For files, output includes line numbers for easy reference.
For directories, output shows a flat listing of immediate files and subdirectories.
### When to use read
Reading a file directly gives you its exact, current content with line numbers —
eliminating guesswork from memory, file names, or comments. Prefer `read` over
those shortcuts when the file itself is the source of truth.
To read multiple files in a single call, put one path per line in the code block.
Lines beginning with '#' are treated as comments and skipped.
The line-range parameters (start_line, end_line) only apply when reading a single file.
Examples
| User |
read hello.py |
| Assistant |
|
| System |
```hello.py
> 1 print("Hello world")
> 2 print("Goodbye world")
> ```
|
| User |
read both source files |
| Assistant |
|
| System |
```hello.py
> 1 print("Hello world")
> ```
> ```goodbye.py
> 1 print("Goodbye world")
> ```
|
Save#
Gives the assistant the ability to save whole files, or append to them.
Instructions
Create or overwrite a file with the given content.
The path can be relative to the current directory, or absolute.
If the current directory changes, the path will be relative to the new directory.
### When to use save vs patch
Use `save` for new files, full rewrites, or edits that touch most of a file.
Use `patch` for targeted edits to existing files; it keeps surrounding content intact.
Examples
| User |
write a hello world script to hello.py |
| Assistant |
|
| System |
Saved to `hello.py` |
| User |
make it all-caps |
| Assistant |
|
| System |
Saved to `hello.py` |
Instructions
Append the given content to a file.
Examples
| User |
append a print "Hello world" to hello.py |
| Assistant |
|
| System |
Appended to `hello.py` |
- gptme.tools.save.check_for_placeholders(content: str) bool
Check if content contains placeholder lines.
- gptme.tools.save.execute_append(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]
Append code to a file.
- gptme.tools.save.execute_append_impl(content: str, path: Path | None) Generator[Message, None, None]
Actual append implementation.
- gptme.tools.save.execute_save(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]
Save code to a file.
- gptme.tools.save.execute_save_impl(content: str, path: Path | None) Generator[Message, None, None]
Actual save implementation.
Patch#
Gives the LLM agent the ability to patch text files, by using a adapted version git conflict markers.
- Environment Variables:
- GPTME_PATCH_RECOVERY: If set to “true” or “1”, returns the file content in error messages
when patches don’t match. This helps the assistant recover faster by seeing the actual file contents.
Instructions
To patch/modify files, we use an adapted version of git conflict markers.
Multiple ORIGINAL/UPDATED blocks can make several changes in one patch.
Keep patches small. Scope each change to a function/class. Avoid placeholders
in ORIGINAL blocks; they must match the file exactly or the patch will fail.
### When to use patch vs save
Use `patch` for targeted edits to existing files.
Use `save` for new files, full rewrites, or changes too large for patch markers.
Note: When patching markdown, avoid replacing partial codeblocks (just the opening
or closing backticks). The parser needs complete codeblocks. For simple
codeblock-boundary changes (like a language tag), use `sed` or `perl` instead.
Examples
| User |
patch `src/hello.py` to ask for the name of the user |
| Assistant |
|
| System |
Patch applied |
- class gptme.tools.patch.Patch
Patch(original: str, updated: str)
- diff_minimal(strip_context=False) str
Show a minimal diff of the patch. Note that a minimal diff isn’t necessarily a unique diff.
- gptme.tools.patch.apply(codeblock: str, content: str) str
Applies multiple patches in
codeblocktocontent. Provides detailed error messages when patches fail.
- gptme.tools.patch.execute_patch(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]
Applies the patch.
Vision#
Tools for viewing images, giving the assistant vision.
Requires a model which supports vision, such as GPT-4o, Anthropic, and Llama 3.2.
Screenshot#
A simple screenshot tool, using screencapture on macOS and scrot or gnome-screenshot on Linux.
Browser#
- Tools to let the assistant control a browser, including:
loading pages
reading their contents
searching the web
taking screenshots (Playwright only)
getting ARIA accessibility snapshots (Playwright only)
interactive browsing: click, fill forms, scroll (Playwright only)
reading PDFs (with page limits and vision fallback hints)
converting PDFs to images (using pdftoppm, ImageMagick, or vips)
Two backends are available:
- Playwright backend:
Full browser automation with screenshots
Installation:
pipx install 'gptme[browser]' # We need to use the same version of Playwright as the one installed by gptme # when downloading the browser binaries. gptme will attempt this automatically PW_VERSION=$(pipx runpip gptme show playwright | grep Version | cut -d' ' -f2) pipx run playwright==$PW_VERSION install chromium-headless-shell
- Lynx backend:
Text-only browser for basic page reading and searching
No screenshot support
Installation:
# On Ubuntu sudo apt install lynx # On macOS brew install lynx # or any other way that gets you the `lynx` command
- Provider Native Search:
When using Anthropic Claude models, native web search can be enabled
This uses Anthropic’s built-in web search instead of web scraping
More reliable than Google/DuckDuckGo scraping (which is blocked by bot detection)
Configuration:
export GPTME_ANTHROPIC_WEB_SEARCH=true export GPTME_ANTHROPIC_WEB_SEARCH_MAX_USES=5 # Optional, default is 5
Note
This is an experimental feature. It needs some work to be more robust and useful.
Instructions
### When to use browser
Use browser to fetch live web content, search the web, interact with pages, or
take screenshots. Prefer browser over memory for current information. For
services with APIs, prefer shell or Python over scraping.
Examples
Reading docs
| User |
how does gptme work? |
| Assistant |
Let's read the docs. |
Answer question from URL with browsing
| User |
find out which is the latest ActivityWatch version from superuserlabs.org |
| Assistant |
Let's browse the site. |
| System |
|
| Assistant |
Couldn't find the answer on the page. Following link to the ActivityWatch website. |
| System |
|
| Assistant |
The latest version of ActivityWatch is v0.12.2 |
Searching
| User |
who is the founder of ActivityWatch? |
| Assistant |
Let's search for that. |
| System |
|
| Assistant |
The founder of ActivityWatch is Erik Bjäreholt. |
Searching for latest information
| User |
what are the latest developments in AI? |
| Assistant |
Let me search for that. |
| System |
|
| Assistant |
Based on the search results, here are the latest AI developments... |
Take screenshot of page
| User |
take a screenshot of the ActivityWatch website |
| Assistant |
Certainly! I'll use the browser tool to screenshot the ActivityWatch website. |
| System |
|
Get ARIA snapshot to see interactive elements
| User |
what interactive elements are on example.com? |
| Assistant |
Let me get the accessibility snapshot of the page. |
| System |
|
| Assistant |
The page has a heading "Example Domain", a paragraph with description text, and a link "More information...". |
Interactive browsing: open page, click, fill
| User |
search for gptme on example.com's search form |
| Assistant |
I'll open the page, fill the search form, and click submit. |
| System |
|
| Assistant |
I can see a search box and button. Let me fill in the search and click Go. |
| System |
|
| System |
|
| Assistant |
The search was submitted and the page now shows results for "gptme". |
Read full text content of interactive page
| User |
what does the article say? |
| Assistant |
Let me read the full text content of the current page. |
| System |
|
| Assistant |
The article covers [summary of content]. |
Read URL and check browser logs
| User |
read this page and check if there are any console errors |
| Assistant |
I'll read the page first and then check the browser logs. |
| System |
|
| Assistant |
Now let me check the browser console logs: |
| System |
|
Read PDF document
| User |
read this research paper from arxiv |
| Assistant |
I'll read the PDF and extract its text content. |
| System |
|
| Assistant |
I've extracted the text from the PDF. The paper discusses [summary of key points]... |
- gptme.tools.browser.click_element(selector: str) str
Click an element on the current page and return updated ARIA snapshot.
Requires open_page() to be called first.
- Parameters:
selector – Playwright selector to find the element. Supports: - CSS: “#submit-btn”, “.nav-link”, “button” - Text: “text=Submit”, “text=Log in” - Role: “role=button[name=’Submit’]” - Chained: “form >> text=Submit”
- gptme.tools.browser.close_page() str
Close the current interactive browsing page.
Frees browser resources. A new page can be opened with open_page().
- gptme.tools.browser.fill_element(selector: str, value: str) str
Fill a form field on the current page and return updated ARIA snapshot.
Requires open_page() to be called first. Clears any existing value before filling.
- Parameters:
selector – Playwright selector for the input/textarea element.
value – Text to fill into the field.
- gptme.tools.browser.has_lynx() bool
Check if lynx is available.
- gptme.tools.browser.has_playwright() bool
Check if playwright is available.
- gptme.tools.browser.open_page(url: str) str
Open a page for interactive browsing. Returns ARIA accessibility snapshot.
Use this instead of read_url() when you need to interact with the page (click buttons, fill forms, scroll). The page stays open for subsequent click_element(), fill_element(), and scroll_page() calls.
The output includes a metadata header with the page title and current URL.
- gptme.tools.browser.pdf_to_images(url_or_path: str, output_dir: str | Path | None = None, pages: tuple[int, int] | None = None, dpi: int = 150) list[Path]
Convert PDF pages to images using auto-detected CLI tools.
Auto-detects and uses the first available tool: pdftoppm, ImageMagick convert, or vips.
- Parameters:
url_or_path – URL or local path to PDF file
output_dir – Directory to save images (default: creates temp directory)
pages – Optional tuple of (first_page, last_page) to convert (1-indexed). If None, converts all pages.
dpi – Resolution for output images (default: 150)
- Returns:
List of paths to generated PNG images
- Raises:
RuntimeError – If no PDF-to-image tools are available
subprocess.CalledProcessError – If conversion fails
Example
>>> images = pdf_to_images("https://example.com/doc.pdf") >>> for img in images: ... view_image(img) # Analyze with vision tool
- gptme.tools.browser.read_logs() str
Read browser console logs from the last read URL.
- gptme.tools.browser.read_page_text() str
Read the full text content of the current interactive page as Markdown.
Requires open_page() to be called first. Returns the page body converted to Markdown, preserving text formatting. Useful for reading article text, documentation, or other content after navigating to a page.
Unlike read_url(), this reads from the current interactive session — so it reflects the page state after any clicks, form fills, or navigation.
- gptme.tools.browser.read_url(url: str, max_pages: int | None = None) str
Read a webpage or PDF in a text format.
- Parameters:
url – URL to read
max_pages – For PDFs only - maximum pages to read (default: 10). Set to 0 to read all pages. Ignored for web pages.
- gptme.tools.browser.screenshot_url(url: str, path: Path | str | None = None) Path
Take a screenshot of a webpage.
- gptme.tools.browser.scroll_page(direction: str = 'down', amount: int = 500) str
Scroll the current page and return updated ARIA snapshot.
Requires open_page() to be called first.
- Parameters:
direction – “up” or “down” (default: “down”)
amount – Pixels to scroll (default: 500)
- gptme.tools.browser.search(query: str, engine: Literal['google', 'duckduckgo', 'perplexity'] | None = None) str
Search for a query on a search engine.
If no engine is specified, automatically chooses the best available backend and falls back to the next usable backend on failure.
- gptme.tools.browser.search_playwright(query: str, engine: Literal['google', 'duckduckgo', 'perplexity'] = 'google') str
Search for a query on a search engine using Playwright.
- gptme.tools.browser.snapshot_url(url: str) str
Get the ARIA accessibility snapshot of a webpage.
Returns a structured text representation of the page’s accessibility tree, showing interactive elements (buttons, links, inputs) with their roles and names. Useful for understanding page structure and finding elements to interact with.
The output includes a metadata header with the page title and current URL (which may differ from the requested URL after redirects).
Chats#
List, search, and summarize past conversation logs.
Instructions
### When to use chats
Use chats when the user asks about or wants to reference a past conversation:
- "remember when we discussed X?" → search_chats('X')
- "find our earlier chat about Y" → search_chats('Y')
- "what did we say about Z last week?" → search_chats('Z')
- Listing recent sessions to give the user an overview → list_chats()
- Reading a specific prior conversation by ID → read_chat(id)
Do **not** use chats for:
- The current conversation — its content is already in the context window.
- Searching files or code — use the shell or read tool instead.
- Web or documentation search — use the browser tool.
Examples
Search for a specific topic in past conversations
| User |
Can you find any mentions of "python" in our past conversations? |
| Assistant |
Certainly! I'll search our past conversations for mentions of "python" using the search_chats function. |
- gptme.tools.chats.conversation_stats(since: str | None = None, as_json: bool = False) None
Show statistics about conversation history.
- Parameters:
since – Only include conversations since this date (YYYY-MM-DD or Nd).
as_json – Output as JSON instead of formatted text.
- gptme.tools.chats.find_empty_conversations(max_messages: int = 1, include_test: bool = False) list[dict]
Find conversations with few or no messages.
Scans all conversations and returns those with at most max_messages messages. Useful for cleaning up abandoned or empty conversation logs.
- Parameters:
max_messages – Maximum message count to consider “empty” (default: 1, system-only).
include_test – Whether to include test/eval conversations.
- Returns:
List of dicts with conversation metadata and disk size.
- gptme.tools.chats.list_chats(max_results: int = 5, metadata=False, include_summary: bool = False) None
List recent chat conversations and optionally summarize them using an LLM.
- gptme.tools.chats.read_chat(id: str, max_results: int = 5, incl_system: bool = False, context_messages: int = 0, start_message: int | None = None) None
Read a specific conversation log.
- Parameters:
id (str) – The id of the conversation to read.
max_results (int) – Maximum number of messages to display.
incl_system (bool) – Whether to include system messages.
context_messages (int) – Number of messages to show before start_message.
start_message (int | None) – Start from this message number (1-indexed), if specified.
- gptme.tools.chats.search_chats(query: str, max_results: int = 5, system=False, sort: Literal['date', 'count'] = 'date', context_size: int = 50, max_matches: int = 1) None
Search past conversation logs for the given query and print a summary of the results.
- Parameters:
query (str) – The search query.
max_results (int) – Maximum number of conversations to display.
system (bool) – Whether to include system messages in the search.
context_size (int) – Number of characters to show around each match.
max_matches (int) – Maximum number of matches to show per conversation.
Computer#
Warning
The computer use interface is experimental and has serious security implications. Please use with caution and see Anthropic’s documentation on computer use for additional guidance.
Tool for computer interaction for X11 or macOS environments, including screen capture, keyboard, and mouse control.
The computer tool provides direct interaction with the desktop environment. Similar to Anthropic’s computer use demo, but integrated with gptme’s architecture.
Features
Keyboard input simulation
Mouse control (movement, clicks, dragging)
Screen capture with automatic scaling
Cursor position tracking
Installation
On Linux, requires X11 and xdotool:
# On Debian/Ubuntu
sudo apt install xdotool
# On Arch Linux
sudo pacman -S xdotool
On macOS, uses native screencapture and external tool cliclick:
brew install cliclick
You need to give your terminal both screen recording and accessibility permissions in System Preferences.
Configuration
The tool uses these environment variables:
DISPLAY: X11 display to use (default: “:1”, Linux only)
WIDTH: Screen width (default: 1024)
HEIGHT: Screen height (default: 768)
Usage
The tool supports these actions:
- Keyboard:
key: Send key sequence (e.g., “Return”, “Control_L+c”)
type: Type text with realistic delays
- Mouse:
mouse_move: Move mouse to coordinates
left_click: Click left mouse button
right_click: Click right mouse button
middle_click: Click middle mouse button
double_click: Double click left mouse button
left_click_drag: Click and drag to coordinates
- Screen:
screenshot: Take and view a screenshot
cursor_position: Get current mouse position
The tool automatically handles screen resolution scaling to ensure optimal performance with LLM vision capabilities.
Tips for Complex Operations
For complex operations involving multiple keypresses, you can use semicolon-separated sequences with key:
Examples
Filling a login form:
t:username;kp:tab;t:password;kp:returnSwitching applications:
cmd+tabon macOS,alt+Tabon Linux(macOS) Opening Spotlight and searching:
cmd+space;t:firefox;return
Using a single sequence for complex operations ensures proper timing and recognition of keyboard shortcuts.
Instructions
You can interact with the computer through the `computer` Python function.
Works on both Linux (X11) and macOS.
### When to use the computer tool
Use computer for GUI interactions that cannot be done through the shell: clicking
elements in running applications, typing into GUI windows, taking screenshots to
verify visual state, and keyboard shortcuts in desktop apps. Prefer the shell or
tmux over computer for anything that has a CLI equivalent. Use computer when the
task requires direct screen interaction — for example, operating a browser UI,
a desktop app, or an interactive installer that has no headless mode.
The key input syntax works consistently across platforms with:
Available actions:
- key: Send key sequence using a unified syntax:
- Type text: "t:Hello World"
- Press key: "return", "esc", "tab"
- Key combination: "ctrl+c", "cmd+space"
- Chain commands: "cmd+space;t:firefox;return"
- type: Type text with realistic delays (legacy method)
- mouse_move: Move mouse to coordinates
- left_click, right_click, middle_click, double_click: Mouse clicks
- left_click_drag: Click and drag to coordinates
- screenshot: Take and view a screenshot
- cursor_position: Get current mouse position
Note: Key names are automatically mapped between platforms.
Common modifiers (ctrl, alt, cmd/super, shift) work consistently across platforms.
Examples
| User |
Take a screenshot of the desktop |
| Assistant |
I'll capture the screen using the screenshot tool. |
| System |
Viewing image... |
| User |
Type "Hello, World!" into the active window |
| Assistant |
I'll type the text with realistic delays. |
| System |
Typed text: Hello, World! |
| User |
Move the mouse to coordinates (100, 200) and click |
| Assistant |
I'll move the mouse and perform a left click. |
| System |
Moved mouse to 100,200 |
| System |
Performed left_click |
| User |
Get the current mouse position |
| Assistant |
I'll get the cursor position. |
| System |
Cursor position: X=512,Y=384 |
| User |
Double-click at current position |
| Assistant |
I'll perform a double-click. |
| System |
Performed double_click |
| User |
Open a new browser tab |
| Assistant |
I'll open a new browser tab. |
| System |
Sent key sequence: ctrl+t |
- class gptme.tools.computer.ComboOperation
- class gptme.tools.computer.KeyOperation
- class gptme.tools.computer.TextOperation
- gptme.tools.computer.computer(action: Action, text: str | None = None, coordinate: tuple[int, int] | None = None) Message | None
Perform computer interactions in X11 or macOS environments.
- Parameters:
action – The type of action to perform
text – Text to type or key sequence to send
coordinate – X,Y coordinates for mouse actions
RAG#
RAG (Retrieval-Augmented Generation) tool for context-aware assistance.
The RAG tool provides context-aware assistance by indexing and semantically searching text files.
Installation
The RAG tool requires the gptme-rag CLI to be installed:
pipx install gptme-rag
Configuration
Configure RAG in your gptme.toml:
[rag]
enabled = true
post_process = false # Whether to post-process the context with an LLM to extract the most relevant information
post_process_model = "openai/gpt-4o-mini" # Which model to use for post-processing
post_process_prompt = "" # Optional prompt to use for post-processing (overrides default prompt)
workspace_only = true # Whether to only search in the workspace directory, or the whole RAG index
paths = [] # List of paths to include in the RAG index. Has no effect if workspace_only is true.
Features
Manual Search and Indexing
Index project documentation with
rag_indexSearch indexed documents with
rag_searchCheck index status with
rag_status
Automatic Context Enhancement
Retrieves semantically similar documents
Preserves conversation flow with hidden context messages
Instructions
### When to use RAG
Use RAG for semantic search across indexed documents when you do not know the
exact file location or keyword. Prefer `shell` with grep/ripgrep for exact
string or pattern matching. Use `read` when you already know the file path.
Index first with `rag_index`, then search with `rag_search`.
Examples
| User |
Index the current directory |
| Assistant |
Let me index the current directory with RAG. |
| System |
Indexed 1 paths |
| User |
Search for documentation about functions |
| Assistant |
I'll search for function-related documentation. |
| System |
### docs/api.md Functions are documented using docstrings... |
| User |
Show index status |
| Assistant |
I'll check the current status of the RAG index. |
| System |
Index contains 42 documents |
- gptme.tools.rag.get_rag_context(query: str, rag_config: RagConfig, workspace: Path | None = None) Message
Get relevant context chunks from RAG for the user query.
- gptme.tools.rag.init() ToolSpec
Initialize the RAG tool.
- gptme.tools.rag.rag_index(*paths: str, glob: str | None = None) str
Index documents in specified paths.
- gptme.tools.rag.rag_status() str
Show index status.
Morph#
Gives the LLM agent the ability to edit files using Morph Fast Apply v2.
Morph is a specialized code-patching LLM that applies edits at 4000+ tokens per second. It uses a different format than the patch tool: <code>original</code><update>changes</update>
- Environment Variables:
OPENROUTER_API_KEY: Required for accessing Morph via OpenRouter
Instructions
Use this tool to propose an edit to an existing file.
### When to use morph vs patch
Use morph for large or complex edits where the changed lines are scattered
across a file and patch context markers would be verbose. Prefer patch for
small, targeted edits with clear context. Morph requires OPENROUTER_API_KEY.
Write a clear edit while minimizing unchanged code.
List each edit in sequence, using `// ... existing code ...` for untouched spans.
Repeat only enough original context to disambiguate the change.
If you delete a section, include surrounding context to show what is removed.
Examples
```morph example.py
// ... existing code ...
FIRST_EDIT
// ... existing code ...
SECOND_EDIT
// ... existing code ...
THIRD_EDIT
// ... existing code ...
```
- gptme.tools.morph.execute_morph(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]
Applies the morph edit.
- gptme.tools.morph.execute_morph_impl(content: str, path: Path | None, expected_original_content: str) Generator[Message, None, None]
Actual morph implementation - writes the edited content to file.
- gptme.tools.morph.is_openrouter_available() bool
Check if OpenRouter is available for Morph tool.
GH#
GitHub integration tool.
Use native handlers only when they help the assistant succeed more reliably
than a raw gh command in the shell tool. The native path is worth it when
it collapses several API calls into one response, keeps CI state structured and
actionable, or adds merge safety guards that are easy to miss in ad-hoc CLI use.
Native operations that materially help:
issue view— combines issue body and comments in one callpr view— combines PR body, comments, review-thread resolution, CI,and mergeability in one call
pr status— structured check-run summary with actionable run IDspr checks— polls CI until completion with live progress updatespr merge— squash default,--match-head-commitguard, auto-mergerun view— extracts and structures failed log sections from CI runs
Adding a new native wrapper#
Before wrapping a gh subcommand, ask: “Will this help the assistant do
better than a single gh command in the shell tool?” If not, don’t add it
— the pass-through already covers it without bloating instructions.
Good candidates combine multiple API calls into one response, add safety guards, or poll/wait for completion.
- gptme.tools.gh.execute_gh(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]
Execute GitHub operations.
Native handlers for high-value operations (issue view, pr view/status/checks/merge, run view). Everything else passes through to the gh CLI unchanged.
Choice#
Gives the assistant the ability to present multiple-choice options to the user for selection.
Instructions
The options can be provided as a question on the first line and each option on a separate line.
When using the ``options`` keyword argument, options may also be comma-separated.
The tool will present an interactive menu allowing the user to select an option using arrow keys and Enter, or by typing the number of the option.
### When to use choice
Use when you need to present the user with a discrete set of named alternatives and free-text input would be ambiguous. Don't use for simple yes/no confirmations; don't use when the next step is already clear from context.
Examples
Basic usage with options
| User |
What should we do next? |
| Assistant |
Let me present you with some options: |
| System |
User selected: Add new features |
| User |
What should we do next? |
| Assistant |
Let me present you with some options: |
| System |
User selected: Option two |
- gptme.tools.choice.execute_choice(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]
Present multiple-choice options to the user and return their selection.
Elicit#
Gives the assistant the ability to request structured input from the user.
Elicitation supports multiple input types:
- text: Free-form text input
- choice: Single selection from options
- multi_choice: Multiple selections from options
- secret: Hidden input (API keys, passwords) with UI redaction
- confirmation: Yes/No question
- form: Multiple fields collected at once
Secret values are handled specially: the value is hidden from the chat display
(hide=True) so it does not appear on screen, but it is passed to the LLM
in-context so the agent can act on it (e.g. set it as an env var). The value
is stored in the on-disk conversation log.
Instructions
### When to use elicit
Use elicit when you need structured user input that a plain text reply cannot
cleanly provide:
- **secret** — API keys, passwords, tokens. The value is hidden from the chat
display so it does not appear over someone's shoulder; the LLM still receives
it in-context so it can act on it (e.g. export as an env var or pass to a
command). The conversation log on disk will contain the value.
- **choice / multi_choice** — present a fixed set of options so the user
selects rather than types a free-form answer that you then have to parse.
- **confirmation** — ask yes/no before a destructive or irreversible action.
- **form** — collect several related fields in one interaction instead of a
back-and-forth sequence.
Do **not** use elicit for simple open-ended questions that read naturally in
chat — a plain assistant message is clearer and less disruptive in those cases.
### Input types
- text: Free-form text input
- choice: Single selection from a list (specify options)
- multi_choice: Multiple selections from a list (specify options)
- secret: Hidden from display; LLM receives the value in-context to act on it
- confirmation: Yes/No question
- form: Multiple fields at once (specify JSON field definitions)
Examples
Ask for a secret API key
| User |
Set up the OpenAI integration |
| Assistant |
I need your OpenAI API key to proceed. It will be hidden from normal chat display. |
| System |
User provided secret value (not shown) |
Ask user to choose an option
| User |
Which database should we use? |
| Assistant |
Let me ask the user their preference. |
| System |
User selected: PostgreSQL |
Collect project setup information via form
| User |
Set up a new project |
| Assistant |
Let me gather some details about the project. |
| System |
Form submitted: {"name": "my-project", "language": "python", "tests": true}
|
Form#
Gives the assistant the ability to present a form with multiple fields for user input.
This tool enables structured data collection from users through an interactive form with support for different field types: text, select, boolean, and number.
Instructions
### When to use the form tool
Use the form tool when you need to collect multiple related fields from the
user in a single interaction. It is well suited for structured data collection
where field types are text, select (choose from a list), boolean (yes/no), or
number.
For **secrets** (API keys, passwords) use the ``elicit`` tool instead — form
does not have a secret type, so credentials would appear in the chat display.
For a single question or free-form input, a plain assistant message is simpler.
Prefer form when collecting two or more related fields at once.
### Form syntax
Each field is specified on a separate line with the format:
field_name: Prompt text [options]
Field types are inferred from the prompt:
- Text field (default): `name: What's your name?`
- Select field: `priority: Priority level [low, medium, high]`
- Boolean field: `confirm: Are you sure? [yes/no]`
- Number field: `count: How many? (number)`
The tool will present an interactive form and return the collected data as JSON.
Examples
Collect project information
| User |
I want to start a new project |
| Assistant |
Let me gather some information about your project: |
| System |
Form submitted: |
Simple confirmation form
| User |
Deploy to production |
| Assistant |
Please confirm the deployment details: |
| System |
Form submitted: |
- gptme.tools.form.execute_form(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]
Present a form to the user and collect their responses.
Precommit#
Pre-commit hook tool that automatically runs pre-commit checks after file saves.
This tool automatically runs pre-commit checks in two scenarios:
Per-file checks (FILE_SAVE_POST (file.save.post)): After each file is saved - Runs pre-commit on the specific saved file - Provides immediate feedback on formatting/linting issues
Full checks (TURN_POST (turn.post)): After message processing completes - Runs pre-commit on all modified files - Ensures all changes pass checks before auto-commit
Commands: - /pre-commit: Manually run pre-commit checks
Pre-commit checks include: - Code formatting (black, prettier, etc.) - Linting (ruff, eslint, etc.) - Type checking (mypy, etc.) - Other configured hooks
The tool will report any failures and suggest fixes.
Enable with: –tools precommit Or configure pre-commit checks via: GPTME_CHECK=true
Autocommit#
Autocommit hook tool that automatically provides hints for committing changes after message processing.
When GPTME_AUTOCOMMIT=true is set, after each message is processed: 1. Checks if there are file modifications 2. If modifications exist, returns a message asking the LLM to review and commit
The tool hooks into TURN_POST (turn.post) and runs with low priority (after pre-commit checks and other validation).
To enable autocommit:
`bash
export GPTME_AUTOCOMMIT=true
`
Vent#
Vent/feedback tool — emits in-the-moment friction signals to a durable ledger.
- The agent calls this when stuck or frustrated. Signals are written to:
~/.local/share/gptme/friction-ledger.jsonl
Rate-limited to one vent per turn to prevent recursive venting spirals (Lovable found agents can spiral into 43+ vents without this guard).
Usage:
```vent
pytest exits 0 with "no tests found" even though tests/test_vent.py
exists. Tried --co and an explicit path; the discovery config is wrong.
Owner: tooling
```
Resolution owner (axis 1 — who/what unblocks this) is an optional, small, stable enum captured at vent time. Richer theme/cause clustering happens later at analysis time, so keep the capture label thin:
self Solvable now with better prompting / context / reasoning tooling Needs a tool / permission / config / env change operator Needs a human (decision, credential, approval, account action) upstream Needs a fix in a dependency we don’t own architectural Not solvable in the current stack design
The Type keyword accepts both deprecated legacy aliases
(Type1->self, Type2a->tooling, Type2b->architectural, Type0->operator) and
current taxonomy values (e.g. Type: self, Type: tooling).
Complete#
Complete tool - signals that the autonomous session is finished.
- exception gptme.tools.complete.SessionCompleteException
Exception raised to signal that the session should end.
- gptme.tools.complete.auto_reply_hook(manager: LogManager, interactive: bool, prompt_queue: Any) Generator[Message | StopPropagation, None, None]
Hook that implements auto-reply mechanism for autonomous operation.
If in non-interactive mode and last assistant message had no tools, inject an auto-reply to ensure the assistant does work.
This is called via LOOP_CONTINUE hook, which receives interactive and prompt_queue.
- Parameters:
manager – Conversation manager with log and workspace
interactive – Whether in interactive mode
prompt_queue – Queue of pending prompts
- gptme.tools.complete.complete_hook(messages: list[Message], **kwargs) Generator[Message | StopPropagation, None, None]
Hook that detects complete tool call and prevents next generation.
Runs at GENERATION_PRE (before generating response) to stop the session immediately after complete tool is called.
- Parameters:
messages – List of conversation messages
**kwargs – Additional arguments (workspace, manager - currently unused)
Note: GENERATION_PRE hooks are called with messages as first positional arg, not manager as the Protocol suggests. This is a known type safety issue.
Restart#
Restart the gptme process.
This tool allows restarting gptme from within a conversation, which can be useful for applying configuration changes, reloading tools, or recovering from state issues.
- gptme.tools.restart.execute_restart(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]
Execute restart by confirming intent.
The actual restart happens in the restart_hook (GENERATION_PRE), after all messages have been saved to the log.
- gptme.tools.restart.restart_hook(messages: list[Message], **kwargs) Generator[Message, None, None]
Hook that detects restart tool call and performs the restart.
Runs at GENERATION_PRE (before generating response) to restart immediately after the restart tool is called.
By this point, all messages (including the assistant’s restart message and the system confirmation) have been saved to the log.
Lessons#
Lesson system tool for gptme.
Provides structured lessons with metadata that can be automatically included in context. Similar to .cursorrules or “Claude Skills”. Has keyword-based triggering.
Commands provided:
/lesson list- View all available lessons/lesson search <query>- Find lessons matching query/lesson show <id>- Display a specific lesson/lesson refresh- Reload lessons from disk
- class gptme.tools.lessons.LessonSessionStats
Statistics about lessons matched during a session.
- __init__(total_matched: int = 0, unique_lessons: set[str] = <factory>, lesson_titles: dict[str, str] = <factory>) None
- gptme.tools.lessons.auto_include_lessons_hook(manager: LogManager) Generator[Message | StopPropagation, None, None]
Hook to automatically include relevant lessons in context.
Extracts keywords from both user and assistant messages to trigger lessons.
- Parameters:
manager – Conversation manager with log and workspace
- Returns:
Generator of messages to prepend (lessons as system message)
- gptme.tools.lessons.handle_lesson_command(ctx: CommandContext) Generator[Message, None, None]
Handle /lesson command.
- gptme.tools.lessons.session_end_lessons_hook(manager: LogManager, **kwargs) Generator[Message | StopPropagation, None, None]
Hook to print lesson statistics at end of session.
- Parameters:
manager – Conversation manager with log and workspace
**kwargs – Additional arguments (e.g., logdir)
- Yields:
Nothing (just logs statistics)
Todo#
A working memory todo tool for conversation-scoped task planning.
This tool provides a lightweight todo list that exists within the current conversation context, complementing the existing persistent task management system in gptme-agent-template.
Key principles: - Working Memory Layer: Ephemeral todos for current conversation context - Complements Persistent Tasks: Works alongside existing task files without conflicts - Simple State Model: pending, in_progress, completed, paused - Conversation Scoped: Resets between conversations, doesn’t persist to disk - Auto-replay: Automatically restores todo state when resuming conversations
- gptme.tools.todo.get_incomplete_todos_summary() str
Get a summary of incomplete todos for continuation prompts.
- Returns:
A formatted string listing incomplete todos, or empty string if none.
- gptme.tools.todo.has_incomplete_todos() bool
Check if there are any incomplete todos in working memory.
Used by auto_reply_hook to determine if the agent should continue working instead of being asked about completion.
- Returns:
True if there are any pending or in_progress todos.
MCP#
The Model Context Protocol (MCP) allows you to extend gptme with custom tools through external servers. See MCP for configuration and usage details.
MCP server discovery and management tool.
Allows searching for MCP servers in registries and dynamically loading/unloading them.
Available Commands:
- /mcp search [query] - Search for MCP servers across all registries
- /mcp info <server-name> - Get detailed information about a specific server
- /mcp load <server-name> [config-json] - Dynamically load an MCP server into the current session
- /mcp unload <server-name> - Unload a previously loaded MCP server
- /mcp list - List all currently configured and loaded MCP servers
The search command queries the Official MCP Registry (registry.modelcontextprotocol.io).
Once loaded, server tools are available as <server-name>.<tool-name>.
Instructions
### When to use the mcp tool
Use mcp to discover, load, inspect, and manage MCP servers:
- `/mcp search <capability>` finds new servers
- `/mcp load <server-name>` loads one
- `/mcp list` shows what is already loaded
Do not use mcp to call loaded tools; invoke them directly as
`<server-name>.<tool-name>`.
Search uses the Official MCP Registry (registry.modelcontextprotocol.io).
Other commands:
- `info <server>` shows server details
- `resources list/read` browses server resources
- `templates list` lists resource templates
- `prompts list/get` lists or fetches prompts
- `roots list/add/remove` manages advisory workspace roots
Examples
```mcp
search sqlite
```
```mcp
info sqlite
```
```mcp
load sqlite
```
```mcp
list
```
```mcp
unload sqlite
```
```mcp
load my-server
{"command": "uvx", "args": ["my-mcp-server", "--option"]}
```
```mcp
resources list sqlite
```
```mcp
resources read sqlite db://main/users
```
```mcp
templates list sqlite
```
```mcp
prompts list sqlite
```
```mcp
prompts get sqlite create-query {"table": "users"}
```
```mcp
roots list
```
```mcp
roots add filesystem file:///home/user/project Project
```
```mcp
roots remove filesystem file:///home/user/project
```