Tools#

gptme’s tools enable AI agents to execute code, edit files, browse the web, process images, and interact with your computer.

Overview#

📁 File System#

  • Read - Read files in any format

  • Save - Create and overwrite files

  • Patch - Apply precise changes to existing files

  • Morph - Apply fast targeted edits using Morph Fast Apply

💻 Code & Development#

  • Python - Execute Python code interactively with full library access

  • Shell - Run shell commands and manage system processes

  • GH - Interact with GitHub issues, PRs, and repositories

  • Precommit - Automatically run pre-commit checks after file saves

  • Autocommit - Automatically prompt for git commits after file modifications

🌐 Web & Research#

  • Browser - Browse websites, take screenshots, and read web content

  • RAG - Index and search through documentation and codebases

  • Chats - Search past conversations for context and references

👁️ Visual & Interactive#

  • Vision - Analyze images, diagrams, and visual content

  • Screenshot - Capture your screen for visual context

  • Computer - Control desktop applications through visual interface

🤝 User Interaction#

  • Choice - Present multiple-choice options to the user

  • Elicit - Request structured single-field input from the user

  • Form - Present a multi-field form for structured user input

⚡ Advanced Workflows#

  • Tmux - Manage long-running processes in terminal sessions

  • Subagent - Delegate subtasks to specialized agent instances

  • Complete - Signal that the autonomous session is finished

  • Restart - Restart the gptme process after configuration changes

  • Vent - Emit in-the-moment friction signals to a durable ledger

🧠 Knowledge & Planning#

  • Lessons - Access contextual lessons and behavioral guidance

  • Todo - Manage a conversation-scoped working memory task list

🔌 Extensions#

  • MCP - Discover and connect Model Context Protocol servers

Combinations#

The real power emerges when tools work together:

  • Web Research + Code: Browser + Python - Browse documentation and implement solutions

  • Visual Development: Vision + Patch - Analyze UI mockups and update code accordingly

  • System Automation: Shell + Python - Combine system commands with data processing

  • Interactive Debugging: Screenshot + Computer - Visual debugging and interface automation

  • Knowledge-Driven Development: RAG + Chats - Learn from documentation and past conversations

Shell#

The assistant can execute shell commands with bash by outputting code blocks with shell as the language.

Configuration:
GPTME_SHELL_TIMEOUT: Environment variable to configure command timeout (set before starting gptme)
  • Set to a number (e.g., 30) for timeout in seconds

  • Set to 0 to disable timeout

  • Invalid values default to 1200 seconds (20 minutes)

  • If not set, defaults to 1200 seconds (20 minutes)

GPTME_SHELL_TRUNC_PRE_TOKENS / GPTME_SHELL_TRUNC_POST_TOKENS: Override the head/tail token budget for stdout truncation. Defaults: 2000 / 8000. GPTME_SHELL_TRUNC_STDERR_PRE_TOKENS / GPTME_SHELL_TRUNC_STDERR_POST_TOKENS: Same overrides for stderr. Defaults: 2000 / 2000. Lowering these makes the truncation path fire on smaller outputs, which surfaces savings telemetry in context-savings.jsonl and the /context command. Invalid values fall back to defaults.

Instructions

The given command will be executed in a stateful bash shell.
The shell tool will respond with the output of the execution.

These programs are available, among others:
- apt-get
- docker
- git
- hyperfine
- pandoc

### When to use the shell

Use the shell when you need to inspect the workspace, search or examine files,
check git state, or run existing commands and tests. Prefer the shell over
answering from memory when the repo can tell you the answer directly.

### Background Jobs

For long-running commands (dev servers, builds, etc.), use background jobs:
- `bg <command>` - Start command in background, returns job ID
- `jobs` - List all background jobs with status
- `output <id>` - Show accumulated output from a job
- `kill <id>` - Terminate a background job

This prevents blocking on commands like `npm run dev` that run indefinitely.

Examples

User
list the current directory
Assistant
To list the files in the current directory, use `ls`:
shell
ls
System
Ran command: `ls`
`stdout
file1.txt
file2.txt
`

The assistant can learn context by exploring the filesystem

User
learn about the project
Assistant
Lets start by checking the files
shell
git ls-files
System
`stdout
README.md
main.py
`
Assistant
Now lets check the README
shell
cat README.md
System
`stdout
(contents of README.md)
`
Assistant
Now we check main.py
shell
cat main.py
System
`stdout
(contents of main.py)
`
Assistant
The project is...

Create vue project

User
Create a new vue project with typescript and pinia named fancy-project
Assistant
Sure! Let's create a new vue project with TypeScript and Pinia named fancy-project:
shell
npm init vue@latest fancy-project --yes -- --typescript --pinia
System
`stdout
> npx
> create-vue

Vue.js - The Progressive JavaScript Framework

Scaffolding project in ./fancy-project...
`

Proper quoting for complex content

User
add a comment with backticks and special characters
Assistant
When passing complex content with special characters, use single quotes to prevent shell interpretation:
shell
echo 'Content with `backticks` and $variables that should not be interpreted' > example.txt

Background jobs for long-running commands

User
start a dev server
Assistant
I'll start the dev server as a background job so it doesn't block:
shell
bg npm run dev
System
Started background job **#1**: `npm run dev`
>
> Use these commands to manage it:
> - `jobs` - List all background jobs
> - `output 1` - Show output from job #1
> - `kill 1` - Terminate job #1
User
check the server output
Assistant
Let me check the output from the background job:
shell
output 1
System
**Job #1** - Running (45.2s)
Command: `npm run dev`
>
> ```stdout
> > project@1.0.0 dev
> > vite
>
> VITE v5.0.0  ready in 300 ms
>
> ➜  Local:   http://localhost:5173/
> ```
User
stop the server
Assistant
I'll terminate the background job:
shell
kill 1
System
Terminated job #1: `npm run dev`
gptme.tools.shell.close_conversation_shell(conversation_id: str) None

Close and remove the shell session for a conversation.

Called by the SESSION_END hook to clean up shell file descriptors when a conversation’s last session is removed.

gptme.tools.shell.execute_shell(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Executes a shell command and returns the output.

gptme.tools.shell.execute_shell_impl(cmd: str, logdir: Path | None, timeout: float | None = None) Generator[Message, None, None]

Execute shell command and format output.

gptme.tools.shell.get_shell() ShellSession

Get the shell session for the current context, creating it if necessary.

Uses ContextVar to provide context-local state, allowing each conversation to have its own shell session with independent working directory.

In server contexts (where current_conversation_id is set), also registers the shell in a conversation-level registry for cleanup via SESSION_END hooks.

gptme.tools.shell.get_shell_command(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) str

Get the shell command from code/args/kwargs.

gptme.tools.shell.get_workspace_cwd() str | None

Get the workspace directory for the current context, if set.

gptme.tools.shell.preview_shell(cmd: str, _: Path | None) str

Prepare preview for shell command.

gptme.tools.shell.set_shell(shell: ShellSession) None

Set the shell session for the current context (for testing).

gptme.tools.shell.set_workspace_cwd(cwd: str) None

Set the workspace directory for the current context (thread-safe).

Call this before any shell creation to ensure the shell subprocess starts in the correct directory, even with concurrent sessions. This is the thread-safe replacement for os.chdir() in server contexts.

gptme.tools.shell.strip_ansi_codes(text: str) str

Strip ANSI escape sequences from text.

gptme.tools.shell.trim_blank_lines(text: str) str

Trim only leading and trailing blank (whitespace-only) lines.

Interior blank lines are kept. Unlike str.strip(), the first/last contentful lines are returned verbatim, so indentation (e.g. from sed/head/tail of indented code) is preserved.

Python#

The assistant can execute Python code blocks.

It uses IPython to do so, and persists the IPython instance between calls to give a REPL-like experience.

Instructions

Use this tool to execute Python code in an interactive IPython session.
It responds with the execution output and final result.

### When to use the python tool

Use `python` for computation, structured data, and file-processing automation.
Prefer it over the shell for pure computation or when you need persistent
state or Python libraries.

Examples

Result of the last expression will be returned

User
What is 2 + 2?
Assistant
ipython
2 + 2
System
Executed code block.
`result
4
`

Write a function and call it

User
compute fib 10
Assistant
To compute the 10th Fibonacci number, we can run the following code:
ipython
def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)
fib(10)
System
Executed code block.
`result
55
`
class gptme.tools.python.TeeIO
__init__(original_stream)
write(s)

Write string to file.

Returns the number of characters written, which is always equal to the length of the string.

gptme.tools.python.execute_python(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Executes a python codeblock and returns the output.

gptme.tools.python.get_installed_python_libraries() list[str]

Check if a select list of Python libraries are installed.

gptme.tools.python.register_function(func: T) T

Decorator to register a function to be available in the IPython instance.

Tmux#

You can use the tmux tool to run long-lived and/or interactive applications in a tmux session. Requires tmux to be installed.

This tool is suitable to run long-running commands or interactive applications that require user input. Examples of such commands: npm run dev, python3 server.py, python3 train.py, etc. It allows for inspecting pane contents and sending input.

Instructions

You can use the tmux tool to run long-lived and/or interactive applications in a tmux session.

### When to use tmux

Use tmux for interactive applications requiring ongoing keyboard input or output
inspection: REPLs, TUIs, interactive installers, and persistent processes you
need to revisit. Prefer tmux over the shell's `bg` command when the application
requires send-keys interaction or repeated pane inspection. Use `wait
<session_id> [timeout] [stable_time]` when you need output to stabilize before
continuing.

Examples

Running subagents

User
start subagent to fix lints in parallel
Assistant
Let's start a subagent in a new tmux session:
tmux
new-session gptme --non-interactive "fix lint 1"
new-session gptme --non-interactive "fix lint 2"

Running specific agent

User
Ask Bob about his latest work
Assistant
Sure! Let's start a tmux session running Bob (~/bob/):
tmux
new-session cd ~/bob && gptme --non-interactive 'What is your latest work?'

Managing a dev server

User
Start the dev server
Assistant
Certainly! To start the dev server we should use tmux:
tmux
new-session npm run dev
System
Running `npm run dev` in session gptme_1
User
Can you show me the current content of the pane?
Assistant
Of course! Let's inspect the pane content:
tmux
inspect-pane gptme_1
System
`output
Server is running on localhost:5600
`
User
Stop the dev server
Assistant
I'll send 'Ctrl+C' to the pane to stop the server:
tmux
send-keys 0 C-c
System
Sent 'C-c' to pane 0

Get info from ncurses applications

User
start top and give me a summary
Assistant
Sure! Let's start the top command in a tmux session:
tmux
new-session top
System
Running `top` in session gptme_1.
`output
(output from top shown here)
`
Assistant
The load is...

Send keys to a session

User
start ipython
Assistant
Let's start an ipython session:
tmux
new-session ipython
System
Running `ipython` in session 2.
`output
(output from ipython shown here)
`
User
Run 'print("Hello, world!")' in the ipython session
tmux
send-keys 2 'print("Hello, world!")' Enter

Listing active sessions

User
List all active tmux sessions
tmux
list-sessions
System
Active tmux sessions ['0', 'gptme_1']
gptme.tools.tmux.execute_tmux(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Executes a command in tmux and returns the output.

gptme.tools.tmux.inspect_pane(pane_id: str, logdir: Path | None = None) Message

Inspect the content of a tmux pane.

Parameters:
  • pane_id – The tmux pane ID to inspect

  • logdir – Optional directory to save full output if truncated

Returns:

Message with pane content (truncated if too long)

gptme.tools.tmux.wait_for_output(session_id: str, timeout: int = 60, stable_time: int = 3, logdir: Path | None = None) Message

Wait for command output to stabilize in a tmux session.

Monitors the pane output and waits until it remains unchanged for stable_time seconds, or until timeout is reached.

Parameters:
  • session_id – The tmux session ID to monitor

  • timeout – Maximum time to wait in seconds (default: 60)

  • stable_time – Seconds of unchanged output to consider stable (default: 3)

Returns:

Message with the final output and status

Subagent#

Subagent tool — spawn, monitor, and coordinate child agents.

Extracted from a single 1100-line module into a package for maintainability.

Package structure: - types.py — Data classes and module-level state (Subagent, ReturnType, etc.) - hooks.py — Completion notification system (LOOP_CONTINUE hook) - api.py — Public API (subagent, subagent_status, subagent_wait, etc.) - batch.py — Batch execution (BatchJob, subagent_batch, subagent_parallel, subagent_pipeline) - execution.py — Execution backends (thread, subprocess, process monitoring)

Instructions

You can create, check status, wait for, and read logs from subagents.

Subagents support a "fire-and-forget-then-get-alerted" pattern:
- Call subagent() to start an async task (returns immediately)
- Continue with other work
- Receive completion messages via the LOOP_CONTINUE hook
- Optionally use subagent_wait() for explicit synchronization

Key features:
- Agent profiles: Use profile names as agent_id for automatic profile detection
- model="provider/model": Override parent's model (route cheap tasks to faster models)
- use_subprocess=True: Run subagent in subprocess for output isolation
- use_acp=True: Run subagent via ACP protocol (supports any ACP-compatible agent)
- acp_command="claude-code-acp": Use a different ACP agent (default: gptme-acp)
- isolated=True: Run subagent in a git worktree for filesystem isolation
- workdir="/path/to/dir": Set the working directory for the subagent (defaults to cwd)
- redact_secrets=True (default): Redact API keys, tokens, and passwords from workspace context
- context_window=0: Minimal context — only agent identity + tools, no workspace files (strongest isolation)
- context_window=N: Limit workspace context to at most N messages
- subagent_parallel(tasks, timeout): Fan out N subagents and wait for all — returns ordered list of results
- subagent_pipeline(items, *stages, timeout): Multi-stage fan-out with no barrier between stages — item A advances to stage 2 while item B is still in stage 1; each stage callable receives (item_prompt, prev_result) and returns the next stage's prompt
- subagent_batch(): Start multiple subagents and return a BatchJob for explicit synchronization
- subagent_cancel(): Cancel a running subagent (SIGTERM for subprocess, marks result for threads)
- subagent_reply(agent_id, reply): Answer a clarification request and re-spawn the subagent
- Hook-based notifications: Completions (and clarification requests) delivered as system messages

## Context Isolation

Subagents do NOT inherit the parent's conversation history — they always start
with a fresh context. What subagents DO inherit (in context_mode="full"):

- Workspace files listed in gptme.toml [prompt] files (e.g. AGENTS.md, README)
- Dynamic context_cmd output (if configured in gptme.toml)
- User-level config files from ~/.config/gptme

This means secrets stored in workspace config files or produced by context_cmd
can reach the subagent. Secret patterns (API_KEY, TOKEN, PASSWORD, etc.) are
redacted by default (redact_secrets=True). Pass redact_secrets=False to disable
if legitimate config values are incorrectly redacted.

### Controlling context depth with context_window

Limit how much workspace context flows to the subagent. Reach for these when
you need tighter control over what the subagent sees:

- `context_window=None` (default): The subagent sees your full workspace (files,
  tools, recent conversation). Best for tasks that benefit from maximum awareness.
- `context_window=0`: **Strongest isolation** — the subagent gets only agent
  identity and tool descriptions, no workspace files or context_cmd output. Use
  this when the subagent handles sensitive data (secrets, prompts) that should
  not leak into verification or analysis tasks. The subagent knows only what
  you explicitly tell it in the task prompt.
- `context_window=N`: Limits workspace to at most N context messages. Useful when
  the default is too bloated but you still want the subagent to see some workspace
  history — trim without fully isolating.

`context_window=0` is equivalent to `context_mode="selective", context_include=["agent", "tools"]`
but is a simpler one-parameter alternative. Only applies to thread-mode subagents.

## Agent Profiles for Subagents

Use profiles to create specialized subagents with appropriate capabilities.
When agent_id matches a profile name, the profile is auto-applied:
- explorer: Read-only analysis (tools: read)
- researcher: Web research without file modification (tools: browser, read)
- developer: Full development capabilities (all tools)
- verifier: Critical review & validation (tools: read, shell, ipython, chats)
- isolated: Restricted processing for untrusted content (tools: read, ipython)
- computer-use: Visual UI testing specialist (tools: computer, vision, ipython, shell)
- browser-use: Web interaction and testing specialist (tools: browser, screenshot, vision, shell) — supports interactive browsing (open_page, click, fill, scroll) and one-shot reads

Example: `subagent("explorer", "Explore codebase")`
With model override: `subagent("researcher", "Find docs", model="openai/gpt-4o-mini")`
Computer-use example: `subagent("computer-use", "Click the Submit button, wait for the modal, and screenshot the result")`
Browser-use example: `subagent("browser-use", "Open localhost:5173, fill the chat input, click send, and report the result")`

Use subagent_read_log() to inspect a subagent's conversation log for debugging.

## Structured Delegation Template

For complex delegations, use this 7-section template for clear task handoff:

TASK: [What the subagent should do]
EXPECTED OUTCOME: [Specific deliverable - format, structure, quality bars]
REQUIRED SKILLS: [What capabilities the subagent needs]
REQUIRED TOOLS: [Specific tools the subagent should use]
MUST DO: [Non-negotiable requirements]
MUST NOT DO: [Explicit constraints and forbidden actions]
CONTEXT: [Background info, dependencies, related work]

Example prompt using the template:
'''
TASK: Implement the user authentication feature
EXPECTED OUTCOME: auth.py with login/logout endpoints, passing tests
REQUIRED SKILLS: Python, FastAPI, JWT tokens
REQUIRED TOOLS: save, shell (for pytest)
MUST DO: Use bcrypt for password hashing, return proper HTTP status codes
MUST NOT DO: Store plaintext passwords, skip input validation
CONTEXT: This is for the gptme server API, see existing endpoints in server.py
'''

## Clarification Requests

When a subagent ends with a ``clarify`` block, it signals that it needs more
information from the parent before it can continue:

```clarify
Which output format should I use: JSON or CSV?
```

The parent receives a hook notification:
  ❓ Subagent 'X' needs clarification: Which output format should I use: JSON or CSV?
  Call subagent_reply('X', '<your answer>') to continue.

Use ``subagent_reply(agent_id, reply)`` to answer and re-spawn the subagent.
The re-spawned subagent receives the original prompt plus the Q&A so it can
complete the task without losing context.

Examples

Executor Mode (single task)

User
compute fib 13 using a subagent
Assistant
Starting a subagent to compute the 13th Fibonacci number.
ipython
subagent("fib-13", "compute the 13th Fibonacci number")
System
Subagent started successfully.
Assistant
Now we need to wait for the subagent to finish the task.
ipython
subagent_wait("fib-13")
System
{"status": "success", "result": "The 13th Fibonacci number is 233"}.

Planner Mode (multi-task delegation)

User
implement feature X with tests
Assistant
I'll use planner mode to delegate implementation and testing to separate subagents.
ipython
subtasks = [
    {{"id": "implement", "description": "Write implementation for feature X"}},
    {{"id": "test", "description": "Write comprehensive tests"}},
]
subagent("feature-planner", "Feature X adds new functionality", mode="planner", subtasks=subtasks)
System
Planner spawned 2 executor subagents.
Assistant
Now I'll wait for both subtasks to complete.
ipython
subagent_wait("feature-planner-implement")
System
{"status": "success", "result": "Implementation complete in feature_x.py"}.
ipython
subagent_wait("feature-planner-test")
System
{"status": "success", "result": "Tests complete in test_feature_x.py, all passing"}.

### Context Modes

Full Context (default)

User
analyze this codebase
Assistant
I'll use full context mode for comprehensive analysis.
ipython
subagent("analyze", "Analyze code quality and suggest improvements", context_mode="full")

Selective Context (choose specific components)

User
write tests using pytest
Assistant
I'll use subprocess mode so selective context can include workspace files without inheriting the full parent context.
ipython
subagent("tests", "Write pytest tests for the calculate function", context_mode="selective", context_include=["workspace"], use_subprocess=True)

Subprocess Mode (output isolation)

User
run a subagent without output mixing with parent
Assistant
I'll use subprocess mode for better output isolation.
ipython
subagent("isolated", "Compute complex calculation", use_subprocess=True)
System
Subagent started in subprocess mode.

Workspace-Aware Subagent (explicit workdir)

User
I just cd'd into /path/to/project which has a gptme.toml — spawn a subagent there
Assistant
I'll use workdir to make the subagent operate in that workspace and load its config.
ipython
subagent("project", "Add feature X", workdir="/path/to/project", use_subprocess=True)
System
Subagent started in subprocess mode (workdir=/path/to/project).

ACP Mode (multi-harness support)

User
delegate this task to a Claude Code agent
Assistant
I'll use ACP mode to run this via a different agent harness.
ipython
subagent("claude-task", "Analyze and refactor the auth module", use_acp=True, acp_command="claude-code-acp")
System
Started subagent "claude-task" in ACP mode.

List Subagents (observability)

User
what subagents are currently running?
Assistant
I'll use subagent_list to check running agents.
ipython
subagent_list()
System
Listing 2 subagents::
  - analyze (running, 42s) -- "Analyze the codebase architecture..."
  - fib-13 (success, 120s) -- "compute the 13th Fibonacci number"

Parallel Fan-out (wait for all, ordered results)

User
implement, test, and document a feature in parallel and collect all results
Assistant
I'll use subagent_parallel to fan out all tasks and wait for them together.
ipython
tasks = [
    ("impl", "Implement the user authentication feature"),
    ("test", "Write tests for authentication"),
    ("docs", "Document the authentication API"),
]
results = subagent_parallel(tasks, timeout=300)
for (agent_id, _), result in zip(tasks, results):
    print(f"{{agent_id}}: {{result['status']}} — {{result['result'][:60]}}")
System
impl: success — Authentication feature implemented in auth.py
test: success — 12 tests added, all passing
docs: success — API documented in docs/auth.md

Pipeline (multi-stage fan-out, no barrier between stages)

User
review these files in two stages — first find issues, then verify each finding
Assistant
I'll use subagent_pipeline so file B's review starts while file A's verification is running.
ipython
results = subagent_pipeline(
    [("auth", "Review auth.py for bugs"), ("db", "Review db.py for bugs")],
    # Stage 0: review each file
    lambda item, _: item,
    # Stage 1: adversarially verify the review findings
    lambda item, prev: "Verify these findings, keep only real bugs: " + prev,
    timeout=300,
)

# auth advances to stage 1 as soon as its stage 0 finishes,

# while db may still be in stage 0.
for (prefix, _), stage_results in zip([("auth", ...), ("db", ...)], results):
    print(f"{prefix}: {stage_results[-1]['status']}")
System
auth-s0 done → auth-s1 started; db-s0 done → db-s1 started
System
auth: success, db: success

Batch Execution (fire-and-forget with explicit sync)

User
start tasks in background and continue working
Assistant
I'll use subagent_batch to start tasks in the background. Completion hooks will notify me.
ipython
job = subagent_batch([
    ("impl", "Implement the user authentication feature"),
    ("test", "Write tests for authentication"),
])

# Do other work while subagents run — hook notifications arrive automatically:

# "✅ Subagent 'impl' completed: ..."

# Or explicitly wait for all when needed:

results = job.wait_all(timeout=300)
System
Started batch of 2 subagents: ['impl', 'test']

Fire-and-Forget with Hook Notifications

User
start a subagent and continue working
Assistant
I'll spawn a subagent. Completion will be delivered via the LOOP_CONTINUE hook.
ipython
subagent("compute-demo", "Compute pi to 100 digits")

# I can continue with other work now

# When the subagent completes, I'll receive a system message like:

# "✅ Subagent 'compute-demo' completed: pi = 3.14159..."
System
Started subagent "compute-demo"
System
✅ Subagent 'compute-demo' completed: pi = 3.14159265358979...

Profile-Based Subagents (auto-detected from agent_id)

User
explore this codebase and summarize the architecture
Assistant
I'll use the explorer profile for a read-only analysis.
ipython
subagent("explorer", "Analyze the codebase architecture and summarize key patterns")
System
Subagent started successfully.

Profile with Model Override

User
research best practices for error handling
Assistant
I'll spawn a researcher subagent with a faster model for web research.
ipython
subagent("researcher", "Research error handling best practices in Python", model="openai/gpt-4o-mini")
System
Subagent started successfully.

Structured Delegation Template

User
implement a robust auth feature
Assistant
I'll use the structured delegation template for clear task handoff.
ipython
subagent("auth-impl", "TASK: Implement JWT auth | OUTCOME: auth.py with tests | MUST: bcrypt, validation | MUST NOT: plaintext passwords")
System
Subagent started successfully.

Isolated Subagent (Worktree)

User
implement a feature without affecting my working directory
Assistant
I'll run the subagent in an isolated git worktree so it won't modify your files.
ipython
subagent("feature-impl", "Implement the new caching layer in cache.py", isolated=True)
System
Subagent started successfully.

Context-Isolated Subagent (no workspace context)

User
verify this output without exposing our workspace secrets to the subagent
Assistant
I'll use context_window=0 so the subagent only sees what I explicitly give it in the prompt, with no workspace files or secrets inherited.
ipython
subagent("verifier", "Check that the output file has no syntax errors", context_window=0)
System
Subagent started successfully.
class gptme.tools.subagent.BatchJob

Manages a batch of subagents for parallel execution.

Note: With the hook-based notification system, the orchestrator will receive completion messages automatically via the LOOP_CONTINUE hook. This class provides additional utilities for explicit synchronization when needed.

__init__(agent_ids: list[str], results: dict[str, ~gptme.tools.subagent.types.ReturnType] = <factory>, output_schema: type | None = None, _lock: ~_thread.allocate_lock = <factory>) None
get_completed() dict[str, dict]

Get results of completed subagents so far.

When the BatchJob was created with an output_schema (via subagent_batch(output_schema=...)) the results are automatically parsed through _parse_result() before being returned, matching the behaviour of wait_all().

is_complete() bool

Check if all subagents have completed.

total_tokens() dict[str, int | None]

Return aggregated token counts across all completed subagents.

Sums input_tokens and output_tokens from each completed result. Any subagent whose log has no usage metadata contributes None to its part — the aggregate is None when no completed subagent has token data, otherwise it is the sum of available counts.

Returns:

Dict with keys "input_tokens" and "output_tokens". Values are integers (sum of available counts) or None when no usage metadata was found in any completed subagent’s log.

Example:

job = subagent_batch([("a", "task A"), ("b", "task B")])
results = job.wait_all()
stats = job.total_tokens()
print(f"Tokens used: {stats['input_tokens']} in / {stats['output_tokens']} out")
wait_all(timeout: int = 300) dict[str, dict]

Wait for all subagents to complete concurrently.

Uses a thread pool to wait for all subagents simultaneously, so the wall-clock time is bounded by the slowest agent, not the sum of all agent times.

When the BatchJob was created with an output_schema (via subagent_batch(output_schema=...)) the results are automatically parsed through _parse_result() before being returned, matching the auto-parse behaviour of subagent_parallel(output_schema=...).

Parameters:

timeout – Maximum seconds to wait for all subagents

Returns:

Dict mapping agent_id to status dict. When output_schema is set, the "result" value is the parsed/validated object rather than a raw JSON string.

class gptme.tools.subagent.ReturnType

ReturnType(status: Literal[‘running’, ‘success’, ‘failure’, ‘clarification_needed’, ‘timeout’], result: str | dict[str, object] | None = None, input_tokens: int | None = None, output_tokens: int | None = None)

__init__(status: Literal['running', 'success', 'failure', 'clarification_needed', 'timeout'], result: str | dict[str, object] | None = None, input_tokens: int | None = None, output_tokens: int | None = None) None
class gptme.tools.subagent.Subagent

Represents a running or completed subagent.

Supports both thread-based (default) and subprocess-based execution modes. Subprocess mode provides better output isolation.

Communication Model:
  • Parent sends prompt, child executes independently

  • Results retrieved after completion via status()/subagent_wait()

  • Subagents can use the clarify code block to signal ambiguity; the parent receives a hook notification and can call subagent_reply() to re-spawn with the question answered.

__init__(agent_id: str, prompt: str, thread: ~threading.Thread | None, logdir: ~pathlib.Path, model: str | None, context_mode: ~typing.Literal['full', 'selective'] = 'full', context_include: list[str] | None = None, profile: str | None = None, output_schema: type | None = None, use_acp: bool = False, process: ~subprocess.Popen | None = None, execution_mode: ~typing.Literal['thread', 'subprocess', 'acp'] = 'thread', acp_command: str | None = None, isolated: bool = False, worktree_path: ~pathlib.Path | None = None, repo_path: ~pathlib.Path | None = None, timeout: int = 1800, role: ~typing.Literal['general', 'explore', 'implement', 'verify'] | None = None, redact_secrets: bool = True, context_window: int | None = None, context_turns: int | None = None, started_at: float = <factory>, max_time: float | None = None) None
is_running() bool

Check if the subagent is still running.

class gptme.tools.subagent.SubtaskDef

Definition of a subtask for planner mode.

gptme.tools.subagent.get_current_agent_id() str | None

Return the agent_id of the currently running subagent thread, or None.

Set by _create_subagent_thread before calling chat(). Used by the progress tool to identify which subagent is sending a progress update. Only populated in thread-mode subagents.

gptme.tools.subagent.notify_completion(agent_id: str, status: Literal['running', 'success', 'failure', 'clarification_needed', 'timeout'], summary: str) None

Add a subagent completion to the notification queue.

Called by the monitor thread when a subagent finishes. The queued notification will be delivered via the subagent_completion hook during the next LOOP_CONTINUE cycle.

Parameters:
  • agent_id – The subagent’s identifier

  • status – “success” or “failure”

  • summary – Brief summary of the result

gptme.tools.subagent.notify_progress(agent_id: str, message: str) None

Add a subagent progress update to the notification queue.

Called by the progress tool when a subagent sends an intermediate update. The parent’s LOOP_CONTINUE hook delivers it as a system message so the orchestrator can react without blocking on subagent_wait().

Note: Only works for thread-mode subagents (same process). Subprocess-mode subagents cannot share the in-process queue.

Parameters:
  • agent_id – The subagent’s identifier

  • message – Progress update message

gptme.tools.subagent.subagent(agent_id: str, prompt: str, mode: Literal['executor', 'planner'] = 'executor', subtasks: list[SubtaskDef] | None = None, execution_mode: Literal['parallel', 'sequential'] = 'parallel', context_mode: Literal['full', 'selective'] = 'full', context_include: list[str] | None = None, output_schema: type | None = None, use_subprocess: bool | None = None, use_acp: bool = False, acp_command: str = 'gptme-acp', profile: str | None = None, model: str | None = None, isolated: bool | None = None, timeout: int = 1800, role: Literal['general', 'explore', 'implement', 'verify'] | None = None, redact_secrets: bool = True, context_window: int | None = None, max_time: float | None = None, context_turns: int | None = None, workdir: str | Path | None = None)

Starts an asynchronous subagent. Returns None immediately.

Subagent completions are delivered via the LOOP_CONTINUE hook, enabling a “fire-and-forget-then-get-alerted” pattern where the orchestrator can continue working and get notified when subagents finish.

Profile auto-detection: If agent_id matches a known profile name (e.g. “explorer”, “researcher”, “developer”, “verifier”) or a common role alias (“explore”→”explorer”, “research”→”researcher”, “impl”/”dev”→”developer”, “verify”→”verifier”), the profile is applied automatically — no need to pass profile separately.

Role-based defaults (role parameter):

  • "explore": Defaults profile to explorer (read-only analysis)

  • "implement": Defaults profile to developer (full capability)

  • "verify": Defaults profile to verifier plus use_subprocess=True and isolated=True (read-only validation in isolation)

Explicit arguments override role defaults.

Parameters:
  • agent_id – Unique identifier for the subagent. If it matches a known profile name (or a common alias like impl/dev), that profile is auto-applied (unless profile is explicitly set to something else).

  • prompt – Task prompt for the subagent (used as context for planner mode)

  • mode – “executor” for single task, “planner” for delegating to multiple executors

  • subtasks – List of subtask definitions for planner mode (required when mode=”planner”)

  • execution_mode – “parallel” (default) runs all subtasks concurrently, “sequential” runs subtasks one after another. Only applies to planner mode.

  • context_mode – Controls what context is shared with the subagent: - “full” (default): Share complete context (agent identity, tools, workspace) - “selective”: Share only specified context components (requires context_include)

  • context_include – For selective mode, list of context components to include: - Thread mode supports “agent” and “tools” - Subprocess mode also supports “workspace”, which maps to the CLI’s “files” context Legacy subprocess values like “files”, “cmd”, and “all” are still accepted.

  • use_subprocess – If True, run subagent in subprocess for output isolation. Subprocess mode captures stdout/stderr separately from the parent.

  • use_acp – If True, run subagent via ACP (Agent Client Protocol). This enables multi-harness support — the subagent can be any ACP-compatible agent (gptme, Claude Code, Cursor, etc.). Requires the acp package: pip install ‘gptme[acp]’.

  • acp_command – ACP agent command to invoke (default: “gptme-acp”). Only used when use_acp=True. Can be any ACP-compatible CLI.

  • profile – Agent profile name to apply. Profiles provide: - System prompt customization (behavioral hints) - Tool access restrictions (which tools the subagent can use) - Behavior rules (read-only, no-network, etc.) Use ‘gptme-util profile list’ to see available profiles. Built-in profiles: default, explorer, researcher, developer, verifier, isolated, computer-use, browser-use. If not set, auto-detected from agent_id when it matches a profile name.

  • model – Model to use for the subagent. Overrides parent’s model. Useful for routing cheap tasks to faster/cheaper models.

  • isolated – If True, run the subagent in a git worktree for filesystem isolation. The subagent gets its own copy of the repository and can modify files without affecting the parent. The worktree is automatically cleaned up after the subagent completes. Falls back to a temporary directory if not in a git repo.

  • timeout – Maximum seconds before the subprocess monitor kills the subagent (default 1800 = 30 min). Only applies to subprocess mode.

  • redact_secrets

    If True (default), scrub common secret patterns from workspace context messages before they are passed to the subagent. Redacts values from lines where the variable name matches patterns like API_KEY, TOKEN, PASSWORD, PRIVATE_KEY, etc.

    Note: subagents do NOT inherit the parent’s conversation history — they always start with a fresh context containing only the task prompt and workspace context (files from gptme.toml [prompt] files, and context_cmd output when context_mode=”full”). This option sanitizes that inherited workspace context.

    Only applies to thread-mode subagents (subprocess and ACP modes run as a separate gptme process and handle their own context). Set to False to disable redaction if legitimate config values are being incorrectly redacted.

  • context_window

    Limit workspace context messages passed to the subagent. Controls how much of the workspace context (files from gptme.toml [prompt] files, context_cmd output) is shared with the subagent.

    • None (default): no limit — full workspace context is shared.

    • 0: minimal context — only agent identity and tools; no workspace files or context_cmd output. Equivalent to context_mode="selective", context_include=["agent", "tools"].

    • N > 0: at most N workspace context messages are passed.

    Use context_window=0 when the subagent does not need the parent workspace configuration (e.g. a verification task that should only see what the orchestrator explicitly tells it).

    Only applies to thread-mode subagents; has no effect in subprocess or ACP modes (which build their own context as a separate process).

  • max_time

    Wall-clock time limit in seconds. When set, a watchdog timer marks the subagent result as "timeout" after max_time seconds and delivers a timeout status notification via the LOOP_CONTINUE hook. In subprocess mode the child process is terminated. In thread mode the background thread is not force-stopped; callers see the cached timeout result immediately while the thread continues until it finishes naturally. Defaults to None (no limit).

    Use this for defensive orchestration (prevent a stuck subagent from blocking the parent) or hard time budgets in autonomous sessions. max_time=None is fully backwards-compatible — no change in behavior.

  • context_turns

    Number of recent parent conversation turns to forward to the subagent as context. A “turn” starts at a user message and includes all subsequent assistant and tool-result (system) messages until the next user message, so the total message count per turn varies with the number of tool calls. The messages are injected as a system message so the subagent understands what the parent has been doing without confusing its own conversation flow.

    • None (default): no parent context forwarded (current behavior).

    • N > 0: forward the last N turns from the parent’s active log.

    The parent log is fetched automatically from the currently active LogManager (set by the chat loop via ContextVar). This works when subagent() is called from within the ipython tool during a running chat session.

    Use this when the subagent needs awareness of what the parent has already done (e.g. “the parent tried A and B, now try C”) or when the task prompt alone doesn’t provide enough context.

    Only applies to thread-mode subagents; has no effect in subprocess or ACP modes.

  • workdir

    Working directory for the subagent. Defaults to the current working directory (Path.cwd()) when None.

    Use this when you want the subagent to operate in a specific directory — for example, when a cd into a project with a gptme.toml triggers workspace detection and you want the subagent to load that workspace’s config:

    subagent("impl", "Add feature X", workdir="/path/to/project",
             use_subprocess=True)
    

    In subprocess mode the subagent process starts with this as its cwd, so it picks up the gptme.toml from that directory. In thread mode the workspace context (files, context_cmd) is loaded relative to this path.

Returns:

Starts asynchronous execution.

In executor mode, starts a single task execution. In planner mode, starts execution of all subtasks using the specified execution_mode.

Executors use the complete tool to signal completion with a summary. The full conversation log is available at the logdir path.

Return type:

None

gptme.tools.subagent.subagent_batch(tasks: list[tuple[str, str]], use_subprocess: bool = False, use_acp: bool = False, acp_command: str = 'gptme-acp', model: str | None = None, profile: str | None = None, isolated: bool = False, output_schema: type | None = None, workdir: str | Path | None = None, context_turns: int | None = None, redact_secrets: bool = True) BatchJob

Start multiple subagents in parallel and return a BatchJob to manage them.

This is a convenience function for fire-and-gather patterns where you want to run multiple independent tasks concurrently.

With the hook-based notification system, completion messages are delivered automatically via the LOOP_CONTINUE hook. The BatchJob provides additional utilities for explicit synchronization when needed.

Parameters:
  • tasks – List of (agent_id, prompt) tuples

  • use_subprocess – If True, run subagents in subprocesses for output isolation

  • use_acp – If True, run subagents via ACP protocol

  • acp_command – ACP agent command (default: “gptme-acp”)

  • model – Model override applied to every subagent.

  • profile – Agent profile name applied to every subagent.

  • isolated – If True, run each subagent in its own git worktree so file edits don’t conflict between agents or with the parent.

  • output_schema – Optional Pydantic model class. When set, subagents are instructed to return JSON matching the schema in their complete block. Results are automatically parsed when wait_all() is called — the "result" value in each returned dict will be the parsed/validated object rather than a raw JSON string, matching the behaviour of subagent_parallel(output_schema=...).

  • workdir – Working directory passed to every subagent. Useful when running subagents against a specific project directory.

  • context_turns – Number of recent parent conversation turns to forward to each subagent as context prefix. Pass None (default) to use no parent context.

  • redact_secrets – If True (default), redact secrets from workspace context passed to subagents. Pass False only if you need subagents to see config values that are incorrectly flagged as secrets.

Returns:

A BatchJob instance for managing the parallel subagents. The BatchJob provides wait_all(timeout) to wait for completion, is_complete() to check status, and get_completed() for partial results.

Example:

job = subagent_batch([
    ("impl", "Implement feature X"),
    ("test", "Write tests for feature X"),
    ("docs", "Document feature X"),
])
# Orchestrator continues with other work...
# Completion messages delivered via LOOP_CONTINUE hook:
#   "✅ Subagent 'impl' completed: Feature implemented"
#   "✅ Subagent 'test' completed: 5 tests added"
#
# Or explicitly wait for all if needed:
results = job.wait_all(timeout=300)
gptme.tools.subagent.subagent_cancel(agent_id: str) str

Cancel a running subagent.

For subprocess-mode subagents, sends SIGTERM (then SIGKILL after 5s) to the process. For thread-mode subagents, marks the result as cancelled — the thread continues until its next natural checkpoint but the result is already recorded as failure so callers won’t block waiting for it.

Parameters:

agent_id – The subagent to cancel

Returns:

A human-readable status message

gptme.tools.subagent.subagent_list() list[dict]

Returns a list of all subagents with their current status.

Each entry contains: - agent_id: The subagent identifier - status: running/success/failure/clarification_needed - model: The model used (or None) - execution_mode: thread/subprocess/acp - elapsed_s: Seconds since the subagent started (from started_at timestamp) - prompt_preview: First 100 characters of the prompt

Useful for: - Interactive sessions: “what’s running right now?” - Orchestrators deciding whether to spawn more agents - Debugging runaway subagent fans

gptme.tools.subagent.subagent_parallel(tasks: list[tuple[str, str]], timeout: int = 300, use_subprocess: bool = False, use_acp: bool = False, acp_command: str = 'gptme-acp', model: str | None = None, profile: str | None = None, isolated: bool = False, output_schema: type | None = None, workdir: str | Path | None = None, context_turns: int | None = None, redact_secrets: bool = True) list[dict]

Fan out N subagents in parallel, wait for all, return results as an ordered list.

This is the simplest way to run independent tasks concurrently and collect all results. Unlike subagent_batch(), this function blocks until every subagent has finished (or timed out) and returns the results in the same order as the input tasks.

Waits for all subagents concurrently — wall-clock time is bounded by the slowest agent, not the sum of all agent times.

Parameters:
  • tasks – List of (agent_id, prompt) tuples. Each agent_id must be unique within this call.

  • timeout – Maximum seconds to wait for all subagents to finish. Agents that exceed this deadline are reported with status "timeout".

  • use_subprocess – If True, run each subagent in a subprocess for output isolation. Subprocess mode captures stdout/stderr separately and supports hard-kill on timeout.

  • use_acp – If True, run each subagent via the ACP protocol.

  • acp_command – ACP agent command (default: “gptme-acp”). Only used when use_acp=True.

  • model – Model override applied to every subagent. Pass None to inherit the parent’s model.

  • profile – Agent profile name applied to every subagent (e.g. "explorer", "developer", "verifier").

  • isolated – If True, run each subagent in its own git worktree so file edits don’t conflict between agents or with the parent.

  • output_schema – Optional Pydantic model class. When set, subagents are instructed to return valid JSON matching the schema in their complete block. Results are automatically parsed: on success the "result" value is the parsed/validated object (a dict for Pydantic models) rather than a raw JSON string. A "parse_error" key is added to any result that cannot be parsed.

  • workdir – Working directory passed to every subagent. Useful when running subagents against a specific project directory.

  • context_turns – Number of recent parent conversation turns to forward to each subagent as context prefix. Pass None (default) to use no parent context.

  • redact_secrets – If True (default), scrub common secret patterns from workspace context before passing it to subagents.

Returns:

List of result dicts in the same order as tasks. Each dict has "status" ("success" / "failure" / "timeout") and "result" (parsed object when output_schema is set, else the summary text from the subagent’s complete block).

Example:

# Process three independent tasks in parallel
results = subagent_parallel([
    ("researcher", "Research the top 5 Python async frameworks"),
    ("coder",      "Implement a basic async HTTP client"),
    ("tester",     "Write pytest tests for an async HTTP client"),
])
for (agent_id, _), result in zip(tasks, results):
    print(f"{agent_id}: {result['status']}{result['result'][:80]}")

# With worktree isolation for concurrent file edits
results = subagent_parallel(
    [("fix-a", "Fix bug in module A"), ("fix-b", "Fix bug in module B")],
    isolated=True,
)

# With structured output (Pydantic model)
from pydantic import BaseModel

class AnalysisResult(BaseModel):
    summary: str
    score: int
    issues: list[str]

results = subagent_parallel(
    [("a1", "Analyze module A"), ("a2", "Analyze module B")],
    output_schema=AnalysisResult,
)
for r in results:
    if r["status"] == "success":
        analysis = r["result"]  # already a validated dict
        print(f"Score: {analysis['score']}, Issues: {analysis['issues']}")
gptme.tools.subagent.subagent_pipeline(items: list[tuple[str, str]], *stages: Callable[[str, str], str], timeout: float = 600, use_subprocess: bool = False, use_acp: bool = False, acp_command: str = 'gptme-acp', model: str | None = None, profile: str | None = None, isolated: bool = False, output_schema: type | None = None, workdir: str | Path | None = None, context_turns: int | None = None, redact_secrets: bool = True) list[list[dict]]

Process items through multiple stages with no barrier between stages.

Each item is processed through all stages sequentially. Items at different stages run concurrently — item A can be in stage 2 while item B is still in stage 1. This is the “pipeline” pattern as opposed to repeated subagent_parallel() calls which add a full barrier between stages.

Wall-clock time is bounded by the slowest single-item chain, not the sum of the slowest per-stage.

Parameters:
  • items – List of (agent_id_prefix, initial_prompt) tuples.

  • *stages – Callables of the form stage(item_prompt, prev_result) -> str where item_prompt is the original item prompt and prev_result is the raw result text from the previous stage (empty string for the first stage). Each callable returns the prompt to use for the next subagent in the chain.

  • timeout – Maximum seconds to wait for the entire pipeline to finish.

  • use_subprocess – If True, run each subagent in a subprocess.

  • use_acp – If True, run each subagent via the ACP protocol.

  • acp_command – ACP agent command (default: “gptme-acp”). Only used when use_acp=True.

  • model – Model override applied to every subagent.

  • profile – Agent profile name applied to every subagent.

  • isolated – If True, run each subagent in its own git worktree.

  • output_schema – Optional Pydantic model class. When set, each final-stage subagent is instructed to return JSON matching the schema and results are automatically parsed.

  • workdir – Working directory passed to every subagent.

  • context_turns – Number of recent parent turns to forward to each subagent.

  • redact_secrets – If True (default), redact secrets from workspace context.

Returns:

List of lists of result dicts. results[i][j] is the result dict for item i at stage j. Each dict has "status" and "result" keys (plus "input_tokens" / "output_tokens" when available). When output_schema is set, the final-stage "result" value is the parsed/validated object rather than a raw JSON string.

Example:

# Two-stage review pipeline: find issues, then verify each finding
results = subagent_pipeline(
    [("file-auth", "Review auth.py"), ("file-db", "Review db.py")],
    # Stage 0: review
    lambda item, _: f"Review this file for bugs: {item}",
    # Stage 1: verify each review finding
    lambda item, prev: (
        f"Adversarially verify each finding in this review:\n{prev}\n"
        f"Original file to review: {item}"
    ),
)
# file-auth advances to stage 1 as soon as its stage 0 completes,
# while file-db may still be in stage 0.
for (prefix, _), stage_results in zip(items, results):
    final = stage_results[-1]
    print(f"{prefix}: {final['status']}{final['result'][:80]}")

# With isolated worktrees so concurrent file edits don't conflict
results = subagent_pipeline(
    [("impl-a", "Implement feature A"), ("impl-b", "Implement feature B")],
    lambda item, _: item,
    lambda item, prev: f"Write tests for: {prev}",
    isolated=True,
)
gptme.tools.subagent.subagent_read_log(agent_id: str, max_messages: int = 50, include_system: bool = False, message_filter: str | None = None) str

Read the conversation log of a subagent.

Parameters:
  • agent_id – The subagent to read logs from

  • max_messages – Maximum number of messages to return

  • include_system – Whether to include system messages

  • message_filter – Filter messages by role (user/assistant/system) or None for all

Returns:

Formatted log output showing the conversation

gptme.tools.subagent.subagent_reply(agent_id: str, reply: str) None

Re-spawn a subagent that requested clarification.

When a subagent ends with a clarify block, it stops and asks the parent a question. Call this function with your answer to re-start the subagent. The new run receives the original prompt plus an appended Q&A block so it has full context.

Parameters:
  • agent_id – The subagent that raised the clarification request.

  • reply – Your answer to the subagent’s question.

gptme.tools.subagent.subagent_status(agent_id: str) dict

Returns the status of a subagent.

gptme.tools.subagent.subagent_wait(agent_id: str, timeout: int = 60, max_result_chars: int = 2000) dict

Waits for a subagent to finish.

Parameters:
  • agent_id – The subagent to wait for

  • timeout – Maximum seconds to wait (default 60)

  • max_result_chars – Truncate result text to this many characters (default 2000). Long subagent outputs are truncated to keep the parent’s context clean. Call subagent_read_log(agent_id) to read the full output.

Returns:

Status dict with ‘status’ and ‘result’ keys

Subagent Isolation Contract#

When spawning a subagent you need to know exactly what it inherits from the parent and what it starts fresh. There are four dimensions:

1. Workspace config loading

Thread mode (default, use_subprocess=False):

The subagent inherits the parent’s already-assembled workspace context — the [prompt] files from gptme.toml and the context_cmd output as they were loaded for the parent session. It does not re-read from the subagent’s working directory, so a subdirectory with its own gptme.toml will not be picked up automatically.

Subprocess mode (use_subprocess=True):

Spawns a fresh gptme process with workdir as the CWD, which naturally loads that directory’s gptme.toml. Use this when you want subagents to pick up directory-local workspace config.

Fine-grained control:

  • context_mode="selective" + context_include — share only specific components ("agent", "tools", "workspace") instead of the full workspace.

    Behavior by mode:

    Thread: fully supported — filters the inherited context to the specified components.

    Subprocess: context_mode is ignored (the child loads its own workspace from gptme.toml); context_include=["workspace"] maps to the --context files CLI flag to include workspace files. Other context_include values are ignored in this mode.

    ACP: both parameters are ignored.

  • context_window=N — limit how many inherited context messages are forwarded (0 = none, None = all). Thread mode only; ignored in subprocess and ACP.

  • context_turns=N — forward the last N turns of the parent conversation. Thread mode only; ignored in subprocess and ACP.

2. Tool and state inheritance

By default the subagent starts with the same tool list as the parent (both threads share the same initial snapshot; contextvars are thread-isolated so the parent’s tool state cannot be mutated by the subagent).

Three ways to restrict tools:

  • profile="explorer" (or any built-in profile) — applies a tool allowlist at spawn time. Built-in profiles: explorer (read-only), researcher, developer (full), verifier (read-only). Note: role="verify" forces use_subprocess=True and isolated=True in addition to the verifier profile.

  • isolated=True — runs the subagent in a git worktree so filesystem writes don’t affect the parent repo. The worktree is auto-cleaned after completion.

  • redact_secrets=True (default) — scrubs common secret patterns (API keys, tokens, passwords) from workspace context messages before they reach the subagent. Thread-mode only; has no effect in subprocess or ACP modes (the child process’s own gptme.toml controls its secret handling).

Signal tools are loaded regardless of allowlist so the subagent can communicate back. Thread-mode subagents get complete, clarify, and progress. Subprocess subagents get complete and clarify; progress is not loaded because it depends on the parent’s in-process notification queue.

3. Cancellation and timeout

  • max_time (seconds) — a watchdog timer that marks the subagent result as "timeout" after the specified duration and delivers a timeout status notification. In subprocess mode the child process is terminated. In thread mode the background thread is not force-stopped; callers see the cached timeout result immediately while the thread continues until it finishes naturally.

  • timeout (default 1800 s) — subprocess monitor kills the child process after this many seconds. Only applies in subprocess mode.

  • The parent does not block waiting for subagents. Completion is delivered via the LOOP_CONTINUE hook, which re-enters the parent’s loop with a notification message.

4. Child transcript and result delivery

Subagents always start with a fresh conversation — they do not inherit the parent’s message history by default. The result/transcript lifecycle:

  • context_turns=N — the parent’s last N turns are prepended to the subagent’s conversation as context.

  • On completion the subagent calls the complete signal tool with a summary; this is queued back to the parent via the LOOP_CONTINUE hook.

  • subagent_read_log(agent_id) — retrieve the full child transcript from the parent after the subagent completes.

  • subagent_status(agent_id) — poll completion/error state without waiting.

Fan-out and Parallel Execution#

Two helpers make it easy to run multiple independent tasks concurrently:

subagent_parallel(tasks, ...)

Fan out N subagents in parallel and block until all complete. Returns results in the same order as the input tasks. Wall-clock time is bounded by the slowest agent, not their sum. Use this for straightforward parallel delegation where the parent needs all results before continuing:

results = subagent_parallel([
    ("researcher", "Research async Python frameworks"),
    ("coder",      "Implement a basic async HTTP client"),
    ("tester",     "Write pytest tests for an async HTTP client"),
])

Key parameters: isolated=True (each agent gets its own git worktree), output_schema (structured output — see below), model, profile, context_turns, workdir.

subagent_batch(tasks, ...)

Non-blocking variant. Launches all subagents and returns a BatchJob object immediately so the parent can continue working while agents run. Call job.wait_all() later to collect results. Useful when the parent has its own work to interleave:

job = subagent_batch([
    ("a", "..."),
    ("b", "..."),
])
# ... parent does other work ...
results = job.wait_all()

Structured Output (output_schema)#

Both subagent_parallel() and subagent_batch() accept an output_schema parameter (a Pydantic model class). When set, each subagent is instructed to return valid JSON matching the schema inside its complete block. Results are automatically parsed and validated — the "result" value in each result dict is the parsed/validated object rather than a raw string:

from pydantic import BaseModel

class AnalysisResult(BaseModel):
    summary: str
    score: int
    issues: list[str]

results = subagent_parallel(
    [("a1", "Analyze module A"), ("a2", "Analyze module B")],
    output_schema=AnalysisResult,
)
for r in results:
    if r["status"] == "success":
        analysis = r["result"]  # already a validated dict
        print(f"Score: {analysis['score']}")

The output_schema parameter is also available on the low-level subagent() call for single-agent structured output.

Token Budget Tracking#

subagent_wait() and BatchJob.wait_all() include token usage in their result dicts:

result = subagent_wait("my-agent")
# result["input_tokens"]  — tokens consumed by the subagent's prompts
# result["output_tokens"] — tokens generated by the subagent

This lets the parent track cumulative cost across a fleet of delegated tasks and gate further spawning when a budget limit is reached.

Read#

Read the contents of one or more files, or list the contents of a directory.

Provides a sandboxed file reading capability that works without shell access. Useful for restricted tool sets (e.g., --tools read,patch,save).

Multiple paths can be passed in the code block (one per line) to read several files in a single tool call, reducing roundtrips when exploring a codebase.

Instructions

Read the content of one or more files, or list the contents of a directory.
Paths can be relative or absolute.
For files, output includes line numbers for easy reference.
For directories, output shows a flat listing of immediate files and subdirectories.

### When to use read

Reading a file directly gives you its exact, current content with line numbers —
eliminating guesswork from memory, file names, or comments. Prefer `read` over
those shortcuts when the file itself is the source of truth.

To read multiple files in a single call, put one path per line in the code block.
Lines beginning with '#' are treated as comments and skipped.
The line-range parameters (start_line, end_line) only apply when reading a single file.

Examples

User
read hello.py
Assistant
read hello.py
System
```hello.py
>    1       print("Hello world")
>    2       print("Goodbye world")
> ```
User
read both source files
Assistant
read
hello.py
goodbye.py
System
```hello.py
>    1       print("Hello world")
> ```
> ```goodbye.py
>    1       print("Goodbye world")
> ```
gptme.tools.read.execute_read(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Read one or more files (or a directory) and return their contents.

Save#

Gives the assistant the ability to save whole files, or append to them.

Instructions

Create or overwrite a file with the given content.

The path can be relative to the current directory, or absolute.
If the current directory changes, the path will be relative to the new directory.

### When to use save vs patch

Use `save` for new files, full rewrites, or edits that touch most of a file.
Use `patch` for targeted edits to existing files; it keeps surrounding content intact.

Examples

User
write a hello world script to hello.py
Assistant
save hello.py
print("Hello world")
System
Saved to `hello.py`
User
make it all-caps
Assistant
save hello.py
print("HELLO WORLD")
System
Saved to `hello.py`

Instructions

Append the given content to a file.

Examples

User
append a print "Hello world" to hello.py
Assistant
append hello.py
print("Hello world")
System
Appended to `hello.py`
gptme.tools.save.check_for_placeholders(content: str) bool

Check if content contains placeholder lines.

gptme.tools.save.execute_append(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Append code to a file.

gptme.tools.save.execute_append_impl(content: str, path: Path | None) Generator[Message, None, None]

Actual append implementation.

gptme.tools.save.execute_save(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Save code to a file.

gptme.tools.save.execute_save_impl(content: str, path: Path | None) Generator[Message, None, None]

Actual save implementation.

gptme.tools.save.preview_append(content: str, path: Path | None) str | None

Prepare preview content for append operation.

gptme.tools.save.preview_save(content: str, path: Path | None) str | None

Prepare preview content for save operation.

Patch#

Gives the LLM agent the ability to patch text files, by using a adapted version git conflict markers.

Environment Variables:
GPTME_PATCH_RECOVERY: If set to “true” or “1”, returns the file content in error messages

when patches don’t match. This helps the assistant recover faster by seeing the actual file contents.

Instructions

To patch/modify files, we use an adapted version of git conflict markers.

Multiple ORIGINAL/UPDATED blocks can make several changes in one patch.
Keep patches small. Scope each change to a function/class. Avoid placeholders
in ORIGINAL blocks; they must match the file exactly or the patch will fail.

### When to use patch vs save

Use `patch` for targeted edits to existing files.
Use `save` for new files, full rewrites, or changes too large for patch markers.

Note: When patching markdown, avoid replacing partial codeblocks (just the opening
or closing backticks). The parser needs complete codeblocks. For simple
codeblock-boundary changes (like a language tag), use `sed` or `perl` instead.

Examples

User
patch `src/hello.py` to ask for the name of the user
src/hello.py
def hello():
    print("Hello world")

if __name__ == "__main__":
    hello()
Assistant
patch src/hello.py
<<<<<<< ORIGINAL
    print("Hello world")
=======
    name = input("What is your name? ")
    print(f"Hello {name}")
>>>>>>> UPDATED
System
Patch applied
class gptme.tools.patch.Patch

Patch(original: str, updated: str)

__init__(original: str, updated: str) None
diff_minimal(strip_context=False) str

Show a minimal diff of the patch. Note that a minimal diff isn’t necessarily a unique diff.

gptme.tools.patch.apply(codeblock: str, content: str) str

Applies multiple patches in codeblock to content. Provides detailed error messages when patches fail.

gptme.tools.patch.execute_patch(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Applies the patch.

gptme.tools.patch.execute_patch_impl(content: str, path: Path | None) Generator[Message, None, None]

Actual patch implementation.

gptme.tools.patch.preview_patch(content: str, path: Path | None) str | None

Prepare preview content for patch operation.

Vision#

Tools for viewing images, giving the assistant vision.

Requires a model which supports vision, such as GPT-4o, Anthropic, and Llama 3.2.

gptme.tools.vision.view_image(image_path: Path | str | Image) Message

View an image. Large images (>1MB) will be automatically scaled down.

Screenshot#

A simple screenshot tool, using screencapture on macOS and scrot or gnome-screenshot on Linux.

gptme.tools.screenshot.screenshot(path: Path | None = None) Path

Take a screenshot and save it to a file.

Browser#

Tools to let the assistant control a browser, including:
  • loading pages

  • reading their contents

  • searching the web

  • taking screenshots (Playwright only)

  • getting ARIA accessibility snapshots (Playwright only)

  • interactive browsing: click, fill forms, scroll (Playwright only)

  • reading PDFs (with page limits and vision fallback hints)

  • converting PDFs to images (using pdftoppm, ImageMagick, or vips)

Two backends are available:

Playwright backend:
  • Full browser automation with screenshots

  • Installation:

    pipx install 'gptme[browser]'
    # We need to use the same version of Playwright as the one installed by gptme
    # when downloading the browser binaries. gptme will attempt this automatically
    PW_VERSION=$(pipx runpip gptme show playwright | grep Version | cut -d' ' -f2)
    pipx run playwright==$PW_VERSION install chromium-headless-shell
    
  • To use Firefox instead of Chromium (useful for pages that block headless Chromium):

    pipx run playwright==$PW_VERSION install firefox
    export GPTME_BROWSER_ENGINE=firefox
    
  • To use an existing Chromium-compatible browser over Chrome DevTools Protocol instead of launching Playwright’s bundled Chromium, start it with remote debugging enabled and set GPTME_BROWSER_CDP_URL:

    chromium --remote-debugging-port=9222
    export GPTME_BROWSER_CDP_URL=http://127.0.0.1:9222
    
Lynx backend:
  • Text-only browser for basic page reading and searching

  • No screenshot support

  • Installation:

    # On Ubuntu
    sudo apt install lynx
    # On macOS
    brew install lynx
    # or any other way that gets you the `lynx` command
    
Provider Native Search:
  • When using Anthropic Claude models, native web search can be enabled

  • This uses Anthropic’s built-in web search instead of web scraping

  • More reliable than Google/DuckDuckGo scraping (which is blocked by bot detection)

  • Configuration:

    export GPTME_ANTHROPIC_WEB_SEARCH=true
    export GPTME_ANTHROPIC_WEB_SEARCH_MAX_USES=5  # Optional, default is 5
    

Note

This is an experimental feature. It needs some work to be more robust and useful.

Instructions

### When to use browser

Use browser to fetch live web content, search the web, interact with pages, or
take screenshots. Prefer browser over memory for current information. For
services with APIs, prefer shell or Python over scraping.

Examples

Reading docs

User
how does gptme work?
Assistant
Let's read the docs.
ipython
read_url('https://gptme.org/docs/')

Answer question from URL with browsing

User
find out which is the latest ActivityWatch version from superuserlabs.org
Assistant
Let's browse the site.
ipython
read_url('https://superuserlabs.org/')
System
`https://superuserlabs.org/
... [ActivityWatch](https://activitywatch.net/) ...
`
Assistant
Couldn't find the answer on the page. Following link to the ActivityWatch website.
ipython
read_url('https://activitywatch.net/')
System
`https://activitywatch.net/
... Download latest version v0.12.2 ...
`
Assistant
The latest version of ActivityWatch is v0.12.2

Searching

User
who is the founder of ActivityWatch?
Assistant
Let's search for that.
ipython
search('ActivityWatch founder')
System
`result
ActivityWatch was founded by Erik Bjäreholt in 2016...
`
Assistant
The founder of ActivityWatch is Erik Bjäreholt.

Searching for latest information

User
what are the latest developments in AI?
Assistant
Let me search for that.
ipython
search('latest developments in AI')
System
`result
Based on recent developments, AI has seen significant advances...
`
Assistant
Based on the search results, here are the latest AI developments...

Take screenshot of page

User
take a screenshot of the ActivityWatch website
Assistant
Certainly! I'll use the browser tool to screenshot the ActivityWatch website.
ipython
screenshot_url('https://activitywatch.net')
System
`result
Screenshot saved to screenshot.png
`

Get ARIA snapshot to see interactive elements

User
what interactive elements are on example.com?
Assistant
Let me get the accessibility snapshot of the page.
ipython
snapshot_url('https://example.com')
System
`result
Page: Example Domain
URL: https://example.com/

- WebArea "Example Domain":
  - heading "Example Domain" [level=1]
  - text "This domain is for use in illustrative examples..."
  - link "More information..."
`
Assistant
The page has a heading "Example Domain", a paragraph with description text, and a link "More information...".

Interactive browsing: open page, click, fill

User
search for gptme on example.com's search form
Assistant
I'll open the page, fill the search form, and click submit.
ipython
open_page('https://example.com')
System
`result
Page: Example
URL: https://example.com/

- WebArea "Example":
  - textbox "Search" [name="q"]
  - button "Go"
`
Assistant
I can see a search box and button. Let me fill in the search and click Go.
ipython
fill_element('input[name="q"]', 'gptme')
System
`result
Page: Example
URL: https://example.com/

- WebArea "Example":
  - textbox "Search" [name="q"]: gptme
  - button "Go"
`
ipython
click_element('text=Go')
System
`result
Page: Search Results
URL: https://example.com/search?q=gptme

- WebArea "Search Results":
  - heading "Results for: gptme"
  - link "gptme on GitHub"
`
Assistant
The search was submitted and the page now shows results for "gptme".

Read full text content of interactive page

User
what does the article say?
Assistant
Let me read the full text content of the current page.
ipython
read_page_text()
System
`result
# Article Title

The article discusses...
`
Assistant
The article covers [summary of content].

Read URL and check browser logs

User
read this page and check if there are any console errors
Assistant
I'll read the page first and then check the browser logs.
ipython
read_url('https://example.com')
System
`https://example.com
This domain is for use in illustrative examples...
`
Assistant
Now let me check the browser console logs:
ipython
read_logs()
System
`result
No logs or errors captured.
`

Read PDF document

User
read this research paper from arxiv
Assistant
I'll read the PDF and extract its text content.
ipython
read_url('https://arxiv.org/pdf/2410.12361v2')
System
`result
--- Page 1 ---
[PDF text content...]

--- Page 2 ---
[More content...]

---
**Note**: This PDF has 42 pages. Showing first 10 pages.
To read more pages, use: `read_url('...', max_pages=N)` where N is the desired count, or 0 for all pages.

**Tip**: If this text extraction seems incomplete or garbled (common with scanned documents, complex layouts, or image-heavy PDFs), try vision-based reading: convert pages to images using a PDF-to-image tool, then use the vision tool to analyze them.
`
Assistant
I've extracted the text from the PDF. The paper discusses [summary of key points]...
gptme.tools.browser.click_element(selector: str) str

Click an element on the current page and return updated ARIA snapshot.

Requires open_page() to be called first.

Parameters:

selector – Playwright selector to find the element. Supports: - CSS: “#submit-btn”, “.nav-link”, “button” - Text: “text=Submit”, “text=Log in” - Role: “role=button[name=’Submit’]” - Chained: “form >> text=Submit”

gptme.tools.browser.close_page() str

Close the current interactive browsing page.

Frees browser resources. A new page can be opened with open_page().

gptme.tools.browser.fill_element(selector: str, value: str) str

Fill a form field on the current page and return updated ARIA snapshot.

Requires open_page() to be called first. Clears any existing value before filling.

Parameters:
  • selector – Playwright selector for the input/textarea element.

  • value – Text to fill into the field.

gptme.tools.browser.get_current_url() str

Return the URL of the currently open browser page.

Useful after a redirect, navigation, or login flow to confirm where the browser ended up.

Returns:

The current URL as a string.

Raises:

RuntimeError – If no page is currently open.

Example:

open_page("https://example.com/login")
fill_element("#username", "alice")
click_element("text=Log in")
url = get_current_url()   # confirm redirect to /dashboard
gptme.tools.browser.has_lynx() bool

Check if lynx is available.

gptme.tools.browser.has_playwright() bool

Check if playwright is available.

gptme.tools.browser.hover_element(selector: str) str

Hover over an element on the current page and return updated ARIA snapshot.

Triggers mouseover/mouseenter events, revealing hover-only content such as dropdown menus, tooltips, and contextual buttons. Use before clicking a menu item that only appears on hover.

Requires open_page() to be called first.

Parameters:

selector – Playwright selector for the element to hover over.

Returns:

Updated ARIA snapshot of the page after the hover.

Example:

open_page("https://example.com")
hover_element("text=Products")   # reveal dropdown
click_element("text=Pricing")    # click item that appeared
gptme.tools.browser.load_browser_state(path: str) str

Load a previously saved browser session (cookies, localStorage) from a file.

In-session complement to save_browser_state(). Restores authentication state without requiring a process restart or setting the GPTME_BROWSER_STORAGE_STATE environment variable.

After calling this, call open_page(url) to start a browser session with the restored cookies and localStorage.

Typical workflow:

# First session — log in and save state:
open_page("https://x.com/login")
fill_element("#username", "you@example.com")
fill_element("#password", "hunter2")
click_element("text=Log in")
save_browser_state("~/.config/gptme/twitter-session.json")

# Later in the same session (or a new one):
load_browser_state("~/.config/gptme/twitter-session.json")
open_page("https://x.com")           # opens already logged in
click_element("text=What is happening?!")
fill_element('[data-testid="tweetTextarea_0"]', "hello from gptme!")
click_element('[data-testid="tweetButtonInline"]')
Parameters:

path – Path to the session JSON previously written by save_browser_state(). ~ is expanded to the home directory.

Returns:

Confirmation string. The next open_page() will use the restored state.

Raises:

FileNotFoundError – If path does not exist.

gptme.tools.browser.open_page(url: str) str

Open a page for interactive browsing. Returns ARIA accessibility snapshot.

Use this instead of read_url() when you need to interact with the page (click buttons, fill forms, scroll). The page stays open for subsequent click_element(), fill_element(), and scroll_page() calls.

The output includes a metadata header with the page title and current URL.

gptme.tools.browser.pdf_to_images(url_or_path: str, output_dir: str | Path | None = None, pages: tuple[int, int] | None = None, dpi: int = 150) list[Path]

Convert PDF pages to images using auto-detected CLI tools.

Auto-detects and uses the first available tool: pdftoppm, ImageMagick convert, or vips.

Parameters:
  • url_or_path – URL or local path to PDF file

  • output_dir – Directory to save images (default: creates temp directory)

  • pages – Optional tuple of (first_page, last_page) to convert (1-indexed). If None, converts all pages.

  • dpi – Resolution for output images (default: 150)

Returns:

List of paths to generated PNG images

Raises:

Example

>>> images = pdf_to_images("https://example.com/doc.pdf")
>>> for img in images:
...     view_image(img)  # Analyze with vision tool
gptme.tools.browser.press_key(key: str) str

Press a keyboard key or shortcut in the current browser page.

Dispatches the key event to the focused element (or document). Use for submitting forms (Enter), navigating menus (ArrowDown), dismissing dialogs (Escape), or triggering shortcuts (e.g. Control+a).

Requires open_page() to be called first.

Parameters:

key – Playwright key name. Examples: "Enter", "Tab", "Escape", "ArrowDown", "Control+a", "Meta+k".

Returns:

Updated ARIA snapshot of the page after the key press.

Example:

open_page("https://example.com/search")
fill_element("[name='q']", "gptme")
press_key("Enter")
gptme.tools.browser.read_logs() str

Read browser console logs from the last read URL.

gptme.tools.browser.read_page_text() str

Read the full text content of the current interactive page as Markdown.

Requires open_page() to be called first. Returns the page body converted to Markdown, preserving text formatting. Useful for reading article text, documentation, or other content after navigating to a page.

Unlike read_url(), this reads from the current interactive session — so it reflects the page state after any clicks, form fills, or navigation.

gptme.tools.browser.read_url(url: str, max_pages: int | None = None) str

Read a webpage or PDF in a text format.

Parameters:
  • url – URL to read

  • max_pages – For PDFs only - maximum pages to read (default: 10). Set to 0 to read all pages. Ignored for web pages.

gptme.tools.browser.save_browser_state(path: str) str

Save the current browser session (cookies, localStorage) to a file.

Captures the full authentication state of the active browser context so it can be restored in a future session via GPTME_BROWSER_STORAGE_STATE.

Call this after logging in to a site with open_page() + fill_element() + click_element() so you don’t have to re-authenticate next time.

Typical workflow:

open_page("https://x.com/login")
fill_element("#username", "you@example.com")
fill_element("#password", "hunter2")
click_element("text=Log in")
save_browser_state("~/.config/gptme/twitter-session.json")
# Next session: export GPTME_BROWSER_STORAGE_STATE=~/.config/gptme/twitter-session.json
Parameters:

path – File path to write the session JSON. Directories are created automatically. ~ is expanded to the home directory.

Returns:

Confirmation string with the absolute path where the state was saved.

gptme.tools.browser.screenshot_url(url: str, path: Path | str | None = None) Path

Take a screenshot of a webpage.

gptme.tools.browser.scroll_page(direction: str = 'down', amount: int = 500) str

Scroll the current page and return updated ARIA snapshot.

Requires open_page() to be called first.

Parameters:
  • direction – “up” or “down” (default: “down”)

  • amount – Pixels to scroll (default: 500)

gptme.tools.browser.search(query: str, engine: Literal['google', 'duckduckgo', 'perplexity'] | None = None) str

Search for a query on a search engine.

If no engine is specified, automatically chooses the best available backend and falls back to the next usable backend on failure.

gptme.tools.browser.search_playwright(query: str, engine: Literal['google', 'duckduckgo', 'perplexity'] = 'google') str

Search for a query on a search engine using Playwright.

gptme.tools.browser.select_option(selector: str, value: str) str

Select an option from a <select> dropdown on the current page.

Requires open_page() to be called first.

Parameters:
  • selector – Playwright selector for the <select> element.

  • value – The option value attribute or visible text to select.

Returns:

Updated ARIA snapshot of the page after the selection.

Example:

open_page("https://example.com/order")
select_option("[name='size']", "large")
click_element("text=Add to cart")
gptme.tools.browser.snapshot_page() str

Get the ARIA accessibility snapshot of the current interactive page.

Returns the structured accessibility tree of the page open via open_page(), reflecting all DOM changes made by subsequent interactions. Use to re-read the current page state without triggering any action.

Returns:

Structured ARIA snapshot including page title and current URL.

Raises:

RuntimeError – If no page is currently open.

Example:

open_page("https://example.com/form")
fill_element("[name='email']", "user@example.com")
state = snapshot_page()   # verify the field was filled before submitting
click_element("text=Submit")
gptme.tools.browser.snapshot_url(url: str) str

Get the ARIA accessibility snapshot of a webpage.

Returns a structured text representation of the page’s accessibility tree, showing interactive elements (buttons, links, inputs) with their roles and names. Useful for understanding page structure and finding elements to interact with.

The output includes a metadata header with the page title and current URL (which may differ from the requested URL after redirects).

gptme.tools.browser.wait_for_element(selector: str, timeout_ms: int = 5000) str

Wait for a DOM element to become visible on the current page.

Blocks until the element matching selector is visible, then returns the updated ARIA snapshot. Use after actions that trigger dynamic content loading (modals, async renders, redirects).

Requires open_page() to be called first.

Parameters:
  • selector – Playwright selector for the element to wait for.

  • timeout_ms – Maximum wait time in milliseconds (default: 5000).

Returns:

Updated ARIA snapshot once the element is visible.

Example:

open_page("https://x.com/compose/tweet")
wait_for_element("[data-testid='tweetTextarea_0']", timeout_ms=8000)
fill_element("[data-testid='tweetTextarea_0']", "Hello from gptme!")
click_element("[data-testid='tweetButtonInline']")

Browser FAQ#

Does the browser tool bypass CAPTCHAs?

No. The Playwright backend is a real browser engine (headless Chromium or Firefox), so it behaves the same as any headless browser — some CAPTCHAs will block it. gptme does not currently expose a headed-mode toggle for the built-in Playwright launcher. To improve success on sites that detect headless Chromium, try Firefox:

pipx run playwright==$PW_VERSION install firefox
export GPTME_BROWSER_ENGINE=firefox

You can also connect to an existing Chromium-compatible browser over Chrome DevTools Protocol:

chromium --remote-debugging-port=9222
export GPTME_BROWSER_CDP_URL=http://127.0.0.1:9222

Can I use a full GUI browser with extensions?

Yes — via the How to Automate GUIs with Computer Use Docker image, which runs a real Chromium browser inside a VNC-accessible desktop. Extensions, GUI interaction, and anything that needs a visible browser window all work there. See the Computer tool and How to Automate GUIs with Computer Use for setup details.

Can I run the browser tool inside Docker?

The standard Playwright backend works in Docker (headless mode, no display required). For headed/GUI mode inside Docker, use the computer-use Docker image which bundles a VNC server and a full desktop environment. See How to Automate GUIs with Computer Use for details.

The page is blocking my scrape — what should I try?

In order:

  1. Switch backends: GPTME_BROWSER_ENGINE=firefox (different fingerprint than Chromium)

  2. Connect to an existing Chromium browser: GPTME_BROWSER_CDP_URL=http://127.0.0.1:9222

  3. Use Anthropic native search (Claude models only): GPTME_ANTHROPIC_WEB_SEARCH=true

  4. Use the Computer tool with the VNC Docker image for full GUI browser control

Chats#

List, search, and summarize past conversation logs.

Instructions

### When to use chats

Use chats when the user asks about or wants to reference a past conversation:
- "remember when we discussed X?" → search_chats('X')
- "find our earlier chat about Y" → search_chats('Y')
- "what did we say about Z last week?" → search_chats('Z')
- Listing recent sessions to give the user an overview → list_chats()
- Reading a specific prior conversation by ID → read_chat(id)

Do **not** use chats for:
- The current conversation — its content is already in the context window.
- Searching files or code — use the shell or read tool instead.
- Web or documentation search — use the browser tool.

Examples

Search for a specific topic in past conversations

User
Can you find any mentions of "python" in our past conversations?
Assistant
Certainly! I'll search our past conversations for mentions of "python" using the search_chats function.
chats
search_chats('python')
gptme.tools.chats.conversation_stats(since: str | None = None, as_json: bool = False, conversation_id: str | None = None) None

Show statistics about conversation history.

Parameters:
  • since – Only include conversations since this date (YYYY-MM-DD or Nd).

  • as_json – Output as JSON instead of formatted text.

  • conversation_id – Optional conversation ID to inspect in detail.

gptme.tools.chats.find_empty_conversations(max_messages: int = 1, include_test: bool = False) list[dict]

Find conversations with few or no messages.

Scans all conversations and returns those with at most max_messages messages. Useful for cleaning up abandoned or empty conversation logs.

Parameters:
  • max_messages – Maximum message count to consider “empty” (default: 1, system-only).

  • include_test – Whether to include test/eval conversations.

Returns:

List of dicts with conversation metadata and disk size.

gptme.tools.chats.list_chats(max_results: int = 5, metadata=False, include_summary: bool = False) None

List recent chat conversations and optionally summarize them using an LLM.

Parameters:
  • max_results (int) – Maximum number of conversations to display.

  • include_summary (bool) – Whether to include a summary of each conversation. If True, uses an LLM to generate a comprehensive summary. If False, uses a simple strategy showing snippets of the first and last messages.

gptme.tools.chats.read_chat(id: str, max_results: int = 5, incl_system: bool = False, context_messages: int = 0, start_message: int | None = None) None

Read a specific conversation log.

Parameters:
  • id (str) – The id of the conversation to read.

  • max_results (int) – Maximum number of messages to display.

  • incl_system (bool) – Whether to include system messages.

  • context_messages (int) – Number of messages to show before start_message.

  • start_message (int | None) – Start from this message number (1-indexed), if specified.

gptme.tools.chats.search_chats(query: str, max_results: int = 5, system=False, sort: Literal['date', 'count'] = 'date', context_lines: int = 1, max_matches: int = 1) None

Search past conversation logs for the given query and print a summary of the results.

Parameters:
  • query (str) – The search query.

  • max_results (int) – Maximum number of conversations to display.

  • system (bool) – Whether to include system messages in the search.

  • context_lines (int) – Number of lines to show around each match.

  • max_matches (int) – Maximum number of matches to show per conversation.

gptme.tools.chats.search_external_chats(query: str, max_results: int = 5, include_cursor: bool = True, include_codex: bool = True, cursor_dir: Path | None = None, codex_dir: Path | None = None) None

Search external agent sessions (Cursor, Codex) for query.

Prints matching snippets grouped by source agent, with a [Cursor] or [Codex] label so results are clearly distinguished from gptme native results.

Parameters:
  • query – The search query (case-insensitive substring match).

  • max_results – Maximum number of sessions to display.

  • include_cursor – Whether to search Cursor sessions.

  • include_codex – Whether to search Codex sessions.

  • cursor_dir – Override the default Cursor conversations directory.

  • codex_dir – Override the default Codex sessions directory.

Computer#

Warning

The computer use interface is experimental and has serious security implications. Please use with caution and see Anthropic’s documentation on computer use for additional guidance.

See How to Automate GUIs with Computer Use for practical recipes: prerequisites, backend selection, web vs. native automation, and the observe-act-verify loop.

Tool for computer interaction for X11 or macOS environments, including screen capture, keyboard, and mouse control.

The computer tool provides direct interaction with the desktop environment. Similar to Anthropic’s computer use demo, but integrated with gptme’s architecture.

Features

  • Keyboard input simulation

  • Mouse control (movement, clicks, dragging)

  • Screen capture with automatic scaling

  • Cursor position tracking

Installation

On Linux, requires X11 and xdotool:

# On Debian/Ubuntu
sudo apt install xdotool

# On Arch Linux
sudo pacman -S xdotool

On macOS, uses native screencapture and external tool cliclick:

brew install cliclick

You need to give your terminal both screen recording and accessibility permissions in System Preferences.

Configuration

The tool uses these environment variables:

  • DISPLAY: X11 display to use (default: “:1”, Linux only)

  • WIDTH: Screen width (default: 1024)

  • HEIGHT: Screen height (default: 768)

  • GPTME_COMPUTER_CONFIRM_SENSITIVE: Pre-execution gate for sensitive actions (type, key, left_click_drag, fill_element). Values: - unset / “0”: gate disabled (default, back-compatible) - “1”: gate enabled; interactive sessions prompt the user, non-interactive sessions block - “auto-allow”: gate enabled but approves silently (useful in automated scripts)

Usage

The tool supports these actions:

Keyboard:
  • key: Send key sequence (e.g., “Return”, “Control_L+c”)

  • type: Type text with realistic delays

Mouse:
  • mouse_move: Move mouse to coordinates

  • left_click: Click left mouse button

  • right_click: Click right mouse button

  • middle_click: Click middle mouse button

  • double_click: Double click left mouse button

  • left_click_drag: Click and drag to coordinates

Screen:
  • screenshot: Take and view a screenshot

  • cursor_position: Get current mouse position

  • wait_for_change: Poll until screen changes, then return one screenshot

Window management:
  • window_focus: Wait for a window matching a name pattern to appear and focus it

Accessibility (cross-platform):
  • accessibility_tree: Dump the native accessibility tree for all visible apps. On Linux uses AT-SPI2 (role names like “push button”, “entry”). On macOS uses System Events via AppleScript (role names like “AXButton”, “AXTextField”).

  • click_accessible_element: Find and click an element by role and name (text=’role:name’). Linux example: text=’push button:Submit’ macOS example: text=’AXButton:Submit’

The tool automatically handles screen resolution scaling to ensure optimal performance with LLM vision capabilities.

Tips for Complex Operations

For complex operations involving multiple keypresses, you can use semicolon-separated sequences with key:

Examples

  • Filling a login form: t:username;kp:tab;t:password;kp:return

  • Switching applications: cmd+tab on macOS, alt+Tab on Linux

  • (macOS) Opening Spotlight and searching: cmd+space;t:firefox;return

Using a single sequence for complex operations ensures proper timing and recognition of keyboard shortcuts.

Instructions

You can interact with the computer through the `computer` Python function.
Works on both Linux (X11) and macOS.

### When to use the computer tool

Use computer for GUI interactions that cannot be done through the shell: clicking
elements in running applications, typing into GUI windows, taking screenshots to
verify visual state, and keyboard shortcuts in desktop apps. Prefer the shell or
tmux over computer for anything that has a CLI equivalent. Use computer when the
task requires direct screen interaction — for example, operating a browser UI,
a desktop app, or an interactive installer that has no headless mode.

The key input syntax works consistently across platforms with:

Available actions:
- key: Send key sequence using a unified syntax:
  - Type text: "t:Hello World"
  - Press key: "return", "esc", "tab"
  - Key combination: "ctrl+c", "cmd+space"
  - Chain commands: "cmd+space;t:firefox;return"
- type: Type text with realistic delays (legacy method)
- mouse_move: Move mouse to coordinates
- left_click, right_click, middle_click, double_click: Mouse clicks
- left_click_drag: Click and drag to coordinates
- scroll: Scroll the mouse wheel at coordinates (text="up"/"down"/"left"/"right")
- screenshot: Take and view a screenshot
- cursor_position: Get current mouse position
- wait_for_change: Wait until the screen changes, then return a single screenshot.
  Loops internally until ≥1% of pixels differ from the initial capture, or the
  timeout (text="<seconds>", default 10) elapses. Returns one screenshot regardless
  of how many internal polls were needed — avoids stacking redundant screenshots in
  the conversation context. Use after triggering an action that produces a visual
  response (page load, dialog open, animation finish).
- window_focus: Wait for a window whose title contains text=<pattern> to appear,
  then focus it. On Linux/X11 this uses xdotool --sync so no screenshot polling
  is needed. Use after opening a new application to avoid guessing where to click.
- accessibility_tree: Dump the native accessibility tree for all open applications.
  On Linux (AT-SPI2): role names like 'push button', 'entry', 'check box'.
    Requires: pip install pyatspi (and AT-SPI2 accessibility stack).
  On macOS (System Events): role names like 'AXButton', 'AXTextField', 'AXCheckBox'.
    Requires Accessibility permission for the terminal in System Preferences.
  Use this to discover element names and roles before using click_accessible_element.
- click_accessible_element: Find and click an element by role and name without
  needing screen coordinates. Use text='role:name' where role is the platform role
  name and name is a substring of the element's accessible name. Examples:
    Linux:  computer('click_accessible_element', text='push button:Submit')
    macOS:  computer('click_accessible_element', text='AXButton:Submit')

### Accessibility-first for native apps

Prefer click_accessible_element over coordinate-based clicks for native apps:

  computer("accessibility_tree")                               # inspect available elements
  # Linux:
  computer("click_accessible_element", text="entry:Username")  # fill username field
  computer("type", text="user@example.com")
  computer("click_accessible_element", text="push button:Log In")
  # macOS:
  computer("click_accessible_element", text="AXTextField:Username")
  computer("type", text="user@example.com")
  computer("click_accessible_element", text="AXButton:Log In")

This is more robust than coordinate guessing: element names don't shift when
window size or position changes. Use coordinate-based clicks only when the app
lacks accessibility support (e.g. electron apps, games, canvas-based UIs).

### Efficient action-verify loops

Prefer ``act_and_observe()`` over separate ``computer()`` + ``wait_for_change``:

  act_and_observe("left_click", coordinate=(760, 540))  # trigger action, see result

This combines the action and observation into one call, preventing the conversation
from accumulating multiple nearly-identical screenshots during transitions.
Only call ``screenshot()`` directly when you need the current state without waiting.

### Opening new windows without guessing their position

Prefer window_focus over clicking at a guessed coordinate after launching a window:

  computer("key", text="ctrl+alt+t")         # open terminal
  computer("window_focus", text="Terminal")   # wait for it, then focus it
  computer("type", text="echo hello")         # type into the now-focused window

This avoids the delay/click-at-random pattern that fails when window position
varies across sessions or virtual displays.

Note: Key names are automatically mapped between platforms.
Common modifiers (ctrl, alt, cmd/super, shift) work consistently across platforms.

### Observation helpers (structured-first policy)

Three higher-level helpers are available that implement the structured-first observation policy:

- ``observe_web(url, screenshot_too=False)`` — observe a web page using ARIA snapshots first
  (no vision tokens), with automatic fallback to a browser screenshot, then desktop screenshot.
  Pass ``screenshot_too=True`` to get both an ARIA snapshot AND a screenshot side by side.
- ``observe_desktop()`` — thin wrapper around ``computer('screenshot')`` that signals intent
  clearly for native apps and non-browser surfaces.
- ``act_and_observe(action, text=None, coordinate=None, timeout=3.0)`` — perform a desktop
  action **and** automatically observe the result. Combines ``computer(action, ...)`` with
  ``wait_for_change`` in one call — the complete "act then look" loop without separate
  screenshot calls. Use this for tight interaction loops where you want to see the screen
  after every click, keypress, or scroll.
- ``computer_task(task, timeout=300, model=None)`` — run a multi-step computer-use task
  in a **context-isolated subagent** and block until done. All screenshots and intermediate
  steps are kept inside the subagent's own context — the caller's context stays lean. Use
  this for long, multi-step automations (filling forms, navigating multi-page flows, running
  GUI apps) where piling dozens of screenshots into the current context would be wasteful.
  Returns a status dict with ``status`` and ``result`` keys.

These helpers are preferred over calling ``computer("screenshot")`` directly when observing
web pages, because ARIA snapshots avoid costly vision tokens and give a DOM-addressable tree.

Examples

User
Take a screenshot of the desktop
Assistant
I'll capture the screen using the screenshot tool.
ipython
computer("screenshot")
System
Viewing image...
User
Type "Hello, World!" into the active window
Assistant
I'll type the text with realistic delays.
ipython
computer("type", text="Hello, World!")
System
Typed text: Hello, World!
User
Move the mouse to coordinates (100, 200) and click
Assistant
I'll move the mouse and perform a left click.
ipython
computer("mouse_move", coordinate=(100, 200))
System
Moved mouse to 100,200
ipython
computer("left_click")
System
Performed left_click
User
Get the current mouse position
Assistant
I'll get the cursor position.
ipython
computer("cursor_position")
System
Cursor position: X=512,Y=384
User
Double-click at current position
Assistant
I'll perform a double-click.
ipython
computer("double_click")
System
Performed double_click
User
Scroll down in the page at (512, 400)
Assistant
I'll scroll down at those coordinates.
ipython
computer("scroll", coordinate=(512, 400), text="down")
System
Scrolled down at 512,400
User
Click the Submit button then wait for the result page to load
Assistant
I'll use act_and_observe to click Submit and automatically get a screenshot once the screen settles.
ipython
act_and_observe("left_click", coordinate=(760, 540))
System
Screen changed (23.4% pixels differ)
Viewing image...
User
Open a terminal and run a command
Assistant
I'll open a terminal with a keyboard shortcut, then use act_and_observe for window_focus so the shell prompt has time to appear before I type.
ipython
computer("key", text="ctrl+alt+t")
System
Sent key sequence: ctrl+alt+t
ipython
act_and_observe("window_focus", text="Terminal")
System
Screen changed (18.7% pixels differ)
Viewing image...
ipython
act_and_observe("type", text="ls -la" + chr(10))
System
Screen changed (12.3% pixels differ)
Viewing image...
User
Read the content of https://news.ycombinator.com
Assistant
I'll use observe_web to get a structured ARIA snapshot of the page — faster and cheaper than a screenshot.
ipython
observe_web("https://news.ycombinator.com")
System
[ARIA snapshot of Hacker News front page...]
User
Check what's on my desktop right now
Assistant
I'll capture a screenshot of the desktop.
ipython
observe_desktop()
System
Viewing image...
User
Navigate to https://example.com and verify both the text content and visual layout
Assistant
I'll use observe_web with screenshot_too=True to get both the ARIA snapshot and a screenshot.
ipython
observe_web("https://example.com", screenshot_too=True)
System
[ARIA snapshot + screenshot of example.com]
User
Open Firefox, go to https://x.com/compose/tweet, type "Hello from gptme!" and submit it — without filling up my context with screenshots
Assistant
I'll delegate this to computer_task() so all the intermediate screenshots stay in a subagent context rather than here.
ipython
computer_task("Open Firefox, navigate to https://x.com/compose/tweet, type 'Hello from gptme!', and click the Tweet button.", timeout=120)
System
{"status": "success", "result": "Tweet submitted successfully. Firefox opened, x.com/compose/tweet loaded, typed the message, clicked Tweet. Confirmed tweet posted.", "agent_id": "computer-task-a1b2c3d4"}
User
Open a new browser tab
Assistant
I'll open a new browser tab.
ipython
computer("key", text="ctrl+t")
System
Sent key sequence: ctrl+t
class gptme.tools.computer.ComboOperation
class gptme.tools.computer.KeyOperation
class gptme.tools.computer.ScreenRecording

Handle for an in-progress screen recording.

Returned by start_recording(). Call .stop() to finish the recording and get the output path. Also usable as a context manager:

with start_recording("session.mp4") as rec:
    # ... do things on screen ...
    pass  # recording stops here
print(rec.output_path)  # path to the MP4
output_path

Destination file path (set at construction time).

__init__(process: Popen, output_path: Path, stderr: IO[bytes] | None = None) None
stop() Path

Stop the recording. Safe to call more than once.

Returns:

Path to the completed video file.

class gptme.tools.computer.TextOperation
gptme.tools.computer.act_and_observe(action: Action, text: str | None = None, coordinate: tuple[int, int] | None = None, timeout: float = 3.0, settle_time: float = 0.2) list[Message]

Perform a desktop action then automatically observe the result.

Implements the “act → look” half of the computer-use loop in one call, eliminating the separate computer('wait_for_change') step after every interaction. The screen is polled until it settles (up to timeout seconds), then a single screenshot is returned — exactly the same behaviour as computer('wait_for_change') but wired directly after the requested action.

For observation-only actions ("screenshot", "cursor_position", "accessibility_tree", "wait_for_change") the call is passed through unchanged: no extra screenshot is appended.

Parameters:
  • action – Desktop action to perform — same values as computer().

  • text – Text to type or key sequence (forwarded to computer()).

  • coordinate – Mouse coordinates (forwarded to computer()).

  • timeout – Seconds to wait for a screen change after the action (default 3 s).

  • settle_time – After detecting the first screen change, keep polling until the screen stops changing for settle_time consecutive seconds (default 0.2 s). This handles multi-phase UI transitions — e.g. a terminal frame appearing first and the shell prompt rendering shortly after — so the returned screenshot always shows the final settled state rather than a transient intermediate frame. Set to 0.0 to get the original behaviour (return on first change).

Returns:

  • For state-changing actions: zero or one action-output message (if the action itself produces output) plus a screenshot of the settled screen after the change.

  • For observation-only actions: just the output of computer().

Return type:

List of Message objects

Example (from IPython in a computer-use session):

# Click a button and see the screen update — one call, no polling
msgs = act_and_observe("left_click", coordinate=(760, 540))

# Type text and immediately verify what appeared
msgs = act_and_observe("type", text="hello world")

# Open a terminal and wait for the shell prompt (multi-phase transition)
# act_and_observe uses settle_time=0.2 by default: frame appears first,
# then the shell prompt, then 0.2s of quiet → returned screenshot shows prompt
msgs = act_and_observe("window_focus", text="Terminal")

# Observation-only actions are passed through unchanged
msgs = act_and_observe("screenshot")  # same as [computer("screenshot")]
gptme.tools.computer.computer(action: Action, text: str | None = None, coordinate: tuple[int, int] | None = None) Message | None

Perform computer interactions in X11 or macOS environments.

Parameters:
  • action – The type of action to perform

  • text – Text to type or key sequence to send

  • coordinate – X,Y coordinates for mouse actions

gptme.tools.computer.computer_task(task: str, timeout: int = 300, model: str | None = None) dict

Run a computer-use task in a context-isolated subagent.

Spawns a child agent with the computer-use profile and blocks until it completes (or times out). All screenshots and intermediate steps stay inside the subagent’s own context, so the caller’s context remains lean — this is the “context-efficient tool-use loop until goal is achieved” pattern described in gptme/gptme#216.

Use this instead of issuing a long chain of computer() + act_and_observe() calls directly when the task has many steps, or when you don’t want dozens of screenshots piling up in the current context.

Parameters:
  • task – Natural-language description of what to accomplish.

  • timeout – Maximum seconds to wait before giving up (default 300 = 5 min).

  • model – Optional model override for the subagent.

Returns:

Status mapping with keys:

  • status: "success" / "failure" / "clarification_needed" / "timeout"

  • result: text summary from the subagent

  • agent_id: subagent identifier — pass to subagent_read_log() for the full transcript

  • conversation: conversation name for the audit CLI (gptme-util computer audit-log CONVERSATION)

  • logdir: absolute path to the subagent’s conversation directory (str)

"clarification_needed" is returned if the subagent needs more information before it can complete the task. "timeout" is returned when the wall-clock deadline is reached before the subagent finishes. The worker thread may still wind down in the background, but callers immediately see the terminal timeout result.

Return type:

dict

Example (from IPython in a gptme session):

# Compose a tweet without piling screenshots into this context
result = computer_task(
    "Open Firefox, navigate to https://x.com/compose/tweet, "
    "type 'Hello from gptme!', and click Tweet.",
    timeout=120,
)
print(result["status"], result["result"])

# Audit what the subagent actually did (computer-use actions only)
import subprocess
subprocess.run(["gptme-util", "computer", "audit-log", result["conversation"]])

# Read the full step-by-step transcript
from gptme.tools.subagent import subagent_read_log
print(subagent_read_log(result["agent_id"]))
gptme.tools.computer.observe_desktop() Message | None

Observe the current desktop state via screenshot.

Thin wrapper around computer('screenshot') that makes the structured-first / screenshot-fallback policy explicit: call this when there is no URL to snapshot (native apps, the raw desktop, or any non-browser surface).

Returns:

Screenshot Message, or None if capture failed.

Example (from IPython in a computer-use session):

msg = observe_desktop()
# Equivalent to computer("screenshot"), but signals intent clearly.
gptme.tools.computer.observe_web(url: str, screenshot_too: bool = False) list[Message]

Observe a web page: structured ARIA snapshot first, screenshot as fallback.

Implements the structured-first observation policy: prefer accessibility snapshots for web targets — they avoid vision-token cost and give a DOM-addressable tree. Use screenshot_too=True when you need pixel-level visual confirmation alongside the structured snapshot (e.g. to verify layout or canvas content).

Falls back to a browser screenshot, then to a desktop screenshot, if Playwright is not available.

Parameters:
  • url – Page URL to observe.

  • screenshot_too – If True, also take a screenshot even when a snapshot succeeded.

Returns:

List of Message objects (snapshot and/or screenshots). Always returns at least one message; if all observation paths fail, returns a single system message explaining what failed and how to fix it.

Example (from IPython in a computer-use session):

msgs = observe_web("https://news.ycombinator.com")
# Returns one Message containing the ARIA snapshot text.

msgs = observe_web("https://example.com", screenshot_too=True)
# Returns snapshot Message + screenshot Message side-by-side.
gptme.tools.computer.record_screen(output: str | Path | None = None, duration: float = 10.0, fps: int = 10, display: str | None = None) Path

Record the screen for a fixed duration and return the output path.

Synchronous wrapper around start_recording() / ScreenRecording.stop(). Blocks for duration seconds.

Parameters:
  • output – Destination path for the MP4 file. Defaults to a timestamped file in the system temp directory.

  • duration – How many seconds to record (default 10).

  • fps – Frames per second (default 10).

  • display – X11 display string (Linux only). Defaults to $DISPLAY.

Returns:

Path to the finished MP4 file.

Raises:

RuntimeError – If ffmpeg is not found or recording fails to start.

Example (from IPython in a computer-use session):

path = record_screen("tweet-demo.mp4", duration=30)
print(f"Recording saved to {path}")
gptme.tools.computer.start_recording(output: str | Path | None = None, fps: int = 10, display: str | None = None) ScreenRecording

Start recording the screen to an MP4 file.

Uses ffmpeg with x11grab (Linux) or avfoundation (macOS). Returns a ScreenRecording handle — call .stop() to finish or use it as a context manager.

Parameters:
  • output – Destination path for the MP4 file. Defaults to a timestamped file in the system temp directory.

  • fps – Frames per second (default 10 — suitable for UI demos; increase to 24+ for smooth game recordings).

  • display – X11 display string (Linux only). Defaults to $DISPLAY.

Returns:

ScreenRecording handle. Call .stop() when done.

Raises:

RuntimeError – If ffmpeg is not found or recording fails to start.

Example (from IPython in a computer-use session):

rec = start_recording("tweet-demo.mp4")
# ... interact with the browser ...
rec.stop()  # saves tweet-demo.mp4

# Or as a context manager:
with start_recording("demo.mp4") as rec:
    computer_task("open Firefox and navigate to https://example.com")
print(rec.output_path)

RAG#

RAG (Retrieval-Augmented Generation) tool for context-aware assistance.

The RAG tool provides context-aware assistance by indexing and semantically searching text files.

Installation

The RAG tool requires the gptme-rag CLI to be installed:

pipx install gptme-rag

Configuration

Configure RAG in your gptme.toml:

[rag]
enabled = true
post_process = false # Whether to post-process the context with an LLM to extract the most relevant information
post_process_model = "openai/gpt-4o-mini" # Which model to use for post-processing
post_process_prompt = "" # Optional prompt to use for post-processing (overrides default prompt)
workspace_only = true # Whether to only search in the workspace directory, or the whole RAG index
paths = [] # List of paths to include in the RAG index. Has no effect if workspace_only is true.

Features

  1. Manual Search and Indexing

    • Index project documentation with rag_index

    • Search indexed documents with rag_search

    • Check index status with rag_status

  2. Conversation Indexing

    • Index past gptme conversations with rag_index_conversations

    • Only indexes user and assistant messages (skips system prompts)

    • Enables semantic search across your conversation history

  3. Automatic Context Enhancement

    • Retrieves semantically similar documents

    • Preserves conversation flow with hidden context messages

Instructions

### When to use RAG

Use RAG for semantic search across indexed documents when you do not know the
exact file location or keyword. Prefer `shell` with grep/ripgrep for exact
string or pattern matching. Use `read` when you already know the file path.
Index first with `rag_index`, then search with `rag_search`.

Examples

User
Index the current directory
Assistant
Let me index the current directory with RAG.
ipython
rag_index()
System
Indexed 1 paths
User
Search for documentation about functions
Assistant
I'll search for function-related documentation.
ipython
rag_search("function documentation")
System
### docs/api.md
Functions are documented using docstrings...
User
Show index status
Assistant
I'll check the current status of the RAG index.
ipython
rag_status()
System
Index contains 42 documents
User
Index my past conversations so I can search them
Assistant
I'll index your recent conversations with RAG.
ipython
rag_index_conversations()
System
Indexed 47 conversations.
Indexed 47 paths
User
Index only the last 10 conversations
Assistant
I'll index just the 10 most recent conversations.
ipython
rag_index_conversations(n=10)
System
Indexed 10 conversations.
Indexed 10 paths
gptme.tools.rag.get_rag_context(query: str, rag_config: RagConfig, workspace: Path | None = None) Message

Get relevant context chunks from RAG for the user query.

gptme.tools.rag.init() ToolSpec

Initialize the RAG tool.

gptme.tools.rag.rag_index(*paths: str, glob: str | None = None) str

Index documents in specified paths.

gptme.tools.rag.rag_index_conversations(n: int = 100, output_dir: str | None = None) str

Index past gptme conversations for semantic search.

Exports user and assistant messages from conversation logs (skipping system prompts) into text files and indexes them with gptme-rag.

Parameters:
  • n – Maximum number of recent conversations to index (default: 100).

  • output_dir – Directory to write exported conversation files. Defaults to a temporary directory managed by gptme-rag.

Returns:

Status message from the indexing operation.

gptme.tools.rag.rag_search(query: str, return_full: bool = False, top_k: int | None = None) str

Search indexed documents.

gptme.tools.rag.rag_status() str

Show index status.

Morph#

Gives the LLM agent the ability to edit files using Morph Fast Apply v2.

Morph is a specialized code-patching LLM that applies edits at 4000+ tokens per second. It uses a different format than the patch tool: <code>original</code><update>changes</update>

Environment Variables:

OPENROUTER_API_KEY: Required for accessing Morph via OpenRouter

Instructions

Use this tool to propose an edit to an existing file.

### When to use morph vs patch

Use morph for large or complex edits where the changed lines are scattered
across a file and patch context markers would be verbose. Prefer patch for
small, targeted edits with clear context. Morph requires OPENROUTER_API_KEY.

Write a clear edit while minimizing unchanged code.
List each edit in sequence, using `// ... existing code ...` for untouched spans.
Repeat only enough original context to disambiguate the change.
If you delete a section, include surrounding context to show what is removed.

Examples

```morph example.py
// ... existing code ...
FIRST_EDIT
// ... existing code ...
SECOND_EDIT
// ... existing code ...
THIRD_EDIT
// ... existing code ...
```
gptme.tools.morph.execute_morph(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Applies the morph edit.

gptme.tools.morph.execute_morph_impl(content: str, path: Path | None, expected_original_content: str) Generator[Message, None, None]

Actual morph implementation - writes the edited content to file.

gptme.tools.morph.is_openrouter_available() bool

Check if OpenRouter is available for Morph tool.

gptme.tools.morph.preview_morph(content: str, path: Path | None) str | None

Prepare preview content for morph operation.

GH#

GitHub integration tool.

Use native handlers only when they help the assistant succeed more reliably than a raw gh command in the shell tool. The native path is worth it when it collapses several API calls into one response, keeps CI state structured and actionable, or adds merge safety guards that are easy to miss in ad-hoc CLI use.

Native operations that materially help:

  • issue view — combines issue body and comments in one call

  • pr view — combines PR body, comments, review-thread resolution, CI,

    and mergeability in one call

  • pr status — structured check-run summary with actionable run IDs

  • pr checks — polls CI until completion with live progress updates

  • pr merge — squash default, --match-head-commit guard, auto-merge

  • run view — extracts and structures failed log sections from CI runs

Adding a new native wrapper#

Before wrapping a gh subcommand, ask: “Will this help the assistant do better than a single gh command in the shell tool?” If not, don’t add it — the pass-through already covers it without bloating instructions.

Good candidates combine multiple API calls into one response, add safety guards, or poll/wait for completion.

gptme.tools.gh.execute_gh(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Execute GitHub operations.

Native handlers for high-value operations (issue view, pr view/status/checks/merge, run view). Everything else passes through to the gh CLI unchanged.

Choice#

Gives the assistant the ability to present multiple-choice options to the user for selection.

Instructions

The options can be provided as a question on the first line and each option on a separate line.
When using the ``options`` keyword argument, options may also be comma-separated.

The tool will present an interactive menu allowing the user to select an option using arrow keys and Enter, or by typing the number of the option.

### When to use choice

Use when you need to present the user with a discrete set of named alternatives and free-text input would be ambiguous. Don't use for simple yes/no confirmations; don't use when the next step is already clear from context.

Examples

Basic usage with options

User
What should we do next?
Assistant
Let me present you with some options:
choice
What would you like to do next?
Write documentation
Fix bugs
Add new features
Run tests
System
User selected: Add new features
User
What should we do next?
Assistant
Let me present you with some options:
choice
Example question?
1. Option one
2. Option two
System
User selected: Option two
gptme.tools.choice.execute_choice(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Present multiple-choice options to the user and return their selection.

gptme.tools.choice.parse_options_from_content(content: str) tuple[str | None, list[str]]

Parse options from content, returning (question, options).

gptme.tools.choice.parse_options_from_kwargs(kwargs: dict[str, str]) tuple[str | None, list[str]]

Parse options from args and kwargs, returning (question, options).

Elicit#

Gives the assistant the ability to request structured input from the user.

Elicitation supports multiple input types: - text: Free-form text input - choice: Single selection from options - multi_choice: Multiple selections from options - secret: Hidden input (API keys, passwords) with UI redaction - confirmation: Yes/No question - form: Multiple fields collected at once

Secret values are handled specially: the value is hidden from the chat display (hide=True) so it does not appear on screen, but it is passed to the LLM in-context so the agent can act on it (e.g. set it as an env var). The value is stored in the on-disk conversation log.

Instructions

### When to use elicit

Use elicit when you need structured user input that a plain text reply cannot
cleanly provide:
- **secret** — API keys, passwords, tokens. The value is hidden from the chat
  display so it does not appear over someone's shoulder; the LLM still receives
  it in-context so it can act on it (e.g. export as an env var or pass to a
  command). The conversation log on disk will contain the value.
- **choice / multi_choice** — present a fixed set of options so the user
  selects rather than types a free-form answer that you then have to parse.
- **confirmation** — ask yes/no before a destructive or irreversible action.
- **form** — collect several related fields in one interaction instead of a
  back-and-forth sequence.

Do **not** use elicit for simple open-ended questions that read naturally in
chat — a plain assistant message is clearer and less disruptive in those cases.

### Input types

- text: Free-form text input
- choice: Single selection from a list (specify options)
- multi_choice: Multiple selections from a list (specify options)
- secret: Hidden from display; LLM receives the value in-context to act on it
- confirmation: Yes/No question
- form: Multiple fields at once (specify JSON field definitions)

Examples

Ask for a secret API key

User
Set up the OpenAI integration
Assistant
I need your OpenAI API key to proceed. It will be hidden from normal chat display.
elicit
{
  "type": "secret",
  "prompt": "Enter your OpenAI API key:",
  "description": "Required for the OpenAI integration. Hidden from normal chat display."
}
System
User provided secret value (not shown)

Ask user to choose an option

User
Which database should we use?
Assistant
Let me ask the user their preference.
elicit
{
  "type": "choice",
  "prompt": "Which database should we use?",
  "options": [
    "PostgreSQL",
    "SQLite",
    "MySQL",
    "MongoDB"
  ]
}
System
User selected: PostgreSQL

Collect project setup information via form

User
Set up a new project
Assistant
Let me gather some details about the project.
elicit
{
  "type": "form",
  "prompt": "New project setup:",
  "fields": [
    {
      "name": "name",
      "prompt": "Project name?",
      "type": "text"
    },
    {
      "name": "language",
      "prompt": "Primary language?",
      "type": "choice",
      "options": [
        "python",
        "typescript",
        "rust"
      ]
    },
    {
      "name": "tests",
      "prompt": "Include tests?",
      "type": "boolean"
    }
  ]
}
System
Form submitted: {"name": "my-project", "language": "python", "tests": true}
gptme.tools.elicit.execute_elicit(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Execute elicitation and return user’s response.

For secret types, the value is returned to the agent with UI redaction.

gptme.tools.elicit.parse_elicitation_spec(code: str) ElicitationRequest | None

Parse an elicitation spec from JSON.

Form#

Gives the assistant the ability to present a form with multiple fields for user input.

This tool enables structured data collection from users through an interactive form with support for different field types: text, select, boolean, and number.

Instructions

### When to use the form tool

Use the form tool when you need to collect multiple related fields from the
user in a single interaction. It is well suited for structured data collection
where field types are text, select (choose from a list), boolean (yes/no), or
number.

For **secrets** (API keys, passwords) use the ``elicit`` tool instead — form
does not have a secret type, so credentials would appear in the chat display.

For a single question or free-form input, a plain assistant message is simpler.
Prefer form when collecting two or more related fields at once.

### Form syntax

Each field is specified on a separate line with the format:
  field_name: Prompt text [options]

Field types are inferred from the prompt:
- Text field (default): `name: What's your name?`
- Select field: `priority: Priority level [low, medium, high]`
- Boolean field: `confirm: Are you sure? [yes/no]`
- Number field: `count: How many? (number)`

The tool will present an interactive form and return the collected data as JSON.

Examples

Collect project information

User
I want to start a new project
Assistant
Let me gather some information about your project:
form
name: Project name?
description: Brief description?
language: Primary language [python, javascript, rust, go, other]
priority: Priority level [low, medium, high]
System
Form submitted:
json
{"name": "my-awesome-project", "description": "A CLI tool for productivity", "language": "python", "priority": "high"}

Simple confirmation form

User
Deploy to production
Assistant
Please confirm the deployment details:
form
environment: Target environment [staging, production]
confirm: Proceed with deployment? [yes/no]
System
Form submitted:
json
{"environment": "production", "confirm": true}
gptme.tools.form.execute_form(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Present a form to the user and collect their responses.

gptme.tools.form.parse_field(line: str) dict | None

Parse a field definition line into a field specification.

gptme.tools.form.parse_form_content(content: str) list[dict]

Parse form content into a list of field specifications.

Precommit#

Pre-commit hook tool that automatically runs pre-commit checks after file saves.

This tool automatically runs pre-commit checks in two scenarios:

  1. Per-file checks (FILE_SAVE_POST (file.save.post)): After each file is saved - Runs pre-commit on the specific saved file - Provides immediate feedback on formatting/linting issues

  2. Full checks (TURN_POST (turn.post)): After message processing completes - Runs pre-commit on all modified files - Ensures all changes pass checks before auto-commit

Commands: - /pre-commit: Manually run pre-commit checks

Pre-commit checks include: - Code formatting (black, prettier, etc.) - Linting (ruff, eslint, etc.) - Type checking (mypy, etc.) - Other configured hooks

The tool will report any failures and suggest fixes.

Enable with: –tools precommit Or configure pre-commit checks via: GPTME_CHECK=true

Autocommit#

Autocommit hook tool that automatically provides hints for committing changes after message processing.

When GPTME_AUTOCOMMIT=true is set, after each message is processed: 1. Checks if there are file modifications 2. If modifications exist, returns a message asking the LLM to review and commit

The tool hooks into TURN_POST (turn.post) and runs with low priority (after pre-commit checks and other validation).

To enable autocommit: `bash export GPTME_AUTOCOMMIT=true `

Vent#

Vent/feedback tool — emits in-the-moment friction signals to a durable ledger.

The agent calls this when stuck or frustrated. Signals are written to:

~/.local/share/gptme/friction-ledger.jsonl

Rate-limited to one vent per turn to prevent recursive venting spirals (Lovable found agents can spiral into 43+ vents without this guard).

Usage:

```vent
pytest exits 0 with "no tests found" even though tests/test_vent.py
exists. Tried --co and an explicit path; the discovery config is wrong.
Owner: tooling
```

Resolution owner (axis 1 — who/what unblocks this) is an optional, small, stable enum captured at vent time. Richer theme/cause clustering happens later at analysis time, so keep the capture label thin:

self Solvable now with better prompting / context / reasoning tooling Needs a tool / permission / config / env change operator Needs a human (decision, credential, approval, account action) upstream Needs a fix in a dependency we don’t own architectural Not solvable in the current stack design

The Type keyword accepts both deprecated legacy aliases (Type1->self, Type2a->tooling, Type2b->architectural, Type0->operator) and current taxonomy values (e.g. Type: self, Type: tooling).

gptme.tools.vent.execute_vent(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Message

Append a friction entry to the ledger.

Complete#

Complete tool - signals that the autonomous session is finished.

exception gptme.tools.complete.SessionCompleteException

Exception raised to signal that the session should end.

gptme.tools.complete.auto_reply_hook(manager: LogManager, interactive: bool, prompt_queue: Any, no_confirm: bool = False) Generator[Message | StopPropagation, None, None]

Hook that implements auto-reply mechanism for autonomous operation.

If in non-interactive mode and last assistant message had no tools, inject an auto-reply to ensure the assistant does work.

In interactive + no_confirm mode (gptme -y), inject a quiet nudge once to avoid piling on, then let the loop continue naturally.

This is called via LOOP_CONTINUE hook, which receives interactive, prompt_queue, and no_confirm.

Parameters:
  • manager – Conversation manager with log and workspace

  • interactive – Whether in interactive mode

  • prompt_queue – Queue of pending prompts

  • no_confirm – Whether tool confirmations are skipped (–no-confirm / -y mode)

gptme.tools.complete.complete_hook(messages: list[Message], **kwargs) Generator[Message | StopPropagation, None, None]

Hook that detects complete tool call and prevents next generation.

Runs at GENERATION_PRE (before generating response) to stop the session immediately after complete tool is called.

Parameters:
  • messages – List of conversation messages

  • **kwargs – Additional arguments (workspace, manager - currently unused)

Note: GENERATION_PRE hooks are called with messages as first positional arg, not manager as the Protocol suggests. This is a known type safety issue.

gptme.tools.complete.execute_complete(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Message

Signal that the autonomous session is complete and ready to exit.

gptme.tools.complete.stuck_detect_hook(manager: LogManager, interactive: bool, prompt_queue: Any, no_confirm: bool = False) Generator[Message | StopPropagation, None, None]

Detect a stuck agent that keeps issuing the same tool call(s).

Unlike auto_reply_hook (which only acts when the last assistant message has no tool uses), this hook fires when the agent does emit tool uses but keeps repeating an identical action without progress — a silent failing loop that would otherwise run until the budget or session timeout is hit.

Registered as a separate LOOP_CONTINUE hook at higher priority than auto_reply_hook so it can observe the yes-tool-but-repeating case the latter early-returns on. Mutates nothing; only yields a system nudge and, after repeated escalations, raises SessionCompleteException.

See gptme/gptme#2725 and the design note in Bob’s workspace.

Restart#

Restart the gptme process.

This tool allows restarting gptme from within a conversation, which can be useful for applying configuration changes, reloading tools, or recovering from state issues.

gptme.tools.restart.execute_restart(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Execute restart by confirming intent.

The actual restart happens in the restart_hook (GENERATION_PRE), after all messages have been saved to the log.

gptme.tools.restart.restart_hook(messages: list[Message], **kwargs) Generator[Message, None, None]

Hook that detects restart tool call and performs the restart.

Runs at GENERATION_PRE (before generating response) to restart immediately after the restart tool is called.

By this point, all messages (including the assistant’s restart message and the system confirmation) have been saved to the log.

Lessons#

Lesson system tool for gptme.

Provides structured lessons with metadata that can be automatically included in context. Similar to .cursorrules or “Claude Skills”. Has keyword-based triggering.

Commands provided:

  • /lesson list - View all available lessons

  • /lesson search <query> - Find lessons matching query

  • /lesson show <id> - Display a specific lesson

  • /lesson refresh - Reload lessons from disk

class gptme.tools.lessons.LessonSessionStats

Statistics about lessons matched during a session.

__init__(total_matched: int = 0, unique_lessons: set[str] = <factory>, lesson_titles: dict[str, str] = <factory>) None
gptme.tools.lessons.auto_include_lessons_hook(manager: LogManager) Generator[Message | StopPropagation, None, None]

Hook to automatically include relevant lessons in context.

Extracts keywords from both user and assistant messages to trigger lessons.

Parameters:

manager – Conversation manager with log and workspace

Returns:

Generator of messages to prepend (lessons as system message)

gptme.tools.lessons.handle_lesson_command(ctx: CommandContext) Generator[Message, None, None]

Handle /lesson command.

gptme.tools.lessons.session_end_lessons_hook(manager: LogManager, **kwargs) Generator[Message | StopPropagation, None, None]

Hook to print lesson statistics at end of session.

Parameters:
  • manager – Conversation manager with log and workspace

  • **kwargs – Additional arguments (e.g., logdir)

Yields:

Nothing (just logs statistics)

Todo#

A working memory todo tool for conversation-scoped task planning.

This tool provides a lightweight todo list that exists within the current conversation context, complementing the existing persistent task management system in gptme-agent-template.

Key principles: - Working Memory Layer: Ephemeral todos for current conversation context - Complements Persistent Tasks: Works alongside existing task files without conflicts - Simple State Model: pending, in_progress, completed, paused - Conversation Scoped: Resets between conversations, doesn’t persist to disk - Auto-replay: Automatically restores todo state when resuming conversations

gptme.tools.todo.get_incomplete_todos_summary() str

Get a summary of incomplete todos for continuation prompts.

Returns:

A formatted string listing incomplete todos, or empty string if none.

gptme.tools.todo.has_incomplete_todos() bool

Check if there are any incomplete todos in working memory.

Used by auto_reply_hook to determine if the agent should continue working instead of being asked about completion.

Returns:

True if there are any pending or in_progress todos.

MCP#

The Model Context Protocol (MCP) allows you to extend gptme with custom tools through external servers. See MCP for configuration and usage details.

MCP server discovery and management tool.

Allows searching for MCP servers in registries and dynamically loading/unloading them.

Available Commands: - /mcp search [query] - Search for MCP servers across all registries - /mcp info <server-name> - Get detailed information about a specific server - /mcp load <server-name> [config-json] - Dynamically load an MCP server into the current session - /mcp unload <server-name> - Unload a previously loaded MCP server - /mcp list - List all currently configured and loaded MCP servers

The search command queries the Official MCP Registry (registry.modelcontextprotocol.io). Once loaded, server tools are available as <server-name>.<tool-name>.

Instructions

### When to use the mcp tool

Use mcp to discover, load, inspect, and manage MCP servers:
- `/mcp search <capability>` finds new servers
- `/mcp load <server-name>` loads one
- `/mcp list` shows what is already loaded

Do not use mcp to call loaded tools; invoke them directly as
`<server-name>.<tool-name>`.

Search uses the Official MCP Registry (registry.modelcontextprotocol.io).

Other commands:
- `info <server>` shows server details
- `resources list/read` browses server resources
- `templates list` lists resource templates
- `prompts list/get` lists or fetches prompts
- `roots list/add/remove` manages advisory workspace roots

Examples

```mcp
search sqlite
```

```mcp
info sqlite
```

```mcp
load sqlite
```

```mcp
list
```

```mcp
unload sqlite
```

```mcp
load my-server
{"command": "uvx", "args": ["my-mcp-server", "--option"]}
```

```mcp
resources list sqlite
```

```mcp
resources read sqlite db://main/users
```

```mcp
templates list sqlite
```

```mcp
prompts list sqlite
```

```mcp
prompts get sqlite create-query {"table": "users"}
```

```mcp
roots list
```

```mcp
roots add filesystem file:///home/user/project Project
```

```mcp
roots remove filesystem file:///home/user/project
```
gptme.tools.mcp.examples(tool_format: str) str

Return example usage.

gptme.tools.mcp.execute_mcp(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) Generator[Message, None, None]

Execute MCP management commands.

Tool Selection & Allowlists#

By default gptme loads its full built-in toolset. You can restrict which tools are active for a given run — either to reduce the agent’s surface area or to build read-only / sandboxed profiles.

Basic usage#

Pass a comma-separated list of tool names to --tools (CLI) or set the TOOL_ALLOWLIST environment variable:

# Exact names — only these tools are loaded
gptme --tools save,patch,shell,python "refactor this file"

# Additive: start from defaults and add more
gptme --tools +rag,browser "research this topic"

# Subtractive: start from defaults and remove specific tools
gptme --tools -shell,computer "safer mode"

# Disable all tools (pure conversation)
gptme --tools "" "just talk to me"

Glob patterns (*, ?, [...]) are also supported, matched against tool names with fnmatch.fnmatchcase().

Hint-based patterns#

Tools can carry capability hints — semantic tags that describe what a tool does. Hint-based allowlist entries let you match entire categories of tools at once using the hint: prefix:

# Allow only tools annotated as read-only (safe for untrusted workspaces)
gptme --tools "hint:read-only" "summarise this repo"

# Mix exact names with hint patterns
gptme --tools "shell,patch,hint:read-only" "analyse and fix"

The following hints are defined:

Hint

Meaning

read-only

Tool only reads state; never writes, creates, or deletes.

destructive

Tool may modify or delete state. Use with caution in automated runs.

idempotent

Tool is safe to call multiple times with the same arguments.

closed-world

Tool affects only local state; it does not make network requests or reach outside the current environment.

Note

Built-in gptme tools do not carry hints in the current release. Hints are currently populated automatically from MCP tool annotations (see below). You can also set hints explicitly when writing custom tools.

MCP tool annotations#

When gptme connects to an MCP server, each tool’s ToolAnnotations are mapped to gptme hints:

MCP annotation

Value

gptme hint

readOnlyHint

true

read-only

destructiveHint

true (and not read-only)

destructive

idempotentHint

true

idempotent

openWorldHint

false

closed-world

Example MCP server configuration that exposes a read-only filesystem tool:

{
  "name": "my-tools",
  "description": "My safe read-only tools",
  "tools": [
    {
      "name": "read_file",
      "description": "Read a file from disk",
      "annotations": {
        "readOnlyHint": true,
        "idempotentHint": true
      }
    }
  ]
}

Once connected, gptme --tools "hint:read-only" will include read_file while excluding any MCP tools without the read-only annotation.

Example profiles#

Read-only research agent — cannot write files or run commands:

gptme --tools "browser,rag,chats,hint:read-only" "research X"

Minimal coding agent — file editing only, no shell or browser:

gptme --tools "read,save,patch,morph,python" "refactor this module"

Safe MCP integration — built-in defaults plus only read-only MCP tools:

gptme --tools "+hint:read-only" "help me explore this codebase"

Subagent with restricted tool set — useful in [agent] config or when spawning subagents programmatically:

# gptme.toml
[env]
TOOL_ALLOWLIST = "shell,patch,save,read,hint:read-only"