model context protocol · host

Watch an LLM actually
use your tools.

Relay connects to remote MCP servers and runs the orchestration loop itself — the model calls tools, reads the results, and decides what to do next, streamed step by step.

Streamable HTTP · multi-server · provider-agnostic

orchestration stream
thinking
tool_call github__search_issues { q: "streaming" }
tool_result → 3 open issues
tool_call github__get_issue { number: 58213 }
tool_result → "App Router streaming stalls behind proxy"
final → Top streaming issue is #58213 — proxy buffering.

The host orchestration loop

Most demos let a desktop client do this. Relay builds it: decide → call tool → feed the result back → decide again. Capped iterations, multiple tool calls per turn, and tool errors recovered without crashing the turn.

decide→ call tool→ feed result→ repeat✓ final answer

Streamed live

Every thinking step, tool call, and result streams to the UI as it happens — not a spinner.

Many servers

Aggregate tools across servers; names are namespaced and routed so two servers can both expose search.

Secure & pluggable

SSRF-guarded URLs, bearer-token auth, and a provider-agnostic adapter — Azure, OpenAI, Groq, Gemini.

Yours, persisted

Sign in with Google or email; conversations save per-user in Firestore with full context memory.

How it works

01

Connect servers

Add remote MCP URLs (optionally with a token). Tools are discovered over Streamable HTTP.

02

Run the loop

Your message plus the aggregated tools go to the model; results feed back until it answers.

03

Watch it stream

Each step renders live; the conversation persists to your account.