Architecture¶

Overview¶

MCP Hangar manages MCP providers with explicit lifecycle, health monitoring, and automatic cleanup.

Key concepts: - Providers — Subprocesses or containers exposing tools via JSON-RPC - State machine — COLD → INITIALIZING → READY → DEGRADED → DEAD - Health monitoring — Failure detection with circuit breaker - GC — Automatic shutdown of idle providers

State Machine¶

     COLD
       │ ensure_ready()
       ▼
  INITIALIZING
       │
       ├─► SUCCESS ──► READY
       │                 │ failures >= threshold
       │                 ▼
       │              DEGRADED
       │                 │ backoff + retry
       │                 └──► READY
       │
       └─► FAILURE ──► DEAD
                         │ retry < max
                         └──► INITIALIZING

Components¶

┌─────────────────────────────────────────────────────────┐
│                    MCP Hangar                           │
│  (FastMCP server, registry.* tools)                    │
└──────────────────────┬──────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────┐
│              Provider Manager                           │
│  - State machine    - Health tracking                  │
│  - Lock management  - Tool cache                       │
└──────────────────────┬──────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────┐
│                Stdio Client                             │
│  - Message correlation    - Timeout management         │
│  - Reader thread          - JSON-RPC                   │
└──────────────────────┬──────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────┐
│           Provider Process                              │
│  (subprocess / docker / podman)                        │
└─────────────────────────────────────────────────────────┘

Background:
┌─────────────────────────────────────────────────────────┐
│  GC Worker: idle cleanup    Health Worker: checks      │
└─────────────────────────────────────────────────────────┘

Threading¶

Lock Hierarchy¶

Acquire in order to avoid deadlocks: 1. Provider.lock (per-provider) 2. StdioClient.pending_lock (per-client)

Threads¶

Thread	Purpose
Main	FastMCP server, tool calls
Reader (per provider)	Read stdout, dispatch responses
GC Worker	Idle provider cleanup
Health Worker	Periodic health checks

Critical Section¶

# Fast path — check state without I/O
with lock:
    if state == READY and tool in cache:
        client = conn.client

# I/O outside lock
response = client.call(...)

Error Handling¶

Category	Strategy
Transient (timeout)	Retry with backoff
Permanent (not found)	Fail fast, mark DEAD
Provider (app error)	Propagate, track metrics

Circuit Breaker¶

READY (failures: 0)
  │ failure
READY (failures: N)
  │ threshold reached
DEGRADED (backoff)
  │ wait
COLD (retry eligible)
  │ ensure_ready()
READY (failures: 0)

Message Correlation¶

class StdioClient:
    pending: Dict[str, PendingRequest]

    def call(method, params, timeout):
        request_id = uuid4()
        queue = Queue(maxsize=1)
        pending[request_id] = PendingRequest(queue)

        write({"id": request_id, "method": method, ...})
        return queue.get(timeout=timeout)

    def _reader_loop():
        while not closed:
            msg = json.loads(read_stdout())
            pending.pop(msg["id"]).queue.put(msg)

Health Checks¶

Uses tools/list — fast, standard, verifies full stack.

class ProviderHealth:
    consecutive_failures: int
    last_success_at: float
    total_invocations: int
    total_failures: int

Performance¶

Hot path:

# Good — state check without I/O
with lock:
    if state == READY:
        client = conn.client
response = client.call(...)  # Outside lock

# Bad — I/O under lock
with lock:
    response = client.call(...)  # Blocks other threads

Recommended TTL: - Subprocess: 180-300s - Container: 300-600s