Consulting · Agentic Systems

Agents that actually do work — connected to your systems via MCP

Language models become productive the moment they get tools. We build agentic systems on top of the Model Context Protocol (MCP), orchestrate them with n8n, and talk to your callers through voice stacks built on VAPI and ElevenLabs. What sounds like hype becomes a serious architecture question in regulated industries — and that's where we advise.

What is the Model Context Protocol (MCP)?

A language model that can only answer remains a well-spoken advisor without hands. Only when it can call tools, read resources and trigger actions does it become an agent. The Model Context Protocol — released by Anthropic as an open standard in late 2024 — describes exactly what that connection between model and outside world looks like.

Before MCP, every platform maintained its own tool format: OpenAI function calls, Anthropic tools, a dozen proprietary SDKs, half-documented webhooks in between. Anyone who built a calendar connector got to rewrite it for the next platform. MCP replaces this fragmentation with a single, JSON-RPC-based protocol that connects a language model as a client to any number of MCP servers — and a server you build once is open to every MCP-capable model.

Technically MCP is deliberately understated: JSON-RPC 2.0 as the wire format, two intended transports (stdio for local processes, HTTP with Server-Sent Events for remote servers), clear initialization and capability negotiation. No proprietary authentication in the protocol itself — that's layered on top through proven mechanisms like OAuth 2.1 or mTLS.

Tools

Callable functions with typed parameters. From "create appointment" to "database query" to "generate PDF" — anything the model is allowed to actively trigger.

Resources

Readable content the model can request as context: a knowledge base, documents, the current CRM record, the log of an ongoing case.

Prompts

Server-provided, parameterizable prompt templates. A good fit for predefined workflows ("summarize citizen case", "draft escalation email").

Sampling

The reverse channel: the server requests a model completion from the client — useful when a tool itself needs a model decision (self-reflection, classification).

Adoption is now broad: Claude (Desktop, IDE integrations, API), Cursor, VS Code, Zed, n8n, many agent frameworks and commercial platforms speak MCP. Building an integration today buys you a connector that won't have to be rewritten tomorrow.

Architectures for agentic systems

An agent is more than "model plus tool list". It's a control pattern that links perception, decision and action in a loop. Which variant of that pattern fits depends on the task, the risk and the latency budget.

Single-agent — one model, one toolbox

Simple · easy to audit

When to use

When the domain is well bounded and a single model with ten to twenty tools can cover the entire job — typical for service desk automation, citizen information, simple back-office operations.

How it works

One central loop: user request → model decides whether a tool is needed → tool call via MCP → result back to model → answer or next tool call. Ends when the model signals "done" or a hard limit is reached.

Strengths

Easy to test · clear audit trails · low token cost · manageable failure modes

Weaknesses

Doesn't scale indefinitely · one model has to do everything · hallucination risk grows with the tool count

Supervisor — one conductor, several specialists

Stable for complex cases

When to use

As soon as a task spans multiple domains — for example "analyze customer enquiry, check contract, trigger refund, draft email". Each domain gets a specialist agent with its own toolbox; a supervisor distributes and consolidates.

How it works

The supervisor has a bird's-eye view of the case and calls sub-agents as tools. Sub-agents work in their own context windows — that saves tokens and reduces drift, because domain knowledge doesn't bleed into the central model.

Strengths

Separation of concerns · each specialist small and testable · context windows stay short · good for compliance-relevant steps

Weaknesses

More tool roundtrips · latency rises · supervisor prompt becomes a critical single point of failure

Workflow-driven — model as a step in the process

Pragmatic · suited to regulated domains

When to use

When the business process is fixed and you only want to make individual steps intelligent — classification, extraction, draft replies. Instead of "agent finds its own way", the rule is "workflow calls the model when prediction logic is needed".

How it works

A workflow engine like n8n, Camunda or Temporal coordinates the steps. Model calls are regular nodes in the graph — with input, output, retry behavior and compensating actions like every other node.

Strengths

Determinism where it matters · model only as decision component · perfect for approval and escalation paths · simple audits

Weaknesses

Less flexible than a free-running agent · new paths require workflow changes · less suitable for open-ended exploration

Cross-cutting concerns to settle early

Memory (what does the agent remember between turns? Vector store, session cache, persistent history?) · tool selection (statically in the prompt, dynamically via embeddings, or hybrid?) · human-in-the-loop (which decisions need human confirmation before they take effect?) · model routing (small model for classification, large for synthesis — saves money and latency) · observability (tracing every tool call, token cost per case, detecting model drift).

n8n as orchestrator — what years of practice taught us

n8n is an open-source workflow platform that has accompanied us through years of integration work — from the first webhook-to-database hops to fully orchestrated agent architectures with tool calls, memory and MCP integration. We know the platform from projects where it isn't a sandbox: the cases have to run reliably every day.

The appeal of n8n lies in its dual nature: a low-code canvas with more than 400 ready-made integrations on the surface, and a full Node.js / JavaScript runtime underneath. A case can start as a click diagram and be refined where click diagrams hit their limits — with real code, sub-workflows or custom nodes. That's the point at which low-code platforms typically break; with n8n the bridge holds.

For agents, n8n has become particularly attractive since the AI Agent Node arrived: it wraps a tool-calling agent as a workflow node, with configurable model, swappable memory provider (buffer, window, vector store), and a list of registered tools. Those tools can be other n8n workflows, HTTP requests, database queries — or, since early 2025, MCP servers as first-class tool providers.

Self-hosting works

n8n runs on-premises or in your own data center — decisive when data isn't allowed to leave the tenant. License: Sustainable Use License plus commercial editions.

Workflows as code

Workflows are JSON. We treat them like code: pull requests, code review, automated import per stage, migrations via CLI. No click drift between test and production.

Extensible with custom nodes

If a connector is missing, we write it as a TypeScript node — from SAP interfaces to internal service layers to specific public-sector authentication schemes.

Observable

Per-case execution history, with inputs, outputs and error stacks. Combined with OpenTelemetry export you get a complete trace of every tool call.

We typically place n8n as the orchestration backbone: the voice platform talks to the caller, the LLM agent decides which step is needed, and n8n runs the deterministic stretches — database updates, ERP calls, document generation, email dispatch. The split is intentional: the agent stays flexible, the business process stays inspected.

Field note

Treat n8n workflows like production code: version in Git, validate in CI, deploy via CLI. Whoever uses the editor as a production cockpit is building shadow IT — and losing the auditability that regulated industries demand.

Voice agents with VAPI and ElevenLabs

Voice is the most natural interface — and the most demanding. A citizen who won't wait. A policyholder describing a problem. A caller switching between standard German, dialect and Turkish. Voice agents need sub-second latency, a voice that doesn't slip, and a backend connection that triggers real work. We build these stacks from two components proven in projects: VAPI for the real-time telephony layer, ElevenLabs for production-grade speech output.

VAPI — the real-time telephony platform for LLM agents

Voice · function calling · WebRTC + SIP

When to use

Whenever a language model has to talk to callers — inbound hotlines, outbound appointment confirmations, after-hours triage, order intake. VAPI takes the painful real-time pipeline off your plate: speech-to-text, model call, text-to-speech, barge-in (caller interrupts the bot), SIP trunk integration with classic phone systems.

How it works

VAPI orchestrates the real-time loop on a sub-second budget. You pick models (GPT, Claude, local open-weight models), voice (ElevenLabs, Cartesia, Azure Neural) and tools — either classic webhooks or MCP servers. During the call VAPI talks to the model, calls tools, gets results back, hands them to TTS and on to the caller. End-to-end latency typically below 800 milliseconds.

Strengths

Ready-to-use telephony incl. SIP · interchangeable models and voices · function calling and webhooks · barge-in by default · solid per-call observability

Weaknesses

SaaS platform with data-residency questions · voice quality varies for rare accents · per-minute pricing adds up at peak load

ElevenLabs — speech output at studio quality

TTS · voice cloning · multilingual

When to use

Whenever speech output has to sound such that the caller doesn't immediately notice they're talking to a machine. ElevenLabs delivers speech synthesis at a quality that, in many scenarios, is barely distinguishable from a human speaker — including German voices with natural prosody, multilingual support (more than 30 languages) and the option to clone a brand voice.

How it works

In the voice stack, ElevenLabs slots in as the TTS provider — VAPI streams model tokens to ElevenLabs, which streams audio frames back into the call. Time-to-first-audio is typically 100–250 milliseconds, so via the streaming API you hear speech long before the full sentence is generated.

Strengths

Outstanding voice quality · very low time-to-first-byte · voice cloning for brand consistency · solid multilingual coverage · clean API

Weaknesses

Cloud-only · per-1000-character cost can pinch at high volumes · voice cloning requires careful consent and abuse controls

Tools, MCP and backends — how the caller reaches real action

A pleasant voice alone is worth little. The value appears the moment the voice agent actually does something — books an appointment in a case management system, triggers a refund, files an issue in a public-sector portal. Here MCP returns to the picture: VAPI acts as MCP client and calls tools provided by an MCP server we run. The server is the controlled door to backends — ERP, CRM, case management, databases — and implements business logic, validation and audit trails.

A typical citizen telephony deployment as we build it: VAPI accepts the call, routes audio through ElevenLabs for output, and calls our MCP server as tool provider. The MCP server exposes functions like create_case, find_appointment_slots, check_citizen_status — each with schema, validation and connection to a case management system or database. n8n runs alongside for deterministic follow-up: confirmation emails, hand-off to case workers, escalations.

// Sketch: VAPI assistant config with MCP server
{
  "model": {
    "provider": "anthropic",
    "model":    "claude-sonnet-4-5",
    "systemPrompt": "You are the friendly voice of the City of Sample Town …"
  },
  "voice": {
    "provider": "11labs",
    "voiceId":  "de-formal-female-01"
  },
  "tools": [{
    "type":        "mcp",
    "serverUrl":   "https://mcp.sample-town.de/sse",
    "auth": { "type": "oauth2", "clientId": "vapi-citizen-line" }
  }]
}

Tutorial: Building and operating your own MCP server

The fastest way to truly understand MCP is to build a server once. This guide takes you concisely from your first tool through authentication to production. Example: an MCP server for a municipal citizen-appointment service.

Prerequisites
A Node.js environment from version 20 (alternatively Python 3.11+). For production: a TLS reverse proxy, an OAuth-2.0/2.1-capable authorization server (e.g. Keycloak — see our IAM page), and a connectable backend (database or case-management API).

Install the SDK and scaffold the server

Anthropic provides official SDKs for TypeScript and Python. We use TypeScript here because n8n and VAPI integrate seamlessly with it.

npm init -y
npm install @modelcontextprotocol/sdk zod

// src/server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport }
  from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({
  name: "sample-town-citizen-service",
  version: "1.0.0"
});

Define your first tool

Tools are registered with name, description and a Zod schema for the parameters. The description isn't decoration — the LLM reads it and decides from it when to use the tool. Invest the time to write precise, dialogue-fit descriptions.

server.tool(
  "find_appointment_slots",
  "Finds free slots at a citizen office for a given service. " +
  "Returns at most 5 suggestions in the requested time range.",
  {
    service:     z.enum(["id_card", "passport", "residence_cert"]),
    from:        z.string().describe("ISO date, earliest"),
    to:          z.string().describe("ISO date, latest"),
    location_id: z.number().optional()
  },
  async (input) => {
    const slots = await backend.findSlots(input);
    return { content: [{ type: "text", text: JSON.stringify(slots) }] };
  }
);

Choose the transport — stdio or HTTP/SSE

Locally embedded MCP servers (e.g. an IDE spawning the server as a child process) use stdio: minimal, no network overhead. For remote-reachable servers — and that includes anything serving VAPI or a shared n8n hub — we use Streamable HTTP (with Server-Sent Events). A lean Express binding is enough to start.

// HTTP/SSE variant
import express from "express";
import { StreamableHTTPServerTransport }
  from "@modelcontextprotocol/sdk/server/streamableHttp.js";

const app = express();
app.use(express.json());

app.post("/mcp", async (req, res) => {
  const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
  await server.connect(transport);
  await transport.handleRequest(req, res, req.body);
});

app.listen(3000);

Authentication via OAuth 2.1

Since mid-2025 MCP defines an official OAuth-2.1-based authorization flow. Your MCP server acts as a resource server; an existing authorization server (e.g. Keycloak) issues access tokens. Concretely: the MCP client (VAPI, n8n, Claude) obtains a token, the MCP server validates it on every call — standard resource-server logic, no special role for MCP.

// Express middleware (excerpt)
app.use("/mcp", async (req, res, next) => {
  const token = req.headers.authorization?.replace("Bearer ", "");
  if (!token) return res.status(401).set(
    "WWW-Authenticate",
    'Bearer resource_metadata="https://mcp.example/.well-known/oauth-protected-resource"'
  ).end();

  const claims = await verifyJwt(token, { issuer, audience: "mcp-citizen-service" });
  req.user = claims;
  next();
});

Connect clients
Three typical clients, all with native MCP support:
- Claude Desktop / Claude Code: entry in the configuration with server URL and OAuth setup. Tools are immediately available in chat.
- n8n: the MCP client node registers the server; tools can be used directly inside the AI Agent Node or in a classic workflow.
- VAPI: the MCP server is registered as a tool provider in the assistant configuration JSON (see the example above).
Production hardening
Reverse proxy with TLS and rate limits in front. Token validation with audience check. Structured logging of every tool call (who called with which arguments, what came back, how long it took). Idempotency keys for write operations. Schema validation with Zod as a hard gate before backend calls. Health endpoints, tracing via OpenTelemetry, a dashboard for latency, error rate and token consumption per case.
```
docker run -d --name mcp-server \
  -p 3000:3000 \
  -e ISSUER=https://iam.sample-town.de/realms/citizens \
  -e AUDIENCE=mcp-citizen-service \
  -e DB_URL=postgresql://… \
  --restart unless-stopped \
  registry.example/mcp-citizen-service:1.0.0

# In front: nginx with TLS, optional mTLS for M2M clients
```
Versioning and evolution
Tools are part of your public API — they change behavior in production cases. Treat schema changes with the discipline of a REST API: backwards-compatible additions, deprecation phases, semantic versioning in the server name, automated contract checks via snapshot tests. With that discipline you can switch model providers without business logic suffering.

Field note

Tool descriptions are the single most important lever. In our projects, a precise, actively phrased description has consistently moved tool selection accuracy from "okay" to "reliable" — more impactful than any model upgrade.

Architecture patterns, security and operations

Agentic systems bring risks classic web applications don't have. Prompt injection in inputs, tool misuse through hallucinated arguments, slow model drift, opaque token cost. Approached soberly, most issues can be prevented before they appear.

Six principles we enforce in every project

1. Least privilege at the tool level. Each MCP server gets only the backend rights its tools need — no database admin, no all-scope API key. Tools themselves are gated per client identity: a voice agent sees different operations than an internal back-office agent.

2. Validate inputs, tame outputs. Schema validation at the tool boundary is mandatory. Beyond that: sanitize content that flows from the model back into other models — a single hostile record otherwise becomes a prompt-injection bomb for every downstream step.

3. Human-in-the-loop where consequences hurt. Moving money, changing permissions, deleting citizen data — no tool executes that without explicit confirmation. In the simplest case: a "confirm" step in the workflow that escalates the action to a human for approval.

4. Complete audit trails. Every tool call gets structured logging — timestamp, identity, input, output, latency, token consumption. In regulated industries this isn't optional, it's part of supervisory obligations.

5. Keep models swappable. Models get better monthly, prices fluctuate, providers go down. Whoever builds against a provider interface (instead of directly against one vendor's SDK) saves themselves expensive migrations.

6. Measure cost from day one. Token cost per case is a business KPI, not an implementation detail. We log it per sub-agent, per tool, per case type — and can produce an honest economic view of the system at any time.

Prompt-injection defense

Clear separation of system, developer and user instructions. Never execute tool calls based on instructions embedded in data. For sensitive actions, require confirmation outside the model.

Data residency

Where are prompt, response and tool inputs processed? For regulated data, EU hosting or on-premises models (vLLM, Ollama, Azure OpenAI EU) as a hard constraint before any other architecture choice.

Observability

Tracing across all layers: voice → agent → MCP server → backend. OpenTelemetry as the shared language. Dashboards for P95 latency, tool error rate, token cost per case type.

Evaluation as a discipline

Golden test conversations, automated replays on every model or prompt update, drift detection in production. Without an eval harness there's no controlled evolution.

What we typically bring

Experience from projects with n8n, Keycloak and custom backends, years of practice in Java, .NET and Python ecosystems, a clear view of what holds up in regulated industries — and the willingness not to sell a prototype as the finished product. Agents are a powerful tool class, not a universal cure. We tell you openly where they make sense and where a classic architecture is the more honest answer.

Workshop or prototype in the agentic space?

In two to three days we'll work with your team on a grounded assessment — from use case to fitting architecture to a slim prototype that lets the economics be read off directly.

Schedule a conversation

Agents that actually do work — connected to your systems via MCP

What is the Model Context Protocol (MCP)?

Tools

Resources

Prompts

Sampling

Architectures for agentic systems

Single-agent — one model, one toolbox

When to use

How it works

Strengths

Weaknesses

Supervisor — one conductor, several specialists

When to use

How it works

Strengths

Weaknesses

Workflow-driven — model as a step in the process

When to use

How it works

Strengths

Weaknesses

Cross-cutting concerns to settle early

n8n as orchestrator — what years of practice taught us

Self-hosting works

Workflows as code

Extensible with custom nodes

Observable

Voice agents with VAPI and ElevenLabs

VAPI — the real-time telephony platform for LLM agents

When to use

How it works

Strengths

Weaknesses

ElevenLabs — speech output at studio quality

When to use

How it works

Strengths

Weaknesses

Tools, MCP and backends — how the caller reaches real action

Tutorial: Building and operating your own MCP server

Prerequisites

Install the SDK and scaffold the server

Define your first tool

Choose the transport — stdio or HTTP/SSE

Authentication via OAuth 2.1

Connect clients

Production hardening

Versioning and evolution

Architecture patterns, security and operations

Six principles we enforce in every project

Prompt-injection defense

Data residency

Observability

Evaluation as a discipline

What we typically bring

Workshop or prototype in the agentic space?