Blog · AI Engineering

Prompt engineering — from trial-and-error to a pattern catalog

Why the way we talk to language models has become a discipline of its own — and which reusable patterns close the gap between "sometimes works" and "works reliably".

Why prompt engineering is a discipline

With the production use of large language models (LLMs), a new interface has entered software engineering: natural language. Unlike classical APIs, there is no fixed syntactic contract — the same thing can be phrased in hundreds of ways, and small changes in wording produce large differences in the result.

That is both the appeal and the problem. Anyone who asks an LLM the same thing three times in three different ways, and gets three different answers, does not have a reproducible solution — they have a streak of luck. In research and industry alike, a dedicated discipline has therefore emerged: prompt engineering — the craft of interacting with a language model using knowledge of natural language so that the result becomes predictable.

In regulated industries, a second layer sits on top: where audits happen, LLM answers have to remain traceable. "We asked the bot something nice" is not an acceptable procedure. Prompts become artefacts — versioned, documented, tested like code.

A pattern catalog turns this discipline into something teachable. What the "Gang of Four" did for software design in 1994, publications such as the "Prompt Pattern Catalog" by White et al. (2023) now do for interaction with language models: reusable solutions to recurring problems, described in a shared vocabulary.

Anatomy of a prompt pattern

A prompt pattern follows a clear structure — modelled on design patterns from software engineering. Six elements are enough to make it teachable and shareable:

Name — the pattern identifier; it already hints at the task and serves as shared vocabulary in reviews.
Context — the domain or problem the pattern addresses; deliberately phrased in a domain-independent way.
Motivation — which problem the pattern solves and why it helps.
Structure and key technique — how relevant information is conveyed to the model, often with reusable phrasing fragments.
Example — a concrete example prompt that shows the pattern in action.
Consequences — strengths and weaknesses plus ways to adapt the pattern to different tasks.

The charm of this form lies in the shared vocabulary. When a team says "persona pattern" or "cognitive verifier", everyone knows what is meant, without having to re-explain — much like "Strategy" or "Observer" in classical object-oriented code. Reviews become shorter, onboarding faster, knowledge less dependent on individuals.

The five categories

The catalog organises the patterns into five categories that also reflect phases of an LLM interaction — from the question through the answer to validation.

Input semantics — how the input is understood or restated, e.g. via a meta language that compresses structures.
Output customisation — how the answer is constrained or shaped: specific formats, personae, templates, step-by-step instructions.
Error identification — how errors in the output can be found and corrected, e.g. via fact lists or by asking the model to justify itself.
Prompt improvement — how the initial question is improved by the model itself, e.g. via clarifying questions or alternative suggestions.
Interaction — how the dialogue between user and model is shaped: the model as questioner, as game, as infinite generator.

This categorisation helps with selection. With a concrete problem — say "the answer hallucinates facts" — the relevant category (error identification) and the right patterns (fact check list, reflection) are quickly found. The ordering in the catalog is not hierarchical but serves as a map.

Sixteen patterns at a glance

The catalog comprises sixteen patterns in total. We pick out the most important ones — the ones that regularly make a difference in our projects — and name the others for completeness.

Persona — the model takes on a role

Probably the best-known pattern. By having the model adopt a defined role — IT security expert, clerk at a citizen office, senior architect — the answer, register and knowledge focus get a clear direction. Example: "Act like an IT security expert and review this code-review proposal." Strength: targeted knowledge focus. Weakness: the model is more prone to hallucination if the role demands detailed knowledge it doesn't have.

Template — the answer follows a predefined structure

The model is told to cast its answer in a strictly defined form. Particularly useful when the output is processed machine-side: "I'll give you a template for your output, where X is the placeholder for the content. Produce a valid JSON object based on first name, last name and address." Upside: machine-readable output. Downside: structure sometimes dominates over substantive depth.

Recipe — the answer as a numbered step-by-step guide

For situations where the goal is clear but the path is not. "I want to achieve X by carrying out steps A, B, C. Produce a complete step sequence, fill in gaps and remove unnecessary steps." Upside: a traceable process. Downside: the model sometimes adds steps based on its own assumptions — read the list, don't follow it blindly.

Question refinement — let the model suggest a better question

Whoever asks, wins — whoever asks better, wins faster. Pattern: "When I ask a question about topic X, suggest a better version of the question and ask whether I want to use that instead." Reduces the classical trial-and-error and often leads to more precise follow-up questions.

Cognitive verifier — break the question into sub-questions

This pattern forces the model to decompose a complex question into several simpler ones, answer them, and combine the sub-answers into an overall response. Example: "When I ask you a question, produce three additional questions that help answer the original better. Once I've answered the three additional ones, combine the answers and produce an output for my original question." Upside: well-reasoned answers on complex topics. Downside: the sub-questions are not always answerable.

Fact check list — a fact list for verification

The model is asked to attach a list of the facts its answer rests on. That list can then be checked one by one — the most effective lever against hallucination, and very combinable with question refinement and persona.

Reflection — explain the reasoning

The model is asked to explain why it answered the way it did. "For every answer, please explain the reasons and assumptions that led to it." Upside: makes the output checkable for domain experts. Downside: the model can invent reasons that have nothing to do with the actual reasoning — explanation is not proof.

Flipped interaction — let the model ask the questions

For complex tasks, the model often doesn't know enough to give a useful answer. The pattern reverses the roles: "From now on, ask me questions about deploying a Python application on AWS. Once you have enough information, produce a Python script." The model leads the conversation until enough context is in place. Result: a much higher hit rate for ambiguous tasks.

Meta language creation — a mini-language for the problem

When the domain has a natural notation, it pays to calibrate the model on it. "2B,4N denotes user names with two random letters followed by four random digits. Generate ten user names." Saves description overhead and reduces ambiguity — provided the notation is clearly documented.

The others in the catalog

The catalog also defines patterns for output automater (the LLM produces a script that automates tasks), visualization generator (output suitable for a graphics tool such as Graphviz or DALL·E), game play (the model generates a game on the chosen topic), infinite generation (unbounded output without re-prompting — useful for test data), alternative approach (the model suggests alternative solution paths), refusal breaker (rewriting to circumvent filter rules — with clear ethical caveats, high abuse potential), and context manager (narrowing or widening the topical focus). Sixteen patterns in total — not a required reading list, but a toolbox you pick from situationally.

Combination and the iterative process

Patterns are useful on their own, but they get genuinely good in combination. Three pairings that show up repeatedly in project work:

Persona + cognitive verifier — "Act like a senior architect. When I ask you a question, produce three sub-questions, answer them, and combine the answers." Result: deeper answers in a coherent professional voice.
Template + fact check list — structured JSON output with an attached fact list. Machine-readable and verifiable — the foundation for any audit process.
Flipped interaction + recipe — the model asks questions until enough context is in place, then delivers a numbered guide. Works particularly well for complex setup tasks.

One important caveat: prompt engineering is iterative. The first attempt is rarely the final one. The cycle typically runs:

Idea
A task to be solved with an LLM — from an application assistant to a code-review helper. Clarity about the desired outcome comes first, not the choice of model.
Implementation
A first prompt, built from the patterns that fit. This is where a catalog earns its keep: instead of starting with a blank page, you combine known building blocks.
Output analysis
Does the model actually answer the original question? Are there hallucinations, formatting errors, omissions, unwanted pleasantries? Most of the work tends to live here.
Improvement
Swap patterns, reorder, add examples, sharpen constraints. Back to step 2. Three to five iterations are normal before a prompt is production-ready.

What matures gets versioned — as an artefact in the repository, with tests that compare its output quality against earlier versions. The same level of discipline as for code.

Practical note

Patterns are not a guarantee of good results. They are a vocabulary that structures the discussion — and a starting point that shortens the first iteration. The second, third, fourth iterations remain real work on the concrete use case.

Recommendation — how we use patterns in projects

Three observations from project work that mark the difference between "knowing the patterns" and "using them effectively".

Introduce patterns at team level, not in solitude. The value of a pattern catalog comes from the shared vocabulary. If three engineers use three different names for the same pattern, the value evaporates. A shared glossary entry and one or two examples in the wiki are often worth more than a thick concept paper.
Prompts are artefacts, not throwaway inputs. Anyone shipping LLM-backed features treats prompts like code: versioned in the repository, with tests, with code reviews. What matures doesn't get re-sent every time — and never lands unreviewed in production.
Patterns complement, they don't replace. Threat modelling, data-protection review, hallucination testing remain separate duties. A neat persona pattern doesn't make output quality if nobody checks whether the answer is correct.

Recommendation

In our projects with AI agents — from citizen-service telephony to internal support bots to agentic workflow systems — prompt engineering is not a side activity but a dedicated track in the backlog. We maintain an internal pattern catalog that adds a layer on top of the White patterns, with our project-specific domain findings: citizen communication, application handling, OZG forms. That turns a collection of generic patterns into a library that actually carries in concrete cases — and grows with every project.

Prompt-engineering workshop or pattern library for your team?

We work with your team on a first set of patterns for your concrete use cases — customer communication, line-of-business workflows, internal assistants. The outcome: a library that can be evolved in sprints rather than a one-off concept paper.

Schedule a call