The Published Language an LLM Cannot Give You
This is the first deep dive in a series that follows from an opening post called "Context Maps in the Age of AI." If you have not read it, the one-paragraph version is enough to follow along. The argument there was that Context Maps were never really about systems, they were about coupling between models, teams, and influence, and that AI does not invalidate the patterns so much as reveal which of them were load-bearing all along and which were only scaffolding. That opening post walked the whole catalog at survey depth and deliberately refused to resolve the question pattern by pattern. The deep dives are where the resolving happens. Each one takes a pattern, or a pair of them, and does the work the survey only set up.
This is the first of those, and it takes the pair where AI changes not the economics of a relationship but the actor on the other end of it.
The opening post made two claims in passing and never stopped to defend either. The first is that a raw language model is, on a Context Map, an upstream Big Ball of Mud: an upstream you sit downstream of, with no clean boundary of its own. The second is that every MCP server an organization stands up is an Open-Host Service commitment, whether anyone called it that or not. The survey could give each of them a sentence. This post pays them both off.
Let me be precise about how I am going to treat the LLM, because the loose version of this argument is the wrong one. I do not mean that a language model behaves a bit like a Big Ball of Mud, as a turn of phrase. I mean it is one, in the exact sense the catalog intends: a region where models are mixed, boundaries are inconsistent, and the honest response is to draw a boundary around the mess and refuse to let it spread. Evans arrives at the same place from the other direction and treats the LLM as a bounded context in its own right, with its own language and its own consistency model. Put the two together and the shape of this post falls out. The LLM is an upstream Big Ball of Mud. The MCP server is the Open-Host Service you publish in front of it. And the thing worth staring at is that those are one artifact seen from two sides. Published for others to build on, it is a host service that puts you upstream of consumers you will never meet. Wrapped around a language model, it is also the only boundary standing between you and an upstream that has no stable boundary of its own. Keep both sides in view, because the rest of this post is about what that position costs you and what it buys you.
The connection is the cheap part
When we talk about MCP, the conversation almost always goes toward connectivity. You have tools and data on one side, a language model that needs to reach them on the other, and MCP is the standard socket between them. The line everyone repeats is that it is the USB-C of AI. Plug things in and they work.
That is true, and it is the part that matters least. Plugging things in is call flow, and call flow is exactly the flow the opening post argued AI has driven close to free. If lowering the cost of wiring were all MCP did, it would earn a paragraph in a release note, not a deep dive.
What the connectivity framing hides is on the other side of the socket. When you stand up an MCP server you are not consuming an integration, you are publishing one. You are opening a service to consumers you do not control, cannot enumerate, and will never meet, and you are promising, implicitly, that the shape of what you expose is stable enough to build on. That is not a connectivity decision, it is a strategic position on your Context Map. The server is the Open-Host Service from the intro: published so that others can integrate against it, which is the move that puts you upstream of everyone who connects.
Most teams I work with have never said it in those words, much as the architects in the opening post had never named their instinct as sorting load-bearing from scaffolding. They exposed some tools and moved on. And the doubleness from the intro is sitting right there in the same server: the capability behind those tools is, more often than not, a language model rather than code the team wrote. So the server faces two ways at once. Take the familiar direction first, the Open-Host Service, because even that side is less settled than it looks.
The pattern is older than the protocol
Read the definition of an Open-Host Service as if you had never heard of MCP. It says to define a protocol that exposes your subsystem as a set of services, to open that protocol so anyone who needs to integrate can use it, and to keep growing it to meet new integration needs rather than bending it to any single consumer's quirks. That is a specification for an MCP server, and it was written in 2003.
One interface for all consumers, not a bespoke translation per client. A contract you publish once rather than renegotiate caller by caller. In my book I argued that the provider of an Open-Host Service is almost always upstream, for a reason that takes one sentence: the actions of the team providing the service land directly on everyone consuming it. Change the service and your consumers inherit the change, asked for or not. That is what upstream means, and it is the seat you take the moment you publish a server.
So the protocol did not invent the pattern. It industrialized it. Before MCP you could stand up an Open-Host Service, and plenty of us did, by publishing a stable REST resource or a well-known event stream. What MCP changed is the cost. The socket is standard now, and you can generate an MCP server straight from an OpenAPI description, which is worth pausing on: a tool reads an interface you already published and emits the host service for you. The Open-Host Service used to be a deliberate design act. It is turning into a build step. That is what industrialization does to a pattern, and it is the same collapse in call-flow cost the opening post described, arriving at the boundary this time.
Cheap is good news right until you read the rest of the old definition. The pattern assumes the host can enhance and expand the protocol over time, and can hand a single consumer a one-off translator when its needs are idiosyncratic, so that the shared contract stays coherent for everyone else. Every one of those clauses assumes an actor doing the work: a team with intent, keeping the thing simple on purpose. Which sets up the question this whole post turns on. When the capability behind your Open-Host Service is not code your team wrote but a language model you call, who is enhancing the protocol? Who is keeping it coherent? Who, in any meaningful sense, is the host?
The first thing to be clear about is that the server cannot be the host. A server is a surface, and a surface accepts no obligations. The host is a team. The MCP server is only the place through which some team accepts the upstream responsibility, and if no team has accepted it, then you have published an Open-Host Service with no host behind it, which is the most dangerous reading of the question, and one a later post in this series comes back to.
The mud runs to the edge of the boundary
So take the other direction now, the one behind the server.
The Big Ball of Mud came into the context-mapping catalog through Vaughn Vernon. In Implementing Domain-Driven Design he puts it this way:
"As we survey existing systems, we find that, in fact, there are parts of systems, often large ones, where models are mixed and boundaries are inconsistent. Draw a boundary around the entire mess and designate it a Big Ball of Mud. Do not try to apply sophisticated modeling within this Context. Be alert to the tendency for such systems to sprawl into other Contexts."
We usually point that at a legacy system, some decade-old billing monolith where several domain models grew into each other until nobody can say where one ends.
A large language model fits that description, and I mean it structurally, not as a quip. Inside the monolith are a handful of tangled domain models. Inside an LLM is every domain model at once: every vocabulary, every contradictory way anything in its training data was ever described, with no boundaries between them whatsoever. The monolith at least had a schema you could read. The LLM has weights. By Vernon's own test, models mixed and boundaries inconsistent, a language model is a Big Ball of Mud, and a thoroughgoing one.
That internal mess is not the claim that matters to an integrator on its own. It matters because of how it surfaces on the map: as an upstream context with no usable boundary to integrate against, which is the only thing your Context Map can actually represent about it.
The mixing inside is not even the dangerous part, because you were never going to integrate against the internals of a legacy system either. The danger is at the boundary. A legacy system, however foul its internals, answers you in a fixed shape: a row, a record, a SOAP envelope, something whose boundary has a definite edge you can write a translation against. Ask a raw language model for a credit decision and you get back natural language, maximally expressive, structured by nobody, and different on the next call. The boundary has no edge. The mud does not stop and present you with an interface, it runs right up to where the boundary should be and keeps going. That is what makes a language model not merely messy but a different kind of upstream, and it is why "we will just parse the response" misreads the problem. There is nothing stable to parse until you have built the thing that makes it stable.
This is the one point where I want to borrow Evans's eyes, because he names something here more precisely than my own framing did. Working through his own small AI integration, he insists that "LLM" is the wrong name for the bounded context on the other side. The context is not the abstract category, it is Claude Sonnet 3.5 specifically, with its own capabilities and interface and quirks, a different context from GPT or Mistral or even a later Claude. That precision matters to my argument more than it does to his. If the upstream Big Ball of Mud were "the LLM" in general, you could imagine writing one durable boundary against it. But you are not downstream of a category, you are downstream of one frozen language model with one set of behaviours, and when the provider swaps it for the next version you are downstream of a different Big Ball of Mud wearing the same URL. The two problems compound: an upstream with no stable surface of its own, that is also replaced underneath you without notice.
Put those together and the consequence is flat and load-bearing. A raw language model has no Published Language at its boundary, because it establishes none, and what surface it does present is not stable across versions. The expressive natural-language surface is the opposite of an agreed exchange format. So if a Published Language is going to exist at that boundary, something downstream has to bring it and hold the LLM to it. The LLM will not meet you halfway. It does not know there is a contract.
Be precise about what the LLM cannot do here, because the seductive part is that it looks like it can. Ask a language model to propose a vocabulary for credit ratings and it will give you a plausible one. But proposing terms is not the same act as establishing them. A Published Language is a social artifact before it is a technical one: it exists because several parties, the people who actually own the domain, have agreed to use words in a fixed way and to hold each other to it. The LLM can draft candidate language all day. It cannot be a party to the agreement, because it cannot make or keep a commitment. Which means the work that turns a schema into a Published Language is not a modelling task you can hand to the upstream. It stays with the domain experts on your side of the boundary, exactly where it always was.
Which closes the loop with the first half of the post. The MCP server is where that brought-in boundary lives: the structured surface, the tool contract, the schema, laid over an upstream that has no surface of its own. From the front it is an Open-Host Service to your consumers. From the back it is the one place the mud is forced out in a fixed shape. One artifact, both jobs. And that is what turns the schema into the real question of the post: a schema gives the mud a shape, but a shape is not yet an agreement. My own book says so. The clearest way to see why is to read an actual contract.
A contract, read closely
Here is a deliberately small one, built on the case study I use in my book and my trainings: Big Pug Loans, a retail bank's platform for selling mortgage loans to private customers. One of its bounded contexts, Scoring, assesses the risk of each applicant. In the original case study Scoring gets a creditworthiness rating from a Credit Agency over a conventional interface. Imagine that rating now comes from a language model that reads the applicant's submitted documents instead. Sanitized and simplified from work I have seen, this is the kind of tool contract a team exposes through an MCP server:
json
{
"name": "assess_credit_risk",
"description": "Returns a credit risk assessment for a loan applicant
based on submitted financial documents.",
"input_schema": {
"type": "object",
"properties": {
"applicant_documents": {
"type": "array",
"items": { "type": "string", "description": "Document text" }
}
},
"required": ["applicant_documents"]
},
"output_schema": {
"type": "object",
"properties": {
"risk_rating": { "type": "string" },
"rationale": { "type": "string" }
},
"required": ["risk_rating", "rationale"]
}
}This validates. It will pass review, ship, and serve traffic. It is also, in the sense that matters, not yet a Published Language, and the gap is the whole point.
In my book I make a distinction that does all the work here. A formalized interface is not the same thing as a Published Language. You can express the model of an Open-Host Service as an XSD, or here as a JSON Schema, and have something perfectly translatable and still not very expressive, because formalization alone is a low bar. What actually makes a Published Language is that several parties have agreed on it and established it. The shape is necessary. The agreement is what was missing.
Look at risk_rating. The schema says it is a string. It does not say which strings. So the LLM returns "high" one week, "High Risk" after a version change, "elevated" for an applicant a loan officer would have called moderate, and every one of those passes validation, because the schema constrains the shape and says nothing about the meaning. The contract is formalized. It is not agreed. Nobody on either side has established what the permitted ratings are, what each one means in this bank's lending policy, or what has to be true of an applicant before "high" is allowed.
A schema gives the mud a shape. It does not yet give the boundary a language.
Now tighten it toward one:
json
"risk_rating": {
"type": "string",
"enum": ["LOW", "MODERATE", "ELEVATED", "HIGH"],
"description": "Maps to Big Pug Bank lending tiers. ELEVATED requires a
documented compensating factor in `rationale`."
}The enum is not doing type-checking. It is carrying the bank's lending tiers across the boundary as fixed vocabulary the LLM is not permitted to extend. The description is not documentation, it is the agreement: ELEVATED carries a rule, a compensating factor has to be named in the rationale before the tier is allowed. These are the bank's terms, established by the bank, and the MCP server is where the LLM is held to them. My book makes a point of saying a Published Language need not be industry-wide; it can live inside a single organization, as long as the parties establish it. These four tiers are exactly that: a Published Language with an audience of one bank. (Evans reaches the same mechanism in his own example from the other end, conforming to a national standard taxonomy and rejecting anything the LLM returns outside it. National standard or in-house enum, the move is identical: the vocabulary is established, and the boundary enforces it.)
It is worth being honest that this description is carrying two different things at once. There is the meaning of ELEVATED, what the tier denotes in the bank's domain, and there is a policy about its use, that it may not be applied without a compensating factor. The first is squarely Published Language. The second is closer to a business rule, and a fair reader can object that business rules do not belong in an integration contract's prose. The boundary between the two is real, and worth keeping an eye on, because the failure mode it warns against is a team treating the schema description as the place where domain policy lives. For the small example it is fine to let the agreed vocabulary carry a thin layer of the semantics that makes it meaningful. The point at which the policy gets heavy is the point at which it has outgrown the contract and belongs in the domain instead.
And here is the part to sit with, because it is why this is a beginning and not an end. Everything I just tightened is structural. The enum constrains what the LLM may say. It does not constrain when the LLM is right. A schema-valid, vocabulary-conformant "HIGH" can come back for an applicant who is plainly low risk, and nothing in this contract notices, because the contract describes the shape of a correct answer and has no grip on the correctness of any particular one. That gap is not a flaw in the example. It is the limit of what a schema can do for a probabilistic upstream, even a schema promoted into a Published Language. It is also where the next post in the series begins.
A word on the example I deliberately chose
A careful reader will have an objection by now, and it is the right one: the LLM should not be producing risk_rating at all. The domain-aligned design has the LLM do what it is genuinely good at, reading documents and surfacing evidence, returning something like "documented missed payments in March and April," and leaves the Scoring context to apply its own rules and decide the tier. On that design the credit decision stays inside the bounded context that owns it, and the LLM supplies observations rather than judgments. That instinct is correct, and if you take one piece of design advice from this post, take that one.
I chose the riskier version on purpose, because it is the one teams actually ship and the one where the boundary's limits show most clearly. When the LLM returns evidence, a wrong reading is a wrong fact, and your domain logic still owns the decision. When the LLM returns HIGH, the decision has moved upstream, into an actor that establishes no language, drifts without notice, and cannot be held to account, and a perfectly conformant answer can still be wrong in a way nothing structural will catch. That is the difference between using a Big Ball of Mud as a source of observations and letting it make your domain decisions for you. Which design you are actually running, and who therefore owns the decision, is a question large enough that two later posts in the series are built on it. For this post it is enough to notice that the Published Language sits at the same boundary either way, and that it protects you far better when the thing crossing it is evidence than when it is judgment.
What the boundary buys you when the upstream drifts
Everything so far has been about standing the boundary up. The harder question is what it does for you over time, once it exists and the upstream starts to move. Open-Host Service and Published Language have always travelled together, the established pairing on the upstream side; my own book notes how routinely you see the two combined. What AI changes is not the pairing but the strain on it, because a Published Language is the one thing on the map that can hold still while the thing behind it does not, and a language model behind it does not hold still at all.
I learned this the unglamorous way, long before any of it involved a language model. Years ago I ran a German music magazine, and we wanted concert listings without maintaining them by hand, so we integrated against the API of a well-known music platform. I thought their domain model was excellent, a genuinely good account of concerts, tours, and festivals, exactly how I would have modelled it myself. So I did something I would now flag in a review: I let their model run straight through my application, from the database up to the UI. We were a textbook Conformist, by conviction rather than laziness, and for a while it was wonderful. Then the platform slimmed down its API and discontinued the part we depended on, and the cost of that decision arrived all at once, everywhere their concepts had reached, which was everywhere.
That is the ordinary version of the risk, and it is worth holding next to the LLM version because the difference is the whole point. When the music platform changed, it broke loudly. Endpoints disappeared, calls failed, and however painful the cleanup, I knew the moment it happened and I knew where to look. A Conformist to a conventional upstream at least breaks where you can see it. A language-model upstream does not give you that. When the provider swaps the version behind your MCP server, nothing fails. The calls still return, the schema still validates, the ratings still come back as members of your enum. What changes is that "ELEVATED" now means something slightly different than it did last month, and your Published Language keeps reporting conformance while the meaning underneath it has shifted. The loud break would almost be a mercy. What you get instead is the silent one.
A provider version change is the most obvious source of that silent drift, but it is not the only one, and it may not even be the most common. The same boundary sits in front of a prompt you can edit, a retrieval corpus you can update, a set of examples someone can swap out, and any of those can move the behaviour underneath a conformant schema just as quietly as a version upgrade. Which means the boundary is not only protecting you from an upstream you do not own. It is also where your own organization can introduce drift without noticing, which complicates the neat picture of an external provider as the only thing that moves. That is a thread for a later post, where the question of who actually owns all the moving parts gets its own treatment. For here, the point is narrower: whatever the source, drift registers against a Published Language and hides behind a bare schema.
So the boundary buys you two different things against two different failures, and it is worth separating them. Against a conventional upstream, the value of a Published Language is decoupling: each side translates into the shared vocabulary on its own terms, and the agreement absorbs ordinary change. Against a language-model upstream, the Published Language does something subtler. It cannot stop the upstream from drifting, nothing you own can do that, but it gives the drift a fixed thing to register against. If the bank's four tiers are established and enforced at the boundary, then when the LLM's behaviour moves you have a stable yardstick that the movement can be measured against, instead of a vocabulary that quietly moves along with it. The Conformist has no yardstick, which is why conforming to a language model is the trap the earlier post warned about: you drift with the upstream and never feel the floor move. The imposed Published Language is the floor.
Notice this does not make the boundary a pure Anticorruption Layer, and I do not want to pretend it is cleaner than it is. In practice you conform to some of what the LLM gives you and you isolate yourself from the rest. You accept its natural-language rationale more or less as written, you work with its notion of confidence, you shape your prompts around the way this particular version behaves. On other things, the tiers above all, you refuse to bend and force the upstream to meet your terms. Evans, mapping his own integration, ran into exactly this and decided it was not worth pretending the relationship was all one thing: part Conformist, part Anticorruption Layer, and you pick the aspect worth emphasising. The Published Language is what lets you choose where the line falls. It marks the concepts you will defend, and by omission the ones you have decided you can afford to let through.
What a schema still cannot do
I will not develop this here, because it belongs to its own post, but the boundary of the argument needs naming so the next one has somewhere to stand.
Everything the Published Language does in this post, it does structurally. It fixes the vocabulary, it enforces the shape, it gives drift a yardstick. What it does not do is tell you whether a given answer is correct. A language model can return a perfectly conformant rating that is simply wrong about this applicant, and no schema, however well established as a Published Language, has any grip on that. For a probabilistic upstream the contract about shape turns out to be only half the contract. The other half is a contract about behaviour, the kinds of situations the upstream should handle in particular ways, and that half does not live in a schema at all. It lives in evaluation suites, curated sets of cases where a domain expert has already decided what the right answer is, and the datasets behind them. That is the argument of a later post in this series, and it is the one that I think reframes the Published Language pattern more deeply than anything else AI does to the catalog. For now it is enough to have found the edge of what the schema can carry.
Back to the server
Which brings the two halves back together one last time.
The teams I described at the start were not wrong to invest where they invested. Standing up clean Open-Host Services with stable Published Languages is the right instinct, and under AI it is more right than it was, not less. What I would add to their instinct is only this: when you publish an MCP server, look at both sides of it. In front of you is the Open-Host Service, with all the upstream obligations that have always come with that pattern and do not soften because a protocol made the pattern cheap to stand up. Behind you is an upstream Big Ball of Mud with no boundary of its own, no Published Language you did not impose, and a habit of changing without telling you. The same server is doing both jobs. It is the host you offer outward and the boundary you hold against the mud.
That is why these two patterns are where AI changes the actor rather than the economics. The Open-Host Service and the Published Language did not change their definitions. What changed is who, or what, is standing behind them, and a pattern designed for an upstream you could call on the phone now faces one that does not take calls. The patterns still work. They just have more to hold up than they used to, which is the whole thesis of this series in a single boundary: the wall did not get less necessary because the upstream got stranger. It got more.
What comes next
This post took the two patterns where AI changes the actor and followed them to a single boundary. It also left several threads hanging on purpose, and each one is a post.
The one I am most wary of, and most drawn to, is the one I opened when I chose the riskier example deliberately. Once a language model is producing a domain decision rather than supplying evidence, the question of who actually owns that decision stops being rhetorical. You can have a team that believes it owns its model, with its own code, its own language, its own boundary on the map, while a meaningful share of that model's decisions are quietly shaped by an upstream the team never consciously integrated with and cannot negotiate with. The map says the team is in control. The behaviour says otherwise, and nothing on the map shows the gap. That is the strangest thing I think AI does to the catalog, strange enough that it may need a pattern the catalog does not have yet, and it is the post I am most nervous about getting right.
Closer at hand is the gap I spent this post refusing to close: a schema, even one promoted into a Published Language, fixes what the upstream may say and not whether it is right. For a probabilistic upstream the other half of the contract is about behaviour, and that half does not live in a schema. It lives in the evaluation suites I just mentioned, and I think those artifacts are quietly becoming the real Published Language for a probabilistic upstream. That is the next post, and it follows directly from where this one stopped.
And underneath both is the part of drift I named and handed off: that the behaviour beneath your boundary can move because of your own prompts, your own retrieval corpus, your own examples, not only because a provider shipped a new version. The question stops being how to stay isolated from an upstream you do not own and becomes who owns all the moving parts on your own side, which turns out to be harder to answer than it looks.
References
Eric Evans. Context Mapping with an AI-based Component. Domain Language, January 2026. https://www.domainlanguage.com/articles/context-mapping-an-ai-based-component/ The worked example treating Claude Sonnet 3.5 as a bounded context in its own right, conforming to NAICS as a Published Language and isolating the rest behind a translation boundary. The precise-naming-of-the-context point and the part-Conformist-part-ACL reading in this post both engage with it directly.
Vaughn Vernon. Implementing Domain-Driven Design. Addison-Wesley, 2013. Source of the Big Ball of Mud definition quoted here, and of the canonical catalog phrasings this series builds on.
Michael Plöd. Hands-on Domain-driven Design by Example. The schema-is-not-a-Published-Language distinction, the in-house Published Language argument, the Conformist-by-conviction reading, and the Big Pug Loans case study are all developed there.
Michael Plöd. Context Maps in the Age of AI. The opening post in this series. The three flows, the load-bearing and scaffolding distinction, and the one-line versions of every pattern this post takes up.