CORE IDEA: We should have one schema everywhere rather than sprinkling schema all across an application.
Schemas exist to facilitate communication and aid comprehension. It's a way of defining expectations for what kinds of content (below I will refer to them as "types") will be sent or received and to help with the interpretation of the message.
Schemas exist in human life wherever communication takes place. Schemas even existed before writing. Even when communication is oral, there are certain expectations about the "type" of response one might give in response to a question. Consider the following exchange:
Indeed, humor often leverages a "schema violation" to surprise the hearer as the core of the joke. See Abbot and Costello's famous "Who's on first?" sketch.
After writing, schema became even more important because the reader would be consuming the written content asynchronously. The "writer" wouldn't necessarily be around to clarify the statement in the case of schema violation.
Over time, we developed conventions about how to specify the "type" that a writer was using when they wrote down a message. Tables or spreadsheets are one such convention. Each column has a specific "type" to it.
(picture of a ledger)
Consider the following:
(picture of a check with the incorrect "type" in all of the fields)
When the bank receives the above check, they will be unable to deposit it. It fails "schema validation." It's not about whether a specific account has the correct amount of money in it. Rather, the message is nonsensical because the types of the fields do not conform to an expected schema.
What are these "kinds" of data, anyways? In computing, we frequently refer to them as "types." There are only a few "basic types" (building blocks?):
Looking carefully at the basic types, we discover things are much more interesting. Text can describe a name of a city or a person, a brand of an automobile, a type of cloud, a command, so many things.
We build up a set of complex types from these basic building blocks: cities, states, company names, categories of things, even parts of speech.
So, a schema is simply defining the type of response that's expected when communicating a message to someone, whether orally or in writing.
Schemas are everywhere in computing systems. Schemas are the types in our programming languages that allow us to communicate with the compiler and with other software developers. Schemas are the description of our databases that describe how data is represented. Schemas are the public fields of a class that explain what functions and the types that are expected by that function. Schemas are our REST APIs, our GraphQL APIs, our tRPC. Any process-to-process communication, passing data back and forth, communicating between different computing systems, those are all described by a variety of different schema languages. Even event systems within a single program leverage schemas.
Our user interfaces are also rich in schema. Think about forms or any data we present on the screen. It has certain schemas we expect or accept when it comes to input. It has certain schemas that we expect for output. Just look at a tweet and break down the different elements of a tweet.
So schemas are everywhere in computing, and we have a variety of ways of making explicit the types of things.
We've established that schemas everywhere in computing systems. Let's focus on schemas in one specific domain: web and mobile applications.
(diagram of client-server architecture: front-end <> API <> backend <> database)
The default architecture for web and mobile systems today is client-server. With a client-server architecture we disperse the schema across the system in order to avoid over-coupling and enable the independent systems to develop more quickly. We only allow one program, often called the "backend", to interact with the data directly. Data is stored in a database layer, and that backend usually interacts directly with the data layer through some type of ORM that has some specified schema. The backend is aware of the schema of the data at rest in the database, and it knows how to change that data over time. And that allows the backend to move quickly as it needs to add additional features. It can add those features, update the schema of the database on disk, and add the features.
That describes merely a single program operating over it's own data. There are no other writers reading to and writing from that data. How do you enable interop with multiple applications? Add an API layer, of course.
And that API layer has a schema as well, usually one that is different than the database layer. And the argument here for creating this additional layer is that you can provide an API to others that you can hold steady while modifying the program underneath to add additional functionality. So in other words, you can migrate the schema, add columns, remove columns, rename columns, do all kinds of things while holding the API steady. So you're making a commitment, a contract to external readers and writers. Again, the communication thing, you're saying, hey, we can communicate with outside parties while also changing the way that we operate on the inside. And that kind of separation of concerns or whatever is considered to be valuable.
Creating that API layer is an entirely separate set of code from your main program, your database layer, and probably has like a limited set of capabilities relative to the program and the data store itself.
To use an analogy, an API is like the data that you want is on the other side of a wall and instead of what you really want is, yeah. And an API is like punching holes in that wall to make little windows when really what you really want is a door. You really want to walk through to that other side and have complete and total access to that data. That's not the way APIs work. It's like a window, you can peek at one thing at a time and you have to kind of peek at each thing individually, reconstruct on your side of the wall, your own copy of the data. And if you don't have enough windows, well, yeah, it's just a super inefficient process. Whereas really you just want a door to walk through and fiddle and twiddle with all of the data at will.
What might be a better way? This is the sort of the schema problem and this enables readers and writers to like move along. We've got the code for the reading and writing stored in this API layer. A better way, I believe, would be to Use a single schema that is used by the main writing program and used by all of those potential readers and writers that are also reading and writing from this data. We flip the model. Instead of the data living inside of the program's purview, the data now lives outside of the program's purview and the program asks permission to read and write from that data. How could that possibly work? Every time a schema changes or the program changes, programs needs to change, it needs to change the shape of the data. Build the tools into the schema layer such that schemas can diverge and readers and writers can still read and write for all versions. The schema becomes a graph of related schemas and transformations between different versions. Then we transform on read and whenever data is written, it's written with a specific version attached.
Imagine when the Roman government or Roman, the state of Rome transformed from Roman numerals to Arabic numerals. Imagine they got a form that they had been used to receiving with Arabic numerals on it. It was still in Roman numerals and they're like, ah, crud. "Does anybody remember how to decipher Roman numerals?" so that they can read this old form? Somebody way back in the office still remembers how to do that. They're able to decode the form because they are able to transform the data written in a newer format when they read it. The important thing is there was a reader around with an old version that could read and transform that data forward into the new Arabic system.
I think that's where we're headed is we can change the architecture and simplify the way that we build software. By removing the API layer and creating a schema migration layer and good tooling around that, that allows programs to access the data directly. And that allows us to build much more powerful programs that are natively interoperable with one another that can migrate the schema at will as they need to.
Use cases to solve for:
We want our schema to do several things:
There are several places where we need to leverage schema in order to unify around a single usage of schema across the architecture.
Roughly, communication takes place at several junctures
programmer <> compiler
user <> user interface
program <> persistence
program <> program
Solving the different schema communication problems in the post-app architecture
Evolving the schema over time
Schemas for system interop
As an aside, LLMs are restoring some of the flexibility we had when humans were the primary interpreters of schema. Today, our computing systems are very bad at dealing with schema violations[1]. There are whole debates among computing languages about how to handle data that is incorrect from the perspective of schema: compile-time type checking, dynamic typing. Even at runtime there are different ways of handling type errors: bubbling them up to the user, bubbling them up to programming language, how should the program fail when a schema is violated.
The interesting thing is humans are really good at reinterpreting schemas in the case of a violation. We can deal with the fuzziness. LLMs have this same capability. Putting an LLMs into the mix brings back the fuzzy interpreter we used to have when humans were doing all the interpretation of the messages. Marrying an LLM as a fuzzy interpreter with a precise interpreter like a computing program is actually a huge benefit that should extend the power and range of our computing system, should make them easier to program, to understand. It will make them more non-deterministic, but it will make them more permissive in what they accept. See Postel's Law: "Be conservative in what you do, be liberal in what you accept from others."
What I care most about is the ability to enable multiple web, mobile applications, command line tools, all of these things to interoperate over a shared set of user data that spans all sorts of different types and shemas and expectations, but the applications that are interacting with it, modifying it, updating it, writing it, are all working over the same set of data. How do we enable this kind of thing? What does schema have to say about this?
See Code as Law paper↩︎