Lab Notebook

My Talks

My Writings

Pair Programming with LLMs, My Evolving Workflow

Six strategies for collecting new music

Solving the schema problem

The Proper Uses of Speed in Product Discovery

Towards a Unified Schema for Software

Unifying Schema in Local-first Software Systems, Draft 2

Unifying Schema in Local-first Software

Unifying the Schema with Local-first Architecture

WorkSquared Technical Whitepaper

Readwise

weekly-newsletters

Unifying the Schema with Local-first Architecture

Have you ever needed to add a single field to a form on a web app and, several hours later, found yourself with ten different files open across multiple repositories, adding a line here and a line there just to get the data to flow through the whole stack properly?

Today's web and mobile apps spread schema and type definitions all across the stack. The shape of data is defined in many different places, often with subtle differences, leading to duplication, subtle bugs, and general developer unhappiness. However, local-first app architecture offers the opportunity to unify all of the schema and type definitions into a single definition. This article will show you how that works.

In this article, we will:

Look at an example client-server app and consider how a simple change propagates through the app.
Introduce a local-first architecture and describe how it differs from client-server architecture.
Show the same example app as a local-first app and look at how the same change is made much simpler through unifying the schema.
Consider some other use cases that are made possible due to a unified schema definition.

Before we dig in to the examples, let's define the term schema as I use it in these articles.

Schema defines the shape of the data

When you hear the word "schema" what comes to mind? Chances are you thought of a database schema. In a relational database, data is stored in rows within tables, and we typically define the schema using SQL or in code using an ORM. The database's schema defines the shape of the data in the database: the names of the columns, which columns are in which tables, what constitutes valid data, even default values for columns.

But schema doesn't just show up in databases. Schema turns up everywhere in software. Rather than thinking of schema as simply a specification for the database, consider this expanded definition:

A schema is an explicit, formal definition of the shape of some data: the names and types of fields and what qualifies as valid data for those fields.

By this definition, schema shows up in many places. Let's look at three: database schemas, domain objects, and types.

Database Schemas, Domain Objects, Types are all about Schema

Traditionally, there has been a divide between database schemas, enriched domain objects, and types.

Database schemas are like the database specification: they define the shape of the data at rest. Tables, column names, column types, default values.

Domain objects tend to define how the data is stored in memory and passed around in the program. Think of ActiveRecord or your favorite JavaScript ORM like drizzle. The validations may be more rich than the validations in the database, using arbitrary code such as regular expressions to ensure the data is valid. They may also package up utility functions for working with the object, such as parsing the object from JSON or representing the object as a string.

Types are yet another form of schema, mainly used to help the developer write correct code. The type checker tends to have less robustness than the validations on the database.

Each of these three express facts about the schema: the shape of the data. However, the differences between them lead to subtle differences in capabilities and a lot of duplication. Since all these are defined as code it requires coordinated changes and deployments. And you have to manage compatibility between them as the 'emergent schema' evolves between them.

Let's look at an example application and observe the effects of a simple change as it touches all of those different definitions of schema.

Simple changes with a client-server application

For the sake of this article, we'll use a simple Contact management app, similar to Contacts on macOS and iOS. The interface looks like so:

The app is built with Next.js and React on the front-end, with a Prisma client and some other stuff... I used the t3.gg generator. Quite nice!

Now that we've introduced our example application, let's make a simple change: we need to add a new field to the Contact. Let's walk through each of the changes.

We started by updating the user interface to add the new field to the form. When adding the form field, we had to choose the correct type of input field. Even the input field's type is a form of schema. Some form builders are able to infer the type of input field from a data schema.

Furthermore, we had to add client-side validation logic for the field, sometimes referred to as optimistic updates, to provide that snappy experience when entering data. That validation logic will end up getting duplicated a few more times.

Next we had to add the field to our front-end state management solution, Redux and update the front-end types.

The API layer needed the new field as well, and we once again have to specify the type. We also must duplicate the validation logic. While the input field has been validated on the front-end before entry, we don't know that other clients without validation might try to access the API with invalid data. Furthermore, the validation logic on the front-end might get out of sync and we need to catch that.

Do we need to update the backend's TypeScript types?

In the ORM, we have to add a column to the table, once again specifying the type of that column, and once again specifying the validation logic. Just as with the API, we must validate. We can't take a chance on bad data getting into the database as that could lead to runtime application errors when manipulating that data.

Notice the number of places we had to update the schema in order to make this simple change. The data type of the new field is repeated in the UI, the front-end, the API, the ORM, and the database schema. The validation logic exists in at least three places: the front-end state management solution, the API, and the database schema. Furthermore, all these duplicated definitions must be kept in sync and rolled out simultaneously.

See this pull request to see all of these changes made against our example application.

What are some ways we could make this better?

Existing solutions for unifying the schema across the stack

There are some approaches that help reduce the duplication.

"Full-stack frameworks" like Ruby on Rails and, more recently, Redwood.js and others. These frameworks focus on defining the database schema using an ORM like ActiveRecord or Prisma and then generating the database migrations and the API layer, and sometimes even the front-end logic and forms. These solutions have fallen out of fashion in favor of splitting the front-end and the back-end with solutions like a Next.js front-end and an entirely separate back-end.

Kent Dodds wrote an article entitled "Fully Typed Web Apps" where he demonstrated using Remix to get end-to-end type safety, starting with a Prisma database definition.

These solutions manage to unify different aspects of schema in a few places, though not all.

The reality is that the current mainstream architecture of client-server makes it challenging to fully unify schema across the entire application. In order to fully unify the schema, we need to change the architecture.

Local-first Architecture Flattens the Stack

The typical client-server application has a client, sometimes called a frontend, running in the browser or a native app, a database running on a server, a backend with the business logic that typically wraps the database, and an API between frontend and backend.

Software built according to local-first software principles follows a different architecture. It starts with one simple change that has far-reaching impact: move the database to the client. According to local-first software principles, the software must run on the user's device and work offline. Therefore, the database must live on the client. Every client now has a local copy of the database and we rely on special data structures to keep the databases synchronized across clients. Synchronization can either be peer-to-peer or via a generic sync server.

Moving the database to the client means the business logic moves as well. Now that the frontend and backend are on the same system, there's no need for an API layer, so that simply disappears. The diagram now looks like this.

This architecture is becoming more common. There are already several projects and frameworks which have adopted this general architecture:

Jazz.tools
Evolu
ElectricSQL?
DXOS

Let's take another look at the Contacts application, but this time as a local-first application with a single, unified schema.

A local-first Contacts app with a unified schema

The DXOS framework follows the emerging local-first architectural pattern described above. The data is stored in ECHO, a key-value store based on Automerge, and is synchronized peer-to-peer over WebRTC using libp2p. Identity and encryption is provided via public-private keypairs. ... some more description.

Unifying the Schema with Local-first Architecture

Interactive graph

On this page

Unifying the Schema with Local-first Architecture

Schema defines the shape of the data

Database Schemas, Domain Objects, Types are all about Schema

Simple changes with a client-server application

Existing solutions for unifying the schema across the stack

Local-first Architecture Flattens the Stack

A local-first Contacts app with a unified schema