Lab Notebook

My Talks

My Writings

Pair Programming with LLMs, My Evolving Workflow

Six strategies for collecting new music

Solving the schema problem

The Proper Uses of Speed in Product Discovery

Towards a Unified Schema for Software

Unifying Schema in Local-first Software Systems, Draft 2

Unifying Schema in Local-first Software

Unifying the Schema with Local-first Architecture

WorkSquared Technical Whitepaper

Readwise

weekly-newsletters

Unifying Schema in Local-first Software Systems, Draft 3

Why read this article?

Single resource for understanding schema as it relates to local-first software systems
- Targeted at average web and mobile developers, focused on something they would be familiar with.
Slightly enlarge the idea of what a schema is and where it's used
Demonstrate how the local-first architecture allows you to use a single schema definition to unify many usages of schema in the software system
Show off some nice use cases enabled by this single schema definition

Tone:

Conversational, encouraging, discovering something new
Do: try out DXOS and systems like it. Consider how you might unify schema around a single definition.

Consider a typical web app consisting of a front-end, an API, and a back-end. If you wanted to add a new input field to an existing page in your app and store that field in the database, how many places in the codebase would you have to change? How many different systems would have to be re-deployed? How many times would you have to re-type the name of that new variable and specify it's type?

Is there another way?

What if you could define the schema of the data in a single place and any change to schema would percolate throughout the system? That's the promise of unifying the schema.

In this article, we will define data schemas broadly and consider their usage in software systems. Next we will talk about an emerging architecture that "flattens the stack" and show examples of how that architecture enables the schema to be defined once and then used throughout the system. Finally, we'll discuss some of the trade-offs of this architecture and describe some ongoing research to make it feasible.

Schemas describe the shape of data in a software system

When you hear the word "schema", what do you first think of? Probably a database schema. With a database, we define the shape of the data that will be stored in the database. For relational databases, this is usually done with SQL, and we define a table at a time, like so:

CREATE TABLE Contact (
    Name VARCHAR(255),
    Email VARCHAR(255),
    Color VARCHAR(50), -- Color in hexadecimal, i.e., #FF00CC
    Emoji VARCHAR(8) CHARACTER SET utf8mb4 -- A single valid emoji
);

Rather than thinking of schema as simply a specification for the database, consider this expanded definition:

A schema is an explicit, formal definition of the shape of some data: the names and types of the fields and what it means for the field to contain valid data of that type.

When we consider schema as defining the shape of data, we see that schema shows up in many places in software systems. In fact, schema shows up anywhere there is data, whether stored at rest or passing between two systems. Types, APIs, database schemas, even function calls and user interfaces are rife with schema. Schema is everywhere.

APIs express schema

A common architectural pattern is to place an API layer between the database and other systems that want access to the database. Whether the approach is REST or GraphQL, the general pattern is the same: there are named endpoints and each of those named endpoints has fields which have some types and constraints on them. The API layer may also do some validation of the fields to ensure they conform to the schema that goes beyond the types of the field.

example?

Types express schema

In statically-typed languages like TypeScript or Java, declaring the types for variables is describing the schema.

Function signatures express schema

The arguments and return types of a function also express schema. Frequently a set of values actually express some implicit aggregate type that is not formally represented elsewhere.

User interfaces express schema

The user interface is also riddled with schema, from the display of information to forms used for inputting and updating information.

It should come as no surprise that we discover schema littered throughout software systems. Software is about data; displaying, transforming, moving, storing it. What should be surprising is that there are so many diverse ways of representing the schema of the data.

Let's look at an simple software system and examine the different places that schema is defined.

A typical web application

Consider the architecture of a typical web application: a frontend, an API, a backend, and a database. There are typically many other components, but these form the core of the architecture. In a typical client-server application, the schema is defined separately in each of those individual systems.

A New Outline

Local-first software systems have an opportunity to unify schema across the system due to a different architectural pattern
- This creates a simpler developer experience and some novel user experiences
- In this article , we will:
  - define schema and consider it's usage in software systems
  - differentiate between the layered architecture and local-first architecture
  - explore schema within layered and local-first architectures
  - demonstrate a local-first system with a unified schema
Schemas are an explicit, formalized representation of the shape of data in a software system
- Software systems have many places where we utilize the shape of the data to specify the system
  - Functions
  - APIs
  - Database schemas
  - Types
  - User interfaces
- In most software systems, there exists no unified way to describe the schema for all of the ways that we use it
  - This leads to a lot of "boilerplate" where we have to translate between different shapes of the data, parse, validate, etc
  - This leads to schema divergence, where different aspects of schema are described or enforced in different areas of the architecture
Local-first software systems choose a different architectural pattern than the typical layered architecture of today's web and mobile apps
- For web and mobile software today, the default architecture is a layered approach
  - Front-end - API - backend - database
  - How did we get here?
    - The first web apps were perl scripts
    - Then PHP - PHP had SQL interspersed in the HTML / front-end. There was no API or "backend" because the front-end accessed the db directly
      - This was super unmaintainable for large software systems!
        
        You need a backend to do things that your front-end doesn't do
        
        What if the database schema needed to change?
        
        SQL is tightly coupled to the schema - have to change it all over the place
        
        Introduced ORMs
        
        Also, need backends to do things (background jobs)
        
        Offload things from the client
        
        Now we have SQL everywhere
        
        ORM everywhere
        
        ORMs have their own schema
    - This was OK for a while because the web model was
      - browser ask the backend for a page
      - backend hits the DB, creates the HTML, sends to the browser
    - Web apps became more interactive
      - That round-trip became too slow
      - And we wanted offline
      - Now we have full state, and even databases in the client
  - The challenge for web apps is how to keep the state synchronized across a rich front-end and a database backend - it's a distributed system!
- Local-first architecture moves the database to the client and relies on special data structures to keep the database synchronized across clients
  - While the local-first software article doesn't prescribe a specific architecture, it does lay out principles that constrain the possibilities
  - We seem to be seeing an architectural pattern emerge across local-first software systems and frameworks
    - Automerge
    - ElectricSQL?
    - Jazz.tools
    - Evolu
    - DXOS
Now that we have described these two architectures let’s compare where schema is defined and used within each of them
- Layered architecture has schema at every layer: Schema is spread across a rich front-end with state management, an API layer, a backend, and the database itself
  - DIAGRAM
  - some have attempted to unify the schema across layers
    - Prisma
    - Rails
    - RedwoodJS
    - RSC?
    - Remix + Zod: "Fully Typed Web Apps"
- local-first software systems can define the schema once and use in several places
  - DIAGRAM
  - define the data at rest
  - types
  - validating data entry
  - generation of the UI
examples from a local-first system
- defining the schema
- inferring types
- instantiating, mutating, and replicating an object
- reactive objects update automatically
- validations
- dynamic errors
- schema serialization and discovery
Demos

Scrap Pile

In most software systems, there exists no unified way to fully describe the schema. Consider the case of the Contact table above. The emoji field is specified as a VARCHAR(8)^[1]. Any character could be stored in that column, including characters that are not emojis. Where do we define the set of acceptable emojis? How about email? We need some email validation regular expression. And color? Another regular expression.

With database schemas, the typical solution is to use an ORM like ActiveRecord or Sequelize and define validations on a field.

Let's consider our opening challenge: adding a field to the Contact record. Where would we have to make changes?

Walk through all of them.

What about a different architecture? What if we colocated the database with the front-end?

Describe the different architecture.

Most emojis are encoded in UTF-8 and can vary in byte length, typically ranging between 1 to 4 bytes, but with newer emoji sequences and variations, some can extend up to 8 bytes or more.↩︎