Unifying Schema in Local-first Software

# Unifying Schema in a Local-first Web App _Last Updated: June 4, 2024_ > [!NOTE] Drafting > This is the third draft of an article on schema. Here's the [[Solving the schema problem|first draft]] and the [[Towards a Unified Schema for Software|second draft]], for reference. **tl;dr:** Local-first software systems have an opportunity to unify schema across the system due to a different architectural pattern than the traditional layered architecture, enabling a better DX and novel user experiences. ## Why read this article? - Single resource for understanding schema as it relates to local-first software systems - Slightly enlarge the idea of what a schema is and where it's used - Demonstrate how the local-first architecture allows you to use a single schema definition to unify many usages of schema in the software system - Show off some novel user experiences enabled by this single schema definition ## Adding a field to a client-server web app > [!NOTE] Example application forthcoming > My plan is to build a little Next.js demo app that we can use in the following section for realistic code samples. Let's look at an example application for managing contacts built with a common web app stack: a React front-end with Redux?, a GraphQL API, an Express backend, and a Postgres database. Here's the UI: %% UI of the thing %% We need to add a new field to Contacts. Where do we have to make changes? - Add a field to the form - Update the front-end store - Update the TypeScript types in the front-end - Add a field to the API layer - Update the TypeScript types on the back-end - Migration for the database That's a lot of changes! ### Define the schema in one place What if instead of all all those changes you could define the shape of the data in one place using a syntax that looks something like this: ```typescript export const Contact = { name: S.string, // required email: S.string, // required, validation regex phone: S.string, // valid phone number website: S.string, // optional, valid URL }; ``` When you make changes to this schema, including adding a new field, the application automatically updates, from the database to the front-end. How might we achieve something like that? First, we need to expand how we think about defining the shape of data. And second, we need to change the architecture. Once we have a single schema throughout the application, it enables some interesting scenarios and we will take a look at those towards the end. ## Schema defines the shape of the data When you hear the word "schema" what comes to mind? Probably a database schema. In a database, the schema defines the shape of the data. For relational databases, data is stored in rows within tables, and we typically define the tables using SQL or an ORM %%define ORM%% such as Prisma, as we did in our example above. Rather than thinking of schema as simply a specification for the database, consider this expanded definition: > A schema is an explicit, formal definition of the shape of some data: the names and types of fields and what qualifies as valid data for those fields. By this definition, schema shows up in many places in software systems. Traditionally, there has been a divide between data schemas, enriched domain objects, and types. This divide leads to subtle differences in capabilities and a lot of duplication. Since all these are defined as code it requires coordinated changes and deployments. And you have to manage API compatibility between every layer as the 'emergent schema' evolves between them. Let's unify our thinking about types and validity in this one term: schema. To put this definition to use, let's return to our example of adding a new field to the Contacts app. For each change, look for the changes to schema we had to make. We started by updating the user interface to add the new field to the form. When adding the form field, we had to choose the correct type of input field. Even the input field's type is a form of schema. Some form builders are able to infer the type of input field from a data schema. Furthermore, we had to add client-side validation logic for the field, sometimes referred to as optimistic updates, to provide that snappy experience when entering data. That validation logic will end up getting duplicated a few more times. Next we had to add the field to our front-end state management solution, Redux and update the front-end types. The API layer needed the new field as well, and we once again have to specify the type. We also must duplicate the validation logic. While the input field has been validated on the front-end before entry, we don't know that other clients without validation might try to access the API with invalid data. Furthermore, the validation logic on the front-end might get out of sync and we need to catch that. Do we need to update the backend's TypeScript types? In the ORM, we have to add a column to the table, once again specifying the type of that column, and once again specifying the validation logic. Just as with the API, we _must_ validate. We can't take a chance on bad data getting into the database as that could lead to runtime application errors when manipulating that data. This broader definition of schema represents both type and validity. Note how many times we specified the type and validation logic in this simple example. The "type" of the field is represented in five places and the validation logic is duplicated three times. All these definitions are specified in code, often in different codebases, requiring coordinated changes and deployments to keep them in sync. ### Existing solutions for unifying the schema across systems I know what you are thinking: there are existing approaches to reduce the duplication! Here are a few: - Prisma - Rails - RedwoodJS - RSC? - Remix + Zod: "Fully Typed Web Apps" These solutions manage to unify types or schemas in a few places, though not all. The current mainstream architecture of client-server makes it challenging to fully unify schema across the entire application. In order to fully unify, we'll need to change the architecture. ## Change the architecture: move the database to the client The typical client-server application has a client, sometimes called a frontend, running in the browser or a native app, a database running on a server, a backend with the business logic that typically wraps the database, and an API between frontend and backend. ![[Pasted image 20240302065112.png]] Let's make one change to this architecture: simply move the database to the client. In this architecture, every client now has a local copy of the database and we rely on special data structures to keep the database synchronized across the clients. The synchronization can either be done peer-to-peer or via a generic sync server. Moving the database to the client means the business logic moves as well. Now that the frontend and backend are on the same system, there's no need for an API layer, so that simply disappears. The diagram now looks like this. This architecture is becoming more common with the rise of [local-first software](https://www.inkandswitch.com/local-first/). There are already several projects and frameworks which have adopted this general shape: - Jazz.tools - Evolu - ElectricSQL? - DXOS With this local-first architecture architecture, schema is still necessary. We need schema in order to: - Define the shape of the data at rest and in memory - Enable compile-time type checking - Validate the data conforms to the schema to maintain data integrity - Serialize the schema along with the data in order to distribute to other systems Let's take another look at the Contacts application, but this time as a local-first application with a single, unified schema. ## A local-first Contacts app with a unified schema The [DXOS framework](https://dxos.org) follows the emerging local-first architectural pattern described above. The data is stored in ECHO, a key-value store based on [Automerge](https://automerge.org/), and is synchronized peer-to-peer over WebRTC using libp2p. Identity and encryption is provided via public-private keypairs. ... some more description. %% DXOS architecture diagram %% We use the [Effect Schema library](https://effect.website/docs/schema/introduction/), embedded with reactivity by a signals library, to infuse the [ECHO key-value store](https://docs.dxos.org/guide/echo/) with schema. > [!warning] Warning: API under active development > The code samples below feature an API that we're actively iterating on as we wrap up the migration to Effect Schema. You can find the full [Contacts sample app on the DXOS GitHub](https://github.com/dxos/contacts-app) if you'd like to browse the source directly. Let's start by looking at how we define schema. This syntax leans heavily on [Effect Schema](https://effect.website/docs/schema/introduction/), a descendant of [Zod](https://zod.dev/), but we add our reactive ECHO layer on top of it. ```ts import * as S from "@effect/schema/Schema"; import { TypedObject } from '@dxos/echo-schema'; export class Contact extends TypedObject({ typename: 'dxos.app.contacts.Contact', version: '0.1.0' })({ name: S.string, // required email: S.string, // required, validation regex phone: S.string, // valid phone number website: S.string // optional, valid URL }) {} ``` From that schema definition you can instantiate objects. ```ts import * as E from "@dxos/echo-schema"; const contact = create(Contact, { name: "John Doe", }); ``` Objects are fully reactive and can be manipulated directly: ```tsx import { Filter } from "@dxos/echo-schema"; // query example const contacts = space.db.query(Filter.schema(Contact)); // mutation example contact.email = "[email protected]"; ``` Validations are specified like so: ```ts const EMAIL_REGEX = /^[a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$/; export class Contact extends TypedObject({ typename: 'dxos.app.contacts.Contact', version: '0.1.0' })({ // ... omitted email: S.string.pipe(S.pattern(EMAIL_REGEX)), // ... omitted }) {} ``` Validations get run whenever the object's values are changed. Effect Schema's ParseErrors are really nice! Think dynamically adding errors to a UI, building error messages for API layers, etc. Echoes of ActiveRecord... ```ts // 🔥 throws a ParseError contact.email = "NOT_AN_EMAIL"; /* The @effect/schema ParseError Parsing failed: { email: "NOT_AN_EMAIL" } └─ ["email"] └─ does not match pattern */ ``` We serialize the schema to JSON Schema and save it to the data store. This enables any reader of the data store to inspect the shape of the data, while maintaining data integrity through validations. Schemas are also dynamic. They can be modified at runtime like so. ```ts // Add example for run-time schema modification ``` ### Adding a field to a local-first web app with a unified schema Now that we've described the changes to the system architecture and how schema is defined, how would the changes to add a single field work? We start by extending the schema with the new field. We add the validations in line with the field's definition. ```ts export const Contact = extends TypedObject({ typename: 'dxos.app.contacts.Contact', version: '0.1.0' })({ // ... someField: S.string // What field should we add?? }) {} ``` Now that the field is added, we need to update the UI with the new field: ```jsx // adding the field ``` And that's it! TypeScript types update automatically. Client-side input validations are handled by the same validations that ensure valid data in the database. There's no API to update and because the data store is on the client, there's no need for a state management solution. The ECHO data store will handle data with the new field seamlessly, no migration needed. Finally, these changes can be rolled out with a single deployment rather than having to coordinate across multiple systems. ## Interesting Schema-enabled Scenarios When you have a single unified schema across the entire architecture, new scenarios are enabled. - Tables can define dynamic schema at runtime - Tables can open existing schemas, even schemas from other apps like Contacts - Changes in the database change the data in both places - Apps can also be plugins inside Composer to provide additional functionality - Drag-n-drop a Contact into a Stack and visualize - Can use "schema shape" to inform an LLM and extract Contacts from a document ### Cross-app interop with runtime schema discovery By serializing the schema to the database, we enable cross-app interop scenarios where two different applications can read/write from the same data store simultaneously while maintaining schema integrity. ### Inter-app interop via drag-and-drop Composer's drag-and-drop functionality is schema-aware and uses an inversion of control model to enable plugins to expose functionality about themselves to other plugins. ### Schema-shaped responses from LLMs We can write LLM prompts that request responses in the shape of the schema. ## Future Improvements ### Schema migration If you have any experience with schemas in software systems, you have probably been wondering how we handle schema change. The short answer is we do not handle this case yet. At this time, we do not do automatic data migrations based on changes to the schema. We see the above work as creating a foundation on which we can build a robust schema migration system. ### Schema serialization with arbitrary code execution As mentioned above, we currently serialize schemas to JSON Schema and store them in the data store. However, JSON Schema does not serialize filters (Effect Schema validations) that contain arbitrary code, so some schema validity will not be maintained across systems. We intend to investigate distributing schemas as JavaScript packages and dynamically loading those packages from a schema repository at runtime, enabling arbitrary code to be serialized as well. We do support adding metadata to fields which could be used to look up functions for a field. We already use this in our Table plugin where we add metadata about how many digits you need when representing a number.