Avro

Binary encoding formats offer many advantages such as compact storage, fast encoding/decoding, and well thought out ways to evolve your schema. But they often require a great deal of developer buy in with DSLs and distinct compilation steps. Avro has the same advantages as other binary encoding formats but has a superior developer experience.

Binary Encoding

Binary encoding formats are a way to encode an object to a compact binary representation. This is generally done by defining a schema in a domain specific language required by the format. That schema tells the encoders and decoders what fields need to be encoded, the types of those fields, and what order those fields appear in.

These formats have a lot of advantages:

Compact: The encoded binary is compact; this means it costs less to pass around and store.
Efficient: The encoding and decoding are fast; this means that senders and receivers spend less time executing.
Schema: The encoding is done with a strongly typed schema; this means that the receiver can feel more confident in the payloads it decodes
Evolution: There are explicit mechanisms to evolve the schema; this means that both forwards and backwards compatibility can be maintained

We will be looking at a couple of these formats:

Avro
Protocol Buffers (protobuf)

Example

For the example we will be encoding a simple object:

{
  "id": "1"
}

We'll start with how this is done with protobuf. We first need to define a schema for that object in a domain specific schema definition language:

message Example {
  required string id = 1;
}

Next we need to use a CLI tool to do some code generation:

npm install ts-proto
protoc --plugin=./node_modules/.bin/protoc-gen-ts_proto --ts_proto_out=. ./example.proto

This will generate an example.ts file that can be used to encode and decode:

import { Example } from "./example";

const encoded = Example.encode({
  id: "1",
}).finish();

const decoded = Example.decode(encoded);

This all works but I don't love that I have to a use a DSL to define the schema, and a discrete tool to generate code before any encoding and decoding can occur.

The Avro approach is different. We can define our schema in plain old JSON:

const type = avro.Type.forSchema({
  type: "record",
  name: "Example",
  fields: [{ name: "id", type: "string" }],
});

There also isn't a discrete compilation step. The type variable is ready for encoding and decoding:

const encoded = type.toBuffer({ id: "1" });
const decoded = type.fromBuffer(encoded);

Reader and Writer Schemas

In most of these binary encoding formats there is an expectation that the sender and receiver are using the same schema. They can be at different versions, but they should be the same schema. This is not the case with Avro.

With Avro the reader and writer can have different schemas, and these are called the reader and writer schemas. The writer will write everything in its schema out, but there is some flexibility on the reader side:

The reader schema does not have to have every field in the writer schema
The fields in the reader schema do not have to appear in the same order as they do in the writer schema
The fields in the reader scheme do not have to be in the encoded binary (as long as they have a default value)

This gives you a ton of flexibility. For example:

import avro from "avsc";

const writer = avro.Type.forSchema({
  type: "record",
  name: "Example",
  fields: [
    { name: "name", type: "string" },
    { name: "id", type: "string" },
  ],
});

const encoded = writer.toBuffer({ name: "example", id: "1" });

const reader = avro.Type.forSchema({
  type: "record",
  name: "Example",
  fields: [
    { name: "id", type: "string" },
    { name: "address", type: ["null", "string"], default: null },
  ],
});

const resolver = reader.createResolver(writer);
const decoded = reader.fromBuffer(encoded, resolver);

console.log(decoded);

Example { id: '1', address: null }

The reader schema does not have name so it is ignored, and the writer schema does not have address so it default value is used.

Note the createReolver call. We can't have a reader schema without a writer schema. You might be wondering how the reader would get the writer schema. Well there are a couple of ways:

Get it from a schema registry
Use a container format

Conclusion

Avro gives you the advantages of a binary encoding format, but has a uniquely excellent developer experience.