While designing APIs consumed by clients (mobile or web applications), a common problem is how to enforce the structure of your data messages between the server and the client.

As of today, the JSON format is probably the most used format of exchange used for APIs. JSON is good because it's simple, easy to read and write, and has implementations in almost every language in the world of programming.

However, the forces of JSON can become flaws, as the complexity of your data structures increase with time and growth. The fact that it has no out-of-the-box way to enforce data structure validation, this important aspect is usually left on the side on the road.

Google's Protocol Buffers (Protobuf) format takes a different approach. According to the official documentation, Protobof can be described as:

A language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

It's a structured and compiled format, which means that you have to define precisely the structure of your data message, and that the actual data cannot break this schema.

Protobuf serialization/deserialization process happens in a similar way than JSON, adding structural checks on your data message. On one side of the communication, a message is generated and compiled by the Protobuf library into compressed binary format, which makes it very light to transmit on network. On the other side, the message will be decompiled to the right format depending on the language you use.

An interesting thing is also that you now need only one definition of your messages accross all your applications, server or client. You could put your Protobuf definitions in a separate code repository that would be retrieved by each project, and then compiled to the right language format. By keeping a single data definition across all of your application components, you enforce schema consistency and avoid data exchange errors.

First thing we need is to install the protobuf compiler used to compile the Protobuf definition format to types in the language of your choice. On OSX, you can install it using Homebrew:

$ brew install protobuf

Or, on Ubuntu:

$ sudo apt-get install -y protobuf
Bootstrapping

As an sample project, let's say we want to build a small application to browse live music venues. We'll be using Ruby on Rails for the backend API and a webapp with Javascript for the frontend.

For the sake of simplicity, we'll assume that instead of having separate code repositories, we have the two projects (API and frontend) sitting side-by-side on our machine. Since we want to share the same Protobuf definitions between the two projects, we'll put them in a third shared directory:

protobuf-project/  
  api/
  web/
  shared/

Let's bootstrap the Rails application first. We'll need three models: Artist, Location, and Venue, the former holding references to both an artist and location entity.

$ cd api/
$ rails new . --api
$ rails g model Artist name:string genre:string bio:text
$ rails g model Location country:string city:string place:string
$ rails g model Venue venue_date:datetime artist:references location:references
$ rake db:migrate RAILS_ENV=development
Creating the data message definitions

We'll define three data messages, one for each model:

// shared/protobuf/artist_message.proto

syntax = "proto3";

message ArtistMessage {  
  int32  id    = 1;
  string name  = 2;
  string bio   = 3;
  string genre = 4;
}
// shared/protobuf/location_message.proto

syntax = "proto3";

message LocationMessage {  
  string country = 1;
  string city    = 2;
  string place   = 3;
}
// shared/protobuf/venue_message.proto

syntax = "proto3";

import "artist_message.proto";  
import "location_message.proto";

message VenueMessage {  
  int32  id                = 1;
  ArtistMessage artist     = 2;
  LocationMessage location = 3;
  string venue_date        = 4;
}

We now have to compile these definitions to Ruby classes using the following command:

$ mkdir api/app/messages
$ protoc --proto_path=shared/protobuf --ruby_out=api/app/messages shared/protobuf/*.proto

This will generate, in the api directory, the Ruby classes ArtistMessage, LocationMessage and VenueMessage inside the respective files app/messages/artist_message_pb.rb, app/messages/location_message_pb.rb and app/messages/venue_message_pb.rb. If you take a look at the contents of these files, you'll note that the Google protobuf gem is actually used to dynamically generate the plain Ruby classes. We have to install this gem in our Rails project to make it work:

# api/Gemfile

gem 'google-protobuf'  

Note that, as the generated files don't follow Rails naming conventions, they cannot be autoloaded by the framework, so we have to manually require all the files in the app/messages directory. Also, you will probably have to change the require at the top of venue_message_pb.pb to load the whole path of the included files artist_message_pb.pb and location_message_pb.pb to avoid errors

# config/application.rb

Dir["#{Rails.root}/app/messages/*.rb"].each { |file| require file }  

We can now create endpoints to retrieve artists and venues data, serializing the models to Protobuf data using the class that was generated for us.

# config/routes.rb
Rails.application.routes.draw do  
  resources :artists
  resources :venues
end  

We're gonna add some convenience methods to the Artist model. from_message and to_message will be used respectively to create a new model from a message and to convert a model to a message object. Also, we'll have two methods encode and decode to generate raw protobuf data from message objects and vice-versa.

We can use these two last methods in the controller to handle requests and send Protobuf data to the client. On the other end of the connection, the client code will be responsible to have its own implementation to read and generate protobuf data.

# app/models/artist.rb
class Artist < ActiveRecord::Base  
  has_many :venues

  ###
  # Create a model from a message object
  def self.from_message(message)
    Artist.new.tap do |a|
      a.id = message.id
      a.name = message.name
      a.bio = message.bio
      a.genre = message.genre
    end
  end

  ###
  # Create a message object from model
  def to_message
    ArtistMessage.new(
      :id => self.id,
      :name => self.name,
      :bio => self.bio,
      :genre => self.genre
    )
  end

  ###
  # Encode model data in protobuf format
  def serialize
    ArtistMessage.encode(self.to_message)
  end

  ###
  # Decode protobuf data and hydrate model
  def unserialize(data)
    message = ArtistMessage.decode(data)
    Artist.from_message(message)
  end
end  
# app/controllers/artists_controller.rb
class ArtistsController < ApplicationController  
  before_filter :find_artist

  def show
    send_data @artist.serialize
  end

  def find_artist
    @artist = Artist.find params[:id]
  end
end  

We can add similar methods to the Location and Venue models to build the rest of the API.

# app/models/venue.rb

class Venue < ActiveRecord::Base  
  belongs_to :artist
  belongs_to :location

  def to_message
    VenueMessage.new(
      :id => self.id,
      :artist => self.artist.to_message,
      :location => self.location.to_message,
      :venue_date => self.venue_date.to_s
    )
  end

  def encode
    VenueMessage.encode(self.to_message)
  end
end  

Dealing with collections

We can now serialize a single entity to Protobuf format and send it to the client. However, we should also be able to serialize entities collection in case the client needs a list of artists or venues.

Protobuf cannot serialize a list of messages out of the box. Instead, such a list must be contained into another type of message. This means we have to create another top-level message to hold a collection of single messages. Let's create these two new types, ArtistCollectionMessage and VenueCollectionMessage

// shared/protobuf/artist_collection_message.proto

syntax = "proto3";  
import "artist_message.proto";

message ArtistCollectionMessage {  
  repeated ArtistMessage artists = 1;
}
// shared/protobuf/venue_collection_message.proto

syntax = "proto3";  
import "venue_message.proto";

message VenueCollectionMessage {  
  repeated VenueMessage artists = 1;
}

The repeated protobuf directive indicates that the field will hold an array of the specified type.

We can now compile again the new definitions into plain Ruby classes and add some code so we can generate a Protobuf collection message from a list of records.

# app/models/artist.rb
class Artist < ActiveRecord::Base

  # ...

  ###
  # Encode all models to ArtistCollectionMessage
  # protobuf message
  def self.serialize_all
    message = ArtistCollectionMessage.new(
      :artists => Artist.all.map {|a|
        a.to_message
      }
    )
    ArtistCollectionMessage.encode(message)
  end

  ###
  # Decode a ArtistCollectionMessage protobuf
  # to a collection of Artist models
  def self.unserialize_all(data)
    message = ArtistCollectionMessage.decode(data)
    message.artists.map {|a|
      Artist.from_message(a)
    }
  end
end  
# app/controllers/artists_controller.rb
class ArtistsController < ApplicationController  
  before_filter :find_artist, except: [:index]

  def index
    send_data Artist.serialize_all
  end

  def show
    send_data @artist.serialize
  end

  def find_artist
    @artist = Artist.find params[:id]
  end
end  

We now have an API ready! Since Protobuf has implementation for a lot of languages and platforms, you could use for exemple Javascript for the client side to build a web app that talks Protobuf with your backend, or use it as well for iOS or Android native applications.