Protocol Buffer - A Beginner's Walkthrough

MehreenTahir

4.89/5 (16 votes)

Sep 18, 2018

CPOL

8 min read

32492

This article will introduce the third option for data serialization. It is a beginner’s walkthrough of Google Protocol Buffer. Let's move beyond XML and JSON.

Protocol buffer is language-agnostic binary data format developed by Google to serialize the structured data between different services. Now if you didn’t get all those heavy terms at first, it’s fine. Allow me to walk you through it until everything becomes clear. So for this article, we’ll be talking about:

What Protocol Buffer actually is?
Why Protocol Buffer?
How do they work?
General Structure
Demo

Protocol Buffer

To fully get the concept of protobuf, we first need to understand what is serialization and what are the problems we had that needed to be solved. As Wikipedia explains serialization:

“Serialization is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection link) and reconstructed later (possibly in a different computer environment)”.

Which in simple words mean that we need to serialize data if we are to store it or transfer. But what should be the format in which data should be serialized? Here are few fixes for this:

The raw, in-memory data structures can be sent or saved in binary form. But what if I want to retrieve that data over some other memory layout? A straight forward No! That’s not how we play. The code must be compiled with the same memory layout and endianness, etc. Also it’s really hard to extend this format.
Serialize the data to XML (Extensible Markup Language) or JSON (JavaScript object notation). This approach is great for front end since it is human readable but what about backend server to server communication? These formats are space intensive. Also, encoding and decoding can impose great performance penalty on application.

So what now? Yeah, you can sure go ahead and invent your own serialization technique. This will give you flexibility but not a good idea when you have tremendous amount of data. You need specified protocols to handle huge traffic. That’s exactly where protobuf appears in the picture. They are like XML but more flexible, efficient, automated, smaller, faster and simpler. You need to define the structure of data once and then a special generated source code is used to read and write the structured data to and from a variety of data streams and languages.

Why Protobuf?

I pretty much laid the foundation for why we should be already considering protocol buffer but if that’s not convincing enough, let’s discuss some more to make the usage of it reasonable.

There are usually two types of considerations associated with data storage and transmission:

Size
Efficiency

XML and JSON are designed to be human readable and self-describing which means they are to be text-based. Now this comes with a cost as you need to encode data to transport the message and then decode on other end. Thus it increases message size because some schema information needs to be included along with the message for it to make sense.

Protocol buffer on the other hand is not self-describing, instead they work through binary serialization which means that they encode and compress the data to binary stream which is extremely lightweight and easy to transfer. Reportedly, they take almost 1/3^rd of the size of XML and 1/2 size of JSON if compared. Also smaller message requires less time for it to be transferred which contribute to efficiency. Protobuf is reportedly 6 times faster than JSON.

In addition to above mentioned advantages, here are a few more:

Validation
Easily extensible
Guaranteed type-safety
Backward compatibility
Language independence
Faster serialization/ deserialization

Why Now?

Now you might be wondering why you haven’t heard of it before and if so, why now that we’re talking about them? Here’s an extra sweet for you.

Yep! Although they’ve been around since ten years but most people do not know about them because they were first used by Google ‘internally’. And why now is because of the “Pokemon Go” game.

Image Credit: PokeMon Go

It is created by Niantic and uses protobufs for data transfer. It was the success of this game that gave the hype to the use of protobufs publicly.

Any Cons?

Well, I wouldn’t really call these cons but there might be situations when protobufs might not be that helpful. As we already talked, they are not targeting human readability. So if you want your data to be human readable, then protobufs are not a good fit. If your browser is directly consuming the data from service, then it might be a better option to choose other serialization techniques. Also, people tend to use XML or JSON more often because they have a good community support whereas protobufs lags behind there. Don’t expect a very detailed documentation neither do so many blog posts or article targeting development using protobufs. But, this project is open source so you can sure go ahead and experiment with the things.

How Does Protobuf Work?

You need to specify the structure of the data along with the services that you’re serializing by defining the message types in .proto file. Think of this message as a logical record of information in which you specify message with values. That code then goes to compiler which compiles it with protoc. A predetermined schema is used to encode and decode the message.

Image Credit: Researchgate

Now that the working makes sense, let me give you a very basic .proto message example to elaborate the structure.

message Movie {
  required string title = 1;
  required string genre = 2;
  }

What we did here is defined the context in the message name Movie which has two fields, title and genre with the identifier as 1 and 2. You can specify the fields as optional, required, and repeated. Keep in mind that this is the string representation of what actually would be done in binary.

Demo

So I’m believing that now terms make sense so enough with the talk, let’s get our hands dirty.

Environment Setup

Note: Although Protobufs supports almost all languages and target all the major platforms, i.e., Linux and Windows, I’ll be covering only the C++ installation for Windows. If you’re interested in working with some other language or platform, then access the protocolbufs documentation here.

In order to build protobuf with MSVC on Windows, you need the following tools:

CMake
Visual Studio
Git (optional)

Go ahead and install the above tools to follow along.

Once you have everything installed, open the Visual Studio command prompt and navigate to your working directory. Once there, execute the following command:

$ mkdir install

This is just going to create a folder where protobufs will be installed after build. Before going ahead, make sure cmake and git are added to the system PATH variable. If not, they can be added to the PATH by executing the following commands:

$ set PATH=%PATH%;C:\Program Files (x86)\CMake\bin

$ set PATH=%PATH%;C:\Program Files\Git\cmd

Now you’re good to clone the protobuf locally.

$ git clone https://github.com/protocolbuffers/protobuf.git

If you choose not to use git, then you can simply download the package from git repository that exists at https://github.com/protocolbuffers/protobuf/releases/latest.

Once you have the repo locally, navigate to the project folder protobuf, and then to cmake folder.

$ cd protobuf

$ cd cmake

Now we need to configure the CMake. For that, follow along executing the following commands:

$ mkdir build & cd build

You need to update any submodules if you are using git clone.

$ git submodule update --init --recursive

Makefile generator can build the project in only one configuration, so a separate folder is required for each configuration.

For Release configuration:

$ mkdir release & cd release

$ cmake -G "NMake Makefiles" ^ 
-DCMAKE_BUILD_TYPE=Release ^ 
-DCMAKE_INSTALL_PREFIX=../../../../install ^ 
../..

For Debug configuration:

$ mkdir debug & cd debug

$ cmake -G "NMake Makefiles" ^
 -DCMAKE_BUILD_TYPE=Debug ^ 
 -DCMAKE_INSTALL_PREFIX=../../../../install ^
 ../..

Any of the above commands will generate nmake Makefile in the current directory. After this, you’re good to turn to Visual Studio. Navigate back to build folder and execute the following command. Remember to specify the Visual Studio version that you are using. I’m using VS 2017 community edition.

$ mkdir solution & cd solution

$ cmake -G "Visual Studio 15 2017 Win64" ^ 
 -DCMAKE_INSTALL_PREFIX=../../../../install ^
 ../..

Time to compile the protobuf. Remember the configuration you specified earlier and choose accordingly at this stage as well. Navigate to the build/release folder and execute.

$ nmake

Once compiled, you can run the unit tests as:

$ nmake check

If all the tests are passed, do the installation:

$ nmake install

Now we are good to create our project. Create a new folder in which you want to keep your project. In order to use the protobufs, you first need to define the .proto file. Let’s take the example of a simple Project management system. Use any text editor of your choice and create a file projectmanagement.proto. Remember the extension to the file name .proto.

Add the following to the file:

//projectmanagement.proto
package projectmanagement;
message Developer
{
    required string first_name = 1;
    required string last_name = 2;
    required string email = 3;
}
message Project
{
    required string title= 1;
    optional string url = 2;
    repeated Developer developer= 3;
}

The above code specifies two messages along with fields. This should be familiar to you now as we have already talked about the structure of the message.

You need protoc compiler to compile the above code. The release/debug folder contains the protoc.exe file which was generated while performing configurations. Now either you can add that file to the system PATH or just copy it to the current working directory and execute the following command to compile the code.

$ protoc --cpp_out=. projectmanagement.proto

Once the command is successfully executed, you’ll see that there are two files generated as:

projectmanagement.pb.h
projectmanagement.pb.cc

Let’s look at some of the generated code in the header file. If you scroll down, you’ll see the accessors defined for you.

accessors -------------------------------------------------------

  // required string first_name = 1;
  bool has_first_name() const;
  void clear_first_name();
  static const int kFirstNameFieldNumber = 1;
  const ::std::string& first_name() const;
  void set_first_name(const ::std::string& value);
  #if LANG_CXX11
  void set_first_name(::std::string&& value);
  #endif
  void set_first_name(const char* value);
  void set_first_name(const char* value, size_t size);
  ::std::string* mutable_first_name();
  ::std::string* release_first_name();
  void set_allocated_first_name(::std::string* first_name);

  // required string last_name = 2;
  bool has_last_name() const;
  void clear_last_name();
  static const int kLastNameFieldNumber = 2;
  const ::std::string& last_name() const;
  void set_last_name(const ::std::string& value);
  #if LANG_CXX11
  void set_last_name(::std::string&& value);
  #endif
  void set_last_name(const char* value);
  void set_last_name(const char* value, size_t size);
  ::std::string* mutable_last_name();
  ::std::string* release_last_name();
  void set_allocated_last_name(::std::string* last_name);

  // required string email = 3;
  bool has_email() const;
  void clear_email();
  static const int kEmailFieldNumber = 3;
  const ::std::string& email() const;
  void set_email(const ::std::string& value);
  #if LANG_CXX11
  void set_email(::std::string&& value);
  #endif
  void set_email(const char* value);
  void set_email(const char* value, size_t size);
  ::std::string* mutable_email();
  ::std::string* release_email();
  void set_allocated_email(::std::string* email);

That’s the same for our second message too. This is another great feature of protobufs. You can easily use these setters and getters just like you do in routine. Below is the little program just to elaborate the ease protobufs provide when it comes to accessors.

//protobuf_sample.cc
#include <iostream>
#include <fstream>
#include "projectmanagement.pb.h"

using namespace std;
int main()
{
    projectmanagement::Project project;
    project.set_name("Sample");
    project.set_url("http://www.sample.com");
 
    projectmanagement::Developer *developer = project.add_developer();
    developer->set_first_name("ABC");
    developer->set_last_name("XYZ");
    developer->set_email("someone@example.com");

    cout << "Project: " << project.name() << endl;
    cout << "URL: " << (company.has_url() ? company.url() : "N/A") << endl;
    cout << "Developers: " endl;
    cout << "First name: " << developer.first_name() << endl;
    cout << "Last name: " << developer.last_name() << endl;
    cout << "Email: " << developer.email() << endl;
    return 0;
}

The above code is self-explanatory. We are just assigning some values to the fields using setters and then getting the output.

// output:
// Project: Sample
// URL: http://www.sample.com
//
// developers:
//
// First name: ABC
// Last name: XYZ
// Email: someone@example.com

Last Words

This is not it with protobufs. You can encode the above data to binary or dump data back from binary to human readable text format which is pretty sweet. I would highly encourage you to experiment along and see how things turn out to be. Also, protobuf’s documentation contains the tutorials for other languages as well. You can access it here. Go ahead and get the taste.