Protobuf (short for Protocol Buffers) is a great efficient and fast mechanism for serializing structured data. Companies are adopting it for inter-communication between micro services.

If you haven’t used protocol buffers before, head here and come back.

One of the use cases I encountered while using protocol buffers was to convert multiple Protocol Buffers Kafka streams to JSON kafka streams for logging.

The problem with this is for adding a new protocol in .proto file every time I had to recompile the package to generate the necessary classes. One workaround was to run the protoc command at job startup but that doesn’t seem like a neat solution.

Dynamic Messages come to the rescue

So protocol buffer people already saw this coming and that’s why they have created what is known as DescriptorSet.

A FileDescriptorSet is basically a description of the proto file i.e. it’s name, it’s package name, it’s dependencies and the messages it contains. To generate a FileDescriptorSet just use

It’s important to note that the path should be absolute path and not relative path. The above command will generate a desc file for you which contains the descriptor of the abc.proto file.

Once you have the descriptor file, you can simply read it in any language using a standard file reader and use the InputStream to create a FileDescriptor Object. I’ll be using Scala here for this purpose

So what I’ve done here is that I first read the .desc file as input stream in scala. Then, I converted that input stream to a Descriptor set object.

A FileDescriptorSet contains many File Descriptors corresponding to the main proto file as well as it’s dependencies. In the next step, I extract a FileDescriptor for abc.proto file using it’s name

Once you FileDescriptor, you can get descriptor for a particular message again using it’s name which is what I have done.

Now any serialized TestProtoMessage can be deserialized using DynamicMessage and TestProtoMessage descriptor.

But, How does it help me?

Well it makes deployment a lot easier for me. Earlier, I had to recompile jar every time I added a new message in the .proto file and push it in production.

Now, if there is no code change, I can simply use the same jar. Just generate a descriptor file and push it somewhere in production machine and change the config to point to that file and also add/change the message name and file name in configs for extracting the descriptors.

What about performance?

When you generate code for serialization/deserialization using protoc (using default options), it’s optimised for speed (You can look for more options here in optimise for section). So it’s obvious it’s performance will be much better than using a descriptor to parse a message in real time.

The real question is how worse it is compared to generated code.

Serialization/Deserialization Performance (in nanoseconds/op)

As we can see, during deserialization Dynamic Message take almost 5X time as generated message while during serialization it takes only 2X more time. This is still better compared to JSON which is used everywhere.

The performance test was done using JMH-Benchmark in Java8 on Macbook Pro (13-inch, Early 2015) edition with specs:

  • 2.7 GHz Intel Core i5
  • 8 GB 1867 MHz DDR3

The library for JSON processing used is Jackson. You can definitely achieve better JSON performance using libraries such as DSL-JSON or rapid JSON. But these libraries are not as popular as jackson which is used in most of the libraries these days and supports almost all of the datatypes and even Scala Objects and joda-time.

However, I’ll advice you to use this approach only when you have to frequently add or remove or change messages from proto file with respect to upstream or downstream systems. Otherwise, you can stick to generated code for better performance as well as memory usage.

If you need any more comparisons or more details on how to use protobuf, drop a mail and connect with me on LinkedIn, if you are interested in working on interesting stuff as this.