Most of the business applications which are used today communicate with each other using HTTP 1.1 using REST. This has become a standard in the industry due to its sheer simplicity and almost no effort required to integrate the applications.

However, once your application goes to the scale of handling a million requests per second and more, the shortcomings of the above mechanism become apparent. Let’s address those first —

No Multiplexing

A single HTTP 1.1 connection can only be used to send one request and response at a time. This connection is blocked until the response is received which is highly inefficient.

Limited compression

HTTP 1.1 headers are not compressed which leads to an unnecessary increase in request data size. In HTTP/2, they are compressed using HPACK algorithm which has 99% compression ratio.

Text-based transmission

HTTP 1.1 is text-based which is highly inefficient for transmission of data. They require complex parsers and also don’t support high levels of compression.

HTTP 2 was made to get rid of all these limitations. It supports multiplexing which allows a client to send multiple parallel requests over a single connection. The headers are compressed using HPACK compression algorithm and transmit binary data.

The webmaster article HTTP/2 vs HTTP/1 provides a detailed comparison along with performance metrics for both protocols.

In this article, we’ll be exploring how you can use HTTP/2 in your service. Currently, the most popular way to do this is to use gRPC by Google.


In simple words, it is a web framework used to connect multiple services using HTTP/2. It is currently used by hundreds of companies to efficiently connect microservices.

gRPC services differ from REST in the way that they don’t expose endpoints but methods/procedures. The client simply calls a method as if it is locally implemented and in the background, an RPC call is sent to the server which contains the actual implementation of the method.

The interface of the service and its methods is defined in a .proto file. This is because gRPC uses Protocol buffers to transfer the data between various services as well as to generate client and server stubs. This results in an order of magnitude faster serialization and deserialization speeds.

The RPC call is done over HTTP/2. This allows gRPC users to automatically leverage all the features of the above protocol.

Let’s create a simple Hello World service in gRPC using Java. The example provided here can be found on

Define the service

First, we define the interface of the service in a .proto file. Let’s name this file greeter.proto

We name the service Greeter. This service contains a method SayHello. The method accepts a HelloRequest object and returns a HelloReply object.

Next, we define the format of HelloRequest and HelloReply ProtocolBuffers.

Generate the interfaces

Next, we’ll generate the service interface and client stub using protobuf compiler. Create your usual maven java project. Next copy the greeter.proto file in src/main/proto folder.

Once it is there, we can use maven proto compiler to generate Java classes from this file.

The classes will be generated in target/generated-sources/protobuf directory.

Implement the service

We can now extend the service interface generated to implement the methods.

Implement the client

Clients can now simply call this sayHello method and the service will return the response via HTTP/2

Run the server

You can either use the server bundled with grpc or you can use external frameworks which already provide grpc bindings such as Vert.x Java

Now, you have successfully implemented a basic gRPC service.

Load balancing

The non-trivial part of running a http/2 service in production is load balancing. gRPC breaks the standard connection-level load balancing i.e. to create a new connection to another instance for a request which is provided by default in Kubernetes or HAProxy. This is because gRPC is built on HTTP/2, and HTTP/2 is designed to have a single long-lived TCP connection.

The solution is to do request level load balancing. This means to create long-lived connections and then distribute requests across those connections.

The easiest and most effective way to do this is to use linkerd2. It is a service-mesh which can run beside your Kubernetes/Mesos or any other cluster. Linkerd2 serves as a proxy for the incoming request. Since it is written in rust it adds a very minimal delay (<1ms) and load balance requests across host machines which it can detect through k8s API or DNS.

You don’t need to configure anything extra in Linkerd2, it handles HTTP 1 and 2 traffic by default.

If you want to learn more about gRPC, linkerD or http/2 you can refer the links below:

  1. gRPC Load Balancing on Kubernetes without Tears
  2. HTTP/2: the difference between HTTP/1.1, benefits and how to use it
  3. gRPC Java — Basics