Service Discovery as a Service is the missing serverless lynchpin

Changing a functions dependent resources after deployment is a critical step towards feature parity with traditional architectures

Ben Kehoe
6 min readMay 24, 2017

When we talk about serverless architectures, we often talk about the natural fit between serverless and microservices. We’re already partitioning code into individually-deployed functions — and the current focus on RPC-based microservices will eventually change into event-driven microservices as serverless-native architectures become more mature.

We can generally draw nice microservice boundaries around components in our architecture diagrams. But that doesn’t mean we can actually implement those microservices in a way that achieves two important goals: 1) loose coupling between services, and 2) low end-to-end latency in our system.

This blog is the first part of the series exploring the missing pieces to achieve a vision for loosely-coupled, high-performance serverless architecture using AWS as an avatar for all serverless platform providers.

Missing Piece #1: Service Discovery as a Service

In this post, I’ll focus on loose coupling. In particular, I propose that the lack of Service Discovery as a Service as part of a provider’s ecosystem causes customers to implement their own partial solutions.

I’ll define loose coupling as the ability to change the resources that a given function uses after deployment. There are two important use cases for this:

  1. circular dependencies between services — meaning that all the resource names generally cannot be known until after deployment
  2. the ability to update a microservice without requiring redeployment of the dependent microservices
Serverless deployments without Service Discovery as a Service

In serverless deployments without Service Discovery as a Service, the functions exist in the same namespace. They are connected to other functions within their deployment through environment variables, which are fixed at deployment time. Updating one function requires an update/deploy to all callers — and every function must be deployed with the full physical resource ids that it uses.

Service discovery allows us to keep our code from having to know exactly where to find the resources it depends on. An important part of this is the service registry, which gives us the ability to turn a logical name (e.g., UsersDatabase) into a physical resource id (arn:aws:dynamodb:us-east-1:123456789012:table/UsersDatabase-MVX3P).

If this mapping is known at deployment time, serverless platform providers generally have a way of including it in the deployment; for example, environment variables in AWS Lambda functions. But these mechanisms don’t allow for change without redeploying the function, so they don’t fulfill our need.

My experience has been that everybody who has implemented a serverless system has built their own way of solving this — which is pretty much the definition of undifferentiated heavy lifting. Any remote parameter store that is updatable at runtime will suffice. The EC2 Systems Manager parameter store is a good option.

At iRobot, we have solved this by using our tooling to inject a DynamoDB table into every deployment (i.e., it writes it into the CloudFormation template) to act as a runtime-updatable key-value store. The auto-generated name of this table is injected into each Lambda function’s environment variables using the CloudFormation resource support for env vars.

Service discovery in a traditional architecture

With service discovery in a traditional architecture, the service registry provides a mapping from logical name (e.g., “A”) to a physical resource id (e.g., v1.a.domain in Deployment 1, v2.a.domain in Deployment 2). The isolation by VPC or subnet provides separation between the deployments.

Separation of Environments

To step back a bit further, there’s another advantage provided by service discovery mechanisms in traditional microservice architectures: separation of environments.

Infrastructure as a Service offerings like on EC2 have comprehensive mechanisms for separating groups of resources. On AWS, this is accomplished at the highest level with Virtual Private Clouds (VPCs), which, as the name implies, completely partition EC2 resources into separate silos. Within a VPC, subnets can be used to further isolate instances from each other.

This separation is leveraged to create independent sets of service discovery information, such that the service discovery information itself can have a well-known name, rather than also needing some sort of lookup. For example, it can be accomplished through DNS, which works because the networks of different VPCs are isolated, so the DNS lookups for the same name in each can have different results.

Another option is a configuration manager like Zookeeper, etcd, or Consul — which works because the configuration manager deployments in different VPCs don’t know about each other. As result they don’t conflict, but have a well-known name within each VPC/subnet.

As noted by Martin Fowler, this separation isn’t currently present in any provider’s offering. On AWS, Lambda functions can be run in a VPC, but that is heavy-handed and complicated just to gain logical separation between the functions. This means that, for whatever remote parameter store is being used, there still needs to be a mechanism for separating those parameters between deployments.

With EC2 Systems Manager parameter store, this means the Lambda functions need to understand prefixing, and that prefix needs to be delivered to the function through its env vars. For iRobot’s solution, we create a DynamoDB table with each deployment, inject its name into an environment variable in every Lambda, and we have a library, injected into each packaged Lambda code, that uses it as a parameter store.

Azure actually provides this capability in Azure Service Fabric, but it is currently not available for use with Azure Functions.

Service Discovery as a Service

Service Discovery as a Service tags functions and make a non-namespaced call to the service discovery service (e.g., Get(“A”)), which uses the tag to index into the namespace (e.g., Env1). At deployment time, the functions need only be tagged with an immutable identifier.

Service Discovery as a Service

The functionality that is really needed is a new feature or service as part of the providers’ platforms. We need Service Discovery as a Service (SDaaS) — or more precisely, Service Registry as a Service.

What would this look like? I see it as relatively simple; a key-value store with multiple distinct namespaces. But the crux is this: when making a Get call, the namespace is chosen based on some property of the caller, rather than selected explicitly. Of course, explicit selection would also be available.

For example, a standalone version of this service could use the IAM role of the caller. This would have the added advantage of being usable by server-based implementations as well. A version integrated into AWS Lambda could leverage the recently-added tagging functionality.

To be fully functional as SDaaS, the service would have to allow phase rollouts of changes to the namespace selections. That is, it should support blue-green updates to the values that a given caller receives.

Whatever form this service takes, it would eliminate the need for customers to build their own solutions, allowing them to focus on the tasks specific to their needs and reducing the barrier to entry in the serverless space. As a critical step towards feature parity with traditional architectures, Service Discovery as a Service is the missing lynchpin for serverless.

Update: Tim Wagner, the GM of AWS Lambda and API Gateway, asked some good questions and I wrote a long response that forms an appendix to this post.

Update 2: Paul Johnston wrote about what he thinks is missing, the (loose-coupling-related) concept of event routers.

--

--