Implementing a Custom gRPC Resolver
For the last few years, I have been working on a project within Vitess called VTAdmin1.
It’s an operator-friendly API and UI replacement for the older vtcltd UI control panel.
In order to function, it needs to make gRPC requests to both vtctld
and vtgate
components in one (or more!) Vitess deployments.
To connect to vtgate
and vtctld
components, VTAdmin relies on a Discovery abstraction, which defines an interface for fetching lists of VTGate and Vtctld addresses:
|
|
VTAdmin wraps these gRPC connections within structs called, respectively, vtsql.DB
(which proxies to VTGates) and vtctldclient.Proxy
(which proxies to Vtctlds).
In the original implementation, when calling Dial
on one of these components, VTAdmin would use the Discovery implementation for that cluster to lookup a single address, and call grpc.Dial
to that address.
This works great for a proof-of-concept, but has several fundamental problems that would surface over time.
Problem 1: Uneven Load
The model of vtadmin-api
, by design, is to Dial
a proxy to a VTGate or Vtctld the first time we need a connection to that component.
Then, in future Dial
calls, the proxy sees it already has a connection; if it’s still “healthy”, we reuse the connection and return immediately.
But if the connection ever becomes “unhealthy”, we throw it away, discover a new dial address, and make a new connection (see “Problem 2” for how this went in practice).
In effect, this meant that for the life of the process, VTAdmin would send every single RPC to a single host. For small use cases, this is Fine™, but it’s certainly not great, and could knock over components at high enough usage.
Of course, you can spin up multiple vtadmin-api
processes, and hope that they each pick a different gate, but the API is so lightweight that this feels like a very silly and unnecessarily expensive “solution.”
Problem 2: gRPC vs Proxies
By far the biggest issue with this approach happens when a VTGate or a Vtctld goes away.
Sometimes, there’s network weather, and gRPC’s ClientConn
will internally handle the business of re-establishing a connection to the server, and we’re all good.
The word “internally” there, though, is both a blessing and a curse.
You don’t have to worry about — or manage — these details at all, but at the same time, gRPC2 exposes almost none of this to the end user of a ClientConn
, no matter how much they legitimately need it.
Which makes extremely relevant the fact that both VTGates and Vtctlds are stateless components. Stateless components are awesome! You can put them in an auto-scaling group (ASG) and let your cloud provider make sure you have sufficient capacity, and cycle your hosts so they don’t stick around forever. You can easily implement a blue-green deploy mechanism based on setting the version you want to run and simply terminating hosts in the ASG.
And all of this is great, and will make your life as an operator simpler. You’ll also break VTAdmin.
You see, gRPC’s “transient failure retry” mechanism can’t, as far as I was ever able to observe, tell the different between network weather (“this host is gone for now, but it’ll come back, just keep trying to reconnect”) and more permanent failure (“this host has gone to a server farm upstate, you can try all you like but it’s never coming back”).
So, if VTAdmin had Dial
-ed a Vtctld, and the ASG cycled it (or you did a deploy, or you manually terminated, or, or, or), the only way to get VTAdmin working again for that cluster is to completely restart the vtadmin-api
process.
What we actually need is for VTAdmin to:
- Realize the connection is gone and fully
Close
it. - Re-discover (via its
Discovery
implementation) a new host to dial andgrpc.Dial
that address.
But all we have is the sledgehammer of a full restart. That’s a big bummer, and also requires manual intervention.
We took a few attempts to address or mitigate this problem, which worked … well enough for an MVP, but they weren’t perfect, and required our proxies to leak details about gRPC internals in some cases.
Problem 3: We are Lying to You
I will admit this one is not a technical problem3, per se, but falls within my endless crusade of “words mean things.”
vtsql.DB
and vtctldclient.Proxy
describe themselves as proxies.
And in one sense, they are — they do sit between VTAdmin and another component, shuttling requests and responses back and forth.
But, they are the thinnest proxies imaginable.
They are really more of passthroughs than proxies, and, while having a passthrough is completely fine, and in many cases useful, calling a passthrough a proxy is confusing at best, and might create false expectations from both users and developers.
grpc.Resolver
API
Luckily for us, gRPC has a (not widely-advertised4) API to plug in your own mechanism for providing a list of addresses (a “resolver”) to a client connection (a ClientConn
).
gRPC will then use a second layer (a “balancer”; also pluggable, also not widely-advertised4) to select one or more of those address to create actual, for-releasies net.Conn
connections to (SubConn
) and route RPCs to.
gRPC’s defaults are to use a resolver called passthrough
, and a balancer called pick_first
.
The pick_first
balancer (roughly) walks through the address list it gets from the resolver and creates a SubConn
to the first address that successfully connects.
All future RPCs are sent to that SubConn
.
You can also use a service config to specify other load balancer policies, like round_robin
, which does what it sounds like.
But, we’re not here to talk about balancers; we’re here to talk about resolvers!
Here’s what that passthrough
resolver does5:
|
|
There’s a builder pattern in play here.
gRPC expects you to specify a registry of resolver.Builder
implementations, either in a global registry via resolver.Register
and per-connection, with the dial option grpc.WithResolvers(...resolver.Builder)
.
gRPC will determine the right kind of resolver to build for a given ClientConn
at dial-time, based on the naming scheme6 of the dial target.
Dial targets are parsed as follows:
(<scheme>://)?(<authority>/)?<endpoint>
If a scheme is specified, then the resolver.Builder
registered for that scheme is used; otherwise gRPC will fallback to its default passthrough
resolver.
In addition, the parsed target gets passed to the Builder.Build()
call, so the resolver can inspect its target if needed.
After building a resolver for the target, gRPC wraps the ClientConn
a bunch, to make a nice burrito of a grpc.ClientConn
inside a balancer.ClientConn
inside a resolver.ClientConn
7.
Then, for the life of the connection, gRPC will take different actions based on the connectivity state of the different SubConn
s.
The details don’t matter for our purposes, beyond “occasionally, including on transient connection failures, gRPC will request the resolver to re-resolve.”
This is when ResolveNow
gets called.
The contract here is that the resolver will then produce a new set of addresses, and communicate that back to the balancer via cc.UpdateState
.
After its state is updated by the resolver, the balancer can then create/destroy/reuse SubConn(s)
as it sees fit, and communicate this back to the grpc.ClientConn
via its own cc.UpdateState
method call.
Phew.
With all that preamble, let’s return to the original problem.
VTAdmin has these thin proxies, which use a Discovery
interface to discover a single address to “proxy” to.
They don’t distribute load over the N hosts that they could have used, and furthermore have no way to inspect the connection state because that is internal to gRPC, so they can’t discover different hosts on (permanent) “transient” failures.
But, if we use the resolver APIs, we can write our own resolver that uses our Discovery
interface under the hood to look up the most up-to-date set of addresses whenever gRPC decides it needs to refresh the addresses for a dial target!
Then, balancing and connection management fall strictly under the purview of gRPC — which is more well-equipped to handle that anyway — and not VTAdmin.
Let’s get into it.
Implementation
Here is a link to the PR with the full implementation. I’ve omitted some details for brevity and clarity.
Devising a scheme
The first thing we need to do is figure out how to make gRPC use our resolver.
There’s no way I could see to change the default resolver used by grpc.Dial
, so we’ll need to come up with our own URL scheme.
We also want to use different resolvers for different clusters (since each cluster has its own discovery implementation). We already require clusters to have unique IDs, so we’ll go ahead and just use the cluster ID as our scheme8.
Then, our Dial
s go from using an addr
that we get back from discovery to a somewhat magical string of ${clusterID}://${componentName}
(more on the component name in a bit).
We do need to thread the cluster’s discovery through to the resolvers used by both Vtctld and VTGate proxies, like so:
|
|
This ensures that each proxy (I’m focusing on the Vtctld proxy, but the VTGate proxy is structurally identical) has a builder with the cluster ID as the scheme, as well as a reference to the Discovery
implementation (via the ResolverOptions
).
The builder will hand that Discovery to the actual resolver at build-time:
|
|
As you can see, we’re abusing the URL Host
fragment (formerly “authority” in the gRPC name resolution spec) to switch between Vtctld and VTGate components.
The only difference, functionally, is which discovery function we use, which gets set as the fn
field in our built resolver.
Then, the underlying Dial in the Vtctld proxy becomes:
|
|
ResolveNow
Perhaps the nicest thing about this change is, after all that convoluted preparatory work, the actual guts of the resolver is an almost verbatim copy of the discovery bits that were formerly in the body of the Dial
methods of our proxies.
This time, instead of getting a single address and passing that to a grpc.Dial
call, we instead get a list of addresses, and inform the wrapped ClientConn
of the new address set.
In code:
|
|
And that’s pretty much it! This is a working implementation of a custom resolver, based on VTAdmin’s cluster discovery interface. It handles ASG-cycling gracefully, and I think it’s pretty neat!
Everything else I want to talk about is extra considerations and future improvements.
debug.Debuggable
VTAdmin aims to be operator-friendly, and to that end, it includes some debug endpoints to inspect the state of VTAdmin itself. Here’s how it works.
First, we define a very small interface that any debuggable component will implement:
|
|
This interface is implemented by *cluster.Cluster
, as well as the Vtctld and VTGate proxies.
VTAdmin then defines two http-only endpoints, /debug/clusters/
and /debug/cluster/${clusterID}
, whose handlers munge together all these maps into a JSON payload that operators can inspect.
Here’s a subset, as an example:
|
|
Prior to the custom resolver, the proxies used to track which host they were connected to, as well as the last time they pinged that host as a crude healthcheck. The resolver vastly improved the stability of the API process, but as a result the discovery all happens in the background, completely opaquely to the proxies. This meant we lost that small, but useful, piece of introspection.
That’s okay though! If you recall back to how the proxies hook themselves into the resolver builders, they actually maintain a reference to the builder for that cluster scheme:
|
|
This means that if we have our builder also implement debug.Debuggable
, then we can update each proxy’s Debug()
method to merge those maps in to the final JSON payload.
And that’s exactly what we do:
|
|
Now9, hitting the /debug/
endpoint for that cluster shows the resolver states:
|
|
Of course, all we can provide is the list of potential addresses; which one(s) gRPC has connection(s) to depends on the balancer policy used. Still, it’s better than nothing, and I think the tradeoff of debuggability for reliability is worth it.
Future Work
While this is stable and working now, I also studied the implementation of the dns
resolver in grpc-go.
Two things jumped out to me:
- The resolver continuously (on an interval) re-resolves the target in a background goroutine, and uses its
ResolveNow
method only to jump the interval wait time and trigger a re-resolution immediately.- (It’s not strictly “immediate”, because it also enforces a minimum, non-configurable 30s wait between DNS lookups “to prevent constantly re-resolving”).
- It uses a backoff mechanism to try to re-resolve when a DNS lookup fails, or if the call to
cc.UpdateState
fails, which can help in transient failure modes.
I want both of these things for our discovery resolver, as I think they’ll have a significant impact on reliability.
What I’d really like is for grpc-go
to expose a different interface API that provides the scaffolding of “background loop” + “(configurable) min-wait” + “backoff on failure” + “update state or report error”, and you, as the consumer of this API provide a function that takes a resolver.Target
, resolver.BuildOptions
, and resolver.ResolveNowOptions
and returns a (*resolver.State, error)
.
Failing that, I’ll just have to copy-paste the guts of the DNS watcher loop and replace the d.lookup()
call with our r.resolve()
method.
-
You can see me give an overview of VTAdmin at KubeCon 2021 here. ↩︎
-
I’m referring to
grpc-go
specifically here (and throughout the post), since Vitess, and by extension,vtadmin-api
are written in Go. Different languages may have implementations that differ on this point. ↩︎ -
More accurately: it’s more of a semantic issue than a technical problem, and the technical problems that do lie within the semantic issue are covered by the other problems discussed prior. ↩︎
-
In my opinion, which you are free to disagree with. ↩︎
-
This is not the actual code, which I have modified for both brevity and clarity. ↩︎
-
These docs are not actually accurate for
grpc-go
, which defaults to thepassthrough
resolver I’ve been showing here. Of course, this ends up using variousnet.Lookup
calls to turn non-IP addresses into IPs, but this happens during the creation of the HTTP transport, not at resolver-time. There’s also adns
resolver which does this at resolver-time and hands IP addresses back to the balancer. ↩︎ -
Or maybe it’s a conn-ducken. ↩︎
-
In the spirit of completeness, since we are using the connection-local registry, we actually don’t have to care about cross-cluster uniqueness in the URL scheme. We could instead have just used a scheme such as
vtadmin://
everywhere, but I didn’t, so here we are. ↩︎ -
There’s code changes required in the proxies that aren’t very interesting. The one thing to note is that because we don’t export the resolver type (because I wanted to have the proxies store an interface type, to support dependency injection for testability), we have to make a type assertion before adding the debug info. Here is the Vtctld version. ↩︎