Very, very special thanks go out to Jeff Mitchell with Hashicorp. Without his help, I never would have been able to accomplish what I had originally set out to do.
The Use Case
- You have more than one physical datacenter.
- You need to have a HA Vault setup in each datacenter
- You don’t want to lose functionality in any datacenter if the connection between them is severed.
- You want to have all Vault auth tokens, secrets, and policies available in all datacenters.
- You have one read+write “source” datacenter where you will be creating Vault tokens, policies, and secrets.
- You have one or more read-only “destination” datacenters where Vault data will be replicated to, but never modified. We’ll use one in our example, but I use 3 and see no real scalability problems with anything less than 20.
Context and Assumptions
Throughout this post, I’ll refer to two datacenters: Datacenter A and Datacenter B. Datacenter A will represent our “source” datacenter, and B will represent our “destination” datacenter. For brevity, we’ll assume that these are both physically separate datacenters as well as separate Consul datacenters. We’ll also assume that there is VPN or an alternative connectivity solution in place.
Lastly, this post assumes that you’re familiar with working with both Vault and Consul. Please read all the docs on both products(Consul,Vault). Hashicorp has went to great pains to produce some really good documentation, please don’t let their efforts go to waste.
While Consul itself supports multiple datacenters, there’s some problems in the way that it handles things them that cause problems when you’re putting Vault in front of it.
- Your first thought might be to use multiple datacenters in Consul, and point all of your Vaults at the same datacenter. This technically will work, but if Datacenter A loses connectivity to Datacenter B, Vault will not function in Datacenter B until connectivity is restored.
- You might next come upon consul-replicate. This is the right tool for the job, but the devil’s in the details. First, you can’t replicate the entire k/v store – you have to exclude some things from being replicated.
- If you get all that figured out, you’ll quickly discover that even though everything seems to be working fine via consul-replicate, your changes don’t show up in Datacenter B until after you restart and unseal Vault.
The solution is to setup Vault to point at a local Consul cluster. We’ll use consul-replicate to replicate specific data from Consul in the source datacenter (A) to the destinatation datacenter (B)
For performance, Vault makes use of a read cache. Since only one Vault instance is actually marked as active and that active instance is the only one that sees any operations, it caches every read indefinitely in process memory. Since we’re using consul-replicate to essentially change the data underneath Vault without using Vault API’s, you need to disable that read cache.
Step 1: Setup Consul clusters in each datacenter
By following the awesome documentation on Consul, you’ll have a Consul cluster up and running in each datacenter in no time.
NOTE: You need to set your
acl_master_token to the same values in order to make multiple datacenters work in Consul.
Step 2: Make the Consul clusters aware of the nodes in the other datacenters
consul join -wan <<hostname of Consul host in other datacenter>> command to establish communications between each Consul cluster.
Step 3: Install HA Vault in each datacenter
Use the awesome documentation on Vault to build HA Vault clusters in each datacenter. Point each Vault instance at it’s local Consul datacenter.
Vault instance 1 in Datacenter A would have a
vault.hcl file like this:
1 2 3 4 5 6 7 8 9 10 11 12
Vault instance 1 in Datacenter B would have a
vault.hcl like this:
1 2 3 4 5 6 7 8 9 10 11 12 13
The big thing to notice here is that we turn off the read cache in our destination datacenter.
Step 4: Set Up Consul-Replicate
At this point, you will have all of the endpoints set up in the proper manner. The last piece to this puzzle is to replicate the data from the source to the destination(s) via consul-replicate. Consul-replicate is a very specific tool written for a very specific purpose, and I’ll admit that the docs for it aren’t up to par with the other tools in Hashicorp’s suite. Rest assured, it works perfectly once you’ve got it set up.
Consul-replicate works by connecting our destination Consul cluster, queries the keys that we instruct it to sync using Consul’s built-in remote datacenter capabilities, and synchronizes these key-value pairs to our destination Consul cluster’s local datacenter.
Download the latest consul-replicate release, and install it on one of the Consul nodes in datacenter B. Your
/etc/consul-replicate.hcl should look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
The key here:
DON’T REPLICATE THE ENTIRE /vault TREE!!!
It’s not so much about what you DO replicate as it is what you DON’T replicate. Jeff helped me out tremendously here:
The values I gave above should be a good baseline. You definitely do not want /core/leader or /core/lock and if you replicate /sys/expire you’ll have multiple DCs all trying to revoke the same leases, which is a very bad idea.
From some looking at a basic layout with the file physical backend, you’d want to copy /sys/token but not /sys/expire, yes to /logical/, and yes much of /core but not /core/leader and not /core/lock.
Once you have your prefixes defined, you can start up consul-replicate:
/usr/local/bin/consul-replicate -config=/etc/consul-replicate.hcl -log-level=info. Within a few seconds, all of your Vault data should be replicated up to your destination datacenter.
While unsupported, I’ve found this setup to be fast and stable in our use case. We use Vault more as a encrypted secret keystore for passwords that applications need access to, and don’t use any of the more esoteric features such as the various auth backends. We simply use the token auth and generic secret storage along with the file audit backend. Replication is near-immediate, and the performance penalty incurred due to disabling the read cache has been acceptable.