Every vault command or API call mentionned below assume you have previously exported VAULT_ADDR and VAULT_TOKEN environnement variables:
export VAULT_ADDR="..."
export VAULT_TOKEN="..."
or logged in via another auth method like LDAP:
vault token lookup 1>/dev/null || vault login -method=ldap username="..."
| Variable | Description | Example value |
|---|---|---|
| hashicorpvault_version | fixed hasicorpvault apt version | 1.11.2-1 |
| hashicorpvault_cluster_name | name of the cluster, must match ansible group name in case of a cluster | secret-management-staging |
| Variable | Description | Default value |
|---|---|---|
| hashicorpvault_tls_enable | Enable TLS. If enabled, certificates will be pulled from the url specified withhashicorpvault_tls_remote_cert. |
True |
| hashicorpvault_tls_remote_cert | URL to pull the certificates from | https://pub-auth-certificate.cosium.com |
| hashicorpvault_listen_address | Specifies the address to bind to for listening | 127.0.0.1 |
| hashicorpvault_backup | Enable backups. Local only if you don't define hashicorpvault_backup_sftp dict. |
True |
| hashicorpvault_backup_sftp | Define this dict to enable remote backups. hashicorpvault_backup_sftp.server and hashicorpvault_backup_sftp.port |
Undefined |
This is the simplest case. Launch this role, initialize, unseal Vault and you are good to go.
Note that the only way to guarantee consistent snapshots is to use raft snapshot, a backup solution will be implemented in a future PR
hashicorpvault_tls_remote_cert - name: "secret-management-staging"
raw_config: |
option httpchk GET /v1/sys/health
http-check expect status 200
default-server check check-ssl verify none
server:
- name: "secret-management-staging-1"
fqdn: "secret-management-staging-1.cosium.com"
port: "8200"
- name: "secret-management-staging-2"
fqdn: "secret-management-staging-2.cosium.com"
port: "8200"
- name: "secret-management-staging-3"
fqdn: "secret-management-staging-3.cosium.com"
port: "8200"
hashicorpvault_cluster_name with all nodes defined:
[secret-management-staging]
secret-management-staging-1 ansible_host=secret-management-staging-1.cosium.com
secret-management-staging-2 ansible_host=secret-management-staging-2.cosium.com
secret-management-staging-3 ansible_host=secret-management-staging-3.cosium.com
vault operator init and unseal it with vault operator unseal. The unseal keys are valid for the whole cluster. The node will be the leaderroot@secret-management-staging-1:~ # vault operator init
...
root@secret-management-staging-1:~ # vault operator unseal
Unseal Key (will be hidden):
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed true
Total Shares 5
Threshold 3
Unseal Progress 1/3
Unseal Nonce ba09a8d2-e8cc-dbc7-05b9-a3f802cc68b2
Version 1.6.3
Storage Type raft
HA Enabled true
...
root@secret-management-staging-1:~ # vault status
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.6.3
Storage Type raft
Cluster Name secret-management-staging
Cluster ID 82c02125-57fb-91cf-cb41-c4627806d04b
HA Enabled true
HA Cluster https://10.12.1.8:8201
HA Mode standby
Active Node Address https://10.12.1.8:8200
Raft Committed Index 7966
Raft Applied Index 7966
root@secret-management-staging-2:~ # vault operator unseal
Unseal Key (will be hidden):
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed true
Total Shares 5
Threshold 3
Unseal Progress 1/3
Unseal Nonce 88a6750c-9670-ab4e-9a33-9cebafd5a8f5
Version 1.6.3
Storage Type raft
HA Enabled true
...
root@secret-management-staging-2:~ # vault status
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.6.3
Storage Type raft
Cluster Name secret-management-staging
Cluster ID 82c02125-57fb-91cf-cb41-c4627806d04b
HA Enabled true
HA Cluster https://10.12.1.8:8201
HA Mode standby
Active Node Address https://10.12.1.8:8200
Raft Committed Index 7972
Raft Applied Index 7972
root@secret-management-staging-1:~ # vault status
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.6.3
Storage Type raft
Cluster Name secret-management-staging
Cluster ID 82c02125-57fb-91cf-cb41-c4627806d04b
HA Enabled true
HA Cluster https://10.12.1.8:8201
HA Mode active
Raft Committed Index 7972
Raft Applied Index 7972
root@secret-management-staging-1:~ # vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
secret-management-staging-1 10.12.1.8:8201 leader true
secret-management-staging-3 10.12.1.10:8201 follower true
secret-management-staging-2 10.12.1.9:8201 follower true
In this example, a raft election occured once the second node was unsealed, so its just a matter of luck that the first node is the leader, you could have this instead:
root@secret-management-staging-1:~ # vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
secret-management-staging-1 10.12.1.8:8201 follower true
secret-management-staging-3 10.12.1.10:8201 follower true
secret-management-staging-2 10.12.1.9:8201 leader true
To get a complete insight of the cluster status, use the API (endpoint was introduced in version 1.10, not installed at the time of testing so the output is a sample from the documentation):
root@secret-management-staging-1:~ # curl -s --header "X-Vault-Token: $VAULT_TOKEN" --request GET $VAULT_ADDR/v1/sys/ha-status
{
"Nodes": [
{
"hostname": "node1",
"api_address": "http://10.0.0.2:8200",
"cluster_address": "https://10.0.0.2:8201",
"active_node": true,
"last_echo": null
},
{
"hostname": "node2",
"api_address": "http://10.0.0.3:8200",
"cluster_address": "https://10.0.0.3:8201",
"active_node": false,
"last_echo": "2021-11-29T10:29:09.202235-05:00"
},
{
"hostname": "node3",
"api_address": "http://10.0.0.4:8200",
"cluster_address": "https://10.0.0.4:8201",
"active_node": false,
"last_echo": "2021-11-29T10:29:07.402548-05:00"
}
]
}
hashicorpvault_version to the desired versionFirst step is to increment hashicorpvault_version variable to the desired version for the group, then:
-l, --limit)vault status shows correct Version, and HA Mode is standbyAt this point all standby nodes will be updated and ready to take over. The update will not be complete until one of the updated standby nodes takes over active duty. To do this:
-l, --limit)vault status shows correct Version and HA Mode is standbyInternal update tasks will happen after one of the updated standby nodes takes over active duty.
The leader's raft storage is the source of truth for the cluster, so you must snapshot the leader's storage, always.
root@secret-management-staging-1:~ # curl -s --header "X-Vault-Token: $VAULT_TOKEN" --request GET $VAULT_ADDR/v1/sys/leader | jq .is_self
true
or with
root@secret-management-staging-1:~ # vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
secret-management-staging-1 10.12.1.8:8201 leader true
secret-management-staging-3 10.12.1.10:8201 follower true
secret-management-staging-2 10.12.1.9:8201 follower true
root@secret-management-staging-1:~ # vault operator raft snapshot save /tmp/test.snap
or from anywhere using the API, ensuring to query the leader
root@secret-management-staging-2:~ # export LEADER_ADDR="https://secret-management-staging-1.cosium.com:8200"
root@secret-management-staging-2:~ # curl -s --header "X-Vault-Token: $VAULT_TOKEN" --request GET $LEADER_ADDR/v1/sys/storage/raft/snapshot > test.snap
It is pointless to compress the snapshot (with e.g. zstd) as the data is encrypted.
Copy your vault raft snapshot file onto the leader node and run the below command, replacing the filename with that of your snapshot file.
vault operator raft snapshot restore test.snap
or from anywhere with:
export LEADER_ADDR="https://secret-management-staging-1.cosium.com:8200"
curl -s --header "X-Vault-Token: $VAULT_TOKEN" --data-binary @test.snap --request POST $LEADER_ADDR/v1/sys/storage/raft/snapshot
This procedure assumes keyholders are available with access to the unseal keys for each, that you have access to tokens with sufficient privileges for the origin cluster.
systemctl stop vault.service
-force option is required here since the keys will not be consistent with the snapshot data as you will be restoring a snapshot from a different cluster:
vault operator raft snapshot restore -force test.snap
or with:
curl -s --header "X-Vault-Token: $VAULT_TOKEN" --data-binary @test.snap --request POST $VAULT_ADDR/v1/sys/storage/raft/snapshot-force
This procedure assumes keyholders are available with access to the unseal keys for each, that you have access to tokens with sufficient privileges for both clusters. This procedure is useful when bringing a staging cluster up with data from a prod cluster, to test an upgrade for example.
/opt/vault/data/raft with:
systemctl stop vault.service
rm -rf /opt/vault/data/raft/*
-force option is required here since the keys will not be consistent with the snapshot data as you will be restoring a snapshot from a different cluster:
vault operator raft snapshot restore -force test.snap
or with:
curl -s --header "X-Vault-Token: $VAULT_TOKEN" --data-binary @test.snap --request POST $VAULT_ADDR/v1/sys/storage/raft/snapshot-force
This role will enable automated backups of the raft storage if hashicorpvault_backup is set to true.
For automated backups to be effective, manual steps are neccessary:
vault policy write snapshot snapshot_policy.hcl
with snapshot_policy.hcl being:
# file: snapshot_policy.hcl
path "/sys/storage/raft/snapshot"
{
capabilities = ["read"]
}
vault auth enable approle
vault write auth/approle/role/snapshot token_policies="snapshot"
vault read auth/approle/role/snapshot/role-id
vault write -f auth/approle/role/snapshot/secret-id
/root/.bash_profile as VAULT_ROLE_ID and VAULT_SECRET_ID for each node:
# file: /root/.bash_profile
export VAULT_ROLE_ID="..."
export VAULT_SECRET_ID="..."
To learn more about AppRole auth method and why it was chosen, see the Vault docs
By default, Hashicorp Vault does not enable logging. It can only be enabled via CLI or API once Vault is started and unsealed. Execute the following command to enable logging on the leader node:
vault audit enable syslog tag="vault" local="true"
Explanation:
/var/log/auth.logtag="vault" enables easier parsing with e.g. elasticearchlocal="true" means only the leader node will log requests, instead of replicating logs across the cluster. This avoid duplicates. If a raft election occurs, the new leader node will start logging.Just in case, here are some useful link in case of cluster outage (lost quorum...):