February 5, 2019

Credential Generation and User-centric Policy with Open Source Software

On my plate I have two immediate, core goals:

  • Easy credential generation
  • Easy user management

Looking at what exists in the popular open source categories, I can see two core projects: Cloud Foundry's Credhub and Hashicorp's Vault.

One of my favourite things about Cloud Foundry's Credhub project is it's built-in credential generation features. It's a brilliant feature, as it can generate passwords, SSH keys, and certificates. However, it's got a some fallacies: lack of discoverability, more or less requires BOSH for deployment, lack of sane user management, and no one outside of the Cloud Foundry ecosystem really knows about it. What does everyone generally think about when it comes to open source credential management? Vault.

Vault, like Credhub, is a great piece of credential management software, especially for open source projects. The problem is Vault it also has some fallacies: operator deployment is painful, some credential generation, and the command line interface isn't great since it's feels very in-flux. But Vault has a really strong policy management engine which can make user management pretty simple. So if you look at these pieces of software, there's some overlap.

Okay, so we can see the overlap, but what's the difference? Here's how the Credhub team thinks on the differences:

The goal of the CredHub project is to provide an extendable and well integrated open source credential management solution for Cloud Foundry use-cases. We felt that we could cover these use-cases better by implementing a new product as opposed to wrapping Vault. Beyond UX and dependency concerns, some of the features that CF users want, e.g. HSM support, are only available in the commercial distribution of Vault.

Well, alright. The Cloud Foundry use cases are pretty straightforward: credential generation, HSMs, and BOSH-deployable. All of those are commendable, but how to do you add users into the mix? There's a Credhub CLI tool, but let's be honest, while CLIs have value, a standard enterprise developer may not get terribly excited about it. Vault has a Web UI, and while confusing, it's better than nothing. Okay, now how do you begin to manage credential access? Vault has policies to manage access, but it can't really generate credentials in the most helpful way, and while Credhub is awesome at generation, it fails miserably at the core user experience.

Okay, so we know Credhub gives us easy credential generation, and Vault gives us sane user policy management, so how can I use them together? I'm willing to bet money I made security experts cry, I decided to use Credhub as the source of generational truth, but manage user access to what it creates with Vault.

I'm like 95% confident I made the right decision with that workflow, because I'm not really losing anything, I'm really just gaining policy management on top of Credhub. I decided, for some god-awful reason, to write it in Python, because let's face it, Python is the infrastructure scripting language and it has it's merits.

So, what does it look like? It's pretty straightfoward.

class Syncer(object):

    def __init__(self, vault_mount_point="", **kwargs):
        """
        Base module for syncing between Cloud Foundry's Credhub and Hashicorp's Vault.
        :param str vault_mount_point: Base path for Vault's sync'd credentials. Defaults to 'bosh'
        :param args:
        :param kwargs:
        """
        self.creds_to_download = []
        self.creds = []
        self.hvac_client = hvac.Client
        if vault_mount_point != "":
            self.vault_mount_point = vault_mount_point
        else:
            self.vault_mount_point = "bosh"

        self.logger = logging.getLogger(__name__)
        console_handler = logging.StreamHandler(sys.stdout)
        self.logger.addHandler(console_handler)
        self.logger.setLevel(logging.INFO)
        if kwargs is not None:
            if kwargs["logging_level"]:
                self.logger.setLevel(kwargs["logging_level"])

        self._check_env()
        self._login()

Here I'm really just setting up the baseline class, mostly so I can do a bunch of self-referencing. I know the OO guys are cringing right now, but if you give me mutability, I'm gonna use it. This class definition really isn't anything crazy, I'm really just setting up some variables, instaniating the HVAC client, and then running self._check_env() to check the environment variables and self._login() to login into Credhub and Vault.

    def _check_env(self):
        """
        Verify the needed environment variables exist.
        :return:
        """
        target_vars = ["CREDHUB_CERTIFICATE", "BOSH_CA", "CREDHUB_CLIENT", "CREDHUB_SECRET",
                       "CREDHUB_SERVER", "VAULT_ADDR", "VAULT_CLIENT_ID", "VAULT_CLIENT_SECRET"]
        for v in target_vars:
            try:
                _ = os.environ[v]  # just test for existence.
                self.logger.debug("found {}".format(v))
            except Exception:
                self.logger.exception("missing env var: {0}".format(v))
                raise SyncerFailure("missing env var: {0}".format(v))
        credhub_cert = os.environ["CREDHUB_CERTIFICATE"]
        bosh_ca = os.environ["BOSH_CA"]
        os.environ["CREDHUB_CA_CERT"] = credhub_cert + "\n" + bosh_ca

    def _login(self):
        """
        Log into both Credhub and Vault.
        :return:
        """
        login_command = shlex.split("credhub login")
        self.logger.debug("running: {}".format(login_command))
        login_result = subprocess.run(login_command)
        if login_result.returncode != 0:
            self.logger.exception("non-zero return code on credhub login: {}".format(login_result.stderr))
            raise SyncerFailure("non-zero return code on credhub login: {}".format(login_result.stderr))
        self.hvac_client = hvac.Client(url=os.environ["VAULT_ADDR"])
        self.hvac_client.auth_approle(role_id=os.environ["VAULT_CLIENT_ID"],
                                      secret_id=os.environ["VAULT_CLIENT_SECRET"], mount_point="clients")

Looking at self._check_env(), it's not terribly complicated, it really just checks for the existence of our target environment variables, and then combines the BOSH_CA and CREDHUB_CA_CERT variables into one for the Credhub CLI. I will admit that I am wrapping the Credhub CLI. I can dive into this with a later post, but it was the easiest MVP. self._login() really just calls credhub login and tells HVAC to log into Vault.

    def sync(self):
        """
        Syncs all credentials from Credhub and send them to Vault.
        :return:
        """
        list_command = shlex.split("credhub find --output-json")
        self.logger.debug("running: {}".format(list_command))
        raw_creds_list = subprocess.run(list_command, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
        if raw_creds_list.returncode != 0:
            self.logger.error("non-zero return code when listing credentials: {0}".format(raw_creds_list.stderr))
            raise SyncerFailure("non-zero return code when listing credentials: {0}".format(raw_creds_list.stderr))
        creds_to_load = json.loads(raw_creds_list.stdout)
        for cred in creds_to_load["credentials"]:
            self._fetch(cred["name"])

This is the core user-facing method we care about. Syncer.sync() is what begins to get things done. To explain simply, here's what it does:

  1. Run credhub find --output-json. Credhub doesn't have a list subcommand, but a blank find command works; we're passing the --output-json flag because we need to serialise the output into something common and YAML fucking sucks.
  2. Serialise the JSON repsonse from Credhub.
  3. For each credential, self._fetch().
    def _fetch(self, c: str):
        """
        Retrieves a c from Credhub.
        :param str c: Credential name. Ex: /bosh/concourse/worker_key
        :return:
        """
        get_command = shlex.split("credhub get -n {} --output-json".format(c))
        self.logger.debug("running: {}".format(get_command))
        local_credential = subprocess.run(get_command, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
        if local_credential.returncode != 0:
            self.logger.error("non-zero return code for getting {0}: {1}".format(c, local_credential.stderr))
            raise SyncerFailure(
                "non-zero return code for getting {0}: {1}".format(c, local_credential.stderr))
        self._sync(json.loads(local_credential.stdout))

Here we're really just fetching a given credential from Credhub. Once we have the credential, we move onto the part we really care about: self._sync().

    def _sync(self, cred: dict):
        """
        Internal syncing mechanism.
        :param dict cred: Credhub credential in JSON/Dict form.
        :return:
        """
        c = self._fix_cred_string(cred["name"])
        if cred["type"] == "value":
            self.logger.debug("synchronising value for {0}".format(c))
            self.hvac_client.secrets.kv.v2.create_or_update_secret(
                path=c,
                secret=dict(value=cred["value"]),
                mount_point=self.vault_mount_point
            )
        if cred["type"] == "password":
            self.logger.debug("synchronising password for {0}".format(c))
            self.hvac_client.secrets.kv.v2.create_or_update_secret(
                path=c,
                secret=dict(password=cred["value"]),
                mount_point=self.vault_mount_point
            )
        if cred["type"] == "certificate":
            self.logger.debug("synchronising certificate for {0}".format(c))
            self.hvac_client.secrets.kv.v2.create_or_update_secret(
                path=c,
                secret=dict(
                    ca=cred["value"]["ca"],
                    certificate=cred["value"]["certificate"],
                    private_key=cred["value"]["private_key"]
                ),
                mount_point=self.vault_mount_point
            )
        if cred["type"] == "ssh":
            self.logger.debug("synchronising ssh key pair for {0}".format(c))
            self.hvac_client.secrets.kv.v2.create_or_update_secret(
                path=c,
                secret=dict(
                    private_key=cred["value"]["private_key"],
                    public_key=cred["value"]["public_key"],
                    public_key_fingerprint=cred["value"]["public_key_fingerprint"]
                ),
                mount_point=self.vault_mount_point
            )
        if cred["type"] == "rsa":
            self.logger.debug("synchronising rsa key pair for {0}".format(c))
            self.hvac_client.secrets.kv.v2.create_or_update_secret(
                path=c,
                secret=dict(
                    private_key=cred["value"]["private_key"],
                    public_key=cred["value"]["public_key"]
                ),
                mount_point=self.vault_mount_point
            )

And here is the crux: parsing and sending the credential from Credhub to Vault. There are a couple design decisions I have made and I think I'm okay with:

  • I'm using the Key/Value secrets engine with Vault, therefore SSH keys and TLS certificates are kinda a bust in Vault-land. Maybe I'll fix this in v2.
  • In Credhub, a password and value are essentially the same thing with different labels, so I have to figure out how to represent that in Vault. I am getting by but it's not great.
  • I will support multiple versions. Both Credhub and Vault support credential versioning, so why not?

When looking at the core workflow, it's pretty easy: if a credential is of type T, store in X way. For SSH keys and TLS certificates, I just store the values from Credhub as properties in the secret key. There's likely a better way to do it, I just haven't figured that out yet.

The good news is now I can very easily dish out policies to the folks that need them! For example, I want to share my Minio credentials with some folks, but I didn't want to give them internal access to Credhub and it's weak user management, but now I can create a read-only policy in my public Vault so it's easy for them to get access to the system. What does that look like, you ask?

# can read s3/minio credentials.
path "bosh/data/minio/*" {
  capabilities = ["read"]
}

path "bosh/metadata/minio/" {
  capabilities = ["list"]
}

I have my Kev/Value v2 secrets engine for syncing secrets mounted at bosh, so that way it'd be matching how Credhub handles pathing with deployments, hence the similarities. This policy, which I've called s3-access, really allows me to share out these credentials when needed, like so: vault kv get bosh/minio/minio_secretkey, or thorugh the Vault Web UI if a user prefers that. The permissions on the Web UI does require extraneous permissions (such as list), so do be aware of that.

So in finality, here's what I can do with this system:

  • Easily generate credentials with Credhub.
  • Sync those credentials to Vault.
  • Create and Assign access policies to users that need them.
  • Not give users access to the core infrastructure just for credentials.

Show Me the Code

Feel free to send me feedback on Twitter. :)