Puppet Hiera
Organization
Every variable that puppet looks up in hiera will be searched via one or more backends (at the moment, two a yaml-based ones) according to what we configured in the hiera.yaml file in the base puppet directory.
Depending on the kind of search (string, array or hash), hiera will search hierarchically within the sources and either stop at the first occurrence of the variable, or (in case of the hash and array searches) try to merge results from the different hierarchy levels.
Our hiera config file can be seen here.
The lookup hierarchy
This backend is searched first. This hierarchy is organized as follows:
- hieradata/hosts/${::hostname}.yaml. Here you should basically only include host-specific overrides. It is useful for things like testing new features on a single host of a cluster.
- hieradata/regex.yaml If one of the entries in this file matches the $::fqdn of the host, hiera will lookup keys here. For the format, see below. This is useful if you need to set up a key for multiple hosts. Using this for cluster-wide configurations is deprecated, you should use the role keyword and the role backend.
- hieradata/${::site}/$classpath.yaml If you need to configure something differently per-site (so, eqiad or codfw) globally it should go here. But it should happen only for very general, base classes. $classpath is computed as in the puppet autoload mechanism - so foo::params::param would be searched inside hieradata/${::site}/foo/params.yaml as param
- private/hieradata/${::site}/$classpath.yaml where $classpath is evaluated as explained above. since these data are in the private repository, this is the perfect place to store passwords into (without the need to define private classes at all, this should allow us to wipe most of the labs/private repo out in the end). This is the site-specific version.
- hieradata/common/$classpath.yaml where $classpath is again computed as explained above. This is useful for global configurations that may be overridden at higher levels.
- private/hieradata/common/$classpath.yaml
- role/${::site}/$rolepath.yaml where $rolepath is computed like $classpath above, but based on each of the roles declared in site.pp for the current node, e.g. if we declared
node 'foobar.eqiad.wmnet' {
role mediawiki::appserver, monitoring::aggregator
}
the key will be searched in role/eqiad/mediawiki/appserver.yaml and role/eqiad/monitoring/aggregator.yaml. If the key is found in either of the two, or in both (and it's equal), it will be used. More on this below.
- role/common/$rolepath.yaml where $rolepath is computed like above.
- private/hieradata/role/${::site}/$rolepath.yaml and private/hieradata/role/common/$rolepath.yaml, which are in the private repository and work exactly as the two above.
Role-based lookup
We have created a new parser function/keyword, called role, that can be used only at the node scope to include role classes. It also activates role-based hiera lookup. So for example, when you do
node 'foobar.eqiad.wmnet' {
role mediawiki::appserver, monitoring::aggregator
}
you are doing the following:
- including role::mediawiki::appserver and role::monitoring::aggregator
- make hiera lookup the role/eqiad/mediawiki/appserver.yaml and role/eqiad/monitoring/aggregator.yaml files, and so on
Limitations
This system, which is basically abusing puppet internals, comes with its fair share of limitations, namely:
- Only one role keyword is allowed per node, so while this is good:
node 'foobar.eqiad.wmnet' {
role mediawiki::appserver, monitoring::aggregator
}
this will raise an error:
node 'foobar.eqiad.wmnet' {
role mediawiki::appserver
role monitoring::aggregator
}
- Any hiera lookup happening before the role keyword is declared will not be using the role lookups, so for example any lookup in the top scope will not work; also, if we do
node 'foobar.eqiad.wmnet' {
role mediawiki::appserver
include admin
}
will look up the admin class parameters in the roles, but
node 'foobar.eqiad.wmnet' {
include admin
role mediawiki::appserver
}
will not do that.
Regex matching
Since we have large clusters of almost-identical servers, instead of having to write out hiera data for each server we added a special file called regex.yaml where variables matching a whole cluster can be assigned; the format of the file is:
LABEL:
__regex: !ruby/regexp PATTERN
var1: value1
var2: value2
where LABEL is a unique identifier of the cluster, PATTERN is a Ruby regular expression that will be matched to the $::fqdn puppet fact. So, keep on with the preceding example, we have in regex.yaml
appservers:
__regex: !ruby/regexp /^mw1[0-2][0-9]{2}\.eqiad\.wmnet$/
cluster: hhvm_appservers
to ensure the 'cluster' variable is defined consistently. As of now, the regex.yaml file should just be used seldom, and the role backend should be use instead.
Practical example
Say we are searching the value of admin::always_groups for the node mw1017.eqiad.wmnet, which is defined as follows in site.pp:
node /^mw1(01[7-9]|0[2-9][0-9]|10[0-9]|11[0-3])\.eqiad\.wmnet$/ {
role mediawiki::appserver
include ::admin
}
This will search for the value in the following files
- hieradata/hosts/mw1017.yaml ($::hostname)
- hieradata/regex.yaml (where the $::fqdn will be matched to regexes, see below)
- hieradata/eqiad/admin.yaml ($::realm)
- private/hieradata/eqiad/admin.yaml ($::site)
- hieradata/common/admin.yaml
- private/hieradata/common/admin.yaml
- hieradata/role/eqiad/mediawiki/appserver.yaml
- hieradata/role/common/mediawiki/appserver.yaml
- private/hieradata/role/eqiad/mediawiki/appserver.yaml
- private/hieradata/role/common/mediawiki/appserver.yaml
If it doesn't find any value in any of those files, puppet will use the class default value for that variable.
In Labs
Hiera support for labs is still being completed. Currently, you can set only project-wide hiera data. You can do this by creating/editing a wiki page on wikitech, with the page name Hiera:<projectname> - or putting it under hieradata/labs/<projectname>/common.yaml in the ops/puppet repo. Per-host hiera data can be specified on wikitech, with a page named Hiera:<projectname>/host/<hostname> or in the ops/puppet repo, by creating a file under hieradata/labs/<projectname>/<hostname>.yaml. Only project administrators can create / modify hiera data for a project.
Here, the resolution order is [1]
- Wikitech
- "labs/%{::instanceproject}/host/%{::hostname}.yaml"
- "labs/%{::instanceproject}/common.yaml"
- "labs.yaml"
- "private/%{::instanceproject}.yaml"
- common.yaml
- "private/common.yaml"
Where %{::instanceproject} is the project (e.g. 'tools') and %{::hostname} the instance name (e.g. 'tools-mail').
Note that providing Hiera settings per role is not possible on labs.
How to use it
Removing node-scope variables from manifests
Our historical take on role classes was 'do not parametrize, use node-scope variables to configure'. Well, we're going in the exact opposite direction with hiera. The new advice I can give is 'do parametrize as much as possible your classes, and define values only in hiera. To give you an example, let's create a simplified example of how we did things until now:
class hithere {
notice($msg)
if $punchline != undef {
notice($punchline)
}
}
if ! defined($msg) {
$msg = 'hello world'
}
node 'foo.local' {
$msg = 'Hello goofy!' # note how $msg seems actually unrelated to the hithere class; also, we're polluting manifests with data
include hithere
}
node 'bar.local' {
$msg = 'Hello goofy!'
$punchline = 'we <3 puppet'
include hithere
}
Which has a series of disadvantages, like having misterious node-scope overrides of parameters that are not explicitly declared here
Hiera is used by puppet as the default lookup method for class parameters. So each time you include or declare the class explicitly, puppet will lookup all undeclared parameters within hiera
class hithere ( $msg = 'hello world!', $punchline = undef) {
notice($msg)
if $punchline != undef {
notice($punchline)
}
}
node /(foo|bar).local/ {
include hithere # will look up on hiera hithere::msg and hithere::punchline, if any of the two is defined
}
Then we will have in hieradata/common/hithere.yaml
hithere::msg: "Hello Goofy!"
and in hieradata/hosts/bar.yaml
hithere::punchline: "we <3 puppet"
In this way, code and data are neatly divided, it is very clear what you are doing by declaring the msg and punchline variables as you're guaranteed to change them for the class only, and we obtained a nice deduplication of the node declarations as well.