Puppet coding

From Wikitech

This page is about writing puppet code: how to write it, when to write it, where to put it. For information about how to install or manage puppet, visit this page.

Getting the source

As simple as

git clone https://gerrit.wikimedia.org/r/p/operations/puppet

When we use puppet

Puppet is our configuration management system. Anything related to the configuration files & state of a server should be puppetized. There are a few cases where configurations are deployed into systems without involving puppet but these are the exceptions rather than the rule (MediaWiki configuration, Squid configurations etc.); all package installs and configurations should happen via puppet in order to ensure peer review and reproducibility.

However, Puppet is not being used as a deployment system at Wikimedia. Pushing code via puppet, e.g. with git::clone should be avoided. Depending on the case, Debian packages or the use of our deployment system (Git-deploy) should be employed. Nowadays, deploying software via Trebuchet/git-deploy can be achieved in puppet by using the 'trebuchet' package provider.

labs

Labs users have considerable leeway in the configuration of their systems, and the state of labs machines is frequently not puppetized. Specific projects (e.g. toollabs) often have their own systems for maintaining code and configuration.

That said, any labs system being used for semi-production purposes (e.g. public test sites, bot hosts, etc.) should be fully puppetized. Labs users should always be ready for their instance to vanish at a moment's notice, and have a plan to reproduce the same functionality on a new instance -- generally this is accomplished using puppet.

The node definition for a labs machine is not stored in site.pp -- it lives in LDAP, and can be configured via a web interface on the instance configuration page.

Style, organization and class conventions

Organization

Our code is, as of November 2014, in transition from a system of global manifests to a system of modules and roles. Ultimately our code should be organized like this:

  1. Nodes defined in site.pp can include role classes. They should avoid defining non-role classes; if that is happening then you probably need to create a role.
  2. Use of global variables is highly discouraged. A few actually global settings (e.g. $::site, $::realm) will crop up, but we probably don't need any new ones.
  3. All manifests are declared inside modules.
  4. Role classes are never parameterized, and all the classes they include are only configured using hiera. When using hiera, you can simply declare the class parameters in the relevant yaml file. This also allows dropping $::site-specific role classes or branches in classes, either depending on $::site, $::realm, etc. Old parametrization relied on using node-scope variables (e.g. $cluster) which had a default value as a top-scope variable. That should go away as soon as possible.
  5. All "main" roles - that is, the ones we apply directly to a server, as opposed to things like role::ntp or role::diamond - should include the standard class, that does most of the boilerplate base install of the system.

Nodes in site.pp

Our traditional node definitions included a lot of global variables and boilerplate that needed to be added. Nowadays, your node definitions should be as simple as possible, so ideally the must just include one role definition:

node 'redis01.codfw.wmnet' {
    role db::redis
}

this will include the class role::db::redis and look for role-related hiera configs in hieradata/role/codfw/db/redis.yaml and then in hieradata/role/common/db/redis.yaml, which may for example be:

cluster: redis
admin::groups:
  - redis-admins

Some machines, in particular miscellaneous servers, are "unique" and include an unique combination of classes; in that case, the best thing to do is not using the role keyword, and configure the variables at the host level in hiera:

node 'install2001.codfw.wmnet' {
      include standard
      include role::install-server
      include ganglia_new::collector
}

and then the hiera definition in hieradata/hosts/install2001.yaml

admin::groups:
  - mechanical-hamsters

Coding Style

Please read the upstream style-guide. And install puppet-lint.

format

Many existing manifests use two-spaces (as suggested in the style guide) instead of our 4 space indent standard; when working on existing code always follow the existing whitespace style of the file or module you are editing. Please do not mix cleanup changes with functional changes in a single patch.

Spacing, Indentation, & Whitespace

  • Must use four-space soft tabs.
  • Must not use literal tab characters.
  • Must not contain trailing white space
  • Must align fat comma arrows (=>) within blocks of attributes.

Quoting

  • Must use single quotes unless interpolating variables.
  • All variables should be enclosed in in braces ({}) when being interpolated in a string. like this
  • Variables standing by themselves should not be quoted.like this
  • Must not quote booleans: true is ok, but not 'true' or "true"

Resources

  • Must single quote all resource names and their attribute, except ensure. (unless they contain a variable, of course).
  • Ensure must always be the first attribute.
  • Put a trailing comma after the final resource parameter.
  • Again: Must align fat comma arrows (=>) within blocks of attributes.
  • Don't group resources of the same type (a.k.a compression) :

do

       file { '/etc/default/exim4':
           require => Package['exim4-config'],
           owner   => 'root',
           group   => 'root',
           mode    => '0444',
           content => template('exim/exim4.default.erb'),
       }
       file { '/etc/exim4/aliases/':
           ensure  => directory,
           require => Package['exim4-config'],
           mode    => '0755',
           owner   => 'root',
           group   => 'root',
       } 

don't do

       file { '/etc/default/exim4':
           require => Package['exim4-config'],
           owner   => 'root',
           group   => 'root',
           mode    => '0444',
           content => template('exim/exim4.default.erb');
      '/etc/exim4/aliases/':
           ensure  => directory,
           require => Package['exim4-config'],
           mode    => '0755',
           owner   => 'root',
           group   => 'root',
       } 
  • keep the resource name and the resource type on the same line. No need for extra indentation.

Conditionals

  • Don't use selectors inside resources:

good:

   $file_mode = $::operatingsystem ? {
     debian => '0007',
     redhat => '0776',
     fedora => '0007',
   }
   file { '/tmp/readme.txt':
      content => "Hello World\n",
      mode    => $file_mode,
   }

bad:

    file { '/tmp/readme.txt':
     mode => $::operatingsystem ? {
       debian => '0777',
       redhat => '0776',
       fedora => '0007',
     }
   }
  • Case statements should have default cases. like this.

Classes

All classes and resource type definitions must be in separate files in the manifests directory of their module.

  • Do not nest classes.
  • Ignore Inheritance as possible, puppet is not good at that. Also, inheritance will make your life harder when you need to use hiera - really, don't.
  • Try not to use top-space variables, but if you do use them, scope them correctly:
$::operatingsystem

but not

$operatingsystem
  • Do not use dashes in class names, preferably use alpha-betaic names only.
  • In parameterized class and defined resource type declarations, parameters that are required should be listed before optional parameters. like this.
  • It is in general better to avoid parameters that don't have a default; that will only make your life harder as you need to define that variable for every host that includes it.

Including

  • One include per line.
  • One class per include.
  • Include only the class you need, not the entire scope.

WMF Design conventions

  • Always include the 'base' class for every node (note that standard includes base and should be used in most cases)
  • For every service deployed, please use a system::role definition (defined in modules/system/manifests/role.pp) to indicate what a server is running. This will be put in the MOTD. As the definition name, you should normally use the relevant puppet class. For example:
system::role { "role::cache::bits": description => "bits Varnish cache server" }
  • Files that are fully deployed by Puppet using the file type, should generally use a read-only file mode (i.e., 0444 or 0555). This makes it more obvious that this file should not be modified, as Puppet will overwrite it anyway.
  • For each service, create a nested class with the name service::decommission (e.g. apache::decommission) which removes any configuration and prepares a host for decommissioning.
  • For each service, create a nested class with the name service::monitoring (e.g. squid::monitoring) which sets up any required (Nagios) monitoring configuration on the monitoring server.
  • Any top-level class definitions should be documented with descriptive header, like this:
 # Mediawiki_singlnode: A one-step class for setting up a single-node MediaWiki install,
 #  running from a Git tree.
 #
 #  Roles can insert additional lines into LocalSettings.php via the
 #  $role_requires and $role_config_lines vars.
 #  etc.

Such descriptions are especially important for role classes. Comments like these are used to generate our online puppet documentation.

Useful global variables

These are useful variables you can refer to from anywhere in the Puppet manifests. Most of these get defined in realm.pp or base.pp.

$::realm
The "realm" the system belongs to. As of July 2013 we have the realms production, fundraising and labs.
$::site
Contains the 5-letter site name of the server, e.g. "pmtpa", "eqiad" or "esams".

Before you submit a patch

Parser validation

You can syntax check your changes by

# puppet parser validate filename-here

Lint

You can locally install puppet-lint and use it to check your code before submitting, or enhance existing code by fixing puppet-lint errors/warnings. puppet-lint on github

install puppet-lint

on Debian/Ubuntu
apt-get install puppet-lint  [1], [2]
on Mac OS X
sudo gem install puppet-lint
generic (Ruby gem)
gem install puppet-lint

how to use

puppet-lint manifest.pp

or

puppet-lint --with-filename /etc/puppet/modules

common errors

tab character found on line ..

FIX: do not use tabs, use 4-space soft tabs

Have this in your vim config (.vimrc):

set tabstop=4
set shiftwidth=4
set softtabstop=4
set smarttab
set expandtab

or use something like this (if your local path is ./wmf/puppet/) to apply it to puppet files only.

" Wikimedia style uses 4 spaces for indentation
autocmd BufRead */wmf/puppet/* set sw=4 ts=4 et

open the file, :retab, :wq, done. Make sure to review the resulting change carefully before submitting it.

Or put this in your emacs config (.emacs)

;; Puppet config with 4 spaces
(setq puppet-indent-level 4)
(setq puppet-include-indent 4)
double quoted string containing no variables

FIX: use single quotes (') for all strings unless there are variables to parse in it

unquoted file mode

FIX: always quote file modes with single quotes,like: mode => '0750'

line has more than 80 characters

FIX: wrap your lines to be less than 80 chars, if you have to, there is \<newline>. Vim can help when writing. Place this in your .vimrc

set textwidth=80
not in autoload module layout

FIX: turn your code into a puppet module (Module Fundamentals)

ensure found on line but it's not the first attribute

FIX: move your "ensure =>" to the top of the resource section. (don't forget to turn a ; into a , if it was the last attribute before)

unquoted resource title

FIX: quote all resource titles, single quotes

top-scope variable being used without an explicit namespace

FIX: use an explicit namespace in variable names (Scope and Puppet)

class defined inside a class

FIX: don't define classes inside classes

quoted boolean value

FIX: do NOT quote boolean values ( => true/ => false)

case statement without a default case

FIX: add a default case to your case statement

labs testing

Nontrivial puppet changes should be applies to a labs instance before being merged into production. This can uncover some behaviors and code interactions that don't appear during individual file tests -- for example, puppet runs frequently fail due to duplicate resource definitions that aren't obvious to the naked eye.

To test a puppet patch:

1. Create a self-hosted puppetmaster instance.

2. Configure that instance so that it defines the class you're working on. You can do this either via the 'configure instance' page or by editing /var/lib/git/operations/puppet/manifests/site.pp to contain something like this:

   node this-is-my-hostname {
       include class::I::am::working::on
   }

3. Run puppet a couple of times ('$ sudo puppetd -tv') until each subsequent puppet run is clean doesn't modify anything

4. Apply your patch to /var/lib/git/operations/puppet. Do this by cherry-picking from gerrit or by rsyncing from a local working directory.

5. Run puppet again, and note the changes that this puppet run makes. Does puppet succeed? Are the changes what you expected?

Manually Module testing

A relatively simple and crude testing way is

puppet apply --noop --modulepath /path/to/modules <manifest>.pp

Do note however that this might not work if you reference stuff outside of the module hierarchy

You can get around the missing module hierarchy problem by cloning a local copy of the puppet repo and symlinking in your new module directory.

eg.

git clone --branch production https://git.wikimedia.org/git/operations/puppet.git
cd puppet/modules
ln -s /path/to/mymodule .
puppet apply --verbose --noop --modulepath=/home/${USER}/puppet/modules /path/to/mymodule/manifest/init.pp


Unit Testing modules

Rake tests

Some modules have unit tests already written -- the other modules still need them! Modules with tests have 'rakefile' in their top dir, and a subdir called 'spec'.

Modules imported from upstream typically have tests that run against other upstream modules -- we, however, seek to have tests pass when running only against our own repository. In order to set that up you'll need to run

 $ rake spec

once in the top level directory to get things set up properly. After that you can test individual modules by running

 $ rake spec_standalone

in specific module subdirs.

If you want to compare your results with the test results on our official testing box, check out this page: https://integration.wikimedia.org/ci/job/test-puppet-rspec/


Custom tests

If you are testing a module it makes sense to group these simple tests in a tests/ directory in your modules hierarchy. Your tests can be as simple as

#

include myclass

in a file called myclass.pp in the tests directory.

or a lot more complex, calling parameterized classes and definitions.

All this can also be automated by including in the tests directory the following Makefile

MANIFESTS=$(wildcard *.pp)
OBJS=$(MANIFESTS:.pp=.po)
TESTS_DIR=$(dir $(CURDIR))
MODULE_DIR=$(TESTS_DIR:/=)
MODULES_DIR=$(dir $(MODULE_DIR))

all:	test

test:	$(OBJS)

%.po:	%.pp
	puppet parser validate $<
	puppet apply --noop --modulepath $(MODULES_DIR) $<

and running make from the command line (assuming you have make installed)

Please note that --noop does not mean that no code will be executed. It means puppet wont change the state of any resource. So at least exec resources' conditionals as well puppet parser functions and facts will execute normally. So don't go around testing untrusted code.

Roles

A role is a particular kind of manifest that defines a specific job for a server. A role may pull in multiple other manifests, modules, etc. If you need to configure class variables specifically for this role, define one corresponding file in hieradata/mainrole/ROLE_NAME.yaml and define those variables there. We don't need global variables anymore.

In the ideal future, all manifests will either be roles or part of modules. Top-level node definitions will /only/ include roles. Any stand-alone manifests are anachronisms. No manifest (including those in modules and the roles) should not make use of global variables.

Role classes should be named something like role::<function>[::subfunction]. They live in manifests/role, generally in files named after <function>. Sometimes a ::labs suffix is used, for example, modules/ldap/manifests/role/server.pp defines ldap::role::server::labs. But a ::labs suffix shouldn't really be necessary now that we have hiera - just define a parameter for the role class for each functionality you may want or not want to activate, and use Hiera to configure them depending on the the realm or the site. Modules that can only be used in labs however should include a labs prefix, for example manifests/role/labsquarry.pp defines role::labs::quarry::web. In most cases, just defining class parameters for classes in modules per-environment should be enough.

Puppet Modules

There are currently two high level types of modules. For most things, modules should not contain anything that is specific to the Wikimedia Foundation. Non WMF specific modules could be useable an other puppet repository at any other organization. A WMF specific module is different: it may contain configurations specific to WMF (duh), but remember that it is still a module, so it must be useable on its own as well. Users of either type of module should be able able to use the module without editing anything inside of the module. WMF specific modules will probably be higher level abstractions of services that use and depend on other modules, but they may not refer to anything inside of the top level manifests/ directory. E.g. the 'applicationserver' module abstracts usages of apache, php and pybal to set up a WMF application server.

Often it will be difficult to choose between creating role classes and creating a WMF specific module. There isn't a hard rule on this. You should use your best judgement. If role classes start to get overly complicated, you might consider creating a WMF specific module instead.

3rd party or upstream modules

There are so many great modules out there! Why spend time writing your own?!

Well, for good reasons. Puppet runs as root on the production nodes. We can't import just any 3rd party module, as we can't be sure to trust them. Not because they would do something malicious (although they might), but because they might do something stupid.

All 3rd party modules must be reviewed in the same manner that we review our own code before it goes to production.

git submodules

Even so, since puppet modules are supposed to be their own projects, it is sometimes improper to maintain them as subdirectories inside of the operations/puppet repository. This goes for 3rd party modules as well as non-WMF specific modules. WMF specific modules can and probably should remain as subdirectories inside of operations/puppet.

We are starting to use git submodules to manage puppet modules. Puppet modules must go through the same review process as anything in operations/puppet.

Adding a new puppet module as a git submodule

First up is adding a new puppet module. 'git submodule add' will modify the .gitmodules file, and also take care of cloning the remote into the local directory you specify.

 otto@localhost:~/puppet# git submodule add https://gerrit.wikimedia.org/r/p/operations/puppet/<my-module> modules/<my-module>

git status shows the modified .gitmodules file, as well as a new 'file' at modules/<my-module>. This new file is a pointer to a specific commit in the <my-module> repository.

 otto@localhost:~/puppet# git status
 # On branch master
 # Changes to be committed:
 #   (use "git reset HEAD <file>..." to unstage)
 #
 #	modified:   .gitmodules
 #	new file:   modules/<my-module>
 #

Commit the changes and post them to gerrit for review.

 otto@localhost:~/puppet# git commit && git review

This will show up as a change in the operations/puppet repository. This change will not show the actual code that is added, but instead, only show diffs with the SHA1s that the new submodule points at. Once this change has been reviewed, approved, and merge, those with operations/puppet checked out will have to run

 git submodule update --init

to be sure that they get the changes required in their local working copies. This is only really necessary if other users want to view the submodule's content locally.

Making changes to a submodule

You should never edit a submodule directly in the subdirectory a operations/puppet working copy. If you want to make changes to a submodule, clone that submodule elsewhere directly. Edit there and submit changes for review.

 git clone https://gerrit.wikimedia.org/r/p/operations/puppet/<my-module>
 cd <my-module>
 # edit stuff, push to gerrit for review.
 git commit && git review

Once your module change has been approved, you can update your operations/puppet working copy so that it points to the SHA1 you want it to.

 cd path/to/operations/puppet/modules/<my-module>
 git pull # or whatever you want to checkout your desired SHA1.
 cd ../..
 git commit .gitmodules && git review

This will push the update of the new SHA1 to operations/puppet for review.

Miscellaneous

VIM guidelines

The following in ~/.vim/ftdetect/puppet.vim can help with a lot of formatting errors

" detect puppet filetype
autocmd BufRead,BufNewFile *.pp set filetype=puppet
autocmd BufRead,BufNewFile *.pp setlocal tabstop=4 shiftwidth=4 softtabstop=4 expandtab textwidth=80 smarttab

And for a proper syntax hightlighting the following can be done

sudo aptitude install vim-puppet
mkdir -p ~/.vim/syntax
cp /usr/share/vim/addons/syntax/puppet.vim ~/.vim/syntax/

And definitely have a look at https://github.com/scrooloose/syntastic which will report puppet errors directly in your buffer whenever you save the file (works for python/php etc as well).

Of course symlinks can be used or you can just install vim-addon-manager to manage plugins. vim-puppet provides ftplugin and indent plugin as well. Maybe there are worth the time, but it is up to each user to decide.

Emacs guidelines

Syntax Highlighting

The puppet-el deb package can be used for emacs syntax highlighting, or the raw emacs libraries can be found here.

puppet-el - syntax highlighting for puppet manifests in emacs

The following two sections can be added to a .emacs file to help with 4 space indentions and trailing whitespace.

;; Puppet config with 4 spaces
(setq puppet-indent-level 4)
(setq puppet-include-indent 4)

;; Remove Trailing Whitespace on Save
(add-hook 'before-save-hook 'delete-trailing-whitespace)

Puppet for Labs

There is currently only one puppet repository, and it is applied both to labs instances and production instances. Classes may be added to the Operations/Puppet repository that are only intended for use on a labs instance. The future is uncertain, though: code is often reused, and labs services are sometimes promoted to production. For that reason, any changes made to Operations/Puppet must be held to the same style and security standards as code that would be applied on a production service.

Packages that use pip, gem, etc.

Other than mediawiki and related extensions, any software installed by puppet should be a Debian package and should come either from the WMF apt repo or from an official upstream Ubuntu repo. Never use source-based packaging systems like pip or gem as these options haven't been properly evaluated and approved as secure by WMF Operations staff.