Vagrant and Puppet: testing and prototyping infrastructure

In this post I’m going to show you how to provision multiple VMs with Vagrant and Puppet. This post is useful for your if you either want to learn Puppet in safe environment or already running Puppet in production and would like to learn how to test it before applying changes or new modules to production.

Prior knowledge of Puppet of Vagrant is not required, but would make going through easier. If you would like to familiarise yourself with Vagrant, take a look at an introduction to Vagrant.

For those who are new to Puppet or Vagrant: In a nut shell, Puppet is a tool which allows to automate servers and software provisioning, ensure configuration consistency and allows to define your infrastructure in a modularised and repeatable manner. In essence it allows to ‘code’ your infrastructure and build your systems on either VMs or bare metal. Vagrant, with help of provisioners, allows to build environments (VMs) in a repeatable and automated manner. Puppet is one of multiple provisioners that Vagrant supports.

You may ask: ‘What’s the point?’ or ‘What are the benefits of doing so?’ Myself, I am using Vagrant for development and testing environments (server configurations, services, integration, etc.) before applying changes to production environments.

Let’s start with a story…

It’s a Friday afternoon. Couple of days ago you released a new feature which made your social media platform very popular. That’s great! You noticed on your monitoring dashboard that the application is getting slower and slower under heavy load. Ok, it’s time to spin-off another webapp VM in your cloud platform. You think, “We already have two webapp VMs. Spinning up another is one, max two hours…”. Four hours later you finished installing ‘that last missing library’, and now you just need to copy ‘config file special.properties‘. Finally it’s done, great! Now you are adding newly provisioned VM into the load balancer. It’s 10 p.m. and you are finally done! Last look at Google Analytics – all great, still more on-line visitors than ever before.

At 2 a.m. you are awaken by your mobile phone. One of your colleagues is calling. “We have a problem with our app. Looks like we are loosing sessions and users are being logged out”. It rings a bell and you think, “did I set up session clustering for the new VM?”.

Such story may become a reality for companies where web environments are managed and provisioned manually. Where provisioning and configuration changes are happening ad-hoc or in a non-systematic or non-automated manner. The larger the environment, the more complex, time consuming and error-prone that process could be. You can overcome or mitigate that problem by automating configuration management, systems provisioning and facilitating testing of live-like environments in your development or QA environments. There are multiple systems which allow configuration management and automation. Examples of those are: Chef, CFEngine and Puppet.

Dev environment software requirements

For this tutorial you can use any Linux/OS X as long you have Vagrant and VirtualBox installed. I am going to run the examples in a following environment:

OS: Ubuntu 13.10
Vagrant: 1.4.2
VirtualBox: 4.2.16

Provisioning first VM with Vagrant and Puppet

I will show you how to step by step provision a multi-VM test environment with Vagrant and Puppet. You could either clone the project from my GitHub repository and follow the tutorial or create everything from scratch as we go along. If you are new to Puppet I would suggest you to create directories and files yourself as this will give you better insight into structure of the project. Vagrant will use VirtualBox provider (see requirements) so as soon as you start a VM, you can see it running in your VirtualBox manager.

First, let’s create few directories, Vagrantfile and Puppetfile:

marcin@RDS:~/development/rds/vagrant$ mkdir puppetMultiVMs
marcin@RDS:~/development/rds/vagrant$ cd puppetMultiVMs
marcin@RDS:~/development/rds/vagrant/puppetMultiVMs$ mkdir dist modules hieradata manifests site
marcin@RDS:~/development/rds/vagrant/puppetMultiVMs$ mkdir -p site/profile/manifests site/role/manifests
marcin@RDS:~/development/rds/vagrant/puppetMultiVMs$ touch Puppetfile Vagrantfile

puppetMultiVMs is the root folder of the project. That’s the folder you want to check to git repo. Inside, we created:

dist: here we will put our own modules, config files, properties, etc.
modules: Vagrant will download here all external modules. You want to store your project in a git repo right? Then this folder should be on git ignore list
hieradata: this is optional, we could put all hiera configs here (check out what’s hiera and why it’s a good idea to use it)
manifests: site.pp and nodes.pp
site: here we have another directories: profile and role.
Puppetfile: here, we will define all external Puppet modules we want to use
Vagrantfile: is a Vagrant configuration file
Once you created all directories and touched the files, your project tree should look like:

marcin@RDS:~/development/rds/vagrant/puppetMultiVMs$ tree

.
├── dist
├── hieradata
├── modules
├── Puppetfile
├── site
│   ├── profile
│   │   └── manifests
│   └── role
│       └── manifests
└── Vagrantfile

8 directories, 2 files

In profile we are specifying all services and packages we want to have running within role. Role is composed of profiles and classes and defines what a VM/server should be (i.e. web server, DB, proxy, etc). For example, we can have profiles java and jetty and a role webapp which would include those profiles. We can run multiple VMs with given role, or multiple roles can re-use same profiles. If you are new to Puppet, this may be bit confusing – lots of things are going on here. So let’s get cracking on the example.

### Testing Puppet-provisioned VMs When I was writing this tutorial, the hardest part was to come up with an example which is not trivial, not to complex and could be also useful for a DevOps team. What I came up with is fully set-up quality management platform: SonarQube. I will use PostgreSQL as its DB and Nginx as a reverse proxy running in front of it. Sounds good? In this example I’m going to configure and provision 2 VMs. First VM will act as a reverse proxy (Nginx) and on the second one SonarQube and PostgreSQL will be running. In fact, we don’t need Nginx nor PostgreSQL to run SonarQube but this is fairly standard set-up. I decided to use Nginx, PostgreSQL and SonarQube together to showcase process of testing and provisioning multi-VM environments with Puppet and Vagrant.

In our imaginary live environment, Sonar/PostgreSQL VM has a private network configured and its hostname is sonar.rds.priv (IP irrelevant). The Nginx VM private network hostname is devopsproxy.rds.priv. There is also a public domain sonar.rds.pub which is used by DevOps team to view SonarQube dashboard. A record of sonar.rds.pub is pointing to an external IP of devopsproxy.rds.priv

First, let’s edit Vagrantfile (previously created). Just copy-paste the following content into it:

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  # CentOS 6.5 for VirtualBox
  config.vm.provider :virtualbox do |virtualbox, override|
    override.vm.box     = 'puppetlabs-centos-65-x64-vbox'
    override.vm.box_url = 'http://puppet-vagrant-boxes.puppetlabs.com/centos-65-x64-virtualbox-puppet.box'
    virtualbox.memory   = 512
    virtualbox.customize ["modifyvm", :id, "--cpus", "1"]
  end

  # VM Settings for reverse proxy, I call it devopsproxy
  config.vm.define :devopsproxy do |devopsproxy|
    devopsproxy.vm.hostname = 'devopsproxy.rds.priv'
    # this IP is Vagrant specific and doesn't need to match your production environment
    devopsproxy.vm.network "private_network", ip: "192.168.56.10"

    # Install git & r10k, followed by all the required puppet modules defined in Puppetfile
    devopsproxy.vm.provision "shell", inline: 'rpm -q git &> /dev/null || yum install -q -y git'
    devopsproxy.vm.provision "shell", inline: 'gem query --name r10k --installed &> /dev/null || gem install --no-rdoc --no-ri r10k -v 1.2.0rc2'
    devopsproxy.vm.provision "shell", inline: 'cd /vagrant && r10k -v info puppetfile install'
    # Production environment should have DNS configured for private network, in Vagrant, we need to hardcode it
    devopsproxy.vm.provision "shell", inline: 'echo \'192.168.56.10 devopsproxy.rds.priv\' >> /etc/hosts'
    devopsproxy.vm.provision "shell", inline: 'echo \'192.168.56.20 sonar.rds.priv\' >> /etc/hosts'

    # Configure the puppet provisioner
    devopsproxy.vm.provision "puppet" do |puppet|
      puppet.manifest_file     = "site.pp"
      puppet.module_path       = ["site", "dist", "modules"]
   #  if you know what hiera is, feel free to uncomment and use it
   #  puppet.hiera_config_path = "hieradata/hiera.yaml"
      puppet.options       = "--verbose"
    end

  end

  # VM Settings for sonar server
  config.vm.define :sonar do |sonar|
    sonar.vm.hostname = 'sonar.rds.priv'
    sonar.vm.network "private_network", ip: "192.168.56.20"

    # Install git & r10k, followed by all the required puppet modules defined in Puppetfile
    sonar.vm.provision "shell", inline: 'rpm -q git &> /dev/null || yum install -q -y git'
    sonar.vm.provision "shell", inline: 'gem query --name r10k --installed &> /dev/null || gem install --no-rdoc --no-ri r10k -v 1.2.0rc2'
    sonar.vm.provision "shell", inline: 'cd /vagrant && r10k -v info puppetfile install'
    # Production environment should have DNS configured for private network, in Vagrant, we need to hardcode it
    sonar.vm.provision "shell", inline: 'echo \'192.168.56.10 devopsproxy.rds.priv\' >> /etc/hosts'
    sonar.vm.provision "shell", inline: 'echo \'192.168.56.20 sonar.rds.priv\' >> /etc/hosts'

    # Configure the puppet provisioner
    sonar.vm.provision "puppet" do |puppet|
      puppet.manifest_file     = "site.pp"
      puppet.module_path       = ["site", "dist", "modules"]
    #  if you know what hiera is, feel free to uncomment and use it
    # puppet.hiera_config_path = "hieradata/hiera.yaml"
      puppet.options       = "--verbose"
    end
  end

end

In Vagrantfile, we defined two VMs. Vagrant recognises them as devopsproxy and sonar. Whatever operation on those VMs we want to do with Vagrant we need to specify the VM’s name as well. Vagrant defined VM names are specific to this particular test environment and have nothing to do with how you name your VMs in a production environment.

There are few things going on in the config: we defined virtualbox as a provider (lines 5-10), declared that we want to use CentOS 6.5 (lines 6-7), each VM will have 512MB of RAM (line 8) and allocated 1 vCPU (line 9). Then, each VM has allocated a hostname (devopsproxy.rds.priv and sonar.rds.priv), configured private network between them and host system. The network configuration will happen during provisioning of VM. Git and r10k will be installed. Also, a Puppet provisioner has been configured so during provisioning of the VM, appropriate Puppet modules will be applied (lines 51-57).

Next comes manifests/site.pp, define it as following

# The filebucket allows for file backups to the server
filebucket { main: server => "puppet" }

# Set global defaults - including backing up all files to the main filebucket and adds a global path
File { backup => main }
Exec { path => "/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin" }

# Purge any unmanaged firewall resources
resources { "firewall":
  purge => true
}

# Setup default firewall rule ordering
Firewall {
  before  => Class['profile::firewall::post'],
  require => Class['profile::firewall::pre'],
}

# Load in all of our nodes
import "nodes"

It’s a basic config which will be read by Vagrant’s Puppet-provisioner during provisioning of VMs or by Puppet once committed to production environment. Pay attention to the last line, it will import nodes definition from manifests/nodes.pp which is the next file we will look at:

# Default (unconfigured) node should only receive base profile
node default {
  include role
}

###########################
## Vagrant test VMs      ##
###########################
# Vagrant nginx
node 'devopsproxy.rds.priv' {
  include role::devopsproxy
}

# Vagrant sonar
node 'sonar.rds.priv' {
  include role::sonar
}

By default each node will have default role (called role). Nodes are matched based on their hostname in a very flexible manner, for example if you have several nodes to which you want assign same role, you can for example use regex to match them. Take a look at node definitions for more examples.

Now we have baseline configs of Vagrant and Puppet ready. It’s time to define roles and profiles. From manifests/nodes.pp you can guess we need to define role devopsproxy and sonar (lines 11, 16). Before we define roles, let’s look at profiles which will be included in those roles. To run earlier described stack we need Nginx, SonarQube and PostgreSQL, we will define one profile for each of those. As mentioned earlier, profiles live in directory site/profile/manifests. Let’s touch each of those:

marcin@RDS:~/development/rds/vagrant/puppetMultiVMs$ cd site/profile/manifests/
marcin@RDS:~/development/rds/vagrant/puppetMultiVMs/site/profile/manifests$ touch postgresql.pp sonarqube.pp nginx.pp

Roles will use either our own modules or external one. In this tutorial we will only use external modules – there are already decent modules for all we need to run our stack. To add the modules (sonarqube, nginx, postgresql), simply edit Puppetfile and declare them along with other required modules:

#Base system
mod "puppetlabs/firewall", "1.0.2"
mod "puppetlabs/java", "1.1.1"

#Sonar with dependencies
mod "puppetlabs/stdlib", "4.1.0"
mod "puppetlabs/concat", "1.0.0"
mod "puppetlabs/inifile", "1.0.0"
mod "maestrodev/wget", "1.4.4"
mod "maestrodev/maven", "1.2.1"
mod "maestrodev/sonarqube", "2.1.1"

mod "jfryman/nginx", "0.0.9"
mod "puppetlabs/postgresql", "3.3.3"

Ok, now we can write first profile: PostgreSQL; your site/profile/manifests/postgresql.pp should be as following:

class profile::postgresql {
  anchor { 'profile::postgresql::begin': } ->
    class { '::postgresql::globals':
      manage_package_repo => true,
      version             => '9.3',
    } ->
      class { '::postgresql::server':
      listen_addresses           => '*',
      ipv4acls                   => [
                               'local      sonar            sonar  md5',
                               'host       sonar            sonar  127.0.1.1/32         md5'],
      } ->
        anchor { 'profile::postgresql::end': }
}

Now, having the first profile defined we can actually use it as a part of a role and provision a VM. Let’s start with a default role. That’s a role which will be inherited by all other roles. Having a default role is a convenient way of assuring that each role has properly initialised shared settings, services and modules (i.e. firewall rules, NFS mount points, etc.). Default role lives in site/role/manifests/init.pp and could be as simple as an empty class:

class role {

}

Ok, now let’s jump bit ahead and define sonar role. First, we will only add profile::postgresql just to fire the VM and see if everything glues together. Define site/role/manifests/sonar.pp as following:

class role::sonar inherits role {
  class { 'profile::postgresql': }

}

Just to check if all files are in order, go to root folder of your project, and check with tree if folder/files structure matches mine:

marcin@RDS:~/development/rds/vagrant/puppetMultiVMs$ tree
.
├── dist
├── hieradata
├── manifests
│   ├── nodes.pp
│   └── site.pp
├── modules
├── Puppetfile
├── site
│   ├── profile
│   │   └── manifests
│   │       ├── nginx.pp
│   │       ├── postgresql.pp
│   │       └── sonarqube.pp
│   └── role
│       └── manifests
│           ├── init.pp
│           └── sonar.pp
└── Vagrantfile

9 directories, 9 files

Now we will build and start sonar VM with vagrant up sonar:

marcin@RDS:~/development/rds/vagrant/puppetMultiVMs$ vagrant up sonar
Bringing machine 'sonar' up with 'virtualbox' provider...
[sonar] Importing base box 'puppetlabs-centos-65-x64-vbox'...
[sonar] Matching MAC address for NAT networking...       
[sonar] Setting the name of the VM...

(dozens of lines of console output here...)

Notice: /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Triggered 'refresh' from 1 events
Info: Creating state file /var/lib/puppet/state/state.yaml
Notice: Finished catalog run in 39.41 seconds

If all goes good, you will see on your console execution of Puppet provisioner as well as stages of provisioning PostgreSQL. If something went wrong (i.e. you got a typo in your configs) and need to re-run the process here is an explanation of few Vagrant commands:

vagrant up [vm name] – will create and start given VM
vagrant destroy [vm name] – will shut down and destroy [vm name] – if you don’t need it any more or would like to rebuild it from scratch
vagrant provision [vm name] – will run provisioners on [vm name] which is already up, that’s useful when you have [vm name] already running but you did some changes to Puppet configs and would like to apply changes without destroying/up’ing the VM (which would take more time)

If you would like to login into the VM, execute following from the root of your project: vagrant ssh sonar. Once you are in you can look around, check network settings, take a look at PostgreSQL configs (take a look at pg_hba.conf and compare with postgresql.pp). If you need root privileges, you can simply sudo su –

If all went good, your partly ready Sonar VM is running. Let’s move on and add SonarQube to it. It’s bit more complex as it depends on Maven and Java. Define all profiles as following:

site/profile/manifests/java.pp :

class profile::java {
  anchor { 'profile::java::begin': } ->
    class { '::java': } ->
      anchor { 'profile::java::end': }
}

site/profile/manifests/maven.pp :

class profile::maven {
  require profile::java

  anchor { 'profile::maven::begin': } ->
    class { '::maven': } ->
      anchor { 'profile::maven::end': }
}

site/profile/manifests/sonarqube.pp :

class profile::sonarqube {
  require profile::maven
  require profile::postgresql

  $jdbc = {
    url               => 'jdbc:postgresql://localhost/sonar',
    username          => 'sonar',
    password          => 'AfAz37my0apr',
  }

  anchor { 'profile::sonarqube::begin': } ->
    postgresql::server::db { 'sonar':
    user     => 'sonar',
    password => postgresql_password('sonar', 'AfAz37my0apr'),
  } ->
    class { '::sonarqube' :
      arch         => 'linux-x86-64',
      version      => '4.3',
      user         => 'sonar',
      group        => 'sonar',
      service      => 'sonar',
      installroot  => '/usr/local',
      home         => '/var/local/sonar',
      download_url => 'http://dist.sonar.codehaus.org',
      jdbc         => $jdbc,
      log_folder   => '/var/local/sonar/logs',
    } ->
               anchor { 'profile::sonarqube::end': }
}

Now we can add profile::sonarqube to role sonar. Edit site/role/manifests/sonar.pp as following:

class role::sonar inherits role {
  class { 'profile::postgresql': }
  class { 'profile::sonarqube': }

}

That was easy, right? Now, if you have sonar VM already running, you can update it by executing vagrant provision sonar from root directory of your project:

marcin@RDS:~/development/rds/vagrant/puppetMultiVMs$ vagrant provision sonar
[sonar] Running provisioner: shell...
[sonar] Running: inline script

(...)

Info: /Stage[main]/Sonarqube/Service[sonarqube]: Unscheduling refresh on Service[sonarqube]
Notice: /Stage[main]/Maven::Maven/Exec[maven-untar]/returns: executed successfully
Notice: /Stage[main]/Maven::Maven/File[/usr/bin/mvn]/ensure: created
Notice: Finished catalog run in 69.91 seconds

You will see on your console, that a new DB and role (sonar) were created, SonarQube, Maven and Java installed.

Ok, do you remember, when we were defining VMs in Vagrantfile? We specified IP 192.168.56.20 as a private network IP for the VM. You can either ping it or ssh into VM (vagrant ssh sonar) and run ifconfig to confirm. Now the moment of truth, point your browser to http://192.168.56.20:9000 (9000 is SonarQube’s default port), you should see:

SonarQube

Ta-daaa! SonarQube up and running! If it was first time you were provisioning something with Puppet, it may seem like a lot of work, your head might be spinning around different config files (Puppetfile, postgresql.pp, maven.pp, java.pp, sonarqube.pp, etc.) but those are reusable configs and once you have your PostgreSQL profile defined, you can reuse it with other applications. Same for Java and Maven (i.e. it’s required for Jenkins for example). So in fact, you define all those once, and then defining new roles and provisioning new VMs it is dead simple and very fast. Migration between data centres and environments, rebuilding VMs in production environments becomes magnitude-levels easier and simpler!

To complete this tutorial, lets finally create profile for Nginx and role for devopsprox. Then we point it to our SonarQube instance.

Define site/profile/manifests/nginx.pp as following:

class profile::nginx {

  anchor { 'profile::nginx::begin': } ->
    class { '::nginx':
      worker_processes   => $::processorcount,
      worker_connections => 10240,
      proxy_buffers      => '32 8k',
      proxy_buffer_size  => '8k',
    } ->
      anchor { 'profile::nginx::end': }
}

The above Nginx profile doesn’t include any SSL profile. For a production environment I would advise to add one. If you do so, just define a SSL profile (tip: in that SSL profile, you can also include rngd to run and generate entropy required for SSL).

Nginx module we are using doesn’t have any dependencies. We can proceed with defining role. Your site/role/manifests/devopsproxy.pp should look like:

class role::devopsproxy inherits role {
    class { 'profile::nginx': }

    Nginx::Resource::Vhost {
        vhost_cfg_append       => {'ignore_invalid_headers' =>  'off',
                                   'gzip_types'             =>  'text/plain text/css application/json text/xml application/xml application/xml+rss text/javascript application/javascript',
                                   'gzip_buffers'           =>  '4  256k',
                                   'gzip_comp_level'        =>  '5', },
        location_cfg_append    => {'proxy_redirect' => 'default'},
    }

    nginx::resource::vhost { 'sonar.rds.pub':
        #private network
        proxy                  => 'http://sonar.rds.priv:9000',
    }

    nginx::resource::location { 'sonar-expires':
        ensure               => present,
        location             => '~* \.(png|jpg|jpeg|gif|ico|js|css)$',
        #private network
        proxy                => 'http://sonar.rds.priv:9000',
        #external host
        vhost                => 'sonar.rds.pub',
        location_cfg_prepend => {'expires' => '1y'},
    }
}

Now, we can start our second VM:

marcin@RDS:~/development/rds/vagrant/puppetMultiVMs$ vagrant up devopsproxy
Bringing machine 'devopsproxy' up with 'virtualbox' provider...
[devopsproxy] Importing base box 'puppetlabs-centos-65-x64-vbox'...
[devopsproxy] Matching MAC address for NAT networking...
[devopsproxy] Setting the name of the VM...
[devopsproxy] Clearing any previously set forwarded ports...

(...)

Notice: /Stage[main]/Nginx::Service/Service[nginx]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Nginx::Service/Service[nginx]: Unscheduling refresh on Service[nginx]
Info: Creating state file /var/lib/puppet/state/state.yaml
Notice: Finished catalog run in 18.24 seconds

As this is a test environment, we need to point sonar.rds.pub to 192.168.56.10 so we can show the proxy works. On your host system simply edit /etc/hosts and add following line at the end:

192.168.56.10 sonar.rds.pub

Once that’s done, you should be able to point your browser to sonar.rds.pub and see SonarQube running in behind Nginx

SonarQube via nginx proxy

Summary

To me, getting infrastructure right is as important as writing well functioning application code. If you think that at the end of the day, your amazing webapp needs to run somewhere else than your laptop and depends on other services (DB, load balancer, indexing engine, etc.) then getting run environment right, in a repeatable manner makes sense. Making process of provisioning and configuration of your infrastructure repeatable, testable and robust will make your WebOps/DevOps team happy 🙂 It will also help you to grow and migrate your infrastructure. In case any VM dies (i.e. hardware failure) you can provision new one in an automated way and have it configured exactly as it should be. Puppet is not the only tool for that. You may also check CFEngine, Chef and Docker.

2014-07-06 INFRASTRUCTURE
infrastructure vagrant VirtualBox virtualisation