Extending Chef with custom plugins

A long time ago, in a galaxy not so far away, there lived a peaceful people who used to contest with their friends to see who had the highest uptime server. At the time, it used to be awesome to have a server with more than 1000 days of uptime, to show how Linux was good and stable.

sysadmins

Nowadays, when I see a server with a long uptime, my thoughts are:

“Gosh, this server must be really outdated, it must have a lot of security issues, and God knows what will happen when we need to restart it! The fsck will take forever to run, if the server survives after the reboot!”

But don’t get me wrong: The uptime is still important, but not from a single machine. Now, what matters is the availability of the system as a whole. That’s why here at Movile, we design our platforms to be distributed among a lot of data centers and with fault tolerance, because we know that the hardware will fail sooner or later.

With this in mind, we are going to build a plugin to show the uptime from our nodes, with the ability to search separately by Data Center and order from major to minor. This will be a good opportunity to learn how to expand the Chef Server using a practical example. Also, this plugin is very welcome in times with the GHOST glibc vulnerability around, eating your time

TL; DR

Here is the full version of the plugin, if you’re not interested in theory and examples:

https://gist.github.com/tiago-cruz-movile/00d1eeeb4352b04132bf

And this is an example of its use and output:

knife movile uptime example

About Chef and Knife

Chef is an awesome and powerful automation platform, that transforms complex infrastructure into code. We use it a lot here at Movile and we love it!

automate-all-the-things

Today, I would like to talk a little bit about plugins and how we can easily add new features into them, to extend their power.

If you don’t know the knife yet, all you need to know is that knife is a command-line tool that provides an interface between a local chef-repo and the Chef server. You can also take a look at Chef Quick Overview.

When you install knife, you get more than 80 sub-commands to use, such as:

knife client list
knife cookbook upload
knife environment create
...

Also, there are a lot of plugins available maintained by Chef, and others maintained by Community.

But, if you don’t find what you need, you can always write your own plugin.  So, let’s build a practical plugin to use with knife.

Plugin: knife movile uptime

You can find the theory very well documented at the Custom Knife Plugins page, but if you are not a developer, maybe you will learn faster using a practical example.

Our plugin will be named movile_uptime.rb (available on github), and we have this in the beginning of file:

require 'chef/knife'

module Movile
  class MovileUptime < Chef::Knife

We already have a lot of plugins named “knife movile XYZ”. So, this one will be called by “knife movile uptime”.

Note that, if you want, you can use this class name to override an existing knife plugin. For example, if you want to override the “knife cookbook upload”, you can do this:

class CookbookUpload < Chef::Knife

Tip: You can find the plugins with a lot of functional examples in the directory “/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-VERSION/lib/chef/knife”.

In our plugin, we would like to have an option to limit the number of nodes shown in output. For example, we just want to know the “top 30” machines with higher uptime. So:

  option :limit,
   :short => "-l INT",
   :long => "--limit INT",
   :description => "The number of rows to return",
   :default => 30,
   :proc => lambda { |i| i.to_i }

We can read the user value input using the option “-l number”, like in this next code snippet:

  def run   # start run
    all_nodes = []
    count = 0
    limit = config[:limit] # 30 is default

To do the search using the parameter that the user sent to the plugin, we can do something like this:

    q = Chef::Search::Query.new
    query = @name_args[0] || "chef_environment:production" # search default
    ui.warn "Searching for '#{query}' with #{limit} results ..."

    q.search(:node, query) do |node|
      all_nodes << node
    end

Note that we are searching by nodes in the “production” environment, if the user doesn’t specify the query search.

After this, we can put a nice header to the user:

    header = "%-2s  %-20s %-20s %-13s %-15s %-15s" % ["ID", "Name", "IP Address", "Uptime", "Virtualization", "Platform OS"]
    puts "#{ui.color(header, :cyan)}"

At this point we already have the result of the query, but without any order.

Let’s try to understand how ohai keeps this data:

uptime_flow

Uptime Flow

As you can see, the field uptime[0] is a number that can return days, hours, minutes or seconds. So, to sort these fields correctly, we need to check which string is described in uptime[1]. Example:

    #
    ## Begin: ORDER by uptime
    #
    all_nodes.sort! do |n1, n2|

      uptime_split_1 = n1.uptime.split(" ")
      uptime_split_2 = n2.uptime.split(" ")

      uptime_n1_0 = uptime_split_1.first
      uptime_n2_0 = uptime_split_2.first
      uptime_n1_1 = uptime_split_1[1]
      uptime_n2_1 = uptime_split_2[1]

      # order by days, ignore hours and minutes
      if uptime_n1_1 == "days" or uptime_n1_1 == "day"
        updays_n1 = uptime_n1_0
      else
        updays_n1 = 0
      end

      if uptime_n2_1 == "days" or uptime_n2_1 == "day"
        updays_n2 = uptime_n2_0
      else
        updays_n2 = 0
      end

First, we split uptime’s output using space (” “). If uptime[1] is  “day” or “days”, we use the number described in uptime[0] as updays. If the result of uptime[1] is different from that, it will probably be “minutes”, “hours” or even “seconds”, but to us, it is enough to say that it is running at 0 updays.

You can also check if the n1.uptime exists before trying to split them. It’s a good idea, but I removed them here to try to keep this code clean. You can find the full version on github. And now we can do the “order by”, using this ruby code:

      # if both nodes have the uptime property
      if n2.has_key?("uptime") && n1.has_key?("uptime")
        # then we do the comparison and sort between using integer
        (updays_n2.to_i or 0) <=> (updays_n1.to_i or 0)
      elsif n1.has_key?("uptime") # if only the n1 node has de property
        # n1 should come before than n2
        -1
      else # only node 2 has the property
        # n2 should come before n1
        1
      end
    end
    #
    ## End: ORDER by uptime
    #

At this point, we already have the result of our search, ordered by uptime, and we just need to display it on the screen:

    all_nodes.each do |node|

      count += 1
      if node.has_key?("ec2")
        ip = node['ec2']['public_ipv4']
      else
        ip = node['ipaddress']
      end

      hostname = node['hostname']

      # x days or hours to display
      if node.has_key?("uptime")
        uptime_split = node['uptime'].split(" ")
        updays = uptime_split[0] + " " + uptime_split[1]
      else
        updays = 0
      end

      # virtual of physical
      if node['virtualization'] && node['virtualization']['role']
         system = node['virtualization']['system']
         role = node['virtualization']['role']
         virt = "#{system}:#{role}"
      else
         virt = "physical"
      end

      platform = node['platform'] + " " + node['platform_version']

      # limit output
      if limit >= count
         output = "%02d  %-20s %-20s %-13s %-15s %-15s" % [ "#{count}","#{hostname}","#{ip}","#{updays}","#{virt}","#{platform}" ]
         ui.msg(output)
      else
        break
      end
    end

    ui.msg "Total #{all_nodes.size} nodes found"

    end
  end # End def run

The final output should look something like this:

$ knife movile uptime "name:node*" -l 3
WARNING: Searching for 'name:node*' with 3 results ...
ID  Name                IP Address        Uptime       Virtualization  Platform OS
01  node1               192.168.0.1       5 days       vmware:guest    centos 6.10
02  node2               192.168.0.2       1 days       vmware:guest    centos 6.10
03  node3               192.168.0.3       32 minutes   vmware:guest    centos 6.10

Install the plugin

To install the plugin, just copy the file to your ~/.chef/plugins/knife.

We prefer to keep all the plugins in our chef-repo (managed by git) and we just make a link:

$ ln -s ~/chef-repo/plugins ~/.chef/plugins

Using the above link, all of our engineering team will have instant access to the new plugins or their new features.

I really hope that you enjoy this content, but I must remind you that I’m not a developer and probably there’s a better way to do this ;)

it works every time :)

References