Change the AWS account to launch instances with Chef Test Kitchen

The Chef SDK contains Test Kitchen, that can launch server instances to test your Chef cookbooks. Test Kitchen uses the “chef_zero” provisioner to use your workstation as the virtual Chef server.

To switch Test Kitchen to launch instances in another AWS account

  1. In the .kitchen.yml file update the
    1. availability_zone
    2. subnet_id
    3. aws_ssh_key_id
    4. security_group_ids. Make sure the management ports are open:
      1. for Linux: port 22 for SSH, or
      2. for Windows: port 5985 – 5986 for WinRM and port 3389 for Remote Desktop
    5. ssh_key
  2. In the default.rb attribute file update the
    1. …[‘aws_profile’]
  3. In the ~/.aws/credentials file update the
    1. [default] section

Test Chef cookbooks locally on a virtual workstation

When you use a virtual machine to write your Chef cookbooks you may want to test them locally with Vagrant.

This nested virtual machine cannot use a 64 bit operating system, because to run a 64 bit virtual machine, the host computer’s CPU has to provide the CPU Extensions. Currently only physical CPUs can provide the CPU Extensions, no virtualization software can do that.

The Chef Test Kitchen uses Vagrant to launch test instances. In the background Vagrant uses the VirtualBox engine, so your virtual workstation needs to have Vagrant and VirtualBox installed.

To make sure TestKitchen uses the 32 bit operating system first we will set up the 32 bit virtual machine in Vagrant.

Install the necessary software

  1. Install the Chef Development Kit
  2. Install VirtualBox
  3. Install Vagrant

Create the 32 bit virtual machine

  1. Open a terminal window
  2. Initialize the vagrant virtual machine. Vagrant will download the OS image and start the virtual machine.
     vagrant init hashicorp/precise32
  3. Stop the virtual machine
    vagrant halt
  4. Make sure all Vagrant virtual machines are off
     vagrant global-status

    The result looks like this:

    id name provider state directory
    ----------------------------------------------------------------------------------------------------------------------------
    21590cd default virtualbox running C:/Users/interview/Documents
  5. Shut down the running virtual machines. Add the id of the virtual machine to any command to administer it from any directory.
    vagrant halt 21590cd

Set up the Vagrant connection in the Chef cookbook for Test Kitchen

  1. Navigate to cookbook folder
  2. Edit the .kitchen.yml file:Set the network port forwarding in the driver section:
    driver:
      name: vagrant
      network:
        - ["forwarded_port", {guest: 2200, host: 2222}]
        - ["private_network", {ip: "127.0.0.1"}]
    
    platforms:
      - name: hashicorp/precise32
  3. Test the cookbook
    kitchen converge hashicorp-precise32

 

Migrating from Chef Client version 12 to 13

Chef is under heavy development, every new major version introduces new features, and many times changes, deprecates, or removes some commands or options.

Chef Client 13 introduced a new way of handling reboots and Windows scheduled tasks.

reboot resource

In Chef version 12, the “:reboot_now” action continued the execution of the Chef cookbook but after the specified “delay_mins” the instance was forcefully rebooted even if an application was still running. This could unexpectedly interrupt software installations that did not finish during the specified delay.

This is not a problem if the reboot resource was called with the “:request_reboot” action because in that case, the delay before the reboot starts when the cookbook execution has completed.

In Chef version 13, when the reboot command is ultimately sent to the operating system handling the”:reboot_now” or”:request_reboot” action, a Ruby error is raised to stop the execution of the Chef cookbooks. This prevents the execution of further commands, that would be interrupted by the imminent reboot but puts misleading error messages into the “C:/Chef/Cache/chef-stacktrace.out” file.

Generated at …
Chef::Exceptions::Reboot: Rebooting server at a recipe’s request. Details: {:delay_mins=>1, :reason=>”…”, :timestamp=>…, :requested_by=>”…”}

This is unnecessary in the case of the “:request_reboot” action because the instance is only rebooted when the Chef cookbook execution has already been completed. Chef still raises the error in both cases.

In Test Kitchen

In Chef 12 when the instance was rebooted the DevOps engineer had to wait for the instance to reboot and issue the “kitchen converge” command again to execute the cookbook again.

In Chef 13 Test Kitchen executes the “kitchen converge” command if the cookbook execution failed. This simulates the execution of the Chef Client when the instance starts after the reboot. The error raised by the reboot command helps to test cookbooks with multiple reboots without the need of paying attention to every reboot. This new behavior causes a timeout error when Test Kitchen tries to access the instance while it is still rebooting.

To avoid this we need to add lines to the provisioner section of the .kitchen.yml file

provisioner:
  name: chef_zero
  always_update_cookbooks: true
  retry_on_exit_code: # An array of exit codes that can indicate that kitchen should retry the converge command. Defaults to an empty array.
    - 35              # Reboot is scheduled
    - 20              # The Windows system cannot find the device specified.
    - 1               # Generic failure
  wait_for_retry: 120 # Number of seconds to wait between converge attempts. Defaults to 30.
                     # Chef starts another converge after reboot was requested.
                     # If the 'wait_for_retry' (seconds) is shorter than the 'delay_mins', the converge starts before the reboot happens.
                     # 'wait_for_retry' (set in seconds!) has to be greater than any 'delay_mins' passed to reboot
  max_retries: 10 # Number of times to retry the converge before passing along the failed status. Defaults to 1.

It is very important to set the value of the “wait_for_retry” option greater in seconds, than the longer “delay_mins” set in any “reboot” resources of the cookbook, for Test Kitchen to wait with the retry of the cookbook execution until the instance is rebooted. If Test Kitchen does not wait for the reboot, the next “converge” starts before the instance is rebooted, and the execution will be interrupted by the reboot. After the “wait_for_retry” delay Test Kitchen will try to connect to the instance for “max_retries” times waiting “wait_for_retry” seconds between the tries until the box starts to accept WINRM connections again.

 

windows_task resource

There are two major changes in the windows_task resource.

If the scheduled task already exists and the resource tries to create it again with the same values, an error is thrown

It looks like the error is in the step that tries to parse the start time of the task. Make sure

To change an existing scheduled task

Chef version 13 does not understand the “:change” action. To change an existing scheduled task, use the “:create” action. If you run your cookbooks on multiple Chef versions, you have to make sure those work on the old and the new versions of Chef.

The transition is not so simple, both versions of Chef throw errors when we switch to the :create action.

If you try to modify an existing scheduled task with the “:create” action in Chef 12:

NoMethodError: undefined method `tr’ for nil:NilClass

 

in Chef 13:

Invalid starttime value.

 

Use PowerShell to modify existing scheduled tasks.

To delay the execution of the chef-client scheduled task by two hours to give enough time for the application install between reboots:

$taskpath = "\\"
$taskname = "#{chef_client_task_name}"

$tsDelay = New-TimeSpan -Days 0 -Hours 2 -Minutes 0
$newStartTime = (get-date) + $tsDelay

$tsRepetition = New-TimeSpan -Days 0 -Hours 0 -Minutes 30
$tsDuration = ([timeSpan]::maxvalue)

$trigger = New-ScheduledTaskTrigger -Once -At $newStartTime -RepetitionInterval $tsRepetition -RepetitionDuration $tsDuration

$user = (schtasks.exe /query /s localhost /V /FO CSV | ConvertFrom-Csv | Where { $_.TaskName -eq ($taskpath + $taskname) })."Run As User"
$principal = New-ScheduledTaskPrincipal -UserID $user -LogonType S4U -RunLevel Highest

set-scheduledtask -TaskPath $taskpath -TaskName $taskname -Trigger $trigger -Principal $principal

To set the RunAs user of a Scheduled Task

$user = "NEW_USER_NAME"
$password = "NEW_USER_PASSWORD"

$taskpath = "\\"
$taskname = "TASK_NAME"

set-scheduledtask -TaskPath $taskpath -TaskName $taskname -User $user -Password $password

 

 

 

Calling a resource in the Chef recipe

During major Chef Client version upgrades, some instructions need to be changed based on the version of the Chef Client. For example, upgrading from version 12 to version 13, the “windows_task” resource requires a different action to make changes to existing scheduled tasks.

To call a Chef resource in your cookbook, you need another resource to call from. If you execute the notifies command by itself you get the error message:

NoMethodError
————-
undefined method `declared_key’ for “….[…]”:String

In this example, we call the “windows_task” resource to delay the execution of the chef-client by two hours to give enough time for the server configuration to complete. In Chef version 12 we need to set the action to “:change”, in Chef version 13, it needs to be “:create”.

if ( Chef::VERSION.to_f >= 13.0 )
  puts "Version 13.0 or higher"

  ruby_block "call delay chef-client" do
    block do
      # Do nothing, "block do" has to be here
    end
    notifies :create, "windows_task[delay chef-client]", :immediately
  end
else
  puts "Before version 13.0"

  ruby_block "call delay chef-client" do
    block do
      # Do nothing, "block do" has to be here
    end
    notifies :change, "windows_task[delay chef-client]", :immediately
  end
end

future_time = Time.now.utc + 7200               # Offset is in seconds: 2 hour delay
new_start_day = future_time.strftime('%m/%d/%Y')
new_start_time = future_time.strftime('%H:%M')

windows_task 'delay chef-client' do
  task_name 'chef-client'
  start_day new_start_day
  start_time new_start_time
  action :nothing
  guard_interpreter :powershell_script
  only_if "(Get-ScheduledTask | Where-Object {$_.TaskName -eq 'chef-client'}).TaskName.Length -gt 0"
end

 

Make decisions in your Chef recipe based on the version of the Chef Client

There are times when you have to make a decision in your Chef recipe, based on the version of the Chef Client installed on the node.

There are two ways to get the version of the installed Chef Client:

Chef::VERSION
node['chef_packages']['chef']['version']

To make a decision based on the installed Chef Client version

if ( Chef::VERSION.to_f >= 14.0 )
  puts "Version 14.0 or higher"
else
  puts "Before version 14.0"
end

 

Knife commands

Knife is a ChefDK command line tool to access the Chef server. We use it to upload our cookbooks, environment files, and data bags to the Chef server, and query the server for information on cookbooks and nodes.

Get the list of cookbooks used by a node with version information

knife node show <node-name> -a cookbooks

 

Test your cookbook in Chef Test Kitchen against multiple version of the Chef Client

In large environments, during the Chef Client version change, some older servers still run the prior version of the Chef Client, the newly created servers launch with the new version of the Chef Client.

It is very important to test your cookbooks with the old and the new versions of Chef Client.

To specify the Chef Client version for the Test Kitchen “converge” run, add the highlighted three lines to the provisioner section of the .kitchen.yml file:

provisioner:
  name: chef_zero
  product_name: chef          # Install the Chef Client
  product_version: 14.0.0   # Set the Chef Client version, default=latest
  install_strategy: always    # Forces the installation of the specified Chef Client version even if another version is installed

During “converge”, test kitchen will download and install the specified version of Chef Client.

Warning:

If the installed Chef Client version is newer than the specified version, Test Kitchen downloads the specified version but executes the already installed newer version.

 

windows_task, ArgumentError: invalid date

When the windows_task resource is called to “create” a Windows Scheduled Task that already exists, an error message is returned. In the past, the “modify” action was responsible for the modification of the scheduled tasks, since Chef 13 the “create” action would update the task if exists.

ArgumentError: …[…] (…:… line …) had an error: ArgumentError: windows_task[…] (… line …) had an error: ArgumentError: invalid date

C:/opscode/chef/embedded/lib/ruby/gems/2.4.0/gems/chef-13.6.4-universal-mingw32/lib/chef/provider/windows_task.rb:507:in `strptime’
C:/opscode/chef/embedded/lib/ruby/gems/2.4.0/gems/chef-13.6.4-universal-mingw32/lib/chef/provider/windows_task.rb:507:in `parse_day’
C:/opscode/chef/embedded/lib/ruby/gems/2.4.0/gems/chef-13.6.4-universal-mingw32/lib/chef/provider/windows_task.rb:239:in `convert_user_date_to_system_date’
C:/opscode/chef/embedded/lib/ruby/gems/2.4.0/gems/chef-13.6.4-universal-mingw32/lib/chef/provider/windows_task.rb:99:in `action_create’

It looks like Chef tries to get a date from the existing scheduled task, and cannot parse it successfully.

As we currently cannot update existing tasks, we need to put a guard into the windows_task resource to prevent the execution if the task exists.

 
startup_task_name = 'chef-client-at_startup'

windows_task "create #{startup_task_name}" do
  task_name "#{startup_task_name}"
  ...
  guard_interpreter :powershell_script
  only_if "(Get-ScheduledTask | Where-Object {$_.TaskName -eq "#{startup_task_name}"}).TaskName.Length -eq 0"
end

.NET Framework Detection in the Windows Registry

To determine which .NET framework is installed on the Windows computer check the values in the registry.

The HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full key contains two values you can check:

  • Release
  • Version

You can use InSpec, part of the Chef DK, to check the values:

describe registry_key('HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full') do
  it { should have_property 'Release' }
  it { should have_property_value('Release', :dword, 460805) } # For dword use the decimal value, no quotes
end

describe registry_key('HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full') do
  it { should have_property 'Version' }
  it { should have_property_value('Version', :string, '4.7.02053') }
end
.NET version Release hexadecimal Release decimal Version
4.6.2 60636 394806 4.6.01590
4.7 70805 460805 4.7.02053
4.7.1 709FE 461310 4.7.02558

For more information see

https://docs.microsoft.com/en-us/dotnet/framework/migration-guide/how-to-determine-which-versions-are-installed

 

Chef exit codes

Chef uses the standard RFC 062 exit codes.

In your .kitchen.yml file, you can supply an array of exit codes in the “retry_on_exit_code” option to retry the operation in case the Chef script execution is interrupted. The usual values are

 retry_on_exit_code: # An array of exit codes that can indicate that kitchen should retry the converge command. Defaults to an empty array.
   - 35              # Reboot is scheduled
   - 20              # The Windows system cannot find the device specified.
   - 1               # Generic failure

Chef exit codes

  0: SUCCESS            - Successful run     - Any successful execution of a Chef utility should return this exit code
  1: GENERIC_FAILURE    - Failed execution   - Generic error during Chef execution.
  2: SIGINT_RECEIVED    - SIGINT received    - Received an interrupt signal
  3: SIGTERM_RECEIVED   - SIGTERM received   - Received an terminate signal
 35: REBOOT_SCHEDULED   - Reboot Scheduled   - Reboot has been scheduled in the run state
 37: REBOOT_NEEDED      - Reboot Needed      - Reboot needs to be completed
 41: REBOOT_FAILED      - Reboot Failed      - Initiated Reboot failed - due to permissions or any other reason
 42: AUDIT_MODE_FAILURE - Audit Mode Failure - Audit mode failed, but chef converged successfully.
213: CLIENT_UPGRADED    - Chef upgrade       - Chef has exited during a client upgrade

Relevant Windows exit codes

 20: ERROR_BAD_UNIT     - The system cannot find the device specified.

The list of all Windows error codes is located at https://msdn.microsoft.com/en-us/library/windows/desktop/ms681381(v=vs.85).aspx

The Linux exit codes for