Error Handling In Ansible Playbooks

In this article, we will focus on one of the important Ansible topics - "Error Handling in Ansible Playbooks". By the end of this article, you should have a fair amount of knowledge on Ansible Error Handling and how to handle different errors when running playbooks in Ansible in Linux.

Table of Contents

Error Handling in Playbooks

Errors and bugs are very common on any software you use. Ansible is no exception. When you start working on a real-world project with ansible you will face all kinds of errors be it a product bug or a human error.

Some errors can be avoided by implementing best practices and using tools that have linting and debugging features. In this case, my suggestion is to use vscode which has an excellent plugin ecosystem to support ansible development.

Take a look at the following vscode extensions which allow you to implement best practices by enforcing some rules and spotting basic syntactical errors.

Below is a sample image from vscode ansible linter.

Now let’s go over a few scenarios to understand different types of failures and how to handle them.

How to Handle Syntax Errors in Ansible

The first thing you need to check before running any playbook is syntax errors. You can use the "--syntax-check" flag along with the ansible-playbook command to check for a syntax error.

Take a look at the below playbook. I have two issues. The keyword role is not a valid keyword, it should be "roles". The second issue is both roles are not valid roles.

---
- name: Handle Failure
  hosts: localhost

  role:
   - sample-role1
   - sample-role2

When I run the playbook with --syntax-check, it spots the first error in the playbook.

$ ansible-playbook --syntax-check playbook.yml

Once I fix the error and re-submit the playbook, it shows me the next error where the sample-role1 is not found anywhere in roles_path locations.

The problem with this approach is when you have a couple of errors in the playbook, it will just show the first error and you have to fix errors and re-submit the command to see the next error.

How to Handle Task Failures in Ansible

Before knowing how to fix a failed task you should know how ansible submits the tasks and what happens when a task is failed.

Ansible submits the tasks in the defined order in batches (forks=5). When a particular task is failed in a host then ansible will mark the host and stop further tasks from running in the host.

You can handle task-level failure using any of the below approaches.

Using ignore_errors directive.
Grouping tasks under the block directive.

1. Handling Failures using Ignore_errors Directive

Consider the following playbook which has two tasks to download and install the chrome deb package.

---
- name: Handle Failure with ignore_erros
 hosts: localhost

 tasks:
   - name: Download chrome .deb file
     ansible.builtin.get_url:
       url: https://dl.google.com/linux/direct/google-chrome-stable_current_amd641.deb
       dest: /tmp/chrome.deb

   - name: Install from /tmp/chrome.deb
     ansible.builtin.apt:
       deb: /tmp/chrome.deb

I purposely gave the wrong URL which will make the first task fail. As stated already once a task fails in the host, no further task will be submitted in the host. So the second task is not submitted.

fatal: [localhost]: FAILED! => {"changed": false, "dest": "/tmp/chrome.deb", "elapsed": 0, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "status_code": 404, "url": "https://dl.google.com/linux/direct/google-chrome-stable_current_amd641.deb"}

Now let me run the same playbook again but with "ignore_errors= true".

If you look at the above image, the first task ran and failed but the failure is ignored which made the second task run.

Heads Up: The ignore_errors directive can be added at the task level or play level.

2. Error Handling using Block-Rescue-Always Directives

Sometimes groups of tasks are dependent on each other and if a single task fails then the dependent task alone should not run instead of stopping all the tasks under the particular hosts.

The playbook used in the previous section will be a perfect example of this scenario. The installation task is dependent on the download task so both tasks can be grouped under the block directive.

---
- name: Handle Failure with ignore_erros
  hosts: localhost

  tasks:
   - name: Group the tasks
     block:
       - name: Download chrome .deb file
         ansible.builtin.get_url:
           url: https://dl.google.com/linux/direct/google-chrome-stable_current_amd641.deb
           dest: /tmp/chrome.deb

       - name: Install from /tmp/chrome.deb
         ansible.builtin.apt:
           deb: /tmp/chrome.deb
     ignore_errors: true

I used the same playbook but moved both the tasks under the block directive and added ignore_errors at the block level. If you look at the below output, both tasks automatically inherited the ignore_errors directive.

You can also add the "rescue" and "always" directives along with the "block" directive. The task under the rescue directive will run if any of the tasks under the block directive gets failed. This is very useful for cleanup activities. The always directive runs irrespective of the status of block and rescue directives.

I have added the rescue and always directives with tasks that print some messages.

tasks:
   - name: Group the tasks
     block:
       - name: Download chrome .deb file
         ansible.builtin.get_url:
           url: https://dl.google.com/linux/direct/google-chrome-stable_current_amd641.deb
           dest: /tmp/chrome.deb

       - name: Install from /tmp/chrome.deb
         ansible.builtin.apt:
           deb: /tmp/chrome.deb
     rescue:
       - name: Rescue task
         ansible.builtin.debug:
           msg: "Some cleanup activity..."
     always:
       - name: Task that always runs
         ansible.builtin.debug:
           msg: "Task that always run"

The rescue task ran because the task under the block section got failed. Finally the always tasks ran.

How to Stop All Plays on Failure

In the last section, we have seen ways to run the tasks even if a task got failed. In some cases, we need the entire playbook to stop even if a single task is failed. This can be achieved by setting "any_errors_fatal: true".

Stop Playbook Execution For Any Failures

When the playbook is submitted with "any_errors_fatal" and if a task fails, the tasks submitted in the current batch will run and the play will get stopped.

Heads Up: The directive any_errors_fatal can be set at the play level or task level.

How to Handle Unreachable Host Error in Ansible

Another type of error in ansible is "unreachable host". You will get this error when ansible was not able to connect to the managed host defined in the inventory file. This happens due to many reasons. Either you might have given the wrong host definition or there is a problem with the managed host.

If ansible finds a node to be unreachable it will remove the node from the list of active hosts and will not submit any further tasks on the node.

I am running the same playbook used in the previous section but with some random hostname. The task got failed with "unreachable".

You can add "ignore_unreachable: true" at the task or play level which will skip the current task and run the next task without removing the host from the active list.

How to Create User-Defined Failures in Ansible

Other than the ansible defined failures, users can also create their own rules to make the task fail using the "failed_when" directive. In fact, the ansible linter suggests using the "failed_when" instead of the "ignore_errors" directive.

You need to register the output of the task before doing some conditional checks. The registered variable name is "status".

To know more about Ansible Registers, refer the following guide.

Ansible Register Variable

I have added a condition to check the status code of the registered output. The 404 status code will not be considered a failure. Now any other return code will be considered a failure.

- name: Download chrome .deb file
 ansible.builtin.get_url:
   url: https://dl.google.com/linux/direct/google-chrome-stable_current_amd641.deb
   dest: /tmp/chrome.deb
 register: status
 failed_when: status.status_code != 404

You can also add "AND", and "OR" operators to check for multiple conditions.

failed_when: status.status_code != 404 or status.status_code != 401

How to Handle Handler Task Failures in Ansible

We have a dedicated article for handlers. Refer to the following link and look out for the “Handling Failures” section to know how to handle failures in handler tasks.

How To Use Handlers In Ansible Playbooks

Conclusion

In this article, we have discussed about Error handling in Playbooks. We have also seen what are the different types of failures that you will encounter in ansible and some of the ways to fix them. Feedback is welcomed through the comment section.

Resource:

Error Handling in Playbooks

Ansible Ansible Commands Ansible Error Handling Ansible Series Ansible Tutorial DevOps IT Automation Linux Linux administration

Error Handling In Ansible Playbooks

Different Ways To Handle Failures In Ansible