In this article, we will focus on one of the important Ansible topics - "Error Handling in Ansible Playbooks". By the end of this article, you should have a fair amount of knowledge on Ansible Error Handling and how to handle different errors when running playbooks in Ansible in Linux.
Table of Contents
Error Handling in Playbooks
Errors and bugs are very common on any software you use. Ansible is no exception. When you start working on a real-world project with ansible you will face all kinds of errors be it a product bug or a human error.
Some errors can be avoided by implementing best practices and using tools that have linting and debugging features. In this case, my suggestion is to use vscode which has an excellent plugin ecosystem to support ansible development.
Take a look at the following vscode extensions which allow you to implement best practices by enforcing some rules and spotting basic syntactical errors.
- https://github.com/adrienverge/yamllint
- https://marketplace.visualstudio.com/items?itemName=redhat.ansible
Below is a sample image from vscode ansible linter.
Now let’s go over a few scenarios to understand different types of failures and how to handle them.
How to Handle Syntax Errors in Ansible
The first thing you need to check before running any playbook is syntax errors. You can use the "--syntax-check
" flag along with the ansible-playbook
command to check for a syntax error.
Take a look at the below playbook. I have two issues. The keyword role is not a valid keyword, it should be "roles". The second issue is both roles are not valid roles.
--- - name: Handle Failure hosts: localhost role: - sample-role1 - sample-role2
When I run the playbook with --syntax-check
, it spots the first error in the playbook.
$ ansible-playbook --syntax-check playbook.yml
Once I fix the error and re-submit the playbook, it shows me the next error where the sample-role1 is not found anywhere in roles_path
locations.
The problem with this approach is when you have a couple of errors in the playbook, it will just show the first error and you have to fix errors and re-submit the command to see the next error.
How to Handle Task Failures in Ansible
Before knowing how to fix a failed task you should know how ansible submits the tasks and what happens when a task is failed.
Ansible submits the tasks in the defined order in batches (forks=5). When a particular task is failed in a host then ansible will mark the host and stop further tasks from running in the host.
You can handle task-level failure using any of the below approaches.
- Using ignore_errors directive.
- Grouping tasks under the block directive.
1. Handling Failures using Ignore_errors Directive
Consider the following playbook which has two tasks to download and install the chrome deb package.
--- - name: Handle Failure with ignore_erros hosts: localhost tasks: - name: Download chrome .deb file ansible.builtin.get_url: url: https://dl.google.com/linux/direct/google-chrome-stable_current_amd641.deb dest: /tmp/chrome.deb - name: Install from /tmp/chrome.deb ansible.builtin.apt: deb: /tmp/chrome.deb
I purposely gave the wrong URL which will make the first task fail. As stated already once a task fails in the host, no further task will be submitted in the host. So the second task is not submitted.
fatal: [localhost]: FAILED! => {"changed": false, "dest": "/tmp/chrome.deb", "elapsed": 0, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "status_code": 404, "url": "https://dl.google.com/linux/direct/google-chrome-stable_current_amd641.deb"}
Now let me run the same playbook again but with "ignore_errors= true
".
If you look at the above image, the first task ran and failed but the failure is ignored which made the second task run.
Heads Up: The ignore_errors
directive can be added at the task level or play level.
2. Error Handling using Block-Rescue-Always Directives
Sometimes groups of tasks are dependent on each other and if a single task fails then the dependent task alone should not run instead of stopping all the tasks under the particular hosts.
The playbook used in the previous section will be a perfect example of this scenario. The installation task is dependent on the download task so both tasks can be grouped under the block directive.
--- - name: Handle Failure with ignore_erros hosts: localhost tasks: - name: Group the tasks block: - name: Download chrome .deb file ansible.builtin.get_url: url: https://dl.google.com/linux/direct/google-chrome-stable_current_amd641.deb dest: /tmp/chrome.deb - name: Install from /tmp/chrome.deb ansible.builtin.apt: deb: /tmp/chrome.deb ignore_errors: true
I used the same playbook but moved both the tasks under the block directive and added ignore_errors
at the block level. If you look at the below output, both tasks automatically inherited the ignore_errors
directive.
You can also add the "rescue
" and "always
" directives along with the "block
" directive. The task under the rescue
directive will run if any of the tasks under the block
directive gets failed. This is very useful for cleanup activities. The always directive runs irrespective of the status of block
and rescue
directives.
I have added the rescue
and always
directives with tasks that print some messages.
tasks: - name: Group the tasks block: - name: Download chrome .deb file ansible.builtin.get_url: url: https://dl.google.com/linux/direct/google-chrome-stable_current_amd641.deb dest: /tmp/chrome.deb - name: Install from /tmp/chrome.deb ansible.builtin.apt: deb: /tmp/chrome.deb rescue: - name: Rescue task ansible.builtin.debug: msg: "Some cleanup activity..." always: - name: Task that always runs ansible.builtin.debug: msg: "Task that always run"
The rescue
task ran because the task under the block section got failed. Finally the always
tasks ran.
How to Stop All Plays on Failure
In the last section, we have seen ways to run the tasks even if a task got failed. In some cases, we need the entire playbook to stop even if a single task is failed. This can be achieved by setting "any_errors_fatal: true
".
When the playbook is submitted with "any_errors_fatal
" and if a task fails, the tasks submitted in the current batch will run and the play will get stopped.
Heads Up: The directive any_errors_fatal
can be set at the play level or task level.
How to Handle Unreachable Host Error in Ansible
Another type of error in ansible is "unreachable host". You will get this error when ansible was not able to connect to the managed host defined in the inventory file. This happens due to many reasons. Either you might have given the wrong host definition or there is a problem with the managed host.
If ansible finds a node to be unreachable it will remove the node from the list of active hosts and will not submit any further tasks on the node.
I am running the same playbook used in the previous section but with some random hostname. The task got failed with "unreachable".
You can add "ignore_unreachable: true
" at the task or play level which will skip the current task and run the next task without removing the host from the active list.
How to Create User-Defined Failures in Ansible
Other than the ansible defined failures, users can also create their own rules to make the task fail using the "failed_when
" directive. In fact, the ansible linter suggests using the "failed_when
" instead of the "ignore_errors
" directive.
You need to register the output of the task before doing some conditional checks. The registered variable name is "status
".
To know more about Ansible Registers, refer the following guide.
I have added a condition to check the status code of the registered output. The 404 status code will not be considered a failure. Now any other return code will be considered a failure.
- name: Download chrome .deb file ansible.builtin.get_url: url: https://dl.google.com/linux/direct/google-chrome-stable_current_amd641.deb dest: /tmp/chrome.deb register: status failed_when: status.status_code != 404
You can also add "AND", and "OR" operators to check for multiple conditions.
failed_when: status.status_code != 404 or status.status_code != 401
How to Handle Handler Task Failures in Ansible
We have a dedicated article for handlers. Refer to the following link and look out for the “Handling Failures” section to know how to handle failures in handler tasks.
Conclusion
In this article, we have discussed about Error handling in Playbooks. We have also seen what are the different types of failures that you will encounter in ansible and some of the ways to fix them. Feedback is welcomed through the comment section.
Resource: