The certainty of outcome offered by other Chef resources is notably lacking from the execute Resource, for Chef has no way of knowing the consequences of the provided shell script fragment. Fortunately, it’s possible to ensure idempotent behavior with the appropriate application of care. As an example, perhaps one needs to load several database dump files for an unenlightened Web based application that has not adopted a migrations based strategy.
How to ensure the dump files are loaded in the correct order, but never more than once, while avoiding duplicate work should a dump file fail to import? One can leverage lexically sorted filenames coupled with lock files in such a scenario as demonstrated below.
db_files = ['func.sql', 'schema.sql', 'data.sql.bz2']
db_files_with_idx = db_files.inject({}) do |h, f|
h[f] = "#{h.keys.length.to_s.rjust(2, '0')}_#{f}"
h
end
db_files_with_idx.each do |name, name_with_idx|
db_file = "/root/db_files/#{name_with_idx}"
remote_file db_file do
action :create_if_missing
source "http://example.com/#{name}"
end
end
For simplicity, the database files are defined directly in the recipe, but could be factored out in an attribute. A hash is then created — a candidate for refactoring into a library later — that creates filenames for local storage. Afterward, each file is downloaded using the remote_file Resource.
The output would thus be the following:
irb(main):009:0> pp db_files_with_idx
{"data.sql.bz2"=>"02_data.sql.bz2",
"schema.sql"=>"01_schema.sql",
"func.sql"=>"00_func.sql"}
Next, the execute Resource is called upon, but as it is not idempotent on its own, the behavior must be supplied:
execute "load dump" do
action :run
cwd '/root'
command <<-EOT
# script from below
EOT
not_if {::File.exists?('/root/db_files/.finished')}
end
A hint of that exists in the not_if block, which checks for a the existence of a lock file signaling successful completion of the resource. However, more is required. In particular, a mechanism is necessary to handle a failure in the middle of an import. (MySQL is the database in question, in this example.)
for f in $(ls db_files | sort) ; do
ext=$(echo $f | awk -F. '{print $NF}')
lck="/root/db_files/.seen_${f}"
# Skip successfully imported dump
test -f $lck && continue
case "${ext}" in
bz2)
cmd=bzcat
;;
*)
cmd=cat
esac
echo "Loading database dump file: ${f}"
${cmd} /root/db_files/${f} | /usr/bin/mysql -u root my_db
ret=$?
if [ $ret -ne 0 ] ; then
exit $ret
else
touch $lck
fi
done
touch /root/db_files/.finished
First, the filename names of the database dumps are sorted to match the order defined earlier in the recipe and committed to disk by the remote_file Resource. To add some flexibility, the extension is lopped off using awk, allowing for bzip2 compressed dumps.
Next, a lock file unique to each database dump is tested for existence. If the lock file exists, the dump has been successfully imported and is skipped; as a result, an interruption of the chef-client run by failure or user action will not prevent the recipe from picking up exactly where it left off. Only upon successful importation of the data, as signaled by a return value of 0 from mysql, is a lock file written. Otherwise, the script exits with the non-zero error code, causing the execute Resource to raise an exception.
When success is total, the final lock file referenced in the earlier not_if block is created. Thereafter, the resource shall never run again, unless the lock file is disturbed.
The usage of not_if and only_if in Chef resource definitions along with careful sorting and locking inside the execute Resource brings the loving embrace of idempotent behavior to shell script fragments. Of course, the above could be rewritten entirely in Ruby and run from within a ruby_block Resource, but the same concepts apply and as such is left as an exercise for the reader.