Bash Logging Traps

June 3, 2019

At AppNexus, we use Bash to do some important things, and whenever you have a system that does important things, it’s critical to get logging right so that you can find out what happened when something goes wrong.

The strategy we use to log from Bash is outlined in this blog post. To summarize, we redirect STDOUT and STDERR in our Bash scripts to write into [logger](<http://man7.org/linux/man-pages/man1/logger.1.html>), with various options enabled. logger then sends our messages to syslog (as well as the terminal), and with some syslog configuration we make sure that the messages are routed into a centralized file.

The code to achieve this looks like this:

#!/bin/bash

exec 1> >(/usr/bin/logger -s -t "$(basename $0)")

echo "Hi, I'm using syslog!"

Output:

test.sh: Hi, I'm using syslog!

This is all well and good, and we’ve written many Bash scripts that use this approach to great success. However, there is an edge case to consider when doing this that we encountered recently and that I want to point out because it 1) can result in serious bugs, and 2) provides an opportunity to learn more about how Bash handles redirection and signals!

Cleanup Traps

It is a common pattern in our Bash scripts to trigger some cleanup work if the script exits for some reason (be it because an error was encountered, the script was forcibly killed, or we simply got to the end). To implement this, we use [trap](<http://man7.org/linux/man-pages/man1/trap.1p.html>) to set up a signal handler for whichever signals we consider important to intercept.

In this example, we set up a handler for the INT signal, so that if we are killed during execution, we execute the cleanup function:

#!/bin/bash

exec 1> >(/usr/bin/logger -s -t "$(basename $0)")

cleanup() {
    # Do something really important
    echo "Done with cleanup!"
    exit 1
}

trap cleanup INT

echo "Hi, I'm using syslog!"

while true; do
    # Wait indefinitely by running sleep in the background. Reason to run in 
    # the background is because when Bash is executing an external command in
    # the foreground, it does not handle any signals received until the
    # foreground process terminates.
    # See <http://mywiki.wooledge.org/SignalTrap#When_is_the_signal_handled.3F>
    # for more details.
    sleep 10 &
    wait $!
done

When we run the script, we should see:

$ ./test.sh
test.sh: Hi, I'm using syslog!

Now, in another shell, let’s find our process using ps:

$ ps afx
...
28242 pts/18   Ss     0:00  \\_ -bash
31316 pts/18   S+     0:00      \\_ /bin/bash ./test.sh
31317 pts/18   S+     0:00          \\_ /bin/bash ./test.sh
31321 pts/18   S+     0:00          |   \\_ /usr/bin/logger -s -t test.sh
31369 pts/18   S+     0:00          \\_ sleep 10

There are a couple of PIDs here, and it’s worth going into detail about what each one of them are:

28242 is the Bash shell from which we ran test.sh (our program).
31316 is test.sh itself.
31369 is the sleep command that we run in the background that allows us to wait indefinitely for a SIGINT.