March 31, 2014

The Setup

Recently I spent some time setting up my Unix environment on the cluster of Linux machines hosted by the Columbia University Computer Science Department. CS students here at Columbia can gain access to these machines for $50 per semester, which is relatively cheap compared to renting a similarly-sized box from AWS (this would run you about $160 per semester under Amazon’s newly released pricing scheme).

Many, but not all, CS classes require the use of these accounts, so I can understand why they cost extra. On the other hand, you’d think that Columbia might be able to come up with this $50 surcharge somewhere in the $55,000 annual tuition its students pay (pardon the ramblings of a disgruntled student).

My first step in setting up my Unix environment, as it has become by default ever since I read this article by Zach Holman, was to copy my customized dotfiles (.bashrc.vimrc, etc.) from my local machine to the remote machine so that I could quickly and easily achieve continuity between my setups. To do this I used scp, or secure copy, which is a program that uses the ssh protocol to copy files between hosts on a network.

My scp command looked like this, using the “-r” option to recursively copy the entire directory called “.dotfiles”:

scp -r ~/.dotfiles [email protected]:~/.

Upon executing this command, I got this response in my terminal window:

setting up databases config...
done

Hmm…

Digging

When I ssh’d into my account on the Columbia Linux cluster, I found that the .dotfiles directory had definitely not been copied across the network from my local machine. I was puzzled, and my confusion was compounded by the message I was receiving about setting up some database configuration. What the hell was that?!

Then I opened the .bashrc on my remote machine and, lo and behold, I found the following two lines, leftovers from some old project I had worked on for class more than a year ago:

echo 'setting up databases config...'
...
echo 'done'

This is one of those bugs you hope isn’t true. You really hope that the success of a network file transfer program does not depend on whether you have a print statement in the .bashrc file of your destination computer.

But, indeed, it does.

WAT?

But why is this true? It turns out that scp expects to recieve protocol data via the stdout channel, and when we have echo statements in our .bashrc, we inject new stuff into this channel when .bashrc is automatically run on connection with the ssh client, thus completely screwing up the data transfer. Seems like a massive oversight in the design of scp right?

Well, there are a couple very easy workarounds for this problem that actually give us a bit of insight into the different ways Unix shells can be used, namely interactively and non-interactively. If we wrap all the echo statements in our .bashrc in an if statement as shown below, they will only be executed when our ssh session is interactive, or when there is a human on the other end of the line (instead of, say, an instance of the scp program).

if [ "$SSH_TTY" ]
then
  echo 'xyz'
fi