I want to ssh into a server, run vmstat for a while, then get back the output of vmstat. All automatically, via a Python program. There are many problems. Here is a simulation of the problems through the command line.
Take 1
The first thought was to ssh in, run vmstat and redirect the output of vmstat to a file on the remove server, like so. Then later, I would kill ssh, and use scp to retrieve the logfile from the remote server. Here is the command line equivalent:
$ ssh 'couffable@asgard' 'vmstat 1 2>&1 >/home/couffable/remote-vmstat.log' &
[2] 10405
On remote machine:
[couffable@asgard ~]$ ps ax | grep vmstat
20921 ? Ss 0:00 bash -c vmstat 1 2>&1 >/home/couffable/remote-vmstat.log
20938 ? S 0:00 vmstat 1
This starts things nicely and gives me the PID of the ssh process. The trouble is when I want to stop the process:
On local machine
$ kill 10405
$ fg # to get bash to retrive ssh's status and remove it from zombie list
But on the remote machine, vmstat is still running!
[couffable@asgard ~]$ ps ax | grep vmstat
20921 ? Ss 0:00 bash -c vmstat 1 2>&1 >/home/couffable/remote-vmstat.log
20938 ? S 0:00 vmstat 1
This won't do at all.
Take 2
I noticed that if I run vmstat though ssh without redirecting vmstat's output on the remote machine, ssh shows vmstat's output on my terminal. So I said, why not redirect ssh's output?. Here is what I tried:
On local machine:
$ ssh 'couffable@asgard' 'vmstat 1' 2>&1 >/home/couffable/local-vmstat.log &
[2] 10569
On the remote machine:
[couffable@asgard ~]$ ps ax | grep vmstat
21239 ? Ss 0:00 vmstat 1
Bingo! Now vmstat's output is being stored in a local file, not on the remote machine. Or is it? I notice that on the local machine:
$ # press enter
[2]+ Stopped ssh 'couffable@asgard' 'vmstat 1' 2>&1 >/home/couffable/local-vmstat.log
That is, vmstat/ssh's output is not getting logged because ssh stops when backgrounded. Puff!
Take 3
I decided that I won't background ssh.
On local machine:
$ ssh 'couffable@asgard' 'vmstat 1' 2>&1 >/home/couffable/local-vmstat.log
On remote machine:
[couffable@asgard ~]$ ps ax | grep vmstat
21624 ? Ss 0:00 vmstat 1
After some time, again on local machine (in another terminal):
$ ps ax | grep vmstat
10880 pts/1 S+ 0:00 ssh couffable@asgard vmstat 1
$ kill 10880
And yes, the vmstat on remote machine dies too. Now, will this work in Python?
Take 4
The trouble with running the results of Take 3 in Python is that I can't directly invoke ssh: the redirection won't happen. For redirection, I need a shell. So I could do something like:
>>> pid = os.spawnlp(os.P_NOWAIT, "sh", "sh", "-c", "ssh couffable@asgard vmstat 1", "2>&1", ">vmstat.log")
>>> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 144 82808 231972 1293708 0 0 2 2 0 2 1 0 99 0
What? No redirection? Not good. After killing ssh processes, I try again:
>>> pid = os.spawnlp(os.P_NOWAIT, "sh", "sh", "-c", "ssh couffable@asgard vmstat 1 2>&1 >vmstat.log")
>>> pid
Redirection is working now. I kill the process:
>>> os.kill(11935,9)
>>> pid, status = os.waitpid(11935, 0)
>>> status
9
Ok, so far so good. But the ssh and vmstat processes are still running! (Killing ssh does stop both ssh and vmstat, like Take 2 and unlike Take 1.) What gives? Killing sh does not stop ssh since sh spawned ssh. Hmm.... I don't really want sh to spawn ssh. I want sh to redirect standard output and error and then exec ssh.
Take 5
>>> pid = os.spawnlp(os.P_NOWAIT, "sh", "sh", "-c", "exec ssh couffable@asgard vmstat 1 2>&1 >vmstat.log")
>>> pid
12080
On local macine:
$ ps ax | grep vmstat
12080 pts/4 S+ 0:00 ssh couffable@asgard vmstat 1
On remote machine:
[couffable@asgard ~]$ ps ax | grep vmstat
24081 ? Ss 0:00 vmstat 1
Well and good. Now let's kill the ssh process.
On local machine:
>>> os.kill(12080, 9)
>>> pid, status = os.waitpid(12080, 0)
>>> status
9
On local machine:
$ ps ax | grep vmstat
{nothing}
On remote machine:
[couffable@asgard ~]$ ps ax | grep vmstat
{nothing}
Yay!
The question is: all this mucking around with Unix plumbing is wonderful, but could I have have done it faster if I'd written my own fork(), redirect, and exec mini-program instead of trying to invoke os.spawnlp(...) using sh magic?