nihilanth: December 2006

Saturday, December 30, 2006

Tenth deadly sin of project planning

In an IEEE Software 'From the Editor' column, Steve McConnell writes (at http://www.stevemcconnell.com/ieeesoftware/eic19.htm), that the nine deadly sins of project planning are:

(1) Not planning at all

(2) Failing to account for all project activities

(3) Failure to plan for risk

(4) Using the same plan for every project

(5) Applying pre-packaged plans indiscriminately

(6) Allowing a plan to diverge from project reality

(7) Planning in too much detail too soon

(8) Planning to catch up later

(9) Not learning from past planning sins

I am managing a project right now, and I just can't figure out which sin I am committing: (2) or (7)? I suppose the answer is that is depends on the project and its circumstances. But of course. It always does.

And uh, so at least now I know that these are all the deadly sins of project planning and other sins are presumably not deadly. They are merely extremely toxic, presumably. My project won't exactly die if I forget about some other practical issues, it will just go into convulsions or stop breathing for a while, or sway gently from side to side when it should be rocking.

In other words, I still have to worry about the non-fatal issues in some particular order which is highly project dependent, and these so called fatal sins have to watched out for but exactly when a sin is being committed is also dependent on the project.

Duh!

Why do these wise men write these articles with zero information? I just wasted a perfectly good 5 mins which I could have spent more usefully reading about a API, framework, or some such, that would actually have solved a few real problems.

The tenth deadly sin of project management: trying to get wisdom from anything other than hard facts when you should be doing project management: learning, building consensus, disseminating information, talking to your team, and writing code.

Saturday, December 16, 2006

When unix processes are too much

I want to ssh into a server, run vmstat for a while, then get back the output of vmstat. All automatically, via a Python program. There are many problems. Here is a simulation of the problems through the command line.

Take 1

The first thought was to ssh in, run vmstat and redirect the output of vmstat to a file on the remove server, like so. Then later, I would kill ssh, and use scp to retrieve the logfile from the remote server. Here is the command line equivalent:

$ ssh 'couffable@asgard' 'vmstat 1 2>&1 >/home/couffable/remote-vmstat.log' &
[2] 10405

On remote machine:
[couffable@asgard ~]$ ps ax | grep vmstat
20921 ? Ss 0:00 bash -c vmstat 1 2>&1 >/home/couffable/remote-vmstat.log
20938 ? S 0:00 vmstat 1

This starts things nicely and gives me the PID of the ssh process. The trouble is when I want to stop the process:

On local machine
$ kill 10405
$ fg # to get bash to retrive ssh's status and remove it from zombie list

But on the remote machine, vmstat is still running!
[couffable@asgard ~]$ ps ax | grep vmstat
20921 ? Ss 0:00 bash -c vmstat 1 2>&1 >/home/couffable/remote-vmstat.log
20938 ? S 0:00 vmstat 1

This won't do at all.

Take 2

I noticed that if I run vmstat though ssh without redirecting vmstat's output on the remote machine, ssh shows vmstat's output on my terminal. So I said, why not redirect ssh's output?. Here is what I tried:

On local machine:
$ ssh 'couffable@asgard' 'vmstat 1' 2>&1 >/home/couffable/local-vmstat.log &
[2] 10569

On the remote machine:
[couffable@asgard ~]$ ps ax | grep vmstat
21239 ? Ss 0:00 vmstat 1

Bingo! Now vmstat's output is being stored in a local file, not on the remote machine. Or is it? I notice that on the local machine:
$ # press enter
[2]+ Stopped ssh 'couffable@asgard' 'vmstat 1' 2>&1 >/home/couffable/local-vmstat.log

That is, vmstat/ssh's output is not getting logged because ssh stops when backgrounded. Puff!

Take 3

I decided that I won't background ssh.

On local machine:
$ ssh 'couffable@asgard' 'vmstat 1' 2>&1 >/home/couffable/local-vmstat.log

On remote machine:
[couffable@asgard ~]$ ps ax | grep vmstat
21624 ? Ss 0:00 vmstat 1

After some time, again on local machine (in another terminal):
$ ps ax | grep vmstat
10880 pts/1 S+ 0:00 ssh couffable@asgard vmstat 1
$ kill 10880

And yes, the vmstat on remote machine dies too. Now, will this work in Python?

Take 4

The trouble with running the results of Take 3 in Python is that I can't directly invoke ssh: the redirection won't happen. For redirection, I need a shell. So I could do something like:

>>> pid = os.spawnlp(os.P_NOWAIT, "sh", "sh", "-c", "ssh couffable@asgard vmstat 1", "2>&1", ">vmstat.log")
>>> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 144 82808 231972 1293708 0 0 2 2 0 2 1 0 99 0

What? No redirection? Not good. After killing ssh processes, I try again:
>>> pid = os.spawnlp(os.P_NOWAIT, "sh", "sh", "-c", "ssh couffable@asgard vmstat 1 2>&1 >vmstat.log")
>>> pid

Redirection is working now. I kill the process:
>>> os.kill(11935,9)
>>> pid, status = os.waitpid(11935, 0)
>>> status
9

Ok, so far so good. But the ssh and vmstat processes are still running! (Killing ssh does stop both ssh and vmstat, like Take 2 and unlike Take 1.) What gives? Killing sh does not stop ssh since sh spawned ssh. Hmm.... I don't really want sh to spawn ssh. I want sh to redirect standard output and error and then exec ssh.

Take 5

>>> pid = os.spawnlp(os.P_NOWAIT, "sh", "sh", "-c", "exec ssh couffable@asgard vmstat 1 2>&1 >vmstat.log")
>>> pid
12080

On local macine:
$ ps ax | grep vmstat
12080 pts/4 S+ 0:00 ssh couffable@asgard vmstat 1

On remote machine:
[couffable@asgard ~]$ ps ax | grep vmstat
24081 ? Ss 0:00 vmstat 1

Well and good. Now let's kill the ssh process.
On local machine:
>>> os.kill(12080, 9)
>>> pid, status = os.waitpid(12080, 0)
>>> status
9

On local machine:
$ ps ax | grep vmstat
{nothing}

On remote machine:
[couffable@asgard ~]$ ps ax | grep vmstat
{nothing}

Yay!

The question is: all this mucking around with Unix plumbing is wonderful, but could I have have done it faster if I'd written my own fork(), redirect, and exec mini-program instead of trying to invoke os.spawnlp(...) using sh magic?

Saturday, December 9, 2006

Jeff Darcy's notes on really high performance servers

In his article "High-Performance Server Architecture" (http://pl.atyp.us/content/tech/servers.html)

Jeff Darcy talks about what kills performance. He mentions the following:

data copies
context switches
memory allocation
lock contention

as the biggest four reasons for poor performance, especially at high concurrency. I did not quite understand his suggestions on reducing lock contention. But perhaps it will become clearer by putting his idea to work on a real problem.

He also mentions:

How does your storage subsystem perform with larger vs. smaller requests? With sequential vs. random? How well do read-ahead and write-behind work?
How efficient is the network protocol you're using? Are there parameters or flags you can set to make it perform better? Are there facilities like TCP_CORK, MSG_PUSH, or the Nagle-toggling trick that you can use to avoid tiny messages?
Does your system support scatter/gather I/O (e.g. readv/writev)? Using these can improve performance and also take much of the pain out of using buffer chains.
What's your page size? What's your cache-line size? Is it worth it to align stuff on these boundaries? How expensive are system calls or context switches, relative to other things?
Are your reader/writer lock primitives subject to starvation? Of whom? Do your events have "thundering herd" problems? Does your sleep/wakeup have the nasty (but very common) behavior that when X wakes Y a context switch to Y happens immediately even if X still has things to do?

I am now itching to try a few things. However, how does one begin in a dynamic language like Python? I can't do much about memory allocation. Umm... also, Python does its own reference counting, so data copies should not be a huge problem: I just have to ensure that my own code does not make unnecessary copies. Context switches and lock contention look like the primary and secondary targets to focus on for performance and design.

This statement is rather interesting:

It's very important to use a "symmetric" approach in which a given thread can go from being a listener to a worker to a listener again without ever changing context. Whether this involves partitioning connections between threads or having all threads take turns being listener for the entire set of connections seems to matter a lot less.

Now, how does one go about doing that and still have clean, understandable, and maintainable code?

Wednesday, December 6, 2006

Will you look into the crystal ball for me, please?

I wonder what this guy has to say about mobile advertising.

nihilanth