Pencil

Pencil is a load-balancing buffering proxy I wrote in C for Newzbin. It was designed to sit between our webserver and our backend application (PHP, running as a FastCGI server) and buffer responses in order to prevent slow-reading web clients tying up expensive PHP processes.

It was inspired by and is named after Pen, another network load balancing proxy daemon (however – at the time at least – it didn’t do buffering).

It is quite old now (dating from around 2006) – a modern nginx would probably be able to replace Pencil entirely.

The source is available on Github.

Design

I’m quite keen that any servers I write don’t suffer from the C10k problem (the original C10k article is a good read). This factored quite heavily in my design choices for Pencil.

At the time I had been studying various server architectures; multi-process, multi-threaded, event-driven, etc. I wanted to use an event-driven architecture for Pencil – single-threaded and non-blocking. This is a relatively simple way of achieving high concurrency without the complication of thread management.

As we used FreeBSD, kqueue was the event system of choice.

Buffering

The primary aim of Pencil was to buffer responses from PHP. Our webservers were not overly generously specified with memory (I seem to recall they were 2GB), and we were limited to around 8 PHP processes per webserver because of that; hence, serving 8 simultaneous requests.

Those 8 processes could become swamped pretty easily, either through clients who were simply reading back the response slowly, or malicious clients who would not read it back at all. A PHP process couldn’t handle another request until it was done with the previous one.

Apache’s mod_fastcgi only appears to buffer 8K per request (much less than our average response size), so I wrote Pencil.

Pencil reads the response from PHP and sends it onwards to the client through Apache. If it can do it immediately and entirely then the request is completed and the sockets are closed. If it can’t (Apache’s buffers are full), it allocates sufficient memory to buffer the entire response itself and closes the PHP socket. PHP is now free to service another request, and the only cost is a bit of memory use in Pencil.

Redundancy

As a secondary benefit, Pencil also offered a level of redundancy.

We found that we frequently needed to restart the PHP daemons for upgrades and so on – Apache with mod_fastcgi could not be configured to prioritise PHP on localhost, but failover to another one if it was down. So if PHP died, website users would see gateway errors.

Pencil solved this problem with failover, automatically. It would prioritise PHP on localhost if it was up, and if not, try a configured failover. Favouring a local PHP first kept network latency to a minimum.

This also allowed us to upgrade PHP without losing any live service or even needing to reconfigure which webservers were live.

Retrospective

Things I might have done differently, given what I know now..

kqueue

kqueue was great, but it made Pencil non-portable.

There are abstraction libraries which keep the event-driven benefits without having to support several different systems. Assuming the performance hit was negligible I’d probably use libevent for this these days.

Restart System

It was possible to send Pencil a signal to die gracefully; close the listening socket but finish servicing existing clients. The idea was that a new daemon could be started up for new clients, without dropping existing ones.

This still leaves a window in which a new client might not be able to connect to Pencil and would be dropped by the webserver.

To address this and not drop a single connection, it would be better to have Pencil fork+exec and inherit the existing listening socket.

[tags: c, newzbin, project]