Newzbin was a Usenet search engine I owned and ran from 2002 to 2010. It aimed to provide a complete searchable index of all binary files on Usenet.
For years, finding anything on Usenet was a time consuming and tedious affair of downloading headers for many different newsgroups in a Usenet client and browsing through them. With Newzbin, we wanted to make it as simple as a Google search.
Although Newzbin wasn’t my creation, when I came across it there was a happy coincidence of the current owners looking to move it onto new hands, and me being pretty confident I could take it to the next stages. In pretty short order I found myself running it.
When I was first involved, Newzbin was a very small informal affair. It had perhaps a few thousand pageviews a day and a few hundred users.
Running on a single webserver on PHP, it wasn’t much more than a form for volunteers to fill in (along the lines of “I saw $foo on Usenet, it’s in $this newsgroup”) which was then displayed in a simple listing along with other recent submissions. There was no verification, it was all based on trust.
This early version of Newzbin was a fully manual affair. It placed too much burden on it’s volunteers to find interesting things on Usenet and create an entry in the website index. Things got missed and there were frequent duplicates. As a result the search listings weren’t thorough enough to bring in new users and hence more volunteers – it hadn’t really reached critical mass.
Automated File Listings (“v2”)
With the full rewrite that we called “v2”, our introduction of file listings on the website changed all that. Essentially, it was a mirror of what you’d see in a Usenet index; we had Perl scripts fetching Usenet headers and maintaining a database to display on the site itself.
Now instead of volunteers manually entering all the details of a Usenet file, they could select them on the website and give them a searchable title and some metadata. It was much quicker, far less error-prone (once a set of files were assigned, they could be optionally hidden from view, eliminating duplicates), and we had direct references to the Message-IDs for the messages on Usenet that made up the identified files. This also paved the way for NZB file generation, as discussed below.
As a result of this easier system, the search index coverage rose significantly. Generally if something was on Usenet, it was safe to say it could be found on Newzbin.
NZB files are probably Newzbin’s biggest lasting impression on the Usenet industry. They were designed in around 2003 as a cooperation between myself and the authors behind Newsleecher – they were the first Usenet client in the world to support them as a result.
NZB files were an instant hit and modernised Usenet downloading. They are still used extensively today and it’s hard to find a Usenet client that doesn’t support them now. Although Newzbin is long gone, many other Usenet sites have adopted NZB files and continue to serve the format.
At heart, they are simply XML files which describe all the Message-IDs associated with one or more files on Usenet. A supporting Usenet client can read them and directly download the messages without any need for headers at all, streamlining what used to be a convoluted and slow process into practically one-click downloading.
Until NZB files were introduced, Newzbin was free. However, with increased traffic thanks to the popularity of NZB files, costs were rising. At this point I also felt we had a product we could charge for. We set a subscription price point of £0.25 per week for access to NZB files; the rest of the website stayed free. All functions apart from NZB downloading were still accessible.
This let me turn Newzbin into my fulltime job for quite a few years, and it became a Ltd company with me as controlling director.
I was more interested in the challenges behind running a high volume website and the engineering required to keep it running, than the content it actually served. My day-to-day operations were more about writing PHP, examining Munin graphs, and tinkering with server operating systems etc, than actually using the search engine we created.
However, it turns out that a lot of the binary content on Usenet falls under copyright laws. The MPA took exception to us indexing movies they represented and after a couple of years of legal wrangling, we had to cease operations.
Although it is a long story for another page and another day, my short version is I felt we never accurately portrayed our case in court due to an underfunded and inadequate legal representation – the MPA had buckets more money to throw at a court case than we did.
Newzbin’s website was written in PHP, running as FastCGI daemons behind Apache frontends. MySQL took care of the database which eventually grew a good few hundred million records in some tables (storing every Message-ID posted for the last few years for NZB file creation).
A collection of Ruby and Perl scripts made up the various backend tasks, and there were a couple of specialised C daemons to do time-critical jobs which I’ll detail separately as they warrant further discussion.
Due to the affordable pricepoint, subscriptions proved a tremendous hit. At our peak, we had around 100,000 subscribers and were serving around two million page hits per day.
That much data needs a lot more hardware behind it than the single server Newzbin started with. Over the years the hardware collection grew to 3 fullsize racks of Sun servers, along with Cisco switches, Lantronix terminal servers, APC powerswitches and UPSes etc.
[tags: newzbin, php, project, ruby, webapp]