Friday, December 7, 2012

Radio NZ's New Hosting

Since 2005 Radio NZ has hosted it's page server at ICONZ's hosting facility in Auckland. We also had a connection in Wellington to allow us to publish content to the live system via a VPN.

Also, since 2005, we've used Citylink for audio content delivery. Their CDN, initially run as a side project by Richard Naylor, was the only economic way for us to deliver our audio streaming and downloads.

Until this year.

The Background

We rely on computer-based audio tools, and as a life-line utility we also need to have resilient infrastructure including full UPS and generator back-up for our main offices.

So self-hosting was always possible, but the cost of bandwidth to our premises made it uneconomic. In late 2004 we were quoted $87,000 a month for a 1 Gig internet connection.

As late as 2010 we were still being quoted prices of $4,000 per month for local connectivity to all NZ consumers. Fibre connectivity and International data was on top of that!

Earlier this year we issued an RFP for gigabit fibre and bandwidth from our Wellington and Auckland offices. We became an APNIC member and have our own IP addresses and AS number, enabling us to participate directly in the internet at a technical level.

FX Networks won the tender process with an innovative data plan and is now the ISP for all our public facing web services. These includes our website, the live streaming service and our podcasts.

Because of FX's extensive peering arrangements, all traffic is delivered from New Zealand. For customers of two large national ISPs, their content will no longer have to be imported from a server in the U.S.

International visitors make up between five and 15% of traffic (time dependant) and they'll get the traffic from NZ as well. This means for the vast majority of visitors latency will reduce, and the speed of downloads may go up.

The Migration

The major constraint on the migration was downtime - there was to be none. The second was that we could not buy any more servers, having just bought eight just three years ago, meaning we had to work out a way to run the existing site at ICONZ and a syncronised clone of it from one of our offices via FX.

As it happens our old CMS (MySource Matrix) required 6 servers, and we had two originating servers for the Citylink CDN (one, a live spare). But we have since replaced our CMS with something that needed fewer resources, freeing up some servers to allow us to build the new system in tandem with the old.

So far, so good.

The new infrastructure uses virtual machines (VM), running on GNU/Linux (except for Windows Media Server). All VMs must be able to run in Auckland or Wellington to allow for services to continue if connectivity to one office is lost (such as in a disaster).

Each VM has a static IP address that is used at either location, and a combination of Routing Information Protocol (RIP) and Border Gateway Protocol (BGP) are used to ensure requests go to the right place within our network.

With two servers relocated to our Wellington office and two to Auckland, a basic cluster was built. At each site dual edge routers connect to FX Networks over fibre, and an SRX240 handles firewall duties. A number of VPN tunnels run between the sites to allow disc replication and communication between VMs in the cluster.

The Backup media and podcast server was moved from the ICONZ address space to the FX connection and prepped to take over from the CDN.

One the cluster bare-metal hardware was in place, the existing Web servers were cloned into new VMs, and reconfigured to work within the new environment.

The audio service change-over was the simplest, requiring a DNS change. The DNS records for streaming.radionz.co.nz and podcast.radionz.co.nz were updated to point to our server, and over the space of a day or so all load moved from Citylink's CDN to our server in Wellington. The second server will be moved from ICONZ to our Auckland office, to act as a live spare.

The website change-over was more complicated. Because we are publishing so frequently, we could not do a straight DNS change because some people would be getting the old (no longer updated site) for a period of time. There were several ways to keep the sites in sync, but given the complexity (we are still using parts of MySource Matrix), it wasn't worth the pain to cover the one- to eight-hour DNS time. (And before you ask, not all ISPs honour a short TTL).

The option I chose was to setup a Varnish caching server, point that at the current site, and change the DNS to direct traffic to Varnish. As request moved from the existing site to Varnish, they were forwarded silently back to the old. Once all the traffic requests were arriving at the Varnish, we could then syncronise the new VM servers with the old live servers, and update Varnish to fetch from the those.

This approach meant we could synchronise the new servers with the old (which took about 30 minutes), and then instantly switch Varnish to send requests to the new machines. This we did at 8:45 pm last night. During the syncing process our newsroom could still publish updates to the old server, and the first publish on the new server brought everything up to date.

It is interesting to note that we started directing traffic to the Varnish host just before a tornado hit Auckland, and during this period we saw four times the usual amount of traffic to the site. All of this was going via Varnish, and there was not a single outage.

We will soon move the remaining servers out of ICONZ and add them to the new cluster, improving redundancy and load-balancing capabilities further. From there we'll be fine-tuning the overall performance of the cluster, and next year launching a new design.

I'm happy to answer any technical question in the comments.