Saturday, May 7, 2011

Rebuilding Radio NZ - Part 5: The Evolution of News

In previous parts of this series I have looked at the rationale behind replacing our off-the-shelf CMS with a bespoke solution.

This time it is news. The news section of the site has gone through two major iterations in Matrix before moving to ELF. This section has given us the most grief, both in terms of customisations required and performance problems.

Initial Design

Workflow is very important in the news business. Every second counts, and the tools used in a typical newsroom reflect this. Radio NZ uses Avid’s iNews - an enterprise-grade system designed to manage incoming wire content, news bulletin production, and the editing of stories in a high-stress, fast-turnaround operation.

I believed that it was critical for news staff to be able to use this system to also publish to the web. This would minimise training, avoid the need for additional mental tasks (HTML formatting) during peak news events, and give me a level of formatting control that is not possible when staff directly use a WYSIWYG web-based editor.

The publishing process needed to be simple to use and understand and leverage existing skills to the greatest extend possible. The tools needed to get out of the way of the process of writing and editing.

The other factor in adopting a remote-publishing model over direct editing was raw speed. The site was running on a single server it was significantly slower to add, edit and update content via the web then in a native desktop application. Sluggish (relative) performance would have disrupted the flow of writing and editing.

In iNews stories are compiled into a queue - a folder that allows stories to be placed in a specific order. A single hot-key initiates the publishing process.

Iteration 1

When we first started the site in 2005 only one group of news text stories was available at any one time. This was partly due to technological limitations, and partly for licensing reasons; like most other local news organisations some of our text copy is licensed from overseas news agencies.

Stories were edited in iNews and simple markup was used to provide basic HTML formatting. For example [h] at the start of a line is a heading, [[double square brackets]] is italics and [audio] denotes the lines contains a link to audio. These and other commands were developed in consultation with news staff to be simple to use and understand.

When published from iNews, the list of stories was sent via FTP to a folder on another server. From there a Perl script collected all the metadata and content from the individual story files, merged them into one XML file and imported them into Matrix using a second custom script written by Squiz.

There were no news categories, and content was removed after 7 days.

Each time a new group of stories was published, links on the site were updated to the new content. A problem arose when links were shared though - they would not be to the most recent update of a story. (Interesting note: 4 years after we replaced this system, we are still getting requests for these outdated URLs.)

Because each publish created a new batch of stories they appeared on the site quite quickly.

Iteration 2

It is safe to say that version 1 of the news section did not reflect the breadth of depth of our news coverage. In version 2 we added categories and allowed stories to be updated.

This was obviously more in line with public (and company) expectations of a news service, and removed a number of problems, the biggest being that the URL for a story remained the same for its life.

This required some major changes in Matrix and a significantly more complex import script. The second version of the import script (about 2000 lines of PHP code) was written by Mark Brydon at Squiz and allowed us to update (rather than replace) the bodycopy in an existing Matrix asset, and to manipulate the created, published AND the system-controlled updated timestamps. It also allowed content to be categorised into sets of folders, created on the fly as required.

Our Perl script for processing the exported stories also had to be updated to generate the enhanced XML required by Mark’s script.

There were problems though. By this stage we’d upgraded our Matrix infrastructure to use a master-slave configuration, and increased the local caching of pages to improve delivery speed. But these changes meant that updates to news content did not always appear in a timely fashion.

This was a very complex problem to solve and is documented here in all its gory detail. Hat tip to our sysadmin Colin MacDonald for working with us on this problem.

New categories required multiple steps - folders had to be created, new assets listing made to display the content, and the relevant Matrix IDs had to be added to our Perl script.

The script was written and maintained with NO testing framework whatsoever. Nightmare.

As mentioned in Part 1, our need to frequently update content and to have those updates appear immediately did not really work that well with Matrix. As older problems were mitigated new ones arose.

I realised that we were pushing Matrix outside its design parameters. It was no longer the optimal solution for the type of operation we had become. Our expectations of the system had changed. Our visitors expectations had changed. It was time for a system change.

ELF

The design of the new news section started with the URL schema. This is in the form:

/news/category_name/story_id/story_headline_as_url

The old form was:

/news/stories/yyyy/mm/dd/story_id

e.g. /news/stories/2009/01/02/12459765a219

In Matrix stories were created in a dated folder structure to aid administration. In practice, these segments in the URL did not do anything useful - they redirected to the news home page.

Headlines

Over the life of a story the headline will change. What should the canonical URL for a story be? Many sites get around the issue by ignoring the headline altogether and using a unique ID to retrieve the story.  his is a problem, in my view, because it allows anyone to craft their own URLs to the content, sometimes with hilarious results.

We get around this problem by always using the current headline to create the URL, and redirecting to this if an older version is used. The ID of the story never changes.

Our headline URL generator avoids some annoying problems when punctuation is used. For example this headline:

Taxes to increase by 1.2 %

could become:

taxes-to-increase-by-12

our generator does this:

taxes-to-increase-by-1-point-2-percent

Categories are added through a simple admin screen (at right). In Matrix we had a complex set of structures and functions to specify root node ids, relationships between categories and what folder structure to use. Category folders had to be manually created along with links, asset listings and the like.

In ELF we specify the URL, the code for news staff to use in their story template, and hit save. The position and visibility of the new category on the news home page, and in the sidebar links, can be controlled by a drag and drop interface.

It is now possible to move stories between categories, something that was not possible in Matrix.

ELF also allows images to be added to stories. Images are pre-loaded directly to ELF by staff, and the system gives them a picture code to chose from :

[image:1791:third:right]
[image:1791:half:right]
[image:1791:third:left]
[image:1791:half:left]
[image:1791:full]

The editor selects one of these and inserts it into the story copy in iNews. ELF associates the image with the story when it is requested. The editor can change the size and position of the image in the story simply by changing the code and republishing.

This may seem a roundabout way of doing images. It was the only way because iNews does not support this functionality, and adding the images to content in ELF would also not work because the stories were being updated remotely; any remote update would overwrite images added in ELF.

In practice it works well.

Performance

The performance of the new system is outstanding. Previously, we'd occasionally have to restart the web server process during high demand periods.

The ELF news section went on-line a few days before the first Christchurch quake. It served all traffic without issue on an non-optimised database - we had the slow query log enabled and were intending to tweak things the following Monday after a (ahem) quiet weekend! Page rendering times under load were less than 50 mS which is insignificant. This is with no page caching whatsoever.

All published news content is now available within 30 seconds of the news web editor pressing the publish hotkey in iNews.

Regrets

Even though publishing was simple for staff, the multi-step background process to make it happen probably limited what we were able to do on the site in the first few years, and was a maintenance mightmare. This is certainly the case with Iteration 2. To be frank, from a technical point of view, the process was on the edge of instability, and fragile to maintain.

But I still believe that the decision to remote-publish was the right one, and we are bearing the fruit of this in iteration 3. We now have a highly robust system for publishing news quickly, and a software infrastructure that is technically flexible and well covered by unit tests. So, no regrets really.

I am happy to answer questions in the comments.

In the next post I will talk about the schedules section of the site.

No comments: