Tuesday, June 17, 2008

And now the news...

Image of news section of Radio NZ siteOne of the challenges with publishing news onto the Radio NZ site was getting the content from our internal news text system into our Content Management System.

The newsroom uses iNews to prepare all their copy. The website uses MySource Matrix, a CMS used by many Australian organisations and now starting to get some traction in New Zealand since the local office opened.

There were three options:
  1. Use iNews to generate the site.
  2. Get staff to use the CMS directly.
  3. Wire the two systems together in some way.
The first wasn't really an option because we had content from a range of sources (news, audio, schedules, etc) and we wanted to blend those into one cohesive site.

The second was considered, but deemed too hard because of the need to create a large custom editing and management area in the CMS. We did not have the resources to build, maintain and support this, along with the required staff training.

The last option was to write some software to allow iNews to publish directly to the CMS.

How it Works

iNews is able to output stories in a specified order via ftp. Staff compile the stories for each publish and press one key to send the content for processing. The stories end up on an internal server in HTML format, with an extra file noting the original order.

Processing HTML is always a challenge - the code generated by iNews was not always well-formed - although I already had some software that worked courtesy of Radio NZ International. They'd been publishing content from iNews to their site since 2002, and the script's run there with no issues for 5 years.

The script is 750 lines of Perl code, and runs via a cron job. It reads the order file and processes each HTML file. Files are parsed for metadata such as the published time, and the story content is turned into valid pre-formatted HTML. This is wrapped in XML and pushed to the CMS server.

One of the major advantages of this approach is that staff do not have to learn any HTML, and the system generated HTML can be relied on to meet our site style guidelines. We have defined some formatting codes for use in iNews for basic web formatting:
  • [h] Heading
  • [s] Summary
  • [b] Bold paragraph
  • [[text]] Italicize
When we first added news content to the site (in 2005) the summary line was auto-generated - the home page had only four headlines. The current version of the site has a summary under the first story on the home page, so the [s] directive was added to allow one to be crafted by a human.

The script will still add a summary if there is none, as the main news page needs one for every item.

The CMS has an import script that takes the XML data and creates news items on the site. This is activated via a local network connection to our master servers.

I am currently working on an enhanced version of the script that'll allow stories to be categorised and some other cool stuff. More on this in a later post.

No comments: