Monday, July 28, 2008

News Categories at Radio NZ

We've just completed a major update to the news section of the site that allows us to categorise news content.

We use the MySource Matrix CMS, and categorising content is a trivial exercise using the built in asset listing pages. The tricky part in our case is that none of our news staff actually write their content directly in the CMS, or even have logins. Why?

When we first started using Matrix, the system's functionality was much more limited than it is today (in version 3.18.3). For example, paint layouts - which are used to style news items - had not yet been added to the system.

The available features were perfectly suitable for the service that we were able to offer at that time though.

The tool used by our journalists is iNews - a system optimised for processing news content in a large news environment - and it was significantly faster than Matrix for this type of work (as it should be). Because staff were already adept at using it, we decided to use iNews to edit and compile stories and export the result to the CMS.

This would also mean that staff didn't have to know any HTML, and we could add simple codes to the text to allow headings and other basic web formatting. It would also dramatically simplify initial and on-going training.

The proposed process required two scripts. The first script captured the individual iNews stories, ordered them, converted the content to HTML, and packaged them with some metadata into an XML file. (iNews can output a group to stories via ftp with a single command).

The XML was copied to the CMS server where script 2 imported the specified content.

Each block of stories was placed in a folder in the CMS. The most recent folder represented the current publish, and was stamped with the date and time. The import script then changed the settings on the site's home page headline area, the RSS news feed and the News home to list the stories in this new folder.

Stories appeared on the site as they were ordered in the iNews system, and the first five were automatically displayed on the home page.

On the site, story URLs looked like this:

www.radionz.co.nz/news/200807221513/134e5321

and each new publish replaced the any previous versions:

www.radionz.co.nz/news/200807221545/2ed5432

On the technical side, the iNews processing script ran once a minute via a cron job, but over time we found two problems with this approach.

The first was that the URL for a story was not unique over time - each update got a new URL. RSS readers had trouble working our what was new content and not just a new publish of the same story. People linking to a story would get a stale version of the content, depending on when it was actually updated.

The second related to the 1 minute cycle time of the processing script. Most of the time this worked fine, but occasionally we'd get a partial publish when the script started before iNews had finished publishing. On rare occasions we'd end up with two scripts trying to process the same content.

The Update

The first thing we had to do was revise the script for importing content. This work was done by Mark Brydon, one of the developers at Squiz. The resulting script allowed us to:
  • add a new story at a specific location in Matrix.
  • update the content in a existing story (keeping the URL).
  • remove a story
  • put stories into a folder structure based on the the date.
I provided some pseudo-code and XML and Mark did the rest, with a fair bit of testing and discussion to get the script perfect along the way. Revise actually isn't a strong enough word - Mark merged our fours import scripts into one, refactored common code it functions, and brought it all up to Squiz coding standards.

One of the early design decisions was to use SHA1 hashes to compare content when updating. As you'll see later it made the script more flexible as we fine-tuned the publishing process. Initially the iNews exporter generated SHA1s based on the headline and bodycopy and these were stored in the spare fields in the Matrix news asset. These values could be checked to determine if content had changed.

The second task was to update the iNews exporter to generate the new XML. This proved to be a small challenge as I wanted to run the old and the new import jobs on the same content at the same time. Live content generated by real users is the best test data you can get, so new attributes were added to the XML where required to support this.

The first 3 weeks of testing were used to streamline the export script and write unit tests for the import script. I also added code to the exporter to process updates and removals of stories.

Add. This mode is simple enough - if the story was not in the system, add it.

Update. The update function used the headline of story to determine a match with an existing story on the site. We limited the match to content in the last 24 hours.

This created a problem though - if the headline was changed the system would not be able to find the original. To get around this I created the 'replace' mode. To replace a headline staff would go to the site and locate the story they wanted, capture the last segment of the URL, and paste this into the story with a special code.

In practice this proved to be unwieldy and was unworkable. It completely interrupted the flow of news processing, and we dropped it after only 24 hours of testing.

As an aside, the purpose of a long test period is to solve not only technical issues, but also operational ones. The technology should facilitate a simple work-flow that allows staff to get on with their work. The technical side of things should be as transparent as possible; it is the servant, not the master.

What was needed was a unique ID that stayed with a story for its life in the system. iNews does assign a unique ID to every story, but these are lost when the content is duplicated in the system or published. After looking at the system again, I discovered (and I am kicking myself for not noticing earlier) that the creator id and timestamp are unique for every story, and are retained even when copies are made.

It was simple matter to derive a SHA1 from this data, instead of the headline, and use that for matching stories in the import script. Had I not used a spare field in the CMS to hold the SHA1, we'd have had to rework the code.

After a couple of days testing using the new SHA1, it worked perfectly - staff could update the headline or bodycopy of any story in iNews and when published it would update on the test page without any extra work.

This updated process allowed staff to have complete control over the listing order and content of stories simply by publishing them as a group. If only the story order was altered, the updated time on the story was not changed.

It has worked out to be very simple, but effective.

Kill. To kill a story a special code is entered into the body of the story. The import script sets the mode to kill and the CMS importer purges it from the system.

Because of the all the work done on the iNews export script, I decided to fix the issues mentioned above - partial publishes, 1 minute cycle time, and two scripts working at once.

The new script checks for content every 3 seconds, waits for iNews to finish publishing, and uses locking to avoid multiple jobs clashing. I'll cover the gory details of the script in a later post.

Summary

The new scripts and work processes are now being used for live content. Each story gets a category code plus an optional code to put it in the top 5 on the home page. The story order in iNews is reflected on the site, and it is possible to correct old stories. It's all very simple to use and operate, and doesn't get in the way of publishing news.

And work continues to make the publishing process even simpler - I am looking at ways to remotely move content between categories and to simplify the process to kill items.

Monday, July 21, 2008

Te Wiki o te Reo Māori - Māori Language Week

I made a few changes to the Radio NZ site this morning as part of the company's initiative for Māori Language Week.

The most obvious is the change in font size on the te Reo versions of headings.

This was achieved across the whole site by changing one line and adding a second to our master CSS file.

From this:

h2.bi .reo{font-size:12px; padding: 0; text-transform:none;}

to this:

h2.bi .eng{font-size:12px; padding: 0; text-transform:none;}

h2.bi .reo{font-size:19px; padding: 0; text-transform:none;}


We've also extended the bi-lingual headings beyond the home page to other parts of the site.

The second change is the substitution of te Reo for English in the left side menu on all pages. Hovering over the headings on most modern browsers displays a tool tip with the English equivalent.

If you look at the code on any page (I could not get blogger to display it), there is a span with the class hide. This is for users of screen-readers, and ensures that the verbal rendering of the headings remained consistent with other headings on the site.

The other major addition is some bi-lingual pages. The about us page is an example.

There are links at the top of the page that allow visitors to select which language they want, or to see both side-by-side.

The page is laid out with a series of alternating divs - one for each language - and they are styled with CSS to sit alongside each other.

The links use Javascript to show and hide the sections.

To return the original design at the end of the week it'll be a simple matter to restore the CSS, and swap a few bits of text in the master site template.

Wednesday, July 16, 2008

Some interesting statistics

I have just read the news about Firefox download day, so I thought I'd go and look Radio NZ's web stats to see what happened to Firefox 3 usage.



Yep, looks like a lot of people started using FF3, but were they upgrades, or new users? This is the graph of the total number of FF users over the same period.



There is no statistically relevant change.

I've noticed that FF users quickly upgrade compared with IE users. Of version 2 users, 98% of them are on the latest release - 2.0.0.14 at the time of writing (end of June). Less than 2 % of all FF users are still on version 1.

Looking at IE use in June 2008 we can see the position two years after IE7 was launched:

IE 7: 62.1 %
IE6: 37.4 %
IE5.5: 0.26 %
IE5.0: 0.13 %

The changes in browser use over time are also interesting.

BrowserOct '06Aug '07Jun '08
IE73.4669.1565.61
FF19.5323.4526.99
Safari4.695.035.44
Opera0.820.861.09

IE use is down and everything else is up.

I thought I'd look at operating systems use over the same period:

OSOct '06Aug '07Jun '08
Windows92.6991.3790.51
OS X6.327.408.02
Linux0.831.011.26

I'm also seeing in increase in platforms like PlayStation, iPhone (starting before the release of the 3g in NZ), Symbian, and even Nintendo Wii.

From a Webmaster's point of view, these changes suggest that my approach to content is going to have to become more platform agnostic as time passes.

Tuesday, July 15, 2008

Dated URLs for Radio NZ audio

At the moment on the Radio NZ site all the audio for a programme appears in the following format:

www.radionz.co.nz/audio/progamme/trackname

This is reflected in our CMS with audio appearing in one folder. Due to the number of audio items we now have in each folder (and growing each day) it is getting difficult to manage.

Starting today, we are migrating to a date-based system for audio URLs. They'll follow this format:

www.radionz.co.nz/audio/programme/year/month/day/trackname

Some programmes may only have a year or year/month structure, depending on how often it is broadcast.

This is being done programme by programme and may take a month or two to complete, as we have to change the pages that list the audio as well.

The first programme to use this format is Saturday Morning with Kim Hill.

One challenge in the process was the task of moving assets from the old structure to the new.

A Squiz developer has written a script that does the job for us.

move_assets_to_dated_folders.php /path/to/radionz 26411 audio_item created day

This command moves all the audio items under asset number 26411 into year/month/day folders based on the date the asset was created. At the moment the created folders are under construction, but I have asked for a parameter so they can be made live instead.

We need to do this now before we extend the time-frame audio is available to the public (1 - 4 weeks at present). It won't be possible to move the content later on without breaking links.

Sunday, July 6, 2008

Equal Height Columns with MooTools

There are a number of Javascript solutions around for creating equal height columns in a design. They all work in pretty much the same way - you generally specify the required columns in the height function, and run it after the DOM is available.

On the Radio NZ site I recently ported our old column function to use the Mootools library.

Here is the old function:
/* SetTall by Paul@YellowPencil.com and Scott@YellowPencil.com */

function setTall() {
if (document.getElementById) {
var divs = Array();
var list = new Array('cont-pri','cont-sec','hleft','hright','features','snw');
count = 0;
for (var i = 0; i < list.length; i++) {
if ( (temp = document.getElementById(list[i]) ) ){
divs[count] = temp;
count++;
}
}

var mH = 0;
for (var i = 0; i < divs.length; i++) {
if (divs[i].offsetHeight > mH) mH = divs[i].offsetHeight;
}
for (var i = 0; i < divs.length; i++) {
divs[i].style.height = mH + 'px';
if (divs[i].offsetHeight > mH) {
divs[i].style.height = (mH - (divs[i].offsetHeight - mH)) + 'px';
}
}
}
}
The first part of the function places the ids we want in an array, then checks to see if these are valid and storing them in another array of they are. This check is required as the hleft and hright divs are only found on the home page.

The second loop finds the tallest div, and the last loop changes the height of each div to that value (taking into account any CSS padding).

The Mootools version is much simpler:
function equalHeights() {
var height = 0;

divs = $$('#cont-pri','#cont-sec','#hleft','#hright','#features','#snw');

divs.each( function(e){
if (e.offsetHeight > height){
height = e.offsetHeight;
}
});

divs.each( function(e){
e.setStyle( 'height', height + 'px' );
if (e.offsetHeight > height) {
e.setStyle( 'height', (height - (e.offsetHeight - height)) + 'px' );
}
});
}
The first call to $$ returns only valid divs, eliminating the need to check. The each iterator is then used twice - the first time to get the maximum height, and the second to set all the divs to that height.

Wednesday, July 2, 2008

Loading Content on Demand with Mootools

In my last post I explained some changes I've made to the Radio New Zealand website to improve page rendering times and reduce bandwidth.

In this post I'll explain another change that loads some page content on-demand with an AJAX call, saving even more bandwidth.

One of the fun features of the site is that the content layout changes slightly depending on the width of the browser port. When the browser is less than 800px wide the listener image in the header reduces in size, and the menu moves to the left. When it expands beyond 1200px an extra content column is added to the layout. At the moment this extra column duplicates the features content from the home page, but in the future we could customise it for each section of the site.

The content is shown (or hidden) by dynamically changing the available CSS rules for the page, based on the viewport size.

There is one disadvantage to this approach though - all the required content has to be served for every page request, regardless of whether the end user ever sees it. Based on site stats, the cost of this is non-trivial.

According to Google Analytics:
  • 44% of RNZ visitors have 1024px as their screen width, with most of the rest being higher.
  • We had 700,000 page impressions in June (excluding the home page).
The average size of the features section is 30k, so at least 300,000 delivered pages had content hidden on them that could never be seen by the visitor. That is 20% per page - a lot of waste.

As from today, no more. If you care to look at the source code for this page with your browser set to 1024 wide you'll see an empty div with id = features.

When the page width extends beyond 1200 px, a javascript call is made to fetch the div's content from the server and insert it into the page.

A simple and effective way to save 8.5 gigabytes of traffic a month. Combined with yesterday's effort that's 30 gigabytes of saving a month.

Tuesday, July 1, 2008

Improving page load speeds

When building a site, I try to make pages as small as practicable. The aim is to reduce load times for visitors and so provide a better experience as they move around the site.

There are several reasons for this:

1. There is a correlation between the speed of the site and the perceived credibility of the company and the content.

2. Many New Zealanders are still on dial-up.

3. Bandwidth costs money, which is always scarce (bandwidth is sometimes scarce too).

4. Serving smaller pages puts less load on servers.

I have just made some changes to www.radionz.co.nz to improve page load times and reduce bandwidth.

The first of these changes was to stop using sIFR. Scalable Inman Flash Replacement (to quote the site), "...is meant to replace short passages of plain browser text with text rendered in your typeface of choice, regardless of whether or not your users have that font installed on their systems."

When we launched the new version of the site 18 months ago, sIFR was to render all headings in one of our corporate typefaces. The system worked very well, but required a small flash movie (14kb) and some javascript (28kb). There was also an annoying bump of a few pixels when the flash movie was added to the page - something that I was never able to fully resolve.

The advantage of using sIFR over the traditional method of using images for headings is that if any text is changed, or new pages are added, the text is automatically replaced. We add dozens of new pages every day, so this is a big time-saver.

As the site has grown in size and the number of visitors increased, the 42 kb download and the slower rendering time started to annoy me. Even when content on a page didn't change and was cached in the user's browser, there was still a delay while the headings were replaced.

Lastly, the typeface did not have any macronised vowels, so it was not possible to correctly set headings in Māori.

So last week I removed sIFR from the site. It was a very tough call as the sIFR replaced fonts look really good, and added a certain polish to the site. But with any change you have to weigh all the pros and cons, and at this time the benefits to end-users where overwhelming. (There are also some other changes that I'm making in the near future that'll be simpler without sIFR, but more about that later).

Upon removal, the page rendering improvement was immediately obvious on broadband, and I suspect that on dial-up it will be even more marked.

The other side-effects of this change are slightly reduced server loading (from fewer connections) and a reduction in the amount of bandwidth used by around 800 megabytes per day. (We shift about 8 gigabytes of page traffic a day. The audio is many, many times this figure).

The second phase of the speed improvement was to change the way javascript is loaded. On Monday this week I made two changes.

The first was to use Google's new content delivery network to serve mootools. This javascript library is used for a number of effects on the site such as the accordion in the audio popup (click on the headphones at the top of any page), and all the picture gallery pages.

There are a number of advantages in doing this, summarised nicely by Steve Souders. In a nutshell, the Google servers are optimised to minimise the size of the content, and content headers are set to encourage caching of the content at the ISP and browser. It works even better if other sites use the same library - it increases the likelhood that the content is already cached somewhere, spreading the benefits more widely.

I could have made these changes on our own server, but it doesn't cost anything to support the initiative so why not? I don't know how many other NZ sites use mootools, but a lot of the bigger sites use prototype and they could benefit from better site performance, lower bandwidth use, and improved user experience by adopting this approach.

The second change that I made was to move all our javascript to the bottom of the page. This ensures that the HTML and CSS are loaded first and have a chance to render in the browser. This is one of the recomendations made by the Yahoo performance team for speeding up websites.

The difference in rendering is quite noticeable, and on slower connection you can see the scripts continuing to download after the current page is showing.

In the case of the Radio New Zealand site we've reduced the rendering time for pages by 3 - 4 seconds and trimmed bandwidth consumption by about 10%. The changes took 3 hours to plan, test and implement.

At the rate we consume traffic, the payback period is pretty short. Add to that future savings from not having to replace the servers quite so soon, and the unmeasurable cost of delivering pages faster to visitors (who don't use as much of their data caps) , I'd say it was time well spent.