Friday, December 30, 2011

My take on out-of-office replies

About 5 years ago, when I got back from holiday, I had 1500 emails in my inbox. I had checked (but not answered) my email all through my break and I had worried about the growing backlog instead of enjoying the rest.

I did two things to fix this problem. The first was to start using operational accounts instead of personal accounts for work directly related to the website. The entire web team have access to these accounts, meaning if someone is away nothing gets missed.

The second thing was to set up an out-of-office reply. The point of this is NOT to tell people when I am back. The point is to provide the email addresses that they should be using instead, and saying who is on-call.

The other thing that I added was a note to say that I was not going to read ANY of the email received while on holiday - so if it was important, please resend the email when I got back.

This message caused a bit of controversy, but the effect was dramatic. While on holiday I knew I could completely forget about my email because those sending the mail would do the work for me! They would decide who to contact instead (from info in the reply), or to resend it later.

That first year I took a hard-line - emails were trashed automatically when received. Over 1,000 were in the trash when I got back. Only one 'important' email was resent. The second year it was about 200.

Now things have settled down I don't trash emails anymore, but I still have an auto-response stating who to contact for operational issues. And holidays are so much more relaxing.

Saturday, December 3, 2011

How to Fix Loud TV commercials - Part 2: Measurements

In part one of this series I gave an overview of the problem of loud TV commercials, or put more clearly, the inconsistent loudness of different items in a broadcast.

Before I dive into some solutions to the problem there are some audio concepts to understand. These relate to how audio is produced and measured. Inconsistent measurement and poor monitoring practices are at the core of the problem, so we need to understand these first.

There are two forms of measurement used when producing audio.

Level Meters

A level meter is a device for measuring variations in the electrical amplitude of the audio signal. The two most commonly used meters in broadcasting are the VU meter and the IEC standard PPM. Both measure audio differently, and experienced engineers know how to use them correctly. Measures of level are generally absolute and repeatable.

There are more sophisticated meters that purport to measure loudness as well; I will talk about these later.

Ears

The second and by far the most important tool, is the human auditory system. The ears combined with the brain is the most powerful psychoacoustic measuring device on the planet.

The human ear has two attributes that bear on this problem.

The first is that the perceived volume of a sound is based on the average loudness over time. According to Wikipedia:
The perception of loudness is related to both the sound pressure level and duration of a sound. The human auditory system integrates (averages) the effects of sound pressure level (SPL) over a 600–1,000 ms window.
The second important attribute is that it is optimised for processing speech. Human speech at 1 metre is typically around 60 dBA, measured with a sound pressure meter.

If you play a recording of speech and ask a group of people to set the volume so it is comfortable, the set volume tends to converge on 60 dBA. This applies when watching TV too.

Audio Processing

The two measures above would be fine except for two types of processing that are applied to audio to change the perception of loudness.

The first is equalization (EQ), which boosts or cuts selected frequencies. The aim in doing this is to improve the intelligibility and impact of the sound.

Mixing desks and digital editing systems have very complex controls that allow specific frequencies to be targeted for enhancement, allowing for fine-grained control.

The second first of these is audio compression or limiting. Put simply, this reduces the dynamic range of the audio so that the difference between the loudest and the quietest sounds are reduced. This allows the average volume to be increased.

Both of these are used in commercial production. The voice is EQed and compressed. Any music may also be compressed, and the finished product compressed again. It is common to see commercials with a dynamic range of less than 2dB.

In Practice

The audio of most TV productions has a dynamic range greater than 2dB. 15-25dB is more typical. This difference means that the level (on a meter) has to be set lower to avoid overload on the peaks. Commercials don't have any peaks, so can be set higher.

This is what happens in the average consumer's lounge:

They turn on the TV, and when the programme starts the volume control is adjusted so that the speech is at a comfortable volume. As stated above this will be close to 60 dBA. The dynamic range of the spoken material will be (say) 15 dB, and it must be set so that any peaks do not causes overloading in the broadcast equipment.

When a commercial is played it can be set 13 dB higher (2 dB dynamic) without causing electrical overload.

At the consumer's end, this means that content with a reduced dynamic range (like commercials) will sound a lot louder.

This is a massive simplification, but hopefully it makes sense. I suspect that I'll need to make companion video to this series to demonstrate things more clearly.

How to fix this?

Most discussion I've seen suggests that his is either a technical problem, or deliberate.

The technical crowd think that problems occurs because there are no agreed standards. There are standards, and though they are not always followed I think this is a side-issue because the problems of differences in loudness is operational in nature.

As stated in part one, I don't believe that most TV stations turn up the volume of the Ads. This is an error of omission. The problem persists because it is either ignored because it is not understood, or there is a belief that Ads must be louder in order to be effective.

Next time I will explain the solution to this problem by first presenting a simplified version, and then applying it to some real-world situations.

If you want clarification on anything here, use the comments section.

How to Fix Loud Commercials on TV - Part 1

This is the first of a series of non-tech posts about the volume of commercials on TV - why they are too loud and how to stop the problem.

The problem of commercials being too loud is the result of an arms race, of sorts, that has its roots in practices that were set decades ago. But it is not a race to the top, to world domination, to commercial success. It is a race to the bottom, to the lowest possible quality and the worst outcome for all.

The problem exists world-wide. The US have passed a law, and in New Zealand it is Labour Party Policy. Debates rage on forums, both public, amateur and professional about the cause of the problem and what can be done about it.

About Me

Before running Radio NZ's web operation I was a recording engineer. I started out as a Trainee Radio Studio Operator in 1981, and I've worked in commercial and public radio, on sports broadcasts, and music recording of all genres from early music through the classics, world music, jazz and rock. I've also recorded film scores, and mastered and re-mastered albums.

In the 80s I set-up the audio processing for a couple of Wellington radio stations, 91ZM (now ZMFM) and 2ZB (now NewstalkZB). I was in the fortunate position of having made commercials, done on-air sound mixing (called panel operating in some countries) and worked on the station sound (via audio processing). This allowed me to try out my ideas for improvement on-air, and hear the results first-hand.

Sometime in the mid 80's I was asked to contribute to discussions at TVNZ about the problem of loud commercials.

The Problem

I will start by defining the problem from the viewer's (and listener's) perspective.

You turn on the television to watch a programme. You sit down, adjust the volume so that it is comfortable for you, and start to enjoy the programme. As the show moves from scene to scene (assuming there is no ad break) the volume remains comfortable; you can hear everything that is being said, any  music and effects are neither too loud or too quiet.

Then a commercial break arrives. The volume suddenly increases. It is no longer comfortable to listen to - it is intrusive. During the programme you could have a side-conversation with your fellow viewers. That is now impossible. You reach for the remote and mute the audio.

This is the experience of hundreds of millions of television viewers.

The perception is that someone, somewhere, is turning up the commercials.

So, the problem in a nutshell: the volume of broadcast items is inconsistent, to the point of being disruptive and annoying.


Fixing the Problem

It is very unlikely that anyone, anywhere, is turning up the commercials. I've certainly never seen it.

The problem is caused by a number of technical and operational factors, and is (probably) rounded off by management being unwilling to deal with the issues for 'competitive reasons'.

The problem is complex, and I want to make the explanation accessible to people outside the audio industry. I'll spend a few posts looking at various aspects of the problem, explaining some of the basics of audio, hearing and listening before moving onto solutions.

Please leave any questions, or things you want explained in the comments.

Sunday, November 27, 2011

Latest Radio NZ Browser stats

The latest browser stats for www.radionz.co.nz show some interesting changes when compared to previous years.

Browser2011201020092008
IE41.250.65663
Firefox23.225.5227.527.73
Safari5.613.1105.66
Chrome13.88.754.21.47
Opera0.70.91.021.08

IE is in decline, and IE6 is currently 3.6% of total browser share. Operating system use is also changing.

OS2011201020092008
Windows728184.889.3
Mac15.614.212.68.5
Android3.60.30.020
iPhone2.51.40.560.19
iPad2.40.6300
Linux1.531.41.451.72
iPod0.50.350.220.08

Windows use is dropping, and mobile operating systems are on the rise. Mac use in 2005 was less than 5%.

The breakdown of mobile devices share in the last month:

Android - 38
iPhone - 27
iPad - 26
iPod - 5.33
Blackberry - 0.8
Symbian - 0.7
Sony - 0.19
Windows - 0.12
Nokia - 0.11
Windows Phone - 0.1

iPad users spend on average twice as long on the site.

Tuesday, November 22, 2011

Over on Geekzone Mauricio has posted page speed info for his website.

Here is similar information for Radio New Zealand.

Average web page load time distribution in New Zealand:

All regions:



Slowest cities:



















This is running on a server based at ICONZ running Rails 3.1.x.

The key thing we've done to make our site fast is reducing over-the-wire times by bundling assets, using compression where we can, and setting far-future headers for static content to ensure it can be cached somewhere else. The markup is also as clean as we can make it to ensure fast rendering.

Saturday, August 27, 2011

Radio NZ's embedded player

Last week Radio New Zealand released its new embedded player to the general public.

The graphics were designed by Clemenger BBDO in Wellington and implemented in Flash by PixelDepth in Auckland. The player (and server software) took 3.5 person-days to code, implement and test.

The player can be used by anyone (subject to terms of use) to embed Radio NZ audio content on their web pages.

While some people will be disappointed that the player does not work on iOS devices, the decision to make this first version in Flash was carefully considered. The core web team at RNZ is small - there are just three of us working on a range of projects. At the moment we are working on a complete redesign of the site, and on replacing our CMS, not to mention planning web coverage for the upcoming election.

A few years ago I started work on an HTML5 based player for our audio. The code is still available on Github. One of the impediments to completing this project was the lack of time to debug the application. At the time browser support for HTML5 was patchy. Building that player gave me a good idea of the level of engineering required for a complete package. I take accessibility very seriously. Any player has to work on all platforms, and for screen readers.

I have started planning for release 2 of the embedded player, but it will need thorough testing before release. We didn't have the time to do that now. I'm sure you all understand!

The new CMS is providing the data-backend for the new player, and audio content is delivered from our content delivery network. Both of these are completely platform agnostic. The iframe technique we are using will allow the player to be updated in the future, and for this to appear everywhere.

The player is already being used more widely than I'd expected. Other public broadcasters like NPR, special interest sites like Treetools, and Cycling Wellingtonblogs, not to mention local news websites.

Sunday, July 10, 2011

Rebuilding Radio NZ - Part 12: Migrating Episodes

The migration of programme episode content from MySource Matrix into our new Rails-based CMS has proven to be the largest and most complex task. This week I am going to dive heavily into the code we used to do this.

For most programmes we have maintain a programme library of content dating back to the start of 2008. Some go back further. There are approximately 10,000 episodes from dozens of programmes, each with links, images and embedded video. The images we all stored in Matrix and also had to be transferred.

Because Matrix pages are only assembled when they are requested publicly, the most practical option was to download each page and scrape the content. We have been very consistent in the HTML mark-up used on the site, so this would be reasonably successful.

The first task was to get an inventory of URLs for each programme. A manifest file was created for each programme that gave a list of all the unique episode URLs for that programme, and optionally (if it existed) the episode summary from the metadata. I was not able to rely on programme schedules to get this information because some programmes run specials (Morning Report and Checkpoint) and others were cancelled due to Civil Defense events.

In Matrix I used one Asset Listing for each programme to generate a manifest in XML, and wrote a script to read each of these and cache them locally. Once available, a second script requested every URL listed in the manifests and cached these as well.

Caching the files locally allowed much faster test runs, eliminated unnecessary load on the server and also allowed me to quickly make small tweaks to the HTML if required.

The next step was to extract the core content from the page. I used Nokogiri as my tool of choice for this task.

All body content on the site sits within a div with id #cont-pri. After extracting this block from the page unwanted elements were removed from the DOM.

Audio was removed completely, as this is linked to episodes by association, and rendered dynamically by ELF. Other content to be added later (such as promotions of future content) was also removed.

Host information (in a paragraph with the host class) was extracted, and the spelling of some names was corrected. For each programme I ran the script with debugging code in place to list all episodes where the host could NOT be extracted. Once identified I updated the cache files, or altered the import routine to allow for the variation.

During the actual import the audio and relevant presenter were associated with each episode and saved.

Images

The import script also iterated over all images in the content, cached them locally, then uploaded them into ELF, and changed the link in the HTML to point to the new image.

The import routine was designed to run on the same content repeatedly without creating duplicates. This made it simple to re-parse and import the content again if something incorrect was found later.

As it happens we did find a couple of markup issues and ran the importer on our live system, under normal load with people browsing the content.

I have posted the code on github. I make no excuses for the ugliness or levels of hackery. This code is of the get-it-going-fast-run-it-once-and-throw-it-away-while-also-learning-ruby variety. Don't expect too much.

The clean_up_episode_html is of interest as this is where the captured HTML is cleaned up. A series of regular expressions are used to find and replace mark-up generated by the WYSIWYG in Matrix that is not what we'd ideally want.

The HTML is then sent to html tidy and a smaller set of regexs is used to fine tune the code.

This function was fine-tuned by running episodes though it and watching the output for anomalies. Existing regexs were adjusted, or new ones added to clean the output.

Now we have most regular Radio NZ programmes running in ELF: Morning Report, Nine To Noon, Midday Report, Afternoons, Nights, Saturday, This Way Up, Sunday and Arts on Sunday.

Many of Radio NZ Concert's programme are in ELF too: Upbeat, Music Alive, and many others.

Now that most of the hard transfer work is over, I am focussing on building out functionality to support programmes that don't fit existing patterns. Examples are Enzology and New Horizons.

I am also working on improving the administration section - more programmes means more users and this is generating good ideas for improvement.

I'll be posting only fortnightly on this topic from now on, as I have nearly caught up with the work currently being done on ELF.

Tuesday, June 28, 2011

Rebuilding Radio NZ - Part 11: Editing Episodes

I am jumping ahead to compare our new episode editing interface with the old one, leaving the implementation and migration of episodes for next time.

This is the top section of the programme episode editor in ELF, showing Nine To Noon for 16 June 2010:


It shows a link to the host, the current status of the page and edit/trash buttons. Audio for the episode is displayed on the page, and can be edited directly from there. In edit-mode text content is entered directly into a WYSIWYG, and the host can be changed.


We are using the CK Editor in ELF, and this is very good at removing HTML cruft from content pasted in from Word and Outlook.

In addition to the native clean-up, I have added an additional formatting function for the body. This takes the body content, sending it back to the server to be pre-formatted by ELF's built in parser/formatter. The parser takes the input HTML and returns content formatted in a standard way (bold for lines that start with time). A future version will add links when RNZ programmes are mentioned in the heading.

This may seem a pretty minor feature, but it saves a huge amount of time on a core content task. That is a primary design concern in ELF - eliminating mundane tasks so that users of the system can work on value-added tasks.

Images can be added via a button in the WYSIWYG, and uploaded directly via an image browser. The image browser immediately shows only those images for the current programme.

I should note that everything I have shown is a work-in-progress. The body parsing function was updated last week to just parse the selected content to allow new content to be added to existing content and formatted in place.

ELF generates all progamme information such as the date, host and broadcast times on the fly for every page.

This process differs a lot from the interface in Matrix. Navigation to an episode is via the tree, and the episode is stored in a standard page (shown here without the tree):



The name of the programme, the host for the day and times of broadcast are text content. The audio content is inserted into the page only when it is publicly requested, and cannot be seen in this view (it can be previewed, but not edited from there). The status is on another screen, accessed via a drop-down menu or right clicking on the asset tree and so is the field for a summary of the programme.

Pasting into the WYSIWYG is hit and miss (in our version) often requiring manual editing even after the built in cleaner is applied. I suspect we are stricter than most in what we'll except in our markup.

Image loading takes place in another place tree context (or via a simple image uploaded interface). Once images are loaded they can be added via a button in the WYSIWYG. This launches the image browser which has a tree for navigation.

General versus Custom

Looking at the two approaches, it is clear that in our case an interface that closely matches our workflow and excludes superfluous options results in a better user experience. This was one of the key drivers for change. The cost of the bespoke approach is that the system has to be designed, coded, maintained and supported.

The Matrix approach provides a large number of general tools, allowing sites to be built without any code being written at all. The functionality to build most things is built right in. The downside is that in the administration interface is it not possible to hide unused functionality, and it may not be easy to group related content together in a way that matches business processes. The new version of Matrix (the Mini) does a very good job of fixing both these issues, and the integrated context sensitive help is very impressive.

Iterations

The ELF interface I have shown is the product of dozens of iterations, many of these based on feedback from colleagues. The first round of feedback came from the web-team - they were the first to use the interface for day-to-day work. After the first round of improvements we've started training producers on the system. Training sessions have ranged from 5 to 30 minutes, and during these we've taken notes on what didn't work so we can improve the system further. 

An example is ELF's episode navigation. It is fine for daily programmes where the navigator has five out of seven days linked. It does not work so well for weekly programmes where one in seven links is active, or yearly programmes where one in 365 links are active. In these case there is too much scrolling, and in the case of yearly programmes it is hard to find the previous episodes. I don't have a solution yet.


Some other rough edges

While the interface is fine for once-a-day use by producers, there are some problems that only became evident when using the system all the time and repeating the same actions again and again in a short time frame.

While setting up the documentaries section in ELF I noticed that the workflow was not right - for example the layout of settings on the programme setup page was not intuitive. Not all programme types require all options, and the page needs to have these shown and hidden dynamically in response to changes to the programme type setting.

After using the interface many times, there overhead of having to hunt for the right thing to click or set starts to add up. The benefit of a bespoke system is that these things can be changed

In the next installment I'll cover the huge task of importing thousands of past programme episodes along with audio links, images and presenter information. Stay tuned!

Saturday, June 18, 2011

Rebuilding Radio NZ - Part 10: Going treeless and Modules

One of the biggest time wasters (for us) in Matrix has been navigating the admin section of the site via a tree. Trees can be useful of many tasks, and for seeing the hierarchy of assets (and URLs). But when the number of items in the tree increases above a certain size it becomes harder to get to the one you want.

I should note that Matrix does have a 'simple edit interface' - a way to directly access the editing screen for a particular page - but we've never used it because the site was (and is) growing too fast to set these up for sections and pages. Work is done by site admins (site-wide) or producers (one programme), so the fit was not quite right.

Take Nine To Noon as an example. This programme has 1200 child pages - one for each day’s episode (see image). To get to the current day’s page you have to open the Site node, scroll down, open the station node (National), scroll down and open the programme node (Nine To Noon), wait for the the children to load and display, then scroll down to the bottom of the list. Whew! You can use key-strokes to do all this, but the loading and scrolling time is still high. Moving between programmes is even more unwieldy.

Another example is the audio folders. Nine To Noon currently has over 7,000 audio items. These were once stored in a single folder. When we got to about 3,000 items we had to move everything into dated sub-folders - year, then month, then day. A script was written (now part of the Matrix core command line utils) to move existing items into dated subfolders.

This structure made it simpler to get to items based on their broadcast date, but much slower to move between assets. It also added unnecessary URL segments to the audio path.

We are adding and editing a lot of content every day. Moving around a tree has a very high operational overhead given the large amount of existing content.

In the early stages of the project Nigel (from AbleTech) and I debated using a tree for ELF. Nigel was dead against it, I was less so provided it could be designed well. I agreed to go treeless for the admin interface and to-date one has not been needed. Nice one Nigel!

The biggest problem with abandoning trees though, is the question of what to replace them with. The major navigation pain-point for us is navigating content by broadcast date.

Most developers' initial answer to the date navigation question looks like this:


There is nothing inherently wrong with these date pickers - they are used in a few places in ELF's admin section. Personally I find them too busy, they all work slightly differently, they require a lot of markup and solid engineering to be accessible, and they make you think. Don’t make me think. The solution we came up with involves a sliding date range selector, powered by jQuery and ajax:



Clicking the left or right arrows takes you forward or back. Valid content links are passed into the widget via an ajax call. The dark red button is today, beige buttons are days with content, and the blank buttons have none.

This widget is used right through the admin section of ELF to move between programme episodes, schedule events, and highlights, and is proving both fast and intuitive.

Modules

As the number of asset type increased (Audio, Episodes, Highlights, Schedules Events) we ended up with some duplicated code in our models. These were often slightly different implementations of scopes, coded at different times by different people, all intended to do the same thing, and all tested in slightly different ways. The answer to this problem was to extract the shared functionality into modules.

The first of these modules covered trash. Every trashable asset has a field ‘trashed’, and the module provided consistent methods for trashing and restoring items, and a scope to skip these items in queries.

The second module was for status - live (public), and under construction (hidden). This wraps the published_at field and control visibility.

The third module contained scopes for selecting items based on broadcast date and time - latest, now, between, and for (a specific date).

Having the code in modules simplifies testing and fixed some bugs caused by the subtle differences in the original many implementations. They key lesson is to look for patterns and common code and extract them into modules.

I put the modules into lib, however DHH published a Gist a week ago suggesting their own folder. And if you are not using ActiveSupport::Concern for this sort of thing, you should be. Here is an explanation.

Rails 3.1

Over the last few weeks I have started work on moving the app to Rails 3.1. The feature that is initially of most interest to us is the asset pipeline. DHH first mentioned this in his 2010 Rails Conf Keynote, and this talk was the inspiration for our CSS Views Gem, coded by Koz of Southgate Labs and released on github.

The asset pipeline is a no-brainer. It allows better organisation of CSS and Javascript, and packaging and compression/minification in production. CSS and Javascript are now first-class citizens in the framework  and can be mixed in with Erb, SASS, CoffeeScript or the language parser of your choice.

Having a fast site is important to Radio NZ - I have spent many hours improving markup and streamlining how content was packaged and served - all to great effect. Over the last few years we've reduced page size by 30% and halved client-side rendering time. Doing this in Matrix has been quite hard due to the way CSS and JS content handling is abstracted. You have to get under the hood and hard-code stuff in the Matrix templates to get the highest benefits.

Rails on the other hand is moving towards being 'fast by default'. A selection of sensible and safe best-practices that will work for most use-cases are baked into the framework and turned on by default.

Next time I'll look in more detail at the editing interfaces in the ELF admin section, and compare them with the work-flow in Matrix.

Friday, June 17, 2011

Upgrading to Formtastic 2.0

I have been using Formtastic 1.2 for the Radio NZ site rebuild. I have a number of custom input types, customised via extending SemanticFormBuilder and adding my own methods. Justin outlined the changes a few months ago.

Formtastic 2.0 RC just got released, and Justin points out:
Folks who subclassed SemanticFormBuilder and created their own custom inputs as methods will be in for some pain.

OK, I have been through the pain, and it wasn't so bad. Here is what I did:

1. Changed all instances of Formtastic::SemanticFormBuilder in the formtastic initializer file to Formtastic::FormBuilder.

2. Remove the custom builder declaration in the config file

3. Moved my old custom methods into a module.

(I added include FormtasticExtensions to the top of the initializer file.)

4. Converted my methods to classes.

I have posted a gist of the converted module with the old methods below the new classes.

You should also change the setting of all_fields_required_by_default to false if you did not have it set before, otherwise you'll find that forms that once saved will have HTML5 warnings for all fields. The other gotcha is that validates_length_of without validates_presence_of in the model makes the field required. This might not be what you expect!

My impression of the update is good. This custom code is a lot simpler, and less hackery is required to get what you want.

I found one issue I am still working on fixing - new forms that use the CK Editor for text areas won't save. This happens if the field is required. CK Editor does not write any content to the text area, so the form won't save. A work-around is to mark the field as :required => false and rely on the server-side validation to display the error to the user after trying to save.

Hopefully this helps others with their transition to 2.0.

Saturday, June 11, 2011

Rebuilding Radio NZ - Part 9: Highlights

Each week Radio NZ publishes highlights of upcoming programmes for print media, and these items are also used on the website. On the website these highlights are augmented by content from programmes with production cycles that make it impossible to meet print deadlines.

Highlights appear on the Radio NZ Home page, National home page, and Concert home page. Highlights are also display on programmes pages - for example at the bottom of the Insight page. Their use on programme pages meant this section had to be migrated before I could start on programmes.

Major refactoring

Highlights and Schedules were based off the same model (schedule_events), with a few fields being for the exclusive use of one or the other. Schedules had been running for a few months, but when I came to actually use highlights a some extra features were needed. Because the needs of the two were diverging, I decided to split them into separate models.

This made the code simpler to read and avoided having to hide different fields in the edit screens. The other major refactor was the removal of a base model - content - as an association. The content table held common fields such as body, broadcast time, and status.

It is my view that this approach - of using a single table to hold data that is shared between other assets - is almost always wrong. It makes the code more complex and harder to follow. There is an extra join for every database query, and this make indexing and optimisation more complex. Yes, there is duplication between models - many have a body field - but the benefit is clarity and ease of maintenance. In our case we wanted to keep things simple and avoid overhead.

To avoid code duplication, the functionality built on these fields is extracted into Modules. I'll cover that in a future post.

Migration

There was no need to migrate historical content; only the content for the next few weeks is available on the site. This was extracted via the same XML technique mentioned previously, and imported into ELF. For a short period we generated XML content feeds in ELF for use on some Matrix-generated pages.


Implementation

In Matrix highlights are displayed based on their location in the tree. In our case we had folders inside the main highlight folder (right). Each of these programme highlights folders had to be linked to each programme folder. Linking is basically a reference to the same folder.

Each page that requires certain highlights has its own asset listing. 
The disadvantage of all these asset listings is that they each had their own cache expiry based on the most recent build time. It was quite common for a highlight to be added and appear on some pages and not on others for up to 20 minutes (the cache time).


You can have all assets of one type in one place, but this almost never happens in practice. 

In ELF all highlights are (obviously) stored in the highlights table. The relationship to each programme is an explicit association:

  belongs_to :programme

This allows us to display highlights by programme or by station (every programme has a station). This is much simpler than a tree based system as you can always tell how many highlights a programme has.

Instead of multiple asset listings for duplicated HTML, there is one html template (a partial) for the display of all highlights in programme or station context.

Changes to highlights in ELF update everywhere as soon as they are saved.


Highlight Administration

In Matrix, adding highlights was a multistep process.
  1. Go to the correct folder
  2. Create a News Item
  3. Enter a summary and body on the details screen
  4. Change the created date and time to match the broadcast time of the asset  
  5. Update the metadata to provide a link for the RNZ email newsletter
  6. Make the item live
In step 4 we are faking the broadcast time in the created field. Everywhere that highlights appear, they are sorted by created date and time, and only those after 'today' are displayed.

Individual programmes could enter their own highlights by adding News Items to a folder in their part of the asset tree (as in this example from Saturday Morning).



ELF Administration

In ELF there are two contexts for adding highlights. The administrator context allow highlights to be entered for any programme, while the programme context allows individual producers to add highlights  for just their programme. This is the edit screen:




Everything is on one page, and images can be uploaded directly without having to go to another screen. There is also a widget to move between days (it is not a tree), and I will reveal this in a future post.

The speed of adding highlights is much faster in ELF. I taught a producer to add their own highlights last Friday; it took less than 30 seconds to create and add a highlight.

These are the benefits of a well designed bespoke system - simpler maintenance of content, faster updating, and less confusion for users.

Next time I'll cover going tree-less, and extrating functionality to Modules.

Sunday, May 29, 2011

Rebuilding Radio NZ - Part 8: Dealing with doubt

Any change is hard. Changing from technology you've known or used for a long time comes with a range of emotions which if not dealt with can derail a project as quickly as any technical problem.

During the course of the project to replace MySource Matrix with a bespoke solution (ELF) based on Rails I've experienced doubt, frustration and regret. Doubt that we'd ever finish the project, frustration at the lack of progress at times, and regrets about the past.

Doubt

Twice, so far, the complexity of ELF development has peaked, and with it doubt has set in. I really did wonder if we were going in the right direction, and if we'd be able to implement all of the functionality we needed.

In theory, the main advantage of the agile development process is building on small wins and the ability to correct mistake early. Building on the success of previous iterations helps reinforce that you are on track. In practice this is also true, but at times I found my self comparing the two systems, even months after we'd decided to make the change and had devoted weeks to coding the new system.

In the early stages of the project ELF couldn't do that much, whereas Matrix was still doing everything, so the comparison was not very favourable. Things that were quite simple to build in Matrix were requiring quite a lot of thought to implement in ELF.

Looking at why this was, Matrix encapsulates certain high-level patterns in the form of assets. These assets can be bolted together to create complex public-facing pages. In designing ELF we had to look for lower-level patterns, and this took time and effort. Finding the right patterns (and replacing out-dated patterns with better ones) is a constant process.

Once I realised this - that the design of ELF was going to evolve and improve, and that we had complete control of our destiny - the doubt dissipated. We can have the system any way we want it. If a pattern is wrong, we can change it. If some code is slow, we can refactor it. None of these options were available to us with Matrix. (I should note that this is about trade-offs - complete control of the software stack and all that entails versus an out-of-the-box plug-and-play but not quite right fit).

The other area where doubt set in was the migration of content. I started looking at the whole site - tens of thousands of pages, images and audio items - an enormous task. To get over this, I broke the content down into small pieces.

An example of this was image migration. An image linked from a page in Matrix has to be moved over to ELF and relinked in the HTML. I broke this down as follows, getting each step right before moving on:
  • Fetch an image from a URL and cache it
  • Add this image to ELF returning the new URL
  • Get the HTML for a page and cache it
  • Parse the HTML looking for images.
  • Fetch those images, get the new ELF URL and update the HTML
  • Save the HTML to ELF
Being able to complete smaller tasks in minutes and hours enabled a sense of perspective (no fairy cake required) and this helped build momentum and avoid the doldrums.

Frustration

Our site is big. Really big. You just won't believe how vastly hugely mind-bogglingly big it is.* We don't have the resources to do a big-bang migration to the new system - as you've noticed from the rest of the series we are doing it section by section (and later on) programme by programme.

At times progress has been slow as other projects related to on-air content have to take priority, and we are also working on a new design.

Having to work in two systems is a pain - ELF is much faster to update and simpler to use than Matrix because it has been optimised for our precise use-case. The key for me has been to look forward, not back. Each week more content is run out of ELF and with additional people using it I get many suggestions for improvements. We really are in control of our destiny!

Also, Rails continues to develop and add functionality that we can use to enhance the site and streamline the development process. The built-in testing framework ensures code is robust and that minor changes don't break things.  The processes to import content have never been as stable or as well tested.

Regrets

Hindsight is wonderful thing. In the first draft of this post I listed some of the things I should have done, issues I should have caught earlier and so on. But that was then, this is now. What is done, is done.

A couple of year ago I was involved in a startup that did not succeed. Fact: start-ups fail.

At Webstock that year one of the speakers said they loved hiring people who'd been involved in a startup, even if it had failed because those people had done something (rather than nothing) and had generally learnt a lot from the experience.

Applying that to this case what have I learnt?
  • An enormous amount about content caching
  • Linux and basic database administration
  • How to highly optimise our markup and CSS to reduce server load (and speed up page delivery).
  • Lots more!
They key point is to learn and move on.


Summary

Today as I write this I am looking forward to the day that ELF is running our whole site. In the last week I moved 8 more programmes over to ELF. These were One In Five, Spectrum, Te Ahi Kaa, Saturday, Sunday, Ideas, Insight and Mediawatch - between them 1000 pages of content, 250 images, 15 image galleries and about 6,000 pieces of audio. There were no issues with the migration and the new pages were made live in the middle of the day under normal traffic loads.

Right away we could see the huge improvement in page responsiveness, and the other members of the web team could use the much faster administration section of the site for those programme. That was a real morale booster.

Next time I'll get back to covering the migration of content.

* with apologies to Douglas Adams

Saturday, May 21, 2011

Rebuilding Radio NZ - Part 7: iPhone App Data (and an iPhone app)

Late 2009 we were thinking about releasing an iPhone app. The main impediment was the complexity of providing audio data from Matrix quickly and reliably.

The project was shelved until mid 2010 until ELF was well underway, at which point we chose to work with Wellington company Southgate Labs.

Providing audio data from ELF would have been a relatively simple proposition except for one thing - none of our audio data was being stored in ELF. This presented a major challenge: how do we publish audio to Matrix where all our programmes were still hosted, have it display in Matrix, but have that data also available to ELF as it published and updated?

The solution was to create a private XML feed of all audio in Matrix and import that into ELF at regular intervals. The audio item’s unique Matrix ID was stored in ELF and used as a key to allow updates and avoid duplication. The feed spanned the last 24 hours to cover any changes to items in Matrix.

Within ELF audio items were associated with the same programmes defined for schedules. The audio part of ELF and the whole audio publishing process is pretty interesting (I think), and I’ll cover that in more detail in a later post.

Once the data was inside ELF it was simple enough to roll a data feed to supply a listings of audio by programme for the app. The app can request data from a specified date, assuring that there are no gaps in programme data on each device. Koz supplied us with a json template and behaviour spec for the data feed and this was implemented by Nigel from AbleTech.

And that's it.

Building the app itself was another thing entirely, and was achieved with very little hassle from our end.

Amnon at Southgate Labs came up with initial screen shots, and from there I had several conversations with their team about what the user experience would be like. This led quickly to the first alpha version of the app. I was impressed with how quickly Southgate Labs captured the essence of the desired user experience.

Once complete we tested the app with a wider group of iPhone users, their bug reports and feedback being incorporated. One special area of attention was accessibility - this was an early criteria and we asked accessibility expert Jonathan Mosen to test the app. Only one minor tweak was required to a button label.

The app was designed to do one thing, and to do it very well. Reviews have been favourable, and downloads have been fairly constant since an initial peak at release. There was a second peak in demand right after the second Christchurch earthquake.

Southgate Labs are working on the next version of the app at the moment (May 2011). This includes the much-requested audio scrubbing feature that enables scrolling to any point in the audio (provided it has downloaded).

Next time I'll move on to the Highlights section of the site.

Saturday, May 14, 2011

Rebuilding Radio NZ - Part 6: Schedules

Schedules have always been an integral part of the Radio NZ site. This was always a popular section of site, and it got huge boost when The Listener trimmed its printed listings a few years ago.

Iteration 1

Schedules appeared on our first site in 1998.

The publishing process involved taking a Word document containing a week’s schedule (Saturday to Friday) and posting each day to the site. There was a 3 week cycle with last week, this week and next week. This was done in MS FrontPage.

Iteration 2

The daily schedules were abandoned and replaced with weekly ones because formatting each day’s schedule was too time-consuming. This is how the schedule looked in the site's last week of existence.



Iteration 3

Our second site was launched in 2003, and was based on a custom PHP CMS. I wrote a parser to take the Word HTML and format it for the web and this was built-in to the CMS for ease of use. The parser could identify the start of each day and added bookmark links at the top of the page automatically.

I also added code to pre-format highlights, classical chart and the weekly music features.

For the first time ever here is the code. Pretty ugly code, but it worked well.

Iteration 4

In Matrix we wanted to again have more granular schedule data, so the parser was rewritten to spit the weekly schedules into days. An import script was written by Squiz to import the XML from this step, creating a page for each day of the schedule and setting the created time of the new page to the day of the schedule.

Forcing the create time of the asset allowed us to show the schedule for the day on the home page of each station - a big leap in functionality. You can see a part of the Matrix asset tree at right.

The new parser (written in PHP) was also able to add linked text for each programme in the schedule. The code was a bit fragile and hard to maintain, so was rewritten as a Rails app. You can review the lastest version of the core parser modules on github.

As a separate web application the generated XML had to be manually uploaded to Matrix, then imported. A minor annoyance, but in all saving a huge amount of time reformatting, creating pages, pasting in content and re-setting the created date on each asset.

I’d estimate that doing all this work by hand would have taken about 6 - 8 hours each week.

For Concert listeners we also would generate a weekly PDF; we had a large number of people contact us after The Listener changes asking for an easy printable format. Quite a number of people go to their local library each week to print these out.

The difficulties with this approach were the inability to change the schedule markup en-mass, problems with programme links being statically coded and only being able to offer 1 font size in the PDF. If there were any late changes we’d have to regenerate the PDF, and some older listeners found the font size too small.

Iteration 5

In ELF we needed to build in new features that we’d want in the next design of the site. These were:

  • forward and back navigation by day and week
  • automatic generation of PDFs in different sizes
  • the ability to display what was on-air at any time.

The section really needed a complete under-the-hood rebuild.

The first task was to rewrite the parser, and integrate it into ELF. And for the first time, testing was used to ensure the code performed as expected. The in-ELF module would provide a single interface for parsing and importing schedules. When the schedule is parsed, the system provides a preview of the data so that we can check it has worked correctly.

Two major changes in this iteration are the splitting of the schedule into events, and dynamic linking of programmes.

The new parser uses contextual clues to work out the date and time of each event in the schedule. These events are imported into ELF as schedule events. Each schedule event is associated with its programme:

belongs_to :programme

The programme association is made based on cues in the text.

The code for the parser is available under an MIT license here, and the core HTML cleaning class here.
Schedule Events look like this in the Rails console:

>> ScheduleEvent.station(Station.national).current_event
+ ELF: Schedule Event
ID : 2474802
Title : Eight to Noon with Bryan Crump
Body : A holiday morning of information and entertainment
Start At : 2011-04-25 08:10:00
Programme : None (National)

Pro Tip: Every model in ELF has a custom inspect method to improve readability during debugging.

The public display routines use the date in the URL (or today for the schedule home page) to collect all the events for that day. These are ordered and sent to the view for formatting. Every schedule event for National and Concert are rendered with the same 8 lines of code. This makes it dead easy to change the markup if we need to (and we will because the site is being redesigned).

The administration section has been optimised for navigating by date, and for editing individual events. Because there is no caching, changes appear on the site immediately.



Public schedules page now have forward and back navigation by day or week, and PDFs are dynamically generated on-demand allowing 3 difference size options.

You can append '.xml' to any daily or weekly view to get a dump of that page as XML, and because our schedules are Creative Commons licensed the data can be used subject to a few minor conditions.


Export

Getting the schedules out of Matrix was a breeze via screen scraping. Version 1 of the scraper was given a base URL, and start and end dates in the format yyyymmdd.

The export script grabbed all the historical schedules pages and cached a local copy. The pages were machine generated by the previous parser and almost 100% consistent making it simple to reparse and extract the data.

You can have a look at the scraper code on github.

Our first problem

Schedules are not isolated like recipes - today’s schedule appears on the home page for National and Concert - and at the time both these pages were still running in Matrix. My first solution was to get some javascript to pull the HTML content for this part of the page over from ELF and insert it into the page once it had loaded.

This solution worked, but the 2 second delay before the content appeared looked bad. The second solution was to move these pages into ELF. The content in the right hand column is still generated in Matrix. A cronjob pulls this and other pieces of Matrix content into the ELF database every 2 minutes. Once in the database the content can be used anywhere in ELF. This approach provides a simple way to share content while those sections are still being built.

Recap

At this stage we had Recipes, News, Schedules for National and Concert, and the four main home pages: Site Home, National, Concert and News.

Visitors and Google (webmaster tools) were starting to notice the speed improvement, and we starting to see the benefits of faster administration of pages.

Next time I'll be covering the provision of data by ELF to the Radio NZ iPhone application.

Saturday, May 7, 2011

Rebuilding Radio NZ - Part 5: The Evolution of News

In previous parts of this series I have looked at the rationale behind replacing our off-the-shelf CMS with a bespoke solution.

This time it is news. The news section of the site has gone through two major iterations in Matrix before moving to ELF. This section has given us the most grief, both in terms of customisations required and performance problems.

Initial Design

Workflow is very important in the news business. Every second counts, and the tools used in a typical newsroom reflect this. Radio NZ uses Avid’s iNews - an enterprise-grade system designed to manage incoming wire content, news bulletin production, and the editing of stories in a high-stress, fast-turnaround operation.

I believed that it was critical for news staff to be able to use this system to also publish to the web. This would minimise training, avoid the need for additional mental tasks (HTML formatting) during peak news events, and give me a level of formatting control that is not possible when staff directly use a WYSIWYG web-based editor.

The publishing process needed to be simple to use and understand and leverage existing skills to the greatest extend possible. The tools needed to get out of the way of the process of writing and editing.

The other factor in adopting a remote-publishing model over direct editing was raw speed. The site was running on a single server it was significantly slower to add, edit and update content via the web then in a native desktop application. Sluggish (relative) performance would have disrupted the flow of writing and editing.

In iNews stories are compiled into a queue - a folder that allows stories to be placed in a specific order. A single hot-key initiates the publishing process.

Iteration 1

When we first started the site in 2005 only one group of news text stories was available at any one time. This was partly due to technological limitations, and partly for licensing reasons; like most other local news organisations some of our text copy is licensed from overseas news agencies.

Stories were edited in iNews and simple markup was used to provide basic HTML formatting. For example [h] at the start of a line is a heading, [[double square brackets]] is italics and [audio] denotes the lines contains a link to audio. These and other commands were developed in consultation with news staff to be simple to use and understand.

When published from iNews, the list of stories was sent via FTP to a folder on another server. From there a Perl script collected all the metadata and content from the individual story files, merged them into one XML file and imported them into Matrix using a second custom script written by Squiz.

There were no news categories, and content was removed after 7 days.

Each time a new group of stories was published, links on the site were updated to the new content. A problem arose when links were shared though - they would not be to the most recent update of a story. (Interesting note: 4 years after we replaced this system, we are still getting requests for these outdated URLs.)

Because each publish created a new batch of stories they appeared on the site quite quickly.

Iteration 2

It is safe to say that version 1 of the news section did not reflect the breadth of depth of our news coverage. In version 2 we added categories and allowed stories to be updated.

This was obviously more in line with public (and company) expectations of a news service, and removed a number of problems, the biggest being that the URL for a story remained the same for its life.

This required some major changes in Matrix and a significantly more complex import script. The second version of the import script (about 2000 lines of PHP code) was written by Mark Brydon at Squiz and allowed us to update (rather than replace) the bodycopy in an existing Matrix asset, and to manipulate the created, published AND the system-controlled updated timestamps. It also allowed content to be categorised into sets of folders, created on the fly as required.

Our Perl script for processing the exported stories also had to be updated to generate the enhanced XML required by Mark’s script.

There were problems though. By this stage we’d upgraded our Matrix infrastructure to use a master-slave configuration, and increased the local caching of pages to improve delivery speed. But these changes meant that updates to news content did not always appear in a timely fashion.

This was a very complex problem to solve and is documented here in all its gory detail. Hat tip to our sysadmin Colin MacDonald for working with us on this problem.

New categories required multiple steps - folders had to be created, new assets listing made to display the content, and the relevant Matrix IDs had to be added to our Perl script.

The script was written and maintained with NO testing framework whatsoever. Nightmare.

As mentioned in Part 1, our need to frequently update content and to have those updates appear immediately did not really work that well with Matrix. As older problems were mitigated new ones arose.

I realised that we were pushing Matrix outside its design parameters. It was no longer the optimal solution for the type of operation we had become. Our expectations of the system had changed. Our visitors expectations had changed. It was time for a system change.

ELF

The design of the new news section started with the URL schema. This is in the form:

/news/category_name/story_id/story_headline_as_url

The old form was:

/news/stories/yyyy/mm/dd/story_id

e.g. /news/stories/2009/01/02/12459765a219

In Matrix stories were created in a dated folder structure to aid administration. In practice, these segments in the URL did not do anything useful - they redirected to the news home page.

Headlines

Over the life of a story the headline will change. What should the canonical URL for a story be? Many sites get around the issue by ignoring the headline altogether and using a unique ID to retrieve the story.  his is a problem, in my view, because it allows anyone to craft their own URLs to the content, sometimes with hilarious results.

We get around this problem by always using the current headline to create the URL, and redirecting to this if an older version is used. The ID of the story never changes.

Our headline URL generator avoids some annoying problems when punctuation is used. For example this headline:

Taxes to increase by 1.2 %

could become:

taxes-to-increase-by-12

our generator does this:

taxes-to-increase-by-1-point-2-percent

Categories are added through a simple admin screen (at right). In Matrix we had a complex set of structures and functions to specify root node ids, relationships between categories and what folder structure to use. Category folders had to be manually created along with links, asset listings and the like.

In ELF we specify the URL, the code for news staff to use in their story template, and hit save. The position and visibility of the new category on the news home page, and in the sidebar links, can be controlled by a drag and drop interface.

It is now possible to move stories between categories, something that was not possible in Matrix.

ELF also allows images to be added to stories. Images are pre-loaded directly to ELF by staff, and the system gives them a picture code to chose from :

[image:1791:third:right]
[image:1791:half:right]
[image:1791:third:left]
[image:1791:half:left]
[image:1791:full]

The editor selects one of these and inserts it into the story copy in iNews. ELF associates the image with the story when it is requested. The editor can change the size and position of the image in the story simply by changing the code and republishing.

This may seem a roundabout way of doing images. It was the only way because iNews does not support this functionality, and adding the images to content in ELF would also not work because the stories were being updated remotely; any remote update would overwrite images added in ELF.

In practice it works well.

Performance

The performance of the new system is outstanding. Previously, we'd occasionally have to restart the web server process during high demand periods.

The ELF news section went on-line a few days before the first Christchurch quake. It served all traffic without issue on an non-optimised database - we had the slow query log enabled and were intending to tweak things the following Monday after a (ahem) quiet weekend! Page rendering times under load were less than 50 mS which is insignificant. This is with no page caching whatsoever.

All published news content is now available within 30 seconds of the news web editor pressing the publish hotkey in iNews.

Regrets

Even though publishing was simple for staff, the multi-step background process to make it happen probably limited what we were able to do on the site in the first few years, and was a maintenance mightmare. This is certainly the case with Iteration 2. To be frank, from a technical point of view, the process was on the edge of instability, and fragile to maintain.

But I still believe that the decision to remote-publish was the right one, and we are bearing the fruit of this in iteration 3. We now have a highly robust system for publishing news quickly, and a software infrastructure that is technically flexible and well covered by unit tests. So, no regrets really.

I am happy to answer questions in the comments.

In the next post I will talk about the schedules section of the site.

Saturday, April 30, 2011

Rebuilding Radio NZ - Part 4: Content Extraction & Recipes

The next group of posts will deal with the migration of content. In each I’ll show how we were managing the particular content type in Matrix, the design of the content type in ELF, how we migrated the content, and how we manage the content now.

Getting the content out

There were two options for getting the content out of Matrix.

The first was a custom script that we could use for exporting the whole site. The difficulty was defining everything I would need up-front for Squiz to code against - there were many types of content and different requirements, most not known. The other issue was cost - a script to extract just news was estimated to take about 30 hours to write.

The second option was to setup pages to display groups of single assets in XML format. An example of this was audio assets. These are self contained and contain all the required data for importing to ELF. Where this was not possible screen-scraping would be used. More on this in a future post.

The DIY approach has worked out simpler and cheaper, and I have been able to adjust the export and import to suit each kind of content, building on code from the previous phase.

Recipes

The recipes section was chosen to go first because the section was completely self-contained apart from some in-bound links from programme pages.

Matrix recipes

At the original launch in 2005 we had high expectations for our recipes section - we wanted to divide content into sections based on ingredients and style of cooking. This proved to be more difficult than expected. At right is what our the tree in Matrix looked like.

The recipes home page had seasonal ingredients at the bottom, the right had section featured recipes baed on the season or special events (Christmas, Thanksgiving, Easter, etc), and visitors could search or browser by recipe title.

Managing the content was simplified by putting recipes into lettered folders, however the complex asset structure made it hard to see at a glance what recipes were in which section. Another problem is that the URL structure has an extra (redundant) segment in it with the first letter of the recipe.

When tagging was added we tried that, but this required linking every recipe to a pre-named tag, and making new ones on the fly. This would have required a complete reworking of the section, and all-in-all was too unwieldy to use, even for a section that gets only 3 to 5 new recipes a week.

Recipes took about 5 -10 minutes to format, link into the correct folders and set a future status that matched the broadcast time.

Pasting recipe content into the WYSIWYG from email and Word documents was patchy. Often the markup would contain code that could not be removed with the built in code cleaner. We are very fussy about the quality of our markup, so we developed a separate pre-parser to deal with markup issues. The parser has an FCK Editor and a drop-down to select the type of content. This was able to remove extraneous markup and ensure that the XHTML returned was valid. It was also able to do basic formatting for some type of content.

Even a two-step cut and paste process was faster than hand editing code in the Matrix WYSIWYG (or any editor).

Design

Designing recipes was pretty simple. Recipes have a title, body and broadcast date. They have a chef, tags and are broadcast on a particular programme.

In Rails terms (edited for brevity):

has_many :chefs
belongs_to :programme
acts_as_taggable_on :tags

The programme model contained basic information about the programme such as name and webpath, just to get us started.

Having a chef association and tags allows us to provide navigation by tag and by chef. Since adding both features visitor engagement in that part of the site has increased.

Content Migration

Importing the content was tricky. Each recipe had chef and programme information in the HTML. The import script had to find this information and make the necessary associations.

A rake task was written to parse the content and create recipe assets in ELF. I have posted the code as a gist on github for reference purposes. Note that I was learning Ruby at the time and that it is fairly rough and ready.

As the import script was being written I had it output recipes where it could NOT extract this information. These were found to not be formatted in the standard way, and were edited so that they could be imported.

Tags were manually added to each recipe.

ELF recipes management

In ELF we wanted a data entry screen designed specifically for recipes. This would need to allow for tagging and specifying a chef and broadcast time. And here it is:



The edit screen is simple to use. The tag list offers auto-completion to avoid duplication, and add chef allows new chefs to be added without going to a new screen. A recipe can be added in under 5 minutes.

The WYSIWG is based on CK Editor. This has powerful built-in routines to clean markup pasted from email and MS Word.

The recipes footers which contain seasonal ingredients, and the sidebar with special features both have their own manager: This allows the content to be reused and updated each year.

Now that tagging has been simplified, the seasonal ingredient lists (bottom of page) links to relevant tags. The system allow free-tagging, so any new tag is available immediately. Page impressions in the recipes section is double what it was at this time last year, driven by people browsing content by chef and by tag.

An image uploader is built-in, so pictures can be uploaded and added right inside the WYSIWYG.

Handling old URLs

Legacy URLs are passed to the search page, where it will attempt to extract the title to use as the basis of a search. Try this broken URL for example. In most cases this will give the visitor the recipe they want.

The new recipes section was soft-launched last year, and has streamlined to entry of recipes and improved the user experience.

It also gave me some confidence that we were on the right path.

In the next post I'll cover the evolution of our news section from a basic service offering only 20-30 stories at a time, to the current version with categories and sophisticated remote management.

Saturday, April 23, 2011

Rebuilding Radio NZ - Part 3: Groundwork

In part 3 of this series I'll be covering setting up our new app, and looking at some of the design considerations. I have bumped recipes to next week.

If you are looking for advice on which CMS to get, or not get, this is the wrong place. This series looks at how we at Radio NZ are solving our particular business problems. You mileage can and will vary. You have been warned.

Note: I use the term asset to refer to an instance of a piece of content.

Foundations


When we started building ELF Rails 3 was in RC. The first decision was which version to use - the very stable 2.x branch, or the new-with-cool-features-we-could-use 3.0 branch.

We chose stable because many of the plugins we intended to use were not yet compatible from 3.0, and we did not want to be working around bugs while starting a new application. Rails 3 was also new to our contractor.

The first discussion with contractors Able Tech revolved around what the app was going to be able to do long term, and what core functionality would be required system wide. This would be built first, and everything else would be added on top. One of these features was user authentication for the administration section of the site.

The design of the app had to make maintenance and future development simple because this is likely something we will be using for at least 5 years. While code written in Rails is usually self documenting, I was keen to have file well commented so future any developers could understand why things were done a certain way.

One early decision that was later abandoned was the use of a general content subclass. Many of our planned content types shared some attributes in common - title, webpath, broadcast time, body-copy and so on.

The first few content types built in ELF used this subclass, however this was later abandoned because of the performance impact of extra joins (and the work required to optimise the DB), and ease of maintenance. Having to remember which attributes are delegated to where is bad enough when you are working regularly on a new app, but imagine in a year's time.

This approach brought back memories of Matrix's EAV database schema (also know as Open Schema). With EAV there is no direct relationship between you data models and tables in the database. EAV makes performance tuning for specific use-cases virtually impossible. It does make development easier though, because you do not have to make changes to the database schema as you add new content over time, and it can be more space efficient if the data is quite sparse.

This article is an excellent overview but in summary:
A major downside of EAV is its lower efficiency when retrieving data in bulk in comparison to conventional structure. In EAV model the entity data is more fragmented and so selecting an entire entity record requires multiple table joins.
We went with the standard out-of-the-box AREL layer for ELF. The approach taken now is for each model to have all the fields it needs, and for common functionality (as new models are added) to be extracted into Modules. For example, there is a module that handles all the database scopes for selecting assets based on the broadcast time.

Another big area was the migration of content. This task was going to be mine; I had the best understanding the content, and how to extract it from Matrix.

We made the decision to built the application in small pieces (read: agile), and move content over when each piece was ready.

The system would need a set of Rake tasks for cleaning up exported Matrix content and importing it. These tasks would need to extract contextual information automatically from the HTML, as often there was no metadata.

The test framework built into Rails would allow us to write tests to ensure the handling of imported data was consistent and reliable.

The gradual migration of content meant a certain amount of sharing between the Matrix and ELF - stylesheets, javascript and some images. It would require very careful planning of each phase to ensure the change-over between apps was seamless to site visitors.

As more content was moved we would need to use XML feeds to share data between systems when it was needed in both (more on this in later posts).

Nginx is running in front of Matrix and serves our static assets, so this would be used to divert requests to the new application, allowing us to pick and chose which app did what.

As an aside ELF is now servering the stylesheet for both applications. Matrix is still serving the javascript. The choice is driven by convenience, and which app is driving the most changes in that file.

Broadcast Timestamps

One critical design feature was the use of broadcast related time-stamps.

Matrix only gives you control of created at and published at times. We'd used both as the broadcast time in different places on the site for different but valid reasons.

Station highlights use created time, this being set after the item is created and edited. It means the time the items is published does not matter.

Audio items use published time, as these sometimes have a future status change so we needed to use a time that was updated by the system based on the item going live.

These differences created some management issues. If an audio item has to be temporarily removed from public view, and later restored, the listed broadcast time is wrong and has to be reset manually.

Likewise, if you forgot to set the created at time for a highlight, it would not list at all because only future highlights are shown. The site is so big that you can often forget which of the two attributes is the broadcast time.

In ELF we have two attributes to get around this problem.

The published_at attribute serves two purposes. It can be used to sort, and it controls visibility. When published_at is not set, the item is not visible. This gives us two states: 'Live' and 'Under Construction'.

The broadcast_at attribute contains the date and time the item was (or will be) broadcast. It is never changed by the app, although an it can be changed manually if required.

Keeping things DRY

Don't repeat yourself, they say . We wanted to maximise the advantages of Rails' MVC (Model View Controller) layout, and DRY coding practices to avoid repetition and improve maintainability of the HTML code.

Code Deployment

Deploying new versions of Matrix was hard. This has been improved recently with script-based upgrades.

This is something that is highly optimised in Rails already, where most people seem to use Capistrano. In recent weeks I have been deploying new code to the live server several times a day.

Sensible URLs

Most of the site already had a good URL schema. A few place like news was problematic, and these needed to be revamped.

Revision control

Code was going to be worked on by at least 3 people. We needed a system that allowed this and easy branching. Git. No contest, IMHO.

The migration began with the recipes section, which was chosen to go first because it was largely stand-alone. The next post will cover this in detail.

Saturday, April 9, 2011

Rebuilding Radio NZ - Part 2: The Birth of ELF

In this second part I'll be talking about the birth of ELF.

Warning

Even though its a drag, I'm repeating this disclaimer.

There is no such thing as instant pudding. You cannot copy what someone else does and get the same result.

This post is about a specific site with its own special functional requirements and traffic loads. radionz.co.nz is a public broadcaster's website that includes news, audio and content related to on-air programmes. Traffic loads are very peaky (and high).

This series of posts should NOT taken as advice for or against any particular system. It deals with our specific pain-points and how we are solving them.

You should do your own research and assessment before choosing any CMS or development framework. A good starting point is Strategic Content Management on A List Apart.

Some Management Theory

The manager is responsible for the system in which his staff work. By system, I mean all aspects of the job that contribute to whatever you are producing. The system includes workspaces, office layout, tools, technology, processes and procedures to name a few components.

It is the manager's responsibility to improve the system. In doing so he must understand the difference between problems which are part of the system (built in), and those that are outliers (from outside).

For example, for knowledge workers their computer is part of the system. No one can be productive if their computer keeps failing, is underpowered or does not have the software they need to do their job.

A one-off power cut that stops people working for a day is probably an outlier that needs special attention. (Or may need no attention at all).

The system itself (and everything in it) needs to be designed and maintained. There is nothing worse than a free-running system where components essentially design themselves, are become sub-optimised, failing to work together as a whole. It is very common for processes to become run-down over time and no longer be fit-for-purpose.

The aim is to have stable, predictable processes where you can be sure that content moving through the system meets quality expectations when it is finally published. Efficiency, and replicability are just two aspects of the equation.

The tools that are used to produce and manage web content play a critical role in the system, and one of my roles is to make sure the tools do not get in the way of creating our content.

It is from that base that we considered the suitability of our current CMS tools.

Cracks in the walls

The Radio NZ website was built from scratch - when we started we had no existing processes to support publishing large amounts of web content, and no web infrastructure. We designed new publishing processes and chose our tools (Matrix and a number of custom scripts) based on those processes (I'll be documenting these in later posts).

These processes have been improved iteratively over time. Some of these changes were facilitated by new features in Matrix, others from internal rearrangement. As well as process improvement, we continued to add new content and functionality to the site.

But from late 2009 we found it increasingly difficult to innovate. The modular approach to building sites in Matrix - the very paradigm that got us off the ground so fast - was slowing us down.

Matrix makes The Hard Things simple. Start a new site, set up a home page, a 404 page; all done is 5 minutes. Change content on an About Us page; 1 minute. Setup a form for people to submit queries; done in 10. Display the same content in three different places, auto-generate menu structures; more complex, but still relatively fast to implement.

But for us, some Simple Things were getting harder to do. We were having to create increasingly complex configurations to optimise the display of our content, and create new ways of viewing it. (Examples of this in subsequent posts).

This was largely because our content was stored as individual assets, rather than as modelled data. Each asset knows nothing about any other asset. For example, an audio asset does not know what programme it was broadcast on. A programme asset (a programme's home page) does not know who the hosts of the programme are. And so on.

Some of the asset structures required to support certain features require huge amounts of work to implement.

On top of this was system performance. We are a media site with fast-changing content and high performance demands, and I think the only Matrix customer using the system in this way.

Many pages (like our old home page) were built from many pieces, putting a high load in the system when they had to be rebuilt and cached. With frequent publishing we had to expire the cache as often as 10 times an hour.
In order to deal with our high traffic load it was suggested that a custom caching regime be considered. This would allow us to publish updates 5-10 times an hour, and for the content to be recached more efficiently.

We had already made changes to the operation of the cache (see this old post), and they'd been running for several years, so I had a very good understanding of how this part of the system worked. It was unlikely that these new changes would be of use to other Matrix users and would not become a part of the core product; if implemented, they would be our responsibility to maintain.

The cost of working-around these two problems (asset modeling and caching) - problems that may not exist with other systems - was deemed too high. Sadly, matrix was no longer a good fit for our content or our traffic profile. It was time to consider alternatives.

The decision to change was entirely pragmatic and based on changing business requirements. It was a difficult decision to make, especially after a long history with one product.

ELF is born

Looking at our content, and the sort of features we wanted, it was pretty obvious that a lot of custom code would have to be written.

Very few of our pages are the standard 'edit, upload a photo, update the title' type of content. With this in mind I thought it better to have complete control over all the software, rather than bolt 95% of what we wanted onto an existing product.

Rails looked like a good platform to model and deliver content like ours, and had an excellent local (Wellington) community. There are many development houses and government agencies working with Rails.

So Ruby on Rails it was.

An additional factor was the use of the framework on our company intranet. We had developed a number of powerful modules that could be leveraged for the public website. (In practice, I think we saved about 6 weeks time by recycling existing code).

The name ELF was chosen after a brain-storming session. ELF stands for Eight Legged Freak (i.e. a spider). It was chosen because a spider lives on the web, and because an Elf has 'magical powers' that benefit its users.

In my next post I'll talk about planning the migration of content and the first section we built and made public: Recipes.