Sunday, May 29, 2011

Rebuilding Radio NZ - Part 8: Dealing with doubt

Any change is hard. Changing from technology you've known or used for a long time comes with a range of emotions which if not dealt with can derail a project as quickly as any technical problem.

During the course of the project to replace MySource Matrix with a bespoke solution (ELF) based on Rails I've experienced doubt, frustration and regret. Doubt that we'd ever finish the project, frustration at the lack of progress at times, and regrets about the past.

Doubt

Twice, so far, the complexity of ELF development has peaked, and with it doubt has set in. I really did wonder if we were going in the right direction, and if we'd be able to implement all of the functionality we needed.

In theory, the main advantage of the agile development process is building on small wins and the ability to correct mistake early. Building on the success of previous iterations helps reinforce that you are on track. In practice this is also true, but at times I found my self comparing the two systems, even months after we'd decided to make the change and had devoted weeks to coding the new system.

In the early stages of the project ELF couldn't do that much, whereas Matrix was still doing everything, so the comparison was not very favourable. Things that were quite simple to build in Matrix were requiring quite a lot of thought to implement in ELF.

Looking at why this was, Matrix encapsulates certain high-level patterns in the form of assets. These assets can be bolted together to create complex public-facing pages. In designing ELF we had to look for lower-level patterns, and this took time and effort. Finding the right patterns (and replacing out-dated patterns with better ones) is a constant process.

Once I realised this - that the design of ELF was going to evolve and improve, and that we had complete control of our destiny - the doubt dissipated. We can have the system any way we want it. If a pattern is wrong, we can change it. If some code is slow, we can refactor it. None of these options were available to us with Matrix. (I should note that this is about trade-offs - complete control of the software stack and all that entails versus an out-of-the-box plug-and-play but not quite right fit).

The other area where doubt set in was the migration of content. I started looking at the whole site - tens of thousands of pages, images and audio items - an enormous task. To get over this, I broke the content down into small pieces.

An example of this was image migration. An image linked from a page in Matrix has to be moved over to ELF and relinked in the HTML. I broke this down as follows, getting each step right before moving on:
  • Fetch an image from a URL and cache it
  • Add this image to ELF returning the new URL
  • Get the HTML for a page and cache it
  • Parse the HTML looking for images.
  • Fetch those images, get the new ELF URL and update the HTML
  • Save the HTML to ELF
Being able to complete smaller tasks in minutes and hours enabled a sense of perspective (no fairy cake required) and this helped build momentum and avoid the doldrums.

Frustration

Our site is big. Really big. You just won't believe how vastly hugely mind-bogglingly big it is.* We don't have the resources to do a big-bang migration to the new system - as you've noticed from the rest of the series we are doing it section by section (and later on) programme by programme.

At times progress has been slow as other projects related to on-air content have to take priority, and we are also working on a new design.

Having to work in two systems is a pain - ELF is much faster to update and simpler to use than Matrix because it has been optimised for our precise use-case. The key for me has been to look forward, not back. Each week more content is run out of ELF and with additional people using it I get many suggestions for improvements. We really are in control of our destiny!

Also, Rails continues to develop and add functionality that we can use to enhance the site and streamline the development process. The built-in testing framework ensures code is robust and that minor changes don't break things.  The processes to import content have never been as stable or as well tested.

Regrets

Hindsight is wonderful thing. In the first draft of this post I listed some of the things I should have done, issues I should have caught earlier and so on. But that was then, this is now. What is done, is done.

A couple of year ago I was involved in a startup that did not succeed. Fact: start-ups fail.

At Webstock that year one of the speakers said they loved hiring people who'd been involved in a startup, even if it had failed because those people had done something (rather than nothing) and had generally learnt a lot from the experience.

Applying that to this case what have I learnt?
  • An enormous amount about content caching
  • Linux and basic database administration
  • How to highly optimise our markup and CSS to reduce server load (and speed up page delivery).
  • Lots more!
They key point is to learn and move on.


Summary

Today as I write this I am looking forward to the day that ELF is running our whole site. In the last week I moved 8 more programmes over to ELF. These were One In Five, Spectrum, Te Ahi Kaa, Saturday, Sunday, Ideas, Insight and Mediawatch - between them 1000 pages of content, 250 images, 15 image galleries and about 6,000 pieces of audio. There were no issues with the migration and the new pages were made live in the middle of the day under normal traffic loads.

Right away we could see the huge improvement in page responsiveness, and the other members of the web team could use the much faster administration section of the site for those programme. That was a real morale booster.

Next time I'll get back to covering the migration of content.

* with apologies to Douglas Adams

Saturday, May 21, 2011

Rebuilding Radio NZ - Part 7: iPhone App Data (and an iPhone app)

Late 2009 we were thinking about releasing an iPhone app. The main impediment was the complexity of providing audio data from Matrix quickly and reliably.

The project was shelved until mid 2010 until ELF was well underway, at which point we chose to work with Wellington company Southgate Labs.

Providing audio data from ELF would have been a relatively simple proposition except for one thing - none of our audio data was being stored in ELF. This presented a major challenge: how do we publish audio to Matrix where all our programmes were still hosted, have it display in Matrix, but have that data also available to ELF as it published and updated?

The solution was to create a private XML feed of all audio in Matrix and import that into ELF at regular intervals. The audio item’s unique Matrix ID was stored in ELF and used as a key to allow updates and avoid duplication. The feed spanned the last 24 hours to cover any changes to items in Matrix.

Within ELF audio items were associated with the same programmes defined for schedules. The audio part of ELF and the whole audio publishing process is pretty interesting (I think), and I’ll cover that in more detail in a later post.

Once the data was inside ELF it was simple enough to roll a data feed to supply a listings of audio by programme for the app. The app can request data from a specified date, assuring that there are no gaps in programme data on each device. Koz supplied us with a json template and behaviour spec for the data feed and this was implemented by Nigel from AbleTech.

And that's it.

Building the app itself was another thing entirely, and was achieved with very little hassle from our end.

Amnon at Southgate Labs came up with initial screen shots, and from there I had several conversations with their team about what the user experience would be like. This led quickly to the first alpha version of the app. I was impressed with how quickly Southgate Labs captured the essence of the desired user experience.

Once complete we tested the app with a wider group of iPhone users, their bug reports and feedback being incorporated. One special area of attention was accessibility - this was an early criteria and we asked accessibility expert Jonathan Mosen to test the app. Only one minor tweak was required to a button label.

The app was designed to do one thing, and to do it very well. Reviews have been favourable, and downloads have been fairly constant since an initial peak at release. There was a second peak in demand right after the second Christchurch earthquake.

Southgate Labs are working on the next version of the app at the moment (May 2011). This includes the much-requested audio scrubbing feature that enables scrolling to any point in the audio (provided it has downloaded).

Next time I'll move on to the Highlights section of the site.

Saturday, May 14, 2011

Rebuilding Radio NZ - Part 6: Schedules

Schedules have always been an integral part of the Radio NZ site. This was always a popular section of site, and it got huge boost when The Listener trimmed its printed listings a few years ago.

Iteration 1

Schedules appeared on our first site in 1998.

The publishing process involved taking a Word document containing a week’s schedule (Saturday to Friday) and posting each day to the site. There was a 3 week cycle with last week, this week and next week. This was done in MS FrontPage.

Iteration 2

The daily schedules were abandoned and replaced with weekly ones because formatting each day’s schedule was too time-consuming. This is how the schedule looked in the site's last week of existence.



Iteration 3

Our second site was launched in 2003, and was based on a custom PHP CMS. I wrote a parser to take the Word HTML and format it for the web and this was built-in to the CMS for ease of use. The parser could identify the start of each day and added bookmark links at the top of the page automatically.

I also added code to pre-format highlights, classical chart and the weekly music features.

For the first time ever here is the code. Pretty ugly code, but it worked well.

Iteration 4

In Matrix we wanted to again have more granular schedule data, so the parser was rewritten to spit the weekly schedules into days. An import script was written by Squiz to import the XML from this step, creating a page for each day of the schedule and setting the created time of the new page to the day of the schedule.

Forcing the create time of the asset allowed us to show the schedule for the day on the home page of each station - a big leap in functionality. You can see a part of the Matrix asset tree at right.

The new parser (written in PHP) was also able to add linked text for each programme in the schedule. The code was a bit fragile and hard to maintain, so was rewritten as a Rails app. You can review the lastest version of the core parser modules on github.

As a separate web application the generated XML had to be manually uploaded to Matrix, then imported. A minor annoyance, but in all saving a huge amount of time reformatting, creating pages, pasting in content and re-setting the created date on each asset.

I’d estimate that doing all this work by hand would have taken about 6 - 8 hours each week.

For Concert listeners we also would generate a weekly PDF; we had a large number of people contact us after The Listener changes asking for an easy printable format. Quite a number of people go to their local library each week to print these out.

The difficulties with this approach were the inability to change the schedule markup en-mass, problems with programme links being statically coded and only being able to offer 1 font size in the PDF. If there were any late changes we’d have to regenerate the PDF, and some older listeners found the font size too small.

Iteration 5

In ELF we needed to build in new features that we’d want in the next design of the site. These were:

  • forward and back navigation by day and week
  • automatic generation of PDFs in different sizes
  • the ability to display what was on-air at any time.

The section really needed a complete under-the-hood rebuild.

The first task was to rewrite the parser, and integrate it into ELF. And for the first time, testing was used to ensure the code performed as expected. The in-ELF module would provide a single interface for parsing and importing schedules. When the schedule is parsed, the system provides a preview of the data so that we can check it has worked correctly.

Two major changes in this iteration are the splitting of the schedule into events, and dynamic linking of programmes.

The new parser uses contextual clues to work out the date and time of each event in the schedule. These events are imported into ELF as schedule events. Each schedule event is associated with its programme:

belongs_to :programme

The programme association is made based on cues in the text.

The code for the parser is available under an MIT license here, and the core HTML cleaning class here.
Schedule Events look like this in the Rails console:

>> ScheduleEvent.station(Station.national).current_event
+ ELF: Schedule Event
ID : 2474802
Title : Eight to Noon with Bryan Crump
Body : A holiday morning of information and entertainment
Start At : 2011-04-25 08:10:00
Programme : None (National)

Pro Tip: Every model in ELF has a custom inspect method to improve readability during debugging.

The public display routines use the date in the URL (or today for the schedule home page) to collect all the events for that day. These are ordered and sent to the view for formatting. Every schedule event for National and Concert are rendered with the same 8 lines of code. This makes it dead easy to change the markup if we need to (and we will because the site is being redesigned).

The administration section has been optimised for navigating by date, and for editing individual events. Because there is no caching, changes appear on the site immediately.



Public schedules page now have forward and back navigation by day or week, and PDFs are dynamically generated on-demand allowing 3 difference size options.

You can append '.xml' to any daily or weekly view to get a dump of that page as XML, and because our schedules are Creative Commons licensed the data can be used subject to a few minor conditions.


Export

Getting the schedules out of Matrix was a breeze via screen scraping. Version 1 of the scraper was given a base URL, and start and end dates in the format yyyymmdd.

The export script grabbed all the historical schedules pages and cached a local copy. The pages were machine generated by the previous parser and almost 100% consistent making it simple to reparse and extract the data.

You can have a look at the scraper code on github.

Our first problem

Schedules are not isolated like recipes - today’s schedule appears on the home page for National and Concert - and at the time both these pages were still running in Matrix. My first solution was to get some javascript to pull the HTML content for this part of the page over from ELF and insert it into the page once it had loaded.

This solution worked, but the 2 second delay before the content appeared looked bad. The second solution was to move these pages into ELF. The content in the right hand column is still generated in Matrix. A cronjob pulls this and other pieces of Matrix content into the ELF database every 2 minutes. Once in the database the content can be used anywhere in ELF. This approach provides a simple way to share content while those sections are still being built.

Recap

At this stage we had Recipes, News, Schedules for National and Concert, and the four main home pages: Site Home, National, Concert and News.

Visitors and Google (webmaster tools) were starting to notice the speed improvement, and we starting to see the benefits of faster administration of pages.

Next time I'll be covering the provision of data by ELF to the Radio NZ iPhone application.

Saturday, May 7, 2011

Rebuilding Radio NZ - Part 5: The Evolution of News

In previous parts of this series I have looked at the rationale behind replacing our off-the-shelf CMS with a bespoke solution.

This time it is news. The news section of the site has gone through two major iterations in Matrix before moving to ELF. This section has given us the most grief, both in terms of customisations required and performance problems.

Initial Design

Workflow is very important in the news business. Every second counts, and the tools used in a typical newsroom reflect this. Radio NZ uses Avid’s iNews - an enterprise-grade system designed to manage incoming wire content, news bulletin production, and the editing of stories in a high-stress, fast-turnaround operation.

I believed that it was critical for news staff to be able to use this system to also publish to the web. This would minimise training, avoid the need for additional mental tasks (HTML formatting) during peak news events, and give me a level of formatting control that is not possible when staff directly use a WYSIWYG web-based editor.

The publishing process needed to be simple to use and understand and leverage existing skills to the greatest extend possible. The tools needed to get out of the way of the process of writing and editing.

The other factor in adopting a remote-publishing model over direct editing was raw speed. The site was running on a single server it was significantly slower to add, edit and update content via the web then in a native desktop application. Sluggish (relative) performance would have disrupted the flow of writing and editing.

In iNews stories are compiled into a queue - a folder that allows stories to be placed in a specific order. A single hot-key initiates the publishing process.

Iteration 1

When we first started the site in 2005 only one group of news text stories was available at any one time. This was partly due to technological limitations, and partly for licensing reasons; like most other local news organisations some of our text copy is licensed from overseas news agencies.

Stories were edited in iNews and simple markup was used to provide basic HTML formatting. For example [h] at the start of a line is a heading, [[double square brackets]] is italics and [audio] denotes the lines contains a link to audio. These and other commands were developed in consultation with news staff to be simple to use and understand.

When published from iNews, the list of stories was sent via FTP to a folder on another server. From there a Perl script collected all the metadata and content from the individual story files, merged them into one XML file and imported them into Matrix using a second custom script written by Squiz.

There were no news categories, and content was removed after 7 days.

Each time a new group of stories was published, links on the site were updated to the new content. A problem arose when links were shared though - they would not be to the most recent update of a story. (Interesting note: 4 years after we replaced this system, we are still getting requests for these outdated URLs.)

Because each publish created a new batch of stories they appeared on the site quite quickly.

Iteration 2

It is safe to say that version 1 of the news section did not reflect the breadth of depth of our news coverage. In version 2 we added categories and allowed stories to be updated.

This was obviously more in line with public (and company) expectations of a news service, and removed a number of problems, the biggest being that the URL for a story remained the same for its life.

This required some major changes in Matrix and a significantly more complex import script. The second version of the import script (about 2000 lines of PHP code) was written by Mark Brydon at Squiz and allowed us to update (rather than replace) the bodycopy in an existing Matrix asset, and to manipulate the created, published AND the system-controlled updated timestamps. It also allowed content to be categorised into sets of folders, created on the fly as required.

Our Perl script for processing the exported stories also had to be updated to generate the enhanced XML required by Mark’s script.

There were problems though. By this stage we’d upgraded our Matrix infrastructure to use a master-slave configuration, and increased the local caching of pages to improve delivery speed. But these changes meant that updates to news content did not always appear in a timely fashion.

This was a very complex problem to solve and is documented here in all its gory detail. Hat tip to our sysadmin Colin MacDonald for working with us on this problem.

New categories required multiple steps - folders had to be created, new assets listing made to display the content, and the relevant Matrix IDs had to be added to our Perl script.

The script was written and maintained with NO testing framework whatsoever. Nightmare.

As mentioned in Part 1, our need to frequently update content and to have those updates appear immediately did not really work that well with Matrix. As older problems were mitigated new ones arose.

I realised that we were pushing Matrix outside its design parameters. It was no longer the optimal solution for the type of operation we had become. Our expectations of the system had changed. Our visitors expectations had changed. It was time for a system change.

ELF

The design of the new news section started with the URL schema. This is in the form:

/news/category_name/story_id/story_headline_as_url

The old form was:

/news/stories/yyyy/mm/dd/story_id

e.g. /news/stories/2009/01/02/12459765a219

In Matrix stories were created in a dated folder structure to aid administration. In practice, these segments in the URL did not do anything useful - they redirected to the news home page.

Headlines

Over the life of a story the headline will change. What should the canonical URL for a story be? Many sites get around the issue by ignoring the headline altogether and using a unique ID to retrieve the story.  his is a problem, in my view, because it allows anyone to craft their own URLs to the content, sometimes with hilarious results.

We get around this problem by always using the current headline to create the URL, and redirecting to this if an older version is used. The ID of the story never changes.

Our headline URL generator avoids some annoying problems when punctuation is used. For example this headline:

Taxes to increase by 1.2 %

could become:

taxes-to-increase-by-12

our generator does this:

taxes-to-increase-by-1-point-2-percent

Categories are added through a simple admin screen (at right). In Matrix we had a complex set of structures and functions to specify root node ids, relationships between categories and what folder structure to use. Category folders had to be manually created along with links, asset listings and the like.

In ELF we specify the URL, the code for news staff to use in their story template, and hit save. The position and visibility of the new category on the news home page, and in the sidebar links, can be controlled by a drag and drop interface.

It is now possible to move stories between categories, something that was not possible in Matrix.

ELF also allows images to be added to stories. Images are pre-loaded directly to ELF by staff, and the system gives them a picture code to chose from :

[image:1791:third:right]
[image:1791:half:right]
[image:1791:third:left]
[image:1791:half:left]
[image:1791:full]

The editor selects one of these and inserts it into the story copy in iNews. ELF associates the image with the story when it is requested. The editor can change the size and position of the image in the story simply by changing the code and republishing.

This may seem a roundabout way of doing images. It was the only way because iNews does not support this functionality, and adding the images to content in ELF would also not work because the stories were being updated remotely; any remote update would overwrite images added in ELF.

In practice it works well.

Performance

The performance of the new system is outstanding. Previously, we'd occasionally have to restart the web server process during high demand periods.

The ELF news section went on-line a few days before the first Christchurch quake. It served all traffic without issue on an non-optimised database - we had the slow query log enabled and were intending to tweak things the following Monday after a (ahem) quiet weekend! Page rendering times under load were less than 50 mS which is insignificant. This is with no page caching whatsoever.

All published news content is now available within 30 seconds of the news web editor pressing the publish hotkey in iNews.

Regrets

Even though publishing was simple for staff, the multi-step background process to make it happen probably limited what we were able to do on the site in the first few years, and was a maintenance mightmare. This is certainly the case with Iteration 2. To be frank, from a technical point of view, the process was on the edge of instability, and fragile to maintain.

But I still believe that the decision to remote-publish was the right one, and we are bearing the fruit of this in iteration 3. We now have a highly robust system for publishing news quickly, and a software infrastructure that is technically flexible and well covered by unit tests. So, no regrets really.

I am happy to answer questions in the comments.

In the next post I will talk about the schedules section of the site.