Saturday, April 23, 2011

Rebuilding Radio NZ - Part 3: Groundwork

In part 3 of this series I'll be covering setting up our new app, and looking at some of the design considerations. I have bumped recipes to next week.

If you are looking for advice on which CMS to get, or not get, this is the wrong place. This series looks at how we at Radio NZ are solving our particular business problems. You mileage can and will vary. You have been warned.

Note: I use the term asset to refer to an instance of a piece of content.

Foundations


When we started building ELF Rails 3 was in RC. The first decision was which version to use - the very stable 2.x branch, or the new-with-cool-features-we-could-use 3.0 branch.

We chose stable because many of the plugins we intended to use were not yet compatible from 3.0, and we did not want to be working around bugs while starting a new application. Rails 3 was also new to our contractor.

The first discussion with contractors Able Tech revolved around what the app was going to be able to do long term, and what core functionality would be required system wide. This would be built first, and everything else would be added on top. One of these features was user authentication for the administration section of the site.

The design of the app had to make maintenance and future development simple because this is likely something we will be using for at least 5 years. While code written in Rails is usually self documenting, I was keen to have file well commented so future any developers could understand why things were done a certain way.

One early decision that was later abandoned was the use of a general content subclass. Many of our planned content types shared some attributes in common - title, webpath, broadcast time, body-copy and so on.

The first few content types built in ELF used this subclass, however this was later abandoned because of the performance impact of extra joins (and the work required to optimise the DB), and ease of maintenance. Having to remember which attributes are delegated to where is bad enough when you are working regularly on a new app, but imagine in a year's time.

This approach brought back memories of Matrix's EAV database schema (also know as Open Schema). With EAV there is no direct relationship between you data models and tables in the database. EAV makes performance tuning for specific use-cases virtually impossible. It does make development easier though, because you do not have to make changes to the database schema as you add new content over time, and it can be more space efficient if the data is quite sparse.

This article is an excellent overview but in summary:
A major downside of EAV is its lower efficiency when retrieving data in bulk in comparison to conventional structure. In EAV model the entity data is more fragmented and so selecting an entire entity record requires multiple table joins.
We went with the standard out-of-the-box AREL layer for ELF. The approach taken now is for each model to have all the fields it needs, and for common functionality (as new models are added) to be extracted into Modules. For example, there is a module that handles all the database scopes for selecting assets based on the broadcast time.

Another big area was the migration of content. This task was going to be mine; I had the best understanding the content, and how to extract it from Matrix.

We made the decision to built the application in small pieces (read: agile), and move content over when each piece was ready.

The system would need a set of Rake tasks for cleaning up exported Matrix content and importing it. These tasks would need to extract contextual information automatically from the HTML, as often there was no metadata.

The test framework built into Rails would allow us to write tests to ensure the handling of imported data was consistent and reliable.

The gradual migration of content meant a certain amount of sharing between the Matrix and ELF - stylesheets, javascript and some images. It would require very careful planning of each phase to ensure the change-over between apps was seamless to site visitors.

As more content was moved we would need to use XML feeds to share data between systems when it was needed in both (more on this in later posts).

Nginx is running in front of Matrix and serves our static assets, so this would be used to divert requests to the new application, allowing us to pick and chose which app did what.

As an aside ELF is now servering the stylesheet for both applications. Matrix is still serving the javascript. The choice is driven by convenience, and which app is driving the most changes in that file.

Broadcast Timestamps

One critical design feature was the use of broadcast related time-stamps.

Matrix only gives you control of created at and published at times. We'd used both as the broadcast time in different places on the site for different but valid reasons.

Station highlights use created time, this being set after the item is created and edited. It means the time the items is published does not matter.

Audio items use published time, as these sometimes have a future status change so we needed to use a time that was updated by the system based on the item going live.

These differences created some management issues. If an audio item has to be temporarily removed from public view, and later restored, the listed broadcast time is wrong and has to be reset manually.

Likewise, if you forgot to set the created at time for a highlight, it would not list at all because only future highlights are shown. The site is so big that you can often forget which of the two attributes is the broadcast time.

In ELF we have two attributes to get around this problem.

The published_at attribute serves two purposes. It can be used to sort, and it controls visibility. When published_at is not set, the item is not visible. This gives us two states: 'Live' and 'Under Construction'.

The broadcast_at attribute contains the date and time the item was (or will be) broadcast. It is never changed by the app, although an it can be changed manually if required.

Keeping things DRY

Don't repeat yourself, they say . We wanted to maximise the advantages of Rails' MVC (Model View Controller) layout, and DRY coding practices to avoid repetition and improve maintainability of the HTML code.

Code Deployment

Deploying new versions of Matrix was hard. This has been improved recently with script-based upgrades.

This is something that is highly optimised in Rails already, where most people seem to use Capistrano. In recent weeks I have been deploying new code to the live server several times a day.

Sensible URLs

Most of the site already had a good URL schema. A few place like news was problematic, and these needed to be revamped.

Revision control

Code was going to be worked on by at least 3 people. We needed a system that allowed this and easy branching. Git. No contest, IMHO.

The migration began with the recipes section, which was chosen to go first because it was largely stand-alone. The next post will cover this in detail.

No comments: