Trevin Chow

Microsoft Group Program Manager and Seattle Photographer

The perils of no pre-production testing

View Comments

FASTER! FASTER! FASTER!

This is war cry we all know too well these days with the advent of “Web 2.0″. How quickly a service can be brought to market can either make or break a startup. The benefits of first-mover advantage can’t be discounted especially when buzz and eyeballs are so important these days to build a brand and grow a userbase.

However, with the desire to move to shorter development cycles and get to the “production” environment more quickly, companies are finding ways to take shortcuts as compared to traditional methodologies. Most services at Microsoft have at least 2 staging environments prior to deploying to the “live site”.

image

As our development team reaches code complete, we deploy our bits to the test cluster where our QA team goes through a test pass. After meeting our test pass exit criteria, we roll the bits into an intermediary staging environment, “Pre-production”. This staging environment matches our production hardware as close as possible and we use it to do final acceptance testing. This model, while seemingly heavy-weight, helps ensure the highest quality release when we finally deploy to our live site (aka “Production”). In particular, it’s important when your product integrates with several services that you don’t own — in our case, an example would be Windows Marketplace’s integration with Windows Live ID.

I myself have often been frustrated at times when I want to go to production faster with a feature that is seemingly “small”, but I have to remind myself that the rigor in staging our releases is worth it if we’re ensuring a higher quality product at the expense of slower time to market.

However, I do think there many teams (my current one included) can work on a model that is better at identifying the lower risk features and perhaps roll that directly to live site in a throttled manner. For example, roll out a feature so 5% of our users see if, and if it breaks, roll it back. Otherwise, increase the rollout slowly over a period of time until 100% of the users are using the new feature. The entire time make sure we monitor the snot out of things.

Facebook has been given accolades for how quick they get to market with their new features. When I first started using it last year, I was impressed with how much functionality they rolled out in a given week. From Facebook’s job site:

Our development cycle is extremely fast, and we’ve built tools to keep it that way. It’s common to write code and have it running on the live site a few days later. This comes as a pleasant surprise to engineers who have worked at other companies where code takes months or years to see the light of day. If you work for us, you will be able to make an immediate impact.

In speaking with several people that know engineers at Facebook, they apparently don’t have the same QA process as Microsoft and instead often go straight production and use throttling to control a feature’s exposure. Sounds great!

However, based on my experiences over the past 6 weeks with Facebook, the pitfalls of going “Faster! Faster! Faster!” are showing in a string of very visible problems. Some juicy examples:

No profile anyone?

I’m signed in, but am invisible and profile-less.

image

No news feed items?

My biggest beef is there are too many news items that I can’t even begin to sift through them. But this is ridiculous :)

image

Site maintenance in the middle of the day anyone?

Seems like an odd time to have a planned outage, given Facebook is in the same timezone as I am (PST) and it’s smack dab in the middle of the day (11:35am).

image

30 mins later, same maintenance message, but surprise! Looks like they’ve authenticated me and can tell me how many messages I have in my inbox. Something is astray.

image

Awkward error messages

I got this juicy one earlier today when I tried to confirm a new friend request. Some debugging message that it getting piped through to the FE users?

image

I’m definitely not saying that Facebook’s process is bad and Microsoft’s is better. Just that there has to be a happy medium in doing the appropriately amount of testing and monitoring to ensure we’re striking the right balance between quality and time-to-market. In our quest to ship more features faster, we shouldn’t lose sight that we can’t screw our end users, otherwise there is no reason to ship our products at all.

Related Posts

  1. Reminder why I’m not a developer
  2. Strange characters after WordPress upgrade
  3. Windows Search 4.0 preview
  4. Multi-user support on sign-in
  5. How to add Facebook’s Open Graph social plugins to your site

Written by Trevin

September 12th, 2007 at 6:27 am

Posted in Technology

Tagged with ,

View Comments to 'The perils of no pre-production testing'

Subscribe to comments with RSS or TrackBack to 'The perils of no pre-production testing'.

  1. You may just be one of the unlucky 5% :)

    I’ve heard that Flickr and Newsvine update their production code on the order of daily to multiple times per day, although I don’t recall where I read that.

    RE: the overwhelmingness of the Facebook news feed, they don’t show you all events in the feed (that really would be overwhelming) rather they filter out the stuff they think would be less interesting to you. They’re not public with the algorithms they use to do that (probably because they change) but I thought it was interesting and non-obvious. They really view the news feed more like a newspaper – it’s not everything that happened, it’s just the bits that they think would be interesting to you as a reader.

    Since applications have access to the newsfeed (albeit with limits on how many things they can add to it per day) there is some speculation that “Facebook Newsfeed Optimization” is going to become as important as SEO for startups with Facebook apps.

    Tom

    12 Sep 07 at 9:14 am

  2. I should’ve mentioned that all the errors didn’t happen on the same day, but over the course of a few weeks. The stars, moons and galaxies must have really aligned for me to fall into that 5% each time. Sheesh.

    Trevin

    12 Sep 07 at 9:17 am

  3. Having never seen a Facebook error (aside from in non-Facebook-written apps) I suspect that you use the site quite a bit more than I do. Presumably the heavier the user, the better chance they have at getting bit by the 5%.

    I’ve definitely seen my share of Gmail/Google Talk errors though – probably because I’m in that pretty much all day.

    Tom

    12 Sep 07 at 9:27 am

Leave a Reply

blog comments powered by Disqus
Get Adobe Flash playerPlugin by wpburn.com wordpress themes