3G: Google Gmail Glitch

 Posted by on Mar 1, 2011  Add comments
Mar 012011

I was just asked to comment, for an IT publication, on the Feb. 27 problem at Google that caused an estimated 35,000 people to lose (for a few days, apparently) the entire contents of their Gmail accounts.

Here are my thoughts on this — bearing in mind that on the one hand, I have been thinking a lot (and writing some) about cloud computing risks vs. benefits over the last months, and that on the other hand I don’t know the details of what happened, since Google itself isn’t yet saying exactly how this occurred.

  • It’s not because you run your e-mail in-house that you have no glitches, so people should not indict the entire model because of relatively limited incidents. In fact, if a company’s on-premise e-mail server had crashed on Sunday, it would only have limited resources to apply to fixing the problem. In contrast, Google has tons of people whose priorities they can suddenly re-assign, if needed, to address this issue.
  • Clients should negotiate their SLA to give cloud providers more reasons to provide extra safety and security of data; some of the clauses in standard cloud computing agreements (e.g., Amazon’s EC2, as amply documented by my colleague Lou Mazzucchelli during his keynote at the last Cutter Summit) are way too “irresponsible,” as in “we can’t make any assurances of any sort.” People should negotiate those clauses, and at minimum impose penalties when data becomes unavailable temporarily or permanently. Money talks. On the other hand, there is a paradox here: large companies have the power and the skills to negotiate a better agreement, but they are unlikely to outsource their e-mail. Small and medium enterprises (SMEs) are likely to use Gmail or their ISP’s e-mail solution for their employees, but they are too small to have the IT sourcing skills, or the leverage, to get tighter SLAs.
  • Users of cloud-based e-mail solutions would be well advised to use an e-mail client (like Outlook, Eudora, and others) that “pops” the e-mail into a local store. That way, not only is the cloud provider’s repository the backup if the user’s hard drive gets fried, but conversely the local copy serves as a safety measure if the cloud provider loses your e-mail. Personally, I use Gmail, but all my messages are also on my PC, and it is backed up at home. So when I travel with my laptop, I have two backups for my e-mail: my Gmail folders and my backup disk.
  • It’s never good for a provider to minimize or hide an event. Things will end up being known, and they will look worse if they tried to stay silent, or if they are seen as “spinning the numbers” to minimize the impact. They should say “we had a problem, it affected x clients (a real number, not just a percentage), we expect to have their data restored by such time, and we’re investigating the cause and what measures we can take to lower the risk of this ever happening again.” The next message should be when the problem is resolved, or when the original deadline is reached without a complete resolution. The third or last public message should be “here is the post-mortem on the problem, and what measures we have taken.”

Please feel free to comment — these are immediate reactions, and I don’t claim to cover all bases with these thoughts and recommendations.


 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>