Salesforce CRM experiences sudden downtime

Salesforce.com (CRM) was down for around 30-40 minutes yesterday between 12:40 to 1:20 US Pacific time. Customers complained they were unable to access their accounts or were unable to reach the website in some cases. Salesforce's status page had a brief explanation of the outage.

Service Disruption

Time: 1/6/09 12:40 pm PST

Detail: Service Disruption All Instances

Root cause: Starting at 01/06/2009 20:39 UTC, a core network device failed due to memory allocation errors. The failure caused it to stop passing data but did not properly trigger a graceful fail over to the redundant system as the memory allocation errors where present on the failover system as well. This resulted in a full service failure for all instances. Salesforce.com had to initiate manual recovery steps to bring the service back up. The manual recovery steps was completed at 01/06/2009 21:17 UTC restoring most services except for AP0 and NA31:17 UTC restoring most services except for AP0 and NA3 search indexing. Search of existing data would work but new data would not be indexed for searching. Emergency maintenance was performed at 01/06/2009 23:24 UTC to restore search indexing for AP0 and NA3 and the implementation of a work-around for the memory allocation error. While we are confident the root cause has been addressed by the work-around the Salesforce.com technology team will continue to work with hardware vendors to fully detail the root cause and identify if further patching or fixes will be needed. Further updates will be available as the work progresses.

The event has attracted lots of coverage on the net and also triggered discussions on the downside of using remote services. Just goes to re-inforce the fact that 100% uptime is practically impossible, even for the top-level SaaS players!

Comments (0)