Bug in Skype for Windows client downed P2P network

Skype CIO Lars Rabbe says a cluster of critical support servers became overloaded

Tags: Skype
  • E-Mail
Bug in Skype for Windows client downed P2P network According to Skype's CIO, Lars Rabbe, the 24 hour outage of the company's services was due to a bug in Skype for Windows client [version 5.0.0152].
By  Georgina Enzer Published  December 30, 2010

Skype's chief information officer, Lars Rabbe has revealed that the recent Skype outage was caused by the peer-to-peer (P2P) network becoming unstable and suffering a critical failure.

The initial problem began when a series of servers became overloaded.

"On Wednesday, December 22, a cluster of support servers responsible for offline instant messaging became overloaded. As a result of this overload, some Skype clients received delayed responses from the overloaded servers. In a version of the Skype for Windows client [version 5.0.0152], the delayed responses from the overloaded servers were not properly processed, causing Windows clients running the affected version to crash," said Rabbe in his blog post.

When this occurred, users that were running either the latest Skype for Windows (version 5.0.0.156), older versions of Skype for Windows (4.0 versions), Skype for Mac, Skype for iPhone, Skype on your TV, and Skype Connect or Skype Manager for enterprises were not affected.
Unfortunately almost half of all Skype users globally were running the 5.0.0.152 version of Skype for Windows. The crashes caused around 40% of those clients to fail, according to Rabbe. These clients also included 25 to 30% of the publicly available supernodes, which also then failed.

"Once a supernode has failed, even when restarted, it takes some time to become available as a resource to the P2P network again. As a result, the P2P network was left with 25 to 30% fewer supernodes than normal. This caused a disproportionate load on the remaining available supernodes," said Rabbe.

This massive load, 100 times more than normal traffic as users' restarted crashed Windows clients, caused more supernodes to shut down. This cycle was repeated until there was almost complete Skype failure a few hours after the initial crisis.

To fix the crash, the Skype engineering and operations team then introduced hundreds of instances of the Skype software into the P2P network, which acted as dedicated supernodes to provide enough temporary supernode capacity to accelerate the recovery of the peer-to-peer cloud.

The team was able to stabilise the network by Christmas Eve and complete repairs on Christmas.

"We are truly grateful to all of our users and humbled by your continued support. We know how much you rely on Skype, and we know that we fell short in both fulfilling your expectations and communicating with you during this incident. Lessons will be learned and we will use this as an opportunity to identify and introduce areas of improvement to our software, further assess and invest in capacity and stability, and develop better processes for outage recovery and communications to our user base. Thank you to everyone," said Rabbe.

Add a Comment

Your display name This field is mandatory

Your e-mail address This field is mandatory (Your e-mail address won't be published)

Security code