Bitbucket suffers extensive downtime over the weekend

Oct 6, 2009 13:00 GMT  ·  By

The Bitbucket development project hosting service went down for long periods during the weekend and on Monday morning because of several powerful Distributed Denial of Service (DDoS) attacks directed at its EC2-powered infrastructure. Part of the reason why the downtimes were so serious was Amazon's tech support, who failed to rapidly acknowledge that a network problem was ongoing.

Bitbucket was designed for developers looking to host and maintain their projects. It uses the Mercurial collaborative version control system and runs on Amazon's Elastic Compute Cloud (EC2) infrastructure. Amazon's EBS persistent storage solution is also used to store databases, logfiles, and repositories.

According to Jesper Nohr, the owner of the company behind Bitbucket, the site began experiencing problems during Friday evening, when everything slowed down almost to a halt. Requests from the website on EC2 to the data stored on EBS were going through at a very slow rate. "We were getting less throughput than you can pull off of a 1.44MB floppy," Nohr writes in a blog post describing the incident.

After upgrading to the "Gold" support plan to get the issue resolved faster, an Amazon representative told the Bitbucket team that it was a normal EBS performance variation. After several hours of going back and forth with suggestions on what the team should do on their part, another Amazon support person finally acknowledged that there was a problem with the network.

Eventually, after eight hours, a team of engineers were assigned to the case. It then took another eight or nine hours for the problem to be identified and resolved. The issue was determined to be a DDoS attack directed at Bitbucket's IP address on EC2. "We had a massive flood of UDP packets coming in to our IP, basically eating away all bandwidth to the box," Nohr explains.

On Sunday morning, the attackers changed tactics in order to bypass Amazon's filtering and launched a TCP SYN flood that took an additional two hours to resolve. Then again, on Monday, a new attack, this time against an Amazon router, made the Bitbucket service inaccessible for a number of users.

Nohr notes that Bitbucket is considering switching providers after this, as other companies made them tempting hosting offers during the downtime. "Amazon doesn’t entirely deserve the criticism it has received over this outage," he writes, but adds that "I do think they could’ve taken precautions to at least be warned if one of their routers started pumping through millions of bogus UDP packets to one IP, and I also think that 16+ hours is too long to discover the root of the problem."