Vital Statistics
Posted by Dan Creswell in Architecture, Systems, tags: Architecture, infrastructure, SystemsHow big does a website have to get before custom infrastructure becomes necessary? When a website reaches this stage, what infrastructure gets built? Before trying to answer these questions we must have some means of measuring the size of a website. I’ve settled on the number of machines as a reasonable approximation because:
- As a codebase grows it must be split up along functional boundaries, and spread across multiple processes. More code equals more processes and more machines to run them on.
- More customers, means more load and requires more machines to handle it.
- More data means more storage and more processors to chew through it.
Now let’s see how many machines some of the big players are running and what infrastructure they’re talking about:
TicketMaster have at least 3000 machines and have built Spine to help them manage configuration of their infrastructure.
eBay have built a custom deployment tool (Roller), logging infrastructure, configuration management for their software services, messaging software and more. They’re running around 15000 machines across four geographical locations.
Microsoft have built a custom deployment, configuration and monitoring infrastructure called Autopilot focused on many thousands of machines. In fact we’re talking hundreds of thousands.
Google are dealing in a million or more machines and expending effort on software to handle staged, automatic upgrades. Of course they’ve already built GFS, Chubby etc.
Twitter have moved beyond the half-dozen or so machines they used to have to “a lot of servers” (hundreds?) and are seemingly still hiring operations staff but have built a custom queue server.
Facebook have at least 10000 webservers, 800 MemcacheD instances and 1800 MySQL instances. They’ve built a custom configuration-serving infrastructure, management and monitoring tools. They also contribute to MemcacheD and have built Cassandra and Thrift. They also appear to be busy building their own optimized webservers and a replacement for squid.
Amazon have tens of thousands of servers (surely more?) and have constructed Dynamo, S3, EC2, SQS etc.
A few tentative conclusions:
- It would seem that by the time a website has moved into the thousands of boxes it will have had to address configuration and monitoring. Which suggests development efforts started before this threshold (perhaps at a couple of hundred boxes?)
- As the machine count moves towards the tens of thousands, automated deployment becomes essential and there’s a need to develop more service-specific infrastructure.

Entries (RSS)