December 24, 2019

A million dollar (literally) question.

The most praised attribute of the cloud is the elasticity. It's not just capacity elasticity but also configuration elasticity. Do you need a new network? No problem, one click or one API call and here comes your new network. Same for storage or VPN endpoints or almost anything. With your own datacenter, you have your own elasticity, it's elastic only to the degree you thought about in advance, sometimes years in advance. You need just 70 servers today, and that would fit in 2 racks, so you order 4 in the cage at the facility, using a part of the 3rd rack for your networking gear and considering the rest as the room to grow. Then your company suddenly grows faster than you anticipated (say, 2x) and oops. You need to lease *another* space in this facility (and you are lucky if they have necessary space), build a completely new infrastructure there and then seamlessly migrate your workloads. Which means you are paying twice for the extended period, plus mistakes, outages, lots of all-nighters and the whole nine yards.

CTO/BTO servers have lead times from 3 weeks to a couple of months, so if you have to have spare computing capacity (room to grow, elasticity) you have to keep this capacity in your own storage room. Storage units typically have even longer lead times. Network gear also requires purchasing of expensive equipment with extended lead times and then complex cabling.

With the cloud, this problem just doesn't exist. Well, not entirely, even mighty Amazon regularly having capacity limitations, but nothing comparable to your own datacenter. If you need more infrastructure, you just go and order it. Even if they have capacity shortages, they will resolve them much faster than you can do for yourselves.

So why people still build and maintain datacenters? Because sometimes they want to have the control over the physical infrastructure - servers, switches, cables, routers, interconnects and so on. It could be just paranoia, or it could be security or privacy requirement.

Sometimes they understand that configuration elasticity does not mean configuration flexibility. Amazon (or Azure for that matter) offers only a limited number of instance (VM) types. And if you need 22GB RAM instance with 4 VCPUs you have to use either 16GB with 4 VCPU or 32GB with 8 VCPU. In the former case your application will suffer, in the latter, you basically pay the double price. When I rack my own baremetal servers and then install my own VMware ESXi on them, I can slice and dice resources as I want.

Also, there is a notion of cost, of course. You pay a lot of money on top of the cost of your own infrastructure to get that cloud elasticity. And these expenses are only fully worth it if your workloads are elastic both ways - they expand as well as collapse. If you only grow and never collapse then you should think about building your own infrastructure for the constant part of your workload and implement cloud for the expansion, building more private infrastructure slowly. This is the hybrid cloud, and when designed properly it allows you to have cloud elasticity at _almost_ private infrastructure cost.

Sometimes you just have so much money that you don't care, and I've seen companies paying north of $500K per month for the cloud infrastructure that would have cost them $80K fully loaded (including tech salary and lease payments for the equipment).

The cloud wins hands down in 100% elastic environments like dev/test. Where you want to build the infrastructure fast, use it for some period and then just destroy. And you can't predict your development cycles much in advance, so you can't efficiently build private infrastructure.