It’s funny when you think about basic networking concepts and wonder if they will ever actually prove to be useful. Kind of like that “Do I really need to learn complex geometry? When am I ever going to use this?”. What I’m here to talk about today is IP Addressing design. In many cases this will be something that is already in place and fairly solid, so there won’t be much to think about. This was the case at every company I worked at until the most recent one, which is a local cloud service provider. The type of architecture required for this environment is a bit different from what I’ve previously worked with.
So here is my first architecture tip:
No matter how small your organization is today, think about how your proposed design might look 5-10 years down the road.
The problem that I ran into here was that this cloud provider was still using an IP addressing design which was originally designed for a different set of needs. The design was intended to support the business back when we had two data centers and no one thought we would expand. Well, today we have over a dozen locations and there are constant discussions about adding more.
Let’s start with the original design, why it was a good idea, and why it doesn’t scale well today. Every data center location was assigned a few standard blocks of IP Addresses, where each block corresponded to a logical network location. The 10.0.0.0/8 space was used for this, and broken into the following blocks:
- 10.1.0.0/16 – Reserved
- 10.11.0.0/16 – DMZ
- 10.111.0.0/16 – Out of band Management
- 10.211.0.0/16 – Internal network
This was the bare minimum that each location received, in some cases another /16 or two might be allocated. So first, let’s cover the reasons why this was a good design for the time. All subnets were terminated at classful boundaries, which means there was never confusion on a subnet mask. The association of the second octet to network region made the subnets easy to remember – it was quick for anyone to say “10.2xx? Oh yeah that’s an internal segment”. Also, with a minimum of four /16 blocks, we would practically never run out of IP space in each location (>260,000 usable addresses). All that being said, the addressing scheme was perfect for what it was designed for: Easy to be read and remembered by humans.
While that may have been great for two data center locations, it doesn’t really scale well about eight years later. So let’s take a look at why this design doesn’t work in the long run. After we reached the number of locations we have today, we are left with only ~40 /16 blocks unused in the 10.x.x.x block. That means we have room for ten or less new locations, before we completely exhaust that IP space. Next, after some quick research it turns out that even our largest location was only consuming about 4,000 addresses – not even 2% of the total addresses allocated. Routing tables in each data center were a nightmare, because each location had to have several discontiguous /16 blocks routed back to it. And to top it all off – it turns out that our site-to-site VPN tunnel architecture between locations was configured to use subnet-pair tunnels. This meant that for each pair of data centers (4 /16s per site), there would be 16 VPN tunnels. While 16 isn’t a lot, that really grows exponentially when we add more locations which are all configured for full-mesh VPN connectivity.
I’m trying to keep these posts somewhat manageable – so look for a continuation of this post next week, where I’ll discuss the solution to this problem and how we implemented it.