The AWS console (and Twitter, and LinkedIn) has just lit up with the EC2 Dashboard Console page showing a service status with a new Availability Zone (AZ): ap-southeast-2c.
Before I go any further, I should be clear on my position here – I do not work for AWS. I used to in the past (~a year and a half ago). These opinions disclosed here are mine and not based upon any inside knowledge I may have – that data is kept locked up.
What is an AZ
AWS Regions (currently 11 in the main public global set, with around 5 publicly disclosed as coming soon) are composed of many data centres. For the EC2 services (and those services that exist within the Virtual Private Cloud or VPC world) these exist within customer defined networks that live in Availability Zones. You can think of an Availability Zone as being a logical collection of data centres facilities (one or more) that appear as one virtual data centre.
Each Region generally has at least two Availability Zones in order for customers to split workloads geographically; but that separation is generally within the same city. You can guess the separation by deploying services in each AZ, and then doing a ping from one to the other. They should be less than 10 milliseconds separated.
This separation should be sufficient to have separate power grids, flood plains and other risk factors mitigated, but close enough to make synchronous replication suitable for this environment. Any further separation then synchronous replication becomes a significant performance overhead.
So each AZ is at least one building, and transparent to the customer, this can grow (and shrink) as a physical footprint over time.
What’s New?
Until now, customers have had a choice of two Availability Zones in the Sydney AWS Region, and the general advice was to deploy your service by spreading across both of them evenly in order to get some level of high availability. Indeed, the EC2 SLA talks about having availability zones as part of your strategy for obtaining their 99.95% SLA. Should one of those AZs “become unavailable to you” then you stand a reasonable chance of remaining operational.
In this event of unavailability, those customers that had designed AutoScale groups around their EC2 compute fleet would then find their lost capacity being deployed automatically (subject to their sizings, and any scale-up/down alarms) in the surviving AZ. It meant that your cost implication was to run two instances instead of one, but potentially you ran two slightly smaller instances than you otherwise may have traditionally thought, but the benefit of this automatic recovery to service was a wonderful. It did mean that you ran a risk of losing ~50% of your capacity in one hit (one AZ, evenly split), but that’s better than cold standby elsewhere.
With three AZs, you now have a chance to rethink this. Should you use a third AZ?
Divide by 3!
If your EC2 fleet is already >= 3 instances, then probably this is a no-brainer. You’re already paying for the compute, so why not spread it around to reduce the loss-of-AZ risk exposure. Should an AZ fail then you’re only risking 1/3 of your footprint. The inter-AZ costs (@1c/GB) is in my experience negligible – and if you were split across two anyway then you’re already paying it.
Your ELBs can be expanded to be present in the new AZ as well – at no real increased cost; if ELBs to instances is your architecture, then you would not spread compute across 3 AZs without also adjusting the ELBs they may sit behind to do likewise.
But I don’t need three EC2 instances for my service!
That’s fine – if you’re running two instances, and you’re happy with the risk profile, SLA, and service impact of losing an AZ is that you already have in place, then do nothing. Your existing VPCs that you created won’t sprout a new Subnet in this new AZ by themselves; that’s generally a customer initiated action.
What you may want to do is review any IAM Policies you have in place that are explicit in their naming of AZs and/or subnets. You can’t always assume there will only ever be 2 AZs, and you can’t always assume there will only ever be 3 from now on!
Why is there a 3rd AZ in Sydney?
We’re unlikely to ever know for sure (or be permitted to discuss). Marketing (hat tip to my friends there) will say “unprecedented customer demand”. This may well be true. The existing AZs may be starting to become quite busy. There may be no more additional data centre capacity within a reasonable distance of the existing building(s) of the existing two AZs. And as we know, certain AWS services require a third AZ: for example, RDS SQL Server uses a witness server in a 3rd AZ as part of the multi-AZ solution – perhaps there’s been lots of customer demand for these services rather than exhaustion on the existing services.
But there are other reasons for this. Cost optimisation on the data centre space may mean the time is right to expand in a different geographical area. There’s the constant question as if the AWS services run from AWS-owned buildings or 3rd parties. At certain scales some options become more palatable than others. Some options become more possible. Tax implications, staffing implications, economies of scale, etc. Perhaps a new piece of industrial land became available – perhaps at a good price. Perhaps a new operator built a facility and leased it at the right price for a long term.
Perhaps the existing data centre suppliers (and/or land) in the existing areas became out priced as AWS swallowed up the available capacity. As Mark Twain allegedly said: “buy land, their not making any more of it”. If you owned a data centre and were away of your competitors near by being out of spare capacity, surely that supply-and-demand equation would push pricing up.
So what is clear here?
In my humble opinion,this is a signal that the Cloud market in Australia is a strong enough prospect that it warrants the additional overhead of developing this third AZ. That’s good news for customers who are required – or desire – to keep their content in this Region (such as public sector) as a whole lot of the more modern AWS services that depend upon three *customer accessible* AZs being present in a Region now become a possibility. I say possibility, as each of those individual service teams need to justify their expansion on their own merits – it’s not a fait accompli that a 3rd AZ means these services will come. What helps is customers telling AWS what their requirements are – via the support team, via the forums, and via the AWS team in-country. If you don’t ask, you don’t get.
How do I balance my VPC?
Hm, so you have an addressing scheme you’ve used to split by two? Something like even-numbered third-octect in an IPv4 is in AZ A, and odd numbered is in AZ B?
I’d suggest letting go of those constraints. Give your subnets a Name tag (App Servers A, App Servers B, App Servers C), and balance with whatever address space you have. You’re never going to have a long term perfect allocation in the uncharted future.
If you’ve exhausted your address space, then you may want to renumber – over time – into smaller more distributed subnets. If you’re architecting a VPC, make it large enough to contain enough residual address space that you can use it in future in ways you haven’t even through of yet. The largest VPC you can define is a /16, but you may feel quite comfortable allocating each subnet within that VPC as a /24. That’s 256 subnets of /24 size that you could make; but you don’t have to define them all now. Heck, you may (in an enterprise/large corporate) need a /22 network one day for 1000+ big data processing nodes or Workspaces desktops.