The move to 3 AZs in Sydney

I’ve been working on a significant public-sector AWS cloud deployment (as previously mentioned) and surfed our way confidently through the June 2016 storms with a then 2-Availability-Zone architecture, complete with AWS RDS Multi-AZ, ELB and AutoScale services making the incurred AZ failure almost transparent.

However, not resting on ones laurels, I’ve been looking at what further innovations are in place, and what impact these would have.

Earlier this year, AWS introduced a third AZ in Sydney. This has meant that a set of higher order AWS services have now appeared that themselves depend upon 3 AZs. This also gives us a chance to — optionally — spread ourselves wider across more AZs. I’ve been evaluating this for some time, and came to the following considerations and conclusions.

RDS Multi-AZ during an AZ outage

Multi-AZ was definitely failing-over and doing its thing in June, but for the following 6-8 hours or so the failed AZ stayed off-line, along with my configured VPC subnets in that AZ that were members of my RDS DB Subnet Group. This meant that during this period I didn’t have multi-AZ synchronous protection.

Sure, this returned automatically when the AZ came back online but I thought about this small window a fair bit.

While AWS deploys its AZs on separate flood plains and power distribution grids, the same storm is passing over many parts of the Region at one time, so there’s a possibility that lightning could strike twice.

In order to alleviate this, I added a third subnet from a third AZ to my DB Subnet Group. Multi-AZ RDS only (at this time) has one replica. During a single AZ failure event, RDS would still have a choice of two operating AZs to provide me with the same level of protection.

The data in my relational database is important, and the configuration to add a third Subnet in AZ C has no real financial overhead (just inter-AZ traffic at around US1c/GB for SQL traffic). Thus the change is mostly configuration.

I thought about not having three AZs for my RDS instance(s), and considered what I would say to my customer if there was a double failure and I didn’t move to this solution. I was picturing the reaction I would get when I tell them it was mostly just a configuration change that could have helped protect against a second failure after an AZ outage.

That was a potential conversation that I didn’t want to have! Time to innovate.

EC2 AutoScale during an AZ outage

I then through about what happens to my auto-scale groups the moment an AZ goes dark. Correctly it reacts, and tries to recover capacity into the surviving AZ(s) that the ASG is configured for. In the case where I have two AZs and two instances (one per AZ normally), then I have lost 50% of my capacity. To return to my minimum configuration (two instances) I need ASG to launch from only the one surviving AZ.

Me, and everyone else who is still configured for the original two AZs in the Region. Meanwhile a third AZ is sitting there, possibly idle, just not configured to be in use.

If I had three AZs configured, and my set of ASGs randomly populating two instances across these three AZs, then I would have 1/3rd of my ASGs with no instances a failing AZ. So my immediate demand from this failure has decreased: I would require replacing just 33% of my fleet. Furthermore, I am not constrained to one AZ to satisfy my (reduced) demand.

Migrating from 2 AZ to 3 AZ

So I had a VPC, created from a CloudFormation template. It was reasonably simple, as when starting out on deploying a reasonably sized cloud-native workload, we had no idea what it would look like before we started it. Here’s a summary:

Purpose AZ A AZ B Total IPs (approx)
Internal ELBs 10.x.0.0/24 10.x.1.0/24 500 (/23)
App Servers 10.x.2.0/24 10.x.3.0/24 500 (/23)
Backend Services 10.x.4.0/24 10.x.5.0/24 500 (/23)
Databases 10.x.6.0/24 10.x.7.0/24 500 (/23)
Misc 10.x.8.0/24 10.x.9.0/24 500 (/23)

 

My entire VPC design was a /20 constraint (some 4000 IPs), designated from a corporate topology. Hence using a /24 as a subnet, we would have 16 subnets possible in the VPC.

On-premise firewalls (connected by both Direct Connect and VPN) would permit access from on-premise to specific subnets, and from in-cloud subnets to specific on-premise destinations.

It became clear as our architecture evolved, that having some 500 IPs for what ended up being few Multi-AZ databases (plus a few read replicas) was probably overkill. we also didn’t need 500 addresses for miscellaneous instances and services – again overkill that’s only appreciated with 20/20 hindsight.

Moving Instances

This workload was live, so there’s no chance of any extended downtime. So we refer to the rules of the VPC: you can only delete a VPC if there are no interfaces present in it. Thus we have to look at ENIs, and see where they are.

Elastic Network Interfaces are visible in the AWS console and CLI. You’ll note that Instance, ELBs, RDS instances all have ENIs. Anything that is “in” the VPC likely has them. So we need to jostle these around in order to reallocate.

Lets look at the first pair of subnets for the ELBs. I want to spread this same allocation across three AZs. three is an unfortunate number, as splitting subnets doesn’t nicely go into threes. However, four is a good number. Taking the existing pair of /24 networks (a contiguous /23), we would re-distribute this as 4 /25 networks (120 IP addresses apiece). I only have 40 internal ELBs (TCP pass through), so three’s enough room there. This would leave me with a /25 unused – possibly spare should a 4th AZ every come along.

And thus it began. ELBs were updated to remove their nodes from AZ A. This meant that nodes in AZ A were out of service (ELBs must be present in the same AZ as the instances they are serving to). So at the same time, ASGs were updated to likewise vacate AZ A. ELB reacted by deploying two ENIs in AZ B. ASGs reacted by satisfying their minimum requirements all from subnets in AZ B.

While EC2 instances were quick to vacate AZ A, ELB took some time to do so. Partly this is because ELB uses DNS (with low TTLs), and needs to wait until a sufficient amount of time has past that most clients would have refreshed their cached lookups and discovered the node(s) of ELB only in AZ B. In my case (and in more than one occasion) the ELB got stuck shutting down its ENIs in AZ A.

A support call or two later, and the AZs were vacated (but we’re still up!).

At this stage, the template I used to create the VPCs was read for its first update in 18 months. One of the parameters to my VPC is the CIDR range it holds, so the update was going to be as simple as updating this ONLY for the now-vacant subnets.

However, there’s a catch. For some reason, CloudFormation wants to create NEW subnets before deleting old ones. I was taking my existing 10.x.0.0/24 and going to use 10.x.0.0/25 as the address space. However, since the new subnet was to be created before the old was deleted, this caused an address conflict, and the update safely rolled back (of course, this was learnt in lower environments, not production).

The solution was to stage a two-phase update to the CFN stack. The first update was to set a new temporary range that didn’t conflict – from the spare space in the VPC. Anything would be fine to use so long as (a) it was currently unused, and (b) it didn’t conflict with my final requirements.

So my first update was to set the ELB subnets in AZ A to 10.x.15.0/25, and a follow-up a few seconds later to 10.x.0.0/25. Similarly with the other subnets for App servers and back-end servers.

With these subnets redefined (new subnet IDs), we could reverse the earlier shuffle: defined ELBs back in to AZ A, then define ASGs to span the two AZs. next was the move to vacate AZ B. Just as with the ELBs when they left AZ A, three was a few hours wait for the ENIs to finally disappear.

However this time, I was moving from 10.x.1.0/24, to 10.x.0.128/25. This didn’t overlap, and wasn’t in use, so was a simple one step CloudFormation parameter update to apply.

Next was a template update (not just a parameter update) to define the subnets in AZ C, and provide their new CIDR allocations.

The final move here is to update the ELBs and ASGs to now use their third subnets.

Moving RDS

RDS Multi-AZ is a key feature underpinning the databases we use. In this mode, the ENIs for the master and the standby are in place from the moment that Multi-AZ is selected.

My first move was to force a fail-over of any RDS nodes active in AZ A. This is a reboot “with fail-over”, and incurs about a 3 minute outage. My app is durable to this, but its still done outside of peak service hours with notification to the client.

After failing over, we then temporarily modify the RDS instance to NOT be multi-AZ. Sure enough, the ENI from AZ A is duly removed, and the subnet when vacant can be replaced with the smaller allocation (in my case, a /26 per AZ suffices). With the replaced subnet created, I can then update my DB Subnet Group to include this new SubnetID, and re-enable Multi-AZ. Another reboot “with fail-over”, and convert again to Single AZ, and I can re-define the second subnet. Once more we update the DB Subnet Group again, and re-enable Multi-AZ.

The final chess move was to define the third subnet in the third AZ, and include that in the DB Subnet Group.

Purpose AZ A AZ B AZ C ‘Spare’ Total IPs (approx)
Internal ELBs 10.x.0.0/25 10.x.0.128/25 10.x.1.0/25 10.x.1.128/25 500 (/23), only 370 available now
App Servers 10.x.2.0/25 10.x.2.128/25 10.x.3.0/25 10.x.3.128/25 500 (/23), only 370 available now
Backend Services 10.x.4.0/25 10.x.4.128/25 10.x.5.0/25 10.x.5.128/25 500 (/23), only 370 available now
Databases 10.x.6.0/26 10.x.6.64/26 10.x.6.128/26 10.x.6.192/26 and 10.x.7.0/24 500 (/23), only 190 available now
Misc 10.x.8.0/24 10.x.9.0/24 Same, yet to be re-distributed

Things you can’t easily move

What I found was there are a few resources that once created, actually require deletion. WorkSpaces and Directory services were two that, once present, aren’t currently easy to transfer between subnets. Technically instances aren’t transferable, but since I am in an ASG world (cattle, not pets), I can terminate and instantiate at will.

Closing Thoughts

With spare addressing space available for a fourth subnet, I don’t think I’m going to have to re-organise for a while. My CIDR ranges are still consistent with their original purposes. I have plenty of addressing space to define more subnets in future (perhaps a set of subnets for Lambda-in-VPC).

There’s other VPC improvements I’ve added at the same time, but I’ll save those for my next post.

#TemplateAllTheThings

Note: I also run some of the most advanced security and operation training on AWS. See https://nephology.net.au/ for information.

Arlec Wireless LED Sensor Kit review

I was wandering around my esteemed local hardware store (Bunnings) and obtained an Arlec Wireless LED Sensor Kit. I’d been looking for something to give “under cabinet” lighting, particularly at night in my bathroom. Not wanting to get mains power outed to near floor height in a wet room, this looked like a great solution.

The kit consists of three “bars” of LEDs. One of these bars is a control unit, and also has an IR sensor in it. The other two are slave units to the master containing just an array of LEDs.

For the most part, this does as you want, but with a bit of thinking the product could do so much more. So Arlec, here’s some product research that frankly, you could have done in 20 minutes of thinking about your product:

  1. The light stays on for one minute, and then goes off. Regardless if the IR has been triggered again, during the last 60 seconds, its on for 60 and then off. Followed by madly trying to re-trigger this in the darkness. Surely if the IR triggers again you should reset the timer.
  2. Who choose 60 seconds? This should be configurable by the user. Minimum 5 seconds, maximum an hour?
  3. When the lights come on, they come on at 100%. Making an ease-in, ease-out to bring them up to “full brightness” would be much nicer.
  4. Why have the LEDs always go to 100% brightness. Perhaps that should be configurable.
  5. The LEDs are quite a cold white color. For me, Warm white would have been nicer. Others may want specific colour.
  6. When triggering, the slave units take some time to come on, and they trigger on in a random order. I have mine all in a tight two, and I’d be happy to run a small 2 or 3 wire cable between them and have them trigger simultaneously. Furthermore, with only 3 channel available, I’m limited to deploying this in larger settings. If I wire slaves together, then they should ignore their wireless receivers.
  7. The random order triggering of the slave units should be configurable. I may have a set of 10 of them going up some stairs, and want to put a 50ms delay as the light appears going up the stairs. Coupled with ease-in and brightness control this could look quite good.
  8. The master unit has the IR and a bank of LEDs, but I may not want LEDs where my IR trigger is: separate the IR and control unit into its own module.
  9. Give me the option of having multiple IR sensors (perhaps either end of the array of lights) to trigger the LEDs.
  10. Sell additional slave units individual, and in 5 packs.
  11. 3 channels is not enough if I have multiple sets in close proximity, and subject to interference. So give me the option of disabling the wireless signalling completely.

A smaller form factor would also be neat – perhaps a hard-wired version that could sit flatter under surfaces and be less obtrusive. But that’s my first few things that I think a bit or R&D would uncover.

List AWS’ IPs


#!/usr/bin/python
from datetime import datetime
import requests
import json
import argparse
parser = argparse.ArgumentParser(description="AWS IP Range Display")
parser.add_argument('-verbose', '-v', action='count', help='Verbose')
parser.add_argument('--region', '-r', help="Region to print",
                    default='ap-southeast-2')
parser.add_argument('--service', '-s', help="Service to print", default='ec2')
parser.add_argument('--listservices', default=False, action='store_true')
parser.add_argument('--listregions', default=False, action='store_true')
args = parser.parse_args()
resp = requests.get("https://ip-ranges.amazonaws.com/ip-ranges.json")
if resp.status_code != 200:
    print("Failed to get JSON from {}: {}".format(url, resp.status_code))
    sys.exit(1)
d = json.loads(resp.text)
if args.verbose > 1:
    print(json.dumps(d, sort_keys=True, indent=4, separators=(',', ': ')))
created = datetime.strptime(d['createDate'], "%Y-%m-%d-%H-%M-%S")
if args.verbose:
    print("File created %d days ago (%s), sync token %s" %
          ((datetime.now() - created).days, d['createDate'], d['syncToken']))
if args.listregions:
    print(json.dumps(sorted(set([prefix['region'].lower() for
                                 prefix in d['prefixes']]))))
elif args.listservices:
    print(json.dumps(sorted(set([prefix['service'].lower() for
                                 prefix in d['prefixes']]))))
else:
    for prefix in d['prefixes']:
        if prefix['service'].lower() == args.service.lower():
            if ((prefix['region'].lower() == args.region.lower() or
                 prefix['region'].lower() == 'global')):
                print(prefix['ip_prefix'])

 

3rd AWS Availability Zone hits Sydney

AWS Ap-Southeast-2 has a new AZ

The AWS console (and Twitter, and LinkedIn) has just lit up with the EC2 Dashboard Console page showing a service status with a new Availability Zone (AZ): ap-southeast-2c.

Before I go any further, I should be clear on my position here – I do not work for AWS. I used to in the past (~a year and a half ago). These opinions disclosed here are mine and not based upon any inside knowledge I may have – that data is kept locked up.

What is an AZ

AWS Regions (currently 11 in the main public global set, with around 5 publicly disclosed as coming soon) are composed of many data centres. For the EC2 services (and those services that exist within the Virtual Private Cloud or VPC world) these exist within customer defined networks that live in Availability Zones. You can think of an Availability Zone as being a logical collection of data centres facilities (one or more) that appear as one virtual data centre.

Each Region generally has at least two Availability Zones in order for customers to split workloads geographically; but that separation is generally within the same city. You can guess the separation by deploying services in each AZ, and then doing a ping from one to the other. They should be less than 10 milliseconds separated.

This separation should be sufficient to have separate power grids, flood plains and other risk factors mitigated, but close enough to make synchronous replication suitable for this environment. Any further separation then synchronous replication becomes a significant performance overhead.

So each AZ is at least one building, and transparent to the customer, this can grow (and shrink) as a physical footprint over time.

What’s New?

Until now, customers have had a choice of two Availability Zones in the Sydney AWS Region, and the general advice was to deploy your service by spreading across both of them evenly in order to get some level of high availability. Indeed, the EC2 SLA talks about having availability zones as part of your strategy for obtaining their 99.95% SLA. Should one of those AZs “become unavailable to you” then you stand a reasonable chance of remaining operational.

In this event of unavailability, those customers that had designed AutoScale groups around their EC2 compute fleet would then find their lost capacity being deployed automatically (subject to their sizings, and any scale-up/down alarms) in the surviving AZ. It meant that your cost implication was to run two instances instead of one, but potentially you ran two slightly smaller instances than you otherwise may have traditionally thought, but the benefit of this automatic recovery to service was a wonderful. It did mean that you ran a risk of losing ~50% of your capacity in one hit (one AZ, evenly split), but that’s better than cold standby elsewhere.

With three AZs, you now have a chance to rethink this. Should you use a third AZ?

Divide by 3!

If your EC2 fleet is already >= 3 instances, then probably this is a no-brainer. You’re already paying for the compute, so why not spread it around to reduce the loss-of-AZ risk exposure. Should an AZ fail then you’re only risking 1/3 of your footprint. The inter-AZ costs (@1c/GB) is in my experience negligible – and if you were split across two anyway then you’re already paying it.

Your ELBs can be expanded to be present in the new AZ as well – at no real increased cost; if ELBs to instances is your architecture, then you would not spread compute across 3 AZs without also adjusting the ELBs they may sit behind to do likewise.

But I don’t need three EC2 instances for my service!

That’s fine – if you’re running two instances, and you’re happy with the risk profile, SLA, and service impact of losing an AZ is that you already have in place, then do nothing. Your existing VPCs that you created won’t sprout a new Subnet in this new AZ by themselves; that’s generally a customer initiated action.

What you may want to do is review any IAM Policies you have in place that are explicit in their naming of AZs and/or subnets. You can’t always assume there will only ever be 2 AZs, and you can’t always assume there will only ever be 3 from now on!

Why is there a 3rd AZ in Sydney?

We’re unlikely to ever know for sure (or be permitted to discuss). Marketing (hat tip to my friends there) will say “unprecedented customer demand”.  This may well be true. The existing AZs may be starting to become quite busy. There may be no more additional data centre capacity within a reasonable distance of the existing building(s) of the existing two AZs. And as we know, certain AWS services require a third AZ: for example, RDS SQL Server uses a witness server in a 3rd AZ as part of the multi-AZ solution – perhaps there’s been lots of customer demand for these services rather than exhaustion on the existing services.

But there are other reasons for this. Cost optimisation on the data centre space may mean the time is right to expand in a different geographical area. There’s the constant question as if the AWS services run from AWS-owned buildings or 3rd parties. At certain scales some options become more palatable than others. Some options become more possible. Tax implications, staffing implications, economies of scale, etc. Perhaps a new piece of industrial land became available – perhaps at a good price. Perhaps a new operator built a facility and leased it at the right price for a long term.

Perhaps the existing data centre suppliers (and/or land) in the existing areas became out priced as AWS swallowed up the available capacity. As Mark Twain allegedly said: “buy land, their not making any more of it”. If you owned a data centre and were away of your competitors near by being out of spare capacity, surely that supply-and-demand equation would push pricing up.

So what is clear here?

In my humble opinion,this is a signal that the Cloud market in Australia is a strong enough prospect that it warrants the additional overhead of developing this third AZ. That’s good news for customers who are required – or desire – to keep their content in this Region (such as public sector) as a whole lot of the more modern AWS services that depend upon three *customer accessible* AZs being present in a Region now become a possibility. I say possibility, as each of those individual service teams need to justify their expansion on their own merits – it’s not a fait accompli that a 3rd AZ means these services will come. What helps is customers telling AWS what their requirements are – via the support team, via the forums, and via the AWS team in-country.  If you don’t ask, you don’t get.

How do I balance my VPC?

Hm, so you have an addressing scheme you’ve used to split by two? Something like even-numbered third-octect in an IPv4 is in AZ A, and odd numbered is in AZ B?

I’d suggest letting go of those constraints. Give your subnets a Name tag (App Servers A, App Servers B, App Servers C), and balance with whatever address space you have. You’re never going to have a long term perfect allocation in the uncharted future.

If you’ve exhausted your address space, then you may want to renumber – over time – into smaller more distributed subnets. If you’re architecting a VPC, make it large enough to contain enough residual address space that you can use it in future in ways you haven’t even through of yet. The largest VPC you can define is a /16, but you may feel quite comfortable allocating each subnet within that VPC as a /24. That’s 256 subnets of /24 size that you could make; but you don’t have to define them all now. Heck, you may (in an enterprise/large corporate) need a /22 network one day for 1000+ big data processing nodes or Workspaces desktops.

CloudTrail, now with scalable logging of AWS APIs

AWS CloudTrail had some quiet updates in 2015 to make it a smoother ride when new Regions launch.

When AWS CloudTrail launched in 2013 as a free service (except for the consumed storage of its logs it dumped into S3) it was filling a hole — not advertised as an audit trail, but as close as AWS could get without fear of it becoming a blocking internal service on legitimate API calls. CloudTrail has to work quick enough to keep up with the constant stream of APIs.

Having a log of API calls that a customer makes is a key enabler for compliance reasons. CloudTrail did (at least) one thing that was pretty awesome — cross-account logging. Logs from CloudTrail in Account A could log to Account B, without anyone in Account A having been able to modify the log. For this to work, the recipient account had to configure their S3 bucket with appropriate permissions to receive the logs, with the correct originating identity, and to the specific paths — matching the account numbers of the source account(s) that will be permitted to log to it.

Clearly, one wouldn’t authorise the entire name-space too wide, or you would potentially let any account chose to log to you. They’d have to know the name of your bucket, but once discovered, they could generate enough API activity to start generating logs into your receiving account. Now these logs are quite small (and gziped), but its the principal!

If we think of this ‘receiving’ account as being our security and governance team, then they workload was to:

  1. Add additional paths (account numbers) as the organisation added AWS accounts
  2. White-list user IDs matching the AWS CloudTrail identity in each region as it came on line.

This second item is important. As AWS expands — it’s added a region already this year (2016), with plans for another 5 to come before Christmas — then the Security team in this account would have a race to find the CloudTrail ID for the new region, add it to the S3 Bucket policy for receiving logs, and then contact each of its sending accounts and get them to visit the region purely to turn on CloudTrail. Here’s what that looked like in the S3 bucket policy:

    {
      "Sid": "AWSCloudTrailAclCheck20131101",
      "Effect": "Allow",
      "Principal": {"AWS": [
        "arn:aws:iam::903692715234:root",
        "arn:aws:iam::859597730677:root",
        "arn:aws:iam::814480443879:root",
        "arn:aws:iam::216624486486:root",
        "arn:aws:iam::086441151436:root",
        "arn:aws:iam::388731089494:root",
        "arn:aws:iam::284668455005:root",
        "arn:aws:iam::113285607260:root"
      ]},
      "Action": "s3:GetBucketAcl",
      "Resource": "arn:aws:s3:::my-sec-team-logs"
    },

But the CloudTrail and IAM teams didn’t stand still. In mid 2015, the race to find the new region ID was removed with the ability to specify a global Service Principal ID that mapped to CloudTrail in all Regions – with AWS updating this to include new Regions as they come on line:

    {
	"Sid": "AWSCloudTrailAclCheck20150319",
	"Effect": "Allow",
	"Principal": { "Service": "cloudtrail.amazonaws.com" },
	"Action": "s3:GetBucketAcl",
	"Resource": "arn:aws:s3:::myBucketName"
    },

Turning to the ‘sending’ account, it had that same race – to turn on CloudTrail in a new region. Some questioned the need for doing this – if you’re not planning on using AWS in ap-northeast-2, then why turn on logs there? The simple reason is – to catch any activity that may happen, that you’re not aware of or expecting. Again during 2015, CloudTrail updated to change what used to be 1 ‘Trail’ per region, to having a ‘ShadowTail’ that was actually configured in one Region, but applied to all, with AWS turning on CloudTrail in new regions as they come online.

This replaced a CloudFormation template I’d developed to uniformly do the old Region-by-Region turn on of CloudTrail — and helps future proof the rapidly expanding service to reduce the ‘fog of war’ — the blind spots where activity may happen, but you don’t have any logging of it.

Lastly, a single trail per region was the default – and if you configured that to be handed immediately and directly to a separate account, then you may miss out on being able to inspect it yourself in the service account that generated the events! CloudTrail team fixed that too – permitting multiple trails per Region. This means I can pass one copy of the API log to the central security team, and then direct a duplicate stream to my own bucket for me to review should I need to.

When configuring the delivery of these logs, its also important to think about the long term retention – and automatic deletion of these logs. S3 LifeCycle policies are perfect for this – setting a deletion policy couldn’t be easier – just specify the number of days until deleted.

Should you be worried about one of your security team deleting the log – turn on Versioning for the receiving S3 bucket, and MFA delete. Whenever you access a log, you can always check to see if there are any “previous revisions” that are a result of an overwrite.

Lastly, its important to do something with these logs. CloudTrail Logs, Alerts, or a 3rd party suite like Splunk or managed service like SumoLogic works OK; but the key element is starting to wrap rules around your APIs calls that map to your activity. If you know you’re only ever going to access the API from a certain range, then set up an alert for when this happens from somewhere else. If you know you’re only going to access during office hours, set up an alert for when this happens outside of these hours. Easy stuff! Here’s a few others I like:

  • If using federated identity (SAML, OAuth), look for the definition of a new Identity Provider. Also look for updates (overwrites) of the MetadataDocument for existing Identity Providers – this will happen in (often yearly as SAML metadata contains X509 certificates that have expiry dates in them)
  • If using local users, check for additional users being created, especially if you have a pattern for usernames or know that you only create additional users from the office IP range
  • Check for IAM policies being modified
  • Check for new VPNs being established, new DirectConnect interfaces being offered (including sub-interfaces from another account), new Peering Requests ebing offered/accepted
  • Check for routing table changes; this is often stable after initial set-up

There’s many more situations to think about, and your profile of what wraps around your use of AWS may vary from account to account (eg, between Development and Production, or a payroll workload versus an account used purely for off-site backup.

If you’re already receiving S3 logs, swap over to the Service Principal; it will stop you from having to react when the AWS notification of a new region happens. If you’re already sending CloudTrail logs to your security team then switch to a global Trail and rest easy as the new Regions come on line.

There’s more that CloudTrail has done – including validation files containing signatures of log files delivered, along with a chain of delivery proof where each validation file also has information about the previous one, so there can be no break in the chain of log files delivered.