Transitioning to IPv6 in AWS

There are a large number of workloads that operate in the AWS Cloud using traditional virtual machines (Instances) on traditional IPv4 networking. And for the last few years, we’ve seen the steady growth in IPv6 adoption globally. For those who haven’t started this journey yet, here’s some notes on what you may want to look at as you start to embrace the future of the Internet.

It should be noted that this transition is a two way street:

  1. you need to get ready to offer your digital services to your clients over both IPv4 and IPv4 (Dual Stack)
  2. you need to have your dependant services you use to offer (listen) on an IPv6 address, and probably via a gradual transition via offering both IPv4 and IPv6 for a (long) period of time

Within your internal (to your VPC) network architecture you can use either network protocol: the initial focus needs to be on enabling your incoming traffic to use either IPv4 or IPv6.

Your transport layer security (TLS) should be identical on either network protocol. The IP protocol is just a transport protocol.

Here are the steps:

  1. VPC Changes
  2. Subnet Changes
  3. Load Balancers Changes
  4. Routing Changes
  5. Security Group Changes
  6. DNS Changes

VPC Configuration

Adding an IPv6 address block is reasonably simple in VPC. While you can allocate from your own assigned pool, its far easier to use the AWS pool; its ready to go and doesn’t need any other preparation.

There are three ways to add an IPv6 address allocation:

  • In the console, via ClickOps
  • Via the API (including the CLI)
  • Via the CloudFomation template that defines your VPC – highly recommended

Assigning the address block to the VPC does not actually use it, and should make zero impact to already running workloads. You should be safe to apply this at any time.

Subnet Configuration

Once the VPC has an allocation, we can then update existing subnets to also include an allocation from within the VPC’s range. The key difference we see here is that in IPv4 we can chose the size of the subnet, in IPv6 you cannot: every IPv6 allocation to a subnet is a /64, which is about 18 billion billion IP addresses.

You can undo an allocation if no Network interfaces (ENIs) are present in the subnet using those addresses.

The configuration is relativity simple: you get to those which slice of the VPC IPv6 address block will be used for which subnet. I follow a pretty simple rule: I anticipate that my VPCs will perhaps one day spread across 4 Availability Zones, so I allocate subnets sequentially across Availability zones in order to be able to reference the range via a supernet.

The reason for this is:

  • subnetting is done in powers of two: so for continuous addressing (supernetting) we’re looking at using two AZs, four AZs, or eight AZs, etc.
  • two availability zones is insufficient. If one fails, then I you are running on a single Availability Zone during the incident (which may last several hours). This AZ may be constrained in capacity, while other AZs may be underutilised. Hence we want to use three AZs to have fault tolerance able to be restored DURING a single AZ outage

Most Regions have between three and 5 AZs. Preparing for 8 in most Regions will be reserving address space we’ll likely never be allocating.

Hence, starting with public subnets, we want to sequentially allocate them with space to accommodate four AZs. These allocations are a hexadecimal number between 00 and FF – and hence a 256 limit on the total number of subnets. If we recall the four AZ allocation, then that’s 64 sets of Subnets across all AZs.

Again, you can allocate these by:

  • Click Ops in the console on each existing subnet (or when creating new subnets)
  • API call (including the CLI)
  • CloudFormation template – recommended – in which case, look at the Fn::Cidr to calculate the allocation. Check out my post form March 2018 on this.

If your focus is to start with your services being dual-stack available, then the only subnets you need to allocate initially are the Public Subnets: the subnets where your client facing (internet facing) load balancers are.

Once again, there’s no interruption to existing traffic during this change; indeed you’re less than half way through the required changes.

You may also allocate the rest of your private subnets at this time if you wish.

Routing Changes

For public subnets to function, they need a route for the default IPv6 address via the existing Internet Gateway (IGW). This looks like “::/0”, and when pointing to the IGW, it permits two way traffic just like IPv4. Your set of public subnets will need this route, and this can be done at any time: permitting IPv6 routing wont start clients using it.

If you have private subnets with IPv6 allocations, and you want them to be able to make outbound requests on IPv6 to the Internet, then you may want to consider an Egress Only IGW as the destination for “::/0” for private subnets. Note your public subnets still will use the standard IGW.

The Egress only IGW resource does what it says, and supplants the need for NAT Gateway as used in IPv4 (more on NAT GW later).

Again, you can add the Egress Only IGW and the Routing changes in several ways:

  • Click Ops on the console
  • Via the API (including the CLI)
  • In your CloudFormation template for your VPC – recommended

Load Balancer Changes

Now you have public load balancers in public subnets that have IPv6 available, you can modify your load balancer to have it get an IPv6 address. This is yet another action that will have no impact on current traffic.

You can modify the existing load balancers by:

  • Click ops on the console
  • An API call (including the CLI)
  • In your CloudFormation template for your Workload – recommended

Security Group Changes

Now we’re down the the last two items. By default, your security group is closed unless you have made changes. Your typical load balancer will be listening on TCP 80 and/or 443 for web traffic, and be open to the entire [IPv4] Internet with a source of 0.0.0.0/0.

To enable this security group for IPv6, we add a set of rules for source of ::/0 for the same ports you have for IPv4 (typically 80 and 443 for web traffic, different for other protocols).

Its at this time you can now test connectivity to your load balancer using IPv6 end-to-end – assuming you have another end on the IPv6 Internet somewhere.

If your workstation/cellphone is using IPv6, then you could browse to IPv6 address – but you’ll probably get a certificate warning as the name in the certificate doesn’t match the raw IP address.

If you’re not familiar yet, this should also be a CloudFormation template update.

DNS Changes

This is when we announce to the world that your service can be accessed with IPv6. You want to make sure you have done the above test to ensure you can connect, as this is the final piece in the puzzle.

Typically a custom DNS name for a load balancer is a Route53 ALIAS record of type A (Address). The customer DNS name is what also appears in any TLS Certificates.

To finally flick the switch on IPv6, you add an additional Route53 ALIAS record of type AAAA (four As), with the destination being the same as you have used for the existing Alias A record (one A).

You should now be able to check that you can resolve your service using the dnslookup utility. From a command prompt or Powershell, type:

  • nslookup -type AAAA my.custom.load.balancer.name
  • nslookup -type A my.custom.load.balancer.name

Your Dependencies

Now you’re up and running, you need to think about the services you depend upon. Services within your VPC, such as RDS, require AWS to enable these to be dual stack. Some services already are, such as the Link-Local MetaData service, Time Sync Service and VPC DNS resolver (note: always use the DNS resolver).

Some services will be outside of your VPC but still AWS-run, like SQS, and S3: in which case, look to use VPC Endpoints to communicate with them.

But other third party resources across the Internet may be stack back on IPv4. if you have an EC2 Linux Instance then its sometimes worth running a TCPDUMP to inspect the traffic you see using IPv4. A command like tcpdump ip and port not 22 may be useful. You can extend that to also exclude HTTP/HTTPS traffic with tcpdump ip and port not 22 and port not 80 and port not 443. Remember, your service port on your instance may be a different number on the inside of your network.

You’ll need to ask your dependencies to include dual-stack support on their services. In the mean time, you’ll be having to fall back to using IPv4 from your network to communicate with these dependencies. There’s two ways this can happen:

  1. If the subnet with your EC2 instance in it is dual-stack, hen the host can use an IPv4 connection itself, possibly via a NAT Gateway to communicate with the external IPv4 dependency
  2. If the subnet with your EC2 instance is IPv6 only (which is rather new), then the subnet can be configured to use DNS64 addressing (a subnet level configuration), and can route its traffic via the NAT GW, which will translate from IPv6 on the VPC-internal network, to IPv4 across the Internet (and back). See this.

Moving to IPv6 only internal networks is a long term goal, probably in the order of half a decade or so. A number of additional AWS updates will be needed before this becomes a default.

Additional IPv6 Notes in AWS

In this transition period (which has been going for nearly 25 years), you’re going to find stuff that silently falls back to IPv4. With host able to simultaneously have two addresses (IPv4 A, and IPv6 AAAA), then things that look them up can have a choice. For more things this is the newer AAAA, with a fall-back to A if needed (see the Happy Eyeballs RFC).

However, at this time (Mar 2022), CloudFront still preferences IPv4 origins when the origin is dual-stack. CloudFront also still uses TLS 1.2 instead of the newer and faster TLS 1.3, and HTTP/1.1 instead of the slightly more efficient HTTP/2 request protocol.

AWS IoT core exposes IPv4 endpoints, which is unusual as a key element of IoT is having millions of devices connected, a situation best served by IPv6.

Similar considerations exist for Route53 Health Checks, and others.

Summary

If you’re thinking this is all very new in cloud, you’d be mistaken. I was transitioning customer environments (including production) in AWS to dual stack in 2018 – four years ago. I’ve been dual-stack for my home Internet connection since I swapped to Aussie Broadband (I churned away from iiNet, who once had an IPv6 blog and strong implementation plans).

For several years, Australia’s dominant telco, Telstra, has had IPv6 dual stack for its consumer mobile broadband, something that the other players like Optus are yet to enable.

But these changes are inevitable.

The future is here, its just not evenly distributed.

AWS Local Zones expansion 2022

AWS recently made a bold announcement; at re:invent in specified a few countries it planned to open Local Zones in, but last week it revealed some 32 locations, including Perth, Brisbane, and Auckland

Perth is isolated by the vast distances between east and west coast of Australia – 2044 miles, similar distances to the continental United States between DC and LA (2200 miles), or London to Moscow (2500 miles). The Round Trip Time (RTT) of packets online is around 50ms, which for many applications is not immediately noticeable.

But for some time-critical workloads, its a deal breaker.

Local Zones offer a very cut down version of an AWS Region, targeting compute workloads that use a virtual machine Instance. First available in Japan, there are currently 16 in service; this recent announcement of 32 more will make 48 Local Zones.

While many have become familiar with AWS, the minimal viable product of a Local Zone may leave some confused: the options at your disposal are listed here.

Local Zone attachments

Local Zones are attached to a host Region. In the case of the announced Perth Local Zone, the API designation for this indicates this will be linked to the yet-to-launch Melbourne Region.

When it comes to load balancing within the Local Zone, typically only Application Load Balancing (ALB) is available. That’s perfect for HTTP based workloads with multiple local application servers, but if you’re looking to then add a managed RDS database behind that, you’ll be reaching back to the host Region. Same for SQS, SNS, and most everything else.

Instance types will also be limited, typically focusing on a subset of the latest general purpose families; this is likely to be true of the Elastic Block Store (EBS) volumes, where until now, GP2 (General Purpose SSD) has been the primary option.

When it comes to networking, it appears that Local Zones do not yet support IPv6 dual-stack addressing, as shown in the Console option for defining a subnet with the current Oregon/Los Angeles Local Zone:

IPv4 only subnet creation in Oregon/LA

So, what would benefit from Local Zones? Well architectures with local access direct to instances, that perhaps transform and validate requests on the edge, or perhaps cache responses at the edge before forwarding more efficient queries across the “VPC-internal” connectivity to the host Region. Another use case may be local EC2 Windows Instances, where the reduced latency may make RDP access a seamless desktop experience.

Perhaps some Local Zones will supplant the need for on-premesis Outposts deployments.

Perhaps over time more architectural patterns will come about, and more services will start to make their way into the common Local Zone implementation. Some Local Zones may grow to become full Regions, as happened with the original Osaka (Japan) Local Zone.

Regardless of the way it ends up being used, the expansion is a massive step up in the globally deployed infrastructure.

Stronger SSH Keys for EC2

For those not familiar, SSH is the Secure Shell, an encrypted login system that has been in use for over 25 years. It replaced unencrypted Telnet for remote (text) terminal connections used to access (and administer) systems over remote networks.

Authentication for SSH can be done in multiple ways: simple passwords (not recommended), SSH Keys, and even MFA.

SSH keys is perhaps one of the most common ways; its simple, free, and relatively easy to understand. It uses asymmetric key pairs, consisting of a Private key, and a Public Key.

Understandably, the Private key is kept private, only on your local system perhaps, and the Public key which is openly distributed to any system that wishes to give you access.

For a long time, the Key algorithm used here was the RSA algorithm, and keys had a particular size (length) measured in bits. In the 1990s, 128 bits was considered enough,but more recently, 2048 bits and beyond has been used. The length of the key was one factor to the complexity of guessing the correct combination: fewer bits means smaller numbers. However, the RSA algorithm becomes quiet slow when key sizes start to get quite large, and people (and systems) start to notice a few seconds of very busy CPU when trying to connect across the network.

Luckily, a replacement key algorithm has been around for some time, leveraging Elliptical Curves. This article gives some overview of the Edwards Curve Elliptical Curve for creating the public and private key.

What we see is keys that are smaller compared to RSA keys of similar cryptographic strength, but more importantly, the CPU load is not as high.

OpenSSH and Putty have supported Edwards curves for some time (as at 2022), and several years ago, I requested support from AWS for the EC2 environment. Today, that suggestion/wish-list item has come to fruition with this:

Amazon EC2 customers can now use ED25519 keys for authentication with EC2 Instance Connect

AWS has been one of the last places I was still using RSA based keys, so now I can start planning their total removal.

  • Clearly generating a new ED25519 key is the first step. PuttyGen can do this, as can ssh-keygen. Save the key, and make sure you grab a copy of the OpenSSH format of the key (a single line that starts with ssh-ed25519 and is followed by a string representing the key, and optionally a space and comment at the end). I would recommend having the Comment include the person name, year and possibly even the key type, so that you can identify which key for which individual.
  • You can publish the Public Key to systems that will accept this key – and this can be done in parallel to the existing key still being in place. The public key has no problem with being shared and advertised publicly – its in the name. The worse thing that someone can do with your public key is give you access to their system. In Linux systems, this is typically by adding a line to the ~/.ssh/authorized_keys file (note: US spelling); just add a new line starting with “ssh-ed25519”. From this point, these systems will trust the key.
  • Next you can test access using this key for the people (or systems) that will need access. Ensure you only give the key to those systems or people that should use it. Eg, yourself. When you sign in, look for evidence that shows the new key was used. For example, the Comment on the key (see point 1 above) may be displayed, such as:
  • Lastly you can remove the older key being trusted for remote access from those systems. For your first system, you may one to leave one SSH session connected, remove the older SSH key from the Authorized Keys file, and then initiate a second new connection to ensure you still have access.

Now that we have familiarity with this, we need to look at places where the older key may be used.

In the AWS environment, SSH Public Keys are stored in the Amazon EC2 environment for provision to new EC2 instances (hosts). This may be referenced and deployed during instance start time; but it can also be referenced as part of a Launch Configuration (LC) or Launch Template (LT). These LCs and LTs will need to be updated, so that any subsequent EC2 launches are provisioned with the new key. Ideally you have these defined in a CloudFormation Template; hence adjusting this template and updating the stack is necessary; this will likely trigger a replacement of the current instances, so schedule this operation accordingly (and test in lower environments first).

There’s no sudden emergency for this switch; it is part of the continual sunrise and sunset of technologies, and address the technical debt in a systematic and continual way, just as you would migrate in AWS from GP2 to GP3 SSD EBS volumes, from one EC2 instance family to the next, from the Instance MetaData v1 to v2, and or from IPv4 to dual-stack IPv6.

Gartner Magic Quadrant for Cloud Database: 2021 v 2020

In December, Gartner produced another one of their Magic Quadrants comparing the offering from various Cloud service providers focusing on their database offerings. While its like reading tea leaves, its interesting to see the jostling of the players, the new departures who are excited (funded) enough to run an analyst relations ream, and those who are dropping out.

You can get a copy of the current report from Gartner, AWS, or the 2020 version from Google.

Here’s a mash up comparing the two years; the darker navy blue is 2021, and the lighter blue dots are 2020.

New to this in 2021 are:

  • Intersystems
  • MariaDB
  • Single Store
  • Exasol
  • Cockroach Labs

Leaving the magic quadrant in 2021 are Tencent.

Much improved are AWS and Microsoft who continue to lead – these two are now ranked neck and neck, with Oracle sitting behind them (but also improved). IMHO, those increasing in position are SAP, Teradata, Snowflake, Databricks an Cloudera, and even Huawei.

At the same time, relative to the others in this list, are two that are dropping in comparison: Redis and Marklogic – but only slightly.

Review your Tech for the New Year

As any arbitrary point in time, the start of a new year is as good as any to do those activities which ideally are done regularly, but often only happen annually. Like checking the batteries in your smoke detectors, there’s a set of really trivial steps that anyone with any online technology interface can, and should do.

  1. Check back with online services you use and see if they now support Multi-Factor Authentication (MFA); enable it if they do. MFA comes in multiple types, from an SMS to your mobile, an automated telephone call, an app installed on your phone that generates a unique code every 30 seconds, and offline hardware devices such as a Yubikey (FIDO2).
  2. Uninstall programs you no longer use. On Windows, go to Add or Remove Programs and review the entries. On Debian and similar, “dpkg –get-selections |grep installed
  3. Update those programs you do use, from verified authoritative sources. In particular if you have a Password Manager (take a backup first!), web browser, email client.
  4. Ensure all OS patches are installed. Your OS should have support for this, but some patches may be held back.
  5. Update the firmware on your home WiFi router. If your ISP provided you with the router, then ask them for the updates and how to apply them. If there are no updates for the last 4 years (eg: since the WAP2 Krack attacks), then go buy a new one that will come with firmware updates from the manufacturer.
  6. Update the firmwares on your printers, network security cameras, desk phones and any other devices you have.