The Move to VPC NAT Gateway

Once more on the move to… this time, the move to RE-move Public and Elastic IP addresses from EC2 Instances…

As previously stated, one of my architectural decisions with AWS VPC using S3 and other critical operational services is to not have Single Points of Failure or anything overly complicated between my service Instances, and the remote services they depend upon. Having addressed private access to S3 via VPC Endpoints for S3, the volume of traffic I have that must traverse out of my VPC has reduced. Additional Endpoints have been indicated by the AWS team, I know this is going to further reduce the requirement my EC2 Instances will have on outbound Internet access in future.

But for now, we still need to get reliable outbound traffic, with minimal SPOFs.

Until recently, the only options for outbound traffic from the VPC were:

  1. Randomly assigned Public IPs & a route to the Internet
  2. Persistently allocated Elastic IPs & a route to the Internet
  3. A route to on-premise (or other – but outside of the VPC), with NAT to the Internet performed there
  4. A SOCKS Proxy that itself had Public or Elastic IP & a route to the Internet
  5. An HTTP Proxy, perhaps behind an internal ELB, that itself had a Public IP or Elastic IP & a route to the Internet

All four of these options would remain with some part of my architecture having an interface directly externally exposed. While Security Groups give very good protection, VPC Flow Logs would continue to remind us that there are persistent “knocks on the door” from the Internet as ‘bots and scripts would test every port combination they could. Of course, these attempts bounce off the SGs and/or NACLs.

We can engineer solutions within the instance (host OS) with host-based firewalls and port knocking, but we can also engineer more gracefully outside in the VPC as well.

Until this year, VPC supported routing table entries that would use another Instance (an a separate subnet) as a gateway. Using IPTables or its nfTables replacement, you could control this traffic quite well, however that’s an additional instance to pay for, and far worse, to maintain. If this NAT Instance was terminated, then it would have to be replaced: something that AutoScale could handle for us. However, a new NAT instance would have to adjust routing table(s) in order to add it’s Elastic Network Interface (ENI) to be inserted as a gateway in its dependent subnet’s routing tales. Sure we can script that, but its more moving parts. Lastly, the network throughput was constrained to that of the NAT Instance.


Then came NAT Gateway as a managed service. Managed NAT, no bandwidth limit, nothing to manage or maintain. The only downside is that managed NAT is not multi-AZ: a NAT Gateway exists in only a single Availability Zone (AZ).

Similar to NAT Instances, a NAT Gateway should be defined within a Public subnet of our VPC (i.e., with a direct route to the Internet via IGW). The NAT Gateway gets assigned an Elastic IP, and is still used as a target for a routing rule.

To get around the single-AZ nature of the current NAT Gateway implementation, we define a new routing table per AZ. Each AZ gets its own NAT Gateway, with its own EIP. Other subnets in the same AZ then use a routing table rule to route outbound via the NAT Gateway.

The biggest downside of Managed NAT is that you can’t do interesting hacks on the traffic as it traverses the NAT Gateway at this time. I’ve previously used Instance-NAT to transparently redirect outbound HTTP (TCP 80) traffic via a Squid Proxy, which would then do URL inspection and white/black listing to permit or block the content. In the NAT Gateway world, you’d have to do that in another layer: but then perhaps that proxy server sits in a NAT-routed subnet.

Having said that, soon the ability to do interception of HTTP traffic will go away, soon to be replaced by an all-SSL enabled HTTP/2 world — well, over the next 5 years perhaps. This would require SSL-interception using Server Name Indication (SNI) “sneek & peak” to determine the desired target hostname, then on-the-fly generate a matching SSL certificate issued by our own private CA — a CA that would have to be already trusted by the client devices going through the network, as otherwise this would be a clear violation of the chain of trust.

What else is on my wishlist?

  • All the VPC Endpoints I can dream of: SQS, AutoScaling (for signaling ASG events SUCCESS/FAILURE), CloudWatch (for submitting metrics), DynamoDB, EC2 and CloudFormation API
  • Packet mangling/redirection on Managed NAT on a port-by-port basis.
  • Not having to create one NAT Gateway per-AZ, but one NAT Gateway with multiple subnets such that it selects the egress subnet/EIP in the same AZ as the instances behind it if it is healthy, so I don’t need to make one Routing table per NAT Gateway, and auto fail-over of the gateway to other AZs should here be an issue in any AZ
  • VPC peering between Regions (Encrypted, no SPOFs)


So where is this architecture headed?

I suspect this will end up with a subnet (per AZ) of instances that do require NAT egress outbound, and a subnet of instances that only require access to the array of VPC Endpoints that are yet to come. Compliance may mean filtering that through explicit proxies for filtering and scanning. Those proxies themselves would be in the “subnet that requires NAT egress” — however their usage would be greatly reduced by the availability of the VPC Endpoints.

The good news is, that based upon the diligent work of the VPC team at AWS, we’re sure to get some great capabilities and controls. VPC has now started a next wave of evolution: its launch (back in 2009) had kind of stagnated for a while, but in the last 2 years its back in gear (peering, NAT Gateway, S3 Endpoints).

Looking back over my last three posts – The Move to Three AZs in Sydney, The Move to S3 Endpoints, and now this Move to NAT Gateway, you can see there is still significant improvements to a VPC architecture that needs to be undertaken by the administrator to continue to improve the security and operational resilience in-cloud. Introducing these changes incrementally over time while your workload is live is possible, but takes planning.

More importantly, it sets a direction, a pattern: this is a journey, not just a destination. Additional improvements and approaches will become recommendations in future, and we need to be ready to evaluate and implement them.