Migrating & Operating Public DNS with AWS Route53

DNS is the fundamental directory service of the Internet, and these days, of all corporate systems performing discovery of their various components in digital deployments. It is used at least hundreds of times per day per device, and no one really notices until it breaks.

For the most part, its what changes the hostname you have used to navigate to this article – blog.james.rcpt.to – into an IP address of some sort, the numbered address that is supposed to uniquely identify the server or load balancer that is connected to the public Internet to which your browser will then make a network connection.

If your DNS breaks, then your service, or perhaps your entire company, is “off the air”, no longer discoverable by clients wishing to use human-readable labels and names to find your address.

I recently gave a 45 minute presentation at the AWS User Group in Perth, Western Australia, highlighting some of the advantages of using Route53 for DNS, an some of the more modern security protections therein; slides can be found here.

I have helped various organisations transition their authoritative DNS from their existing services to AWS Route53. This migration, in isolation, requires coordination, preparedness, and awareness of the impact of poor planning, and opportunity for service improvement.

DNS Migration Planning

DNS manages to serve the scale of the planet by effective use of caching. This is done at multiple layers in the service.

When a DNS service does a looking for a query it has not already cached (or has expired from cache), it must start a recursive resolve. For this to happen, it needs to find the Authoritative Name Server for a Zone. This Authoritative Name Server will be delegated down from a parent domain, and so on, until we reach the top of the DNS tree.

A common example in this scenario is the address “www.example.com”. Before answering the address for this exact name, a client must determine the address of the Authoritative Name Server(s) for “example.com”. If this is unknown, then the client must ask the address of the Authoritative Name Server(s) for “.com”.  Again, if “.com” Name Servers are not already known (cached), then the level above that must be queried, commonly called the Public DNS Roots, or “.”.

The Global Root DNS Servers

These DNS roots are globally agreed, and every DNS server is given a (relatively) static file of the addresses for these. They are given generic names, A-M, and are operated by a variety of organisations who agree to share the same data for the 1000+ Top Level Domains under the globally agreed root.

These Root DNS Servers are all accessible by both the existing IPv4 address scheme, and the newer IPv6 Internet. These addresses exist in a file commonly called the “root.hints” file and is distributed with all DNS server software as the initial glue. It rarely changes.

In a trick of Internet routing (BGP), many of these 13 hosts (A-M) are also each replicated multiple times by a process called Anycast: the same small address segment that the server lives on is “announced” to the world from multiple locations, and a duplicate server, performing the same process, and responding with the same answers.

This first layer of scalability helps the root servers deal with billions of devices using the DNS service every second.

The Global Top Level Domains (TLDs): Registry Operator and Registrars

Each of the global TLDS are operated by a Registry Operator, but records are added and removed by multiple Registrar organisations. For example, the “.com” zone is operated by Verisign, but there are many Registrar organisations you can obtain a DNS name from, amongst which is Route53 itself.

These operators have a selection of innovations and policies they apply to their delegation. Some operate their service with just IPv4, and some are dual-stack IPv4 and IPv6. Some operators have their DNS zones cryptographically signed (using DNSSEC) to provide some validation of the DNS queries.

Route53

Starting in December 2010, Route53 originally provided support only for hosting DNS Zones for customers. The engineering for the service at that time was designed to eclipse what most organisations had in place, providing higher reliability and scalability.

Back in the day, the authoritative references on running DNS services were the Bind Operators Guide (aka The BOG), and the O’Reilly book by author Cricket Liu. Most organisations organised just two DNS servers to respond to their customers queries, and the most common software for doing so was ISC’s Berkeley Interned Name Daemon, or BIND.

Of course, to have your own DNS server, you need a fixed IP address that your server would operator from, as this IP address is what the upstream zone would respond with to clients. And thus the initial problem for most organisations was getting a pool of static IP addresses.

Most ISPs only hand out dynamic addresses, and charge substantially more for static routes. Other (typically larger) organisations went through a laborious process of having IP addresses themselves assigned to their organisation (through ARIN, APNIC, or other IP address registries), and then deploying BGP to announce their range to their connected ISP(s) – could be multiple.

This overhead of assigned IP address ranges, setting up corporate BGP (and trying to secure it) all went away with the launch of hosted DNS services, and Route53 turned out to be one of the most well engineered and cost-effective solutions.

With Zones hosted we can delegate from the parent domain to the name servers that Route53 provides us; each individual record (e.g., “www”) can then point to any IP address (i.e., anywhere). The entire need for corporations acquiring large blocks of IP addressing for their organisation went away.

Indeed, I have helped organisations who previously had very large, fixed blocks of IPv4 addresses to relinquish some of these in a commercial market (for millions of dollars).

Route53 has expanded its remit in the AWS Cloud environment. In addition to hosting authoritative DNS zones, it also offers Registrar Services for hundreds of domains, as well as tuning the use of DNS with in the Virtual Private Cloud Environment. Each of these functions can be used completely separately for example, you can:

  • Register a domain with Route53 to handle the re-registration, but delegate to your own (or a 3rd party) DNS servers.
  • Register a domain with another Registrar (e.g., Go Daddy, Verisign) and delegate to Route53 Hosted Zone.
  • Configure complex routing and protection mechanisms for your Virtual Cloud Environment.
  • Host private DNS for your VPC, invisible to the outside world.

In this article, we are going to concentrate on running Public DNS Zones, and the protections you can put in place.

Route53 Public DNS Zone Hosting: Scalability

By default, Route53 gives the operator a choice of 4 DNS servers to pass to the parent domain for delegation. Each of the 4 names are themselves given DNS Names, from four different TLDs. The Four names also themselves resolved to both IPv4 and IPv6 addresses.

The parent domain and then record (and cache) the delegation addresses of these 4 DNS server endpoints; and can instruct end clients doing lookups to also cache this delegation.

Each of those four endpoint addresses are also potentially themselves Anycast announced from multiple locations worldwide. This helps clients reach the closest deployed endpoint for each of the four names, reducing DNS latency.

This set of 4 DNS servers provides much greater reliability than the traditional two, and the multiple anycast presentation further improves this. The chance of any other AWS Route53 customer having the same set of 4 DNS server endpoints is very small, so any Denial of service on specific set of delegated IP address for another Zone is unlikely to affect your zone significantly. This is part of the reason why Route53 offers a 100% availability Service Level Agreement (SLA).

(Note, the control plane, for providing updates to records, is not covered by this SLA)

Route53: Questions

A number of configuration questions arise when planning the migration:

  1. do you want query logging turned on?
    1. this is delivered to an S3 bucket: what’s the retention policy on this (look at S3 life cycle policies) – always set an automated time to delete, perhaps in 12 months?
    2. what’s the analytics processing done on this data, if any?
    3. who has access to this log data, is the bucket marked private, default encryption, versioning?
  2. do you want DNSSEC enabled on the zone –perhaps do this after service migration if you don’t currently have DNSSEC enabled.
  3. What integrations for automated updates are in place, if any?
  4. Who needs access to the console to see and/or update records?

Route53: Migration

The process for migrating to Route53 is relatively simple:

  1. Reduce the parent domains Cache time (TTL – time to live; see below) for the delegation records that point at your current service: a value of 300 seconds may be reasonable.
  2. Prepare the DNS Export form the older service; review the records in there before doing a test import into the new service to ensure that no records cause any issue. This is perfectly safe as we have not redelegated yet. You should also review the individual records’ TTL values, and potentially reduce them down as well as part of the export/import. Any web site or load balancer should run with a TTL no higher than 300 seconds. Once the export/import has been successful, then delete the imported records – we will take a fresh update later…
  3. Determine if there are any processes that are automatically updating your DNS. These will have to be integrated to call the Route53 API for those updates after the migration.
  4. Ensure you have access to redelegate with the Registrar. Test username & password to log in.
  5. Schedule a time for the transition, during which we will avoid updates, or update both old and new. You will have to wait for 5 periods of the previous original TTL time to pass since step 1. For example, if the Delegation TTL was a week, then wait 5 weeks before proceeding (see below on TTL)!
  6. At the agreed time:
    1. Record the current (old) delegation IP addresses the registrar has configured.
    2. re-export the values
    3. update the record TTLs in the export.
    4. import into the Route53.
    5. update the Registrar with the four new Delegation address.
    6. test the DNS immediately; if it fails, revert the Delegation to the addresses in step (a) above.
    7. watch logs/metrics from other systems that rely upon this domain name, such as Web traffic for the zone, or mail traffic.
    8. test the DNS again after 25 minutes.

DNS Caching: TTL & Delegation records

A key element of DNS’s scalability is caching as much as possible. Records all have a customer defined Time To Live value, the duration (in seconds) that a record can be kept by a client.

When making changes to records, we typically observe 50% of clients seeing cached values updated after the TTL period; a further 50% of the remainder see the update after another TTL period; in practice, the TTL almost acts like a half-life:

DurationCumulative % clients seeing update after this time
1 * TTL50%
2 * TTL75%
3 * TTL87.5%
4 * TTL93.75%
5 * TTL96.875%
6 * TTL98.4375%

We would typically wait at least 5 periods of the time-to-live value on a delegation before making a change. As many organisations have this value set to a week (or a month), this could take some time; we’d also recommend keeping this value relatively small once migrated, to ensure you have flexibility in re-delegation in future. 24 hours is a reasonable time for delegation records, unless you’re about to do a migration, in which case 300 seconds is reasonable (25 minutes for 96% of clients to see the update).

During this period, however, you could have your new DNS zone hosting the identical records of the old one, and any updates during this period should be applied to both (or updates can be avoided during this window).

However, some network operators chose to override values as they see fit. A certain ISP in Australia would not honour any small TTL values, and this would result in at least a 24-hour TTL being enforced. Given the 5-TTL-period duration to get 96% of clients seeing the update, you may have to adjust your time frame to accommodate a parallel run of old and new DNS service. Unfortunately, you cannot force update this ISPs.

Route53: Records to Add and Modify

Route53 will automatically create a Start of Authority (SOA) record for your zone. This standard record type has two fields of interest: the RNAME (the responsible person’s email address), and the default TTL value (used for Negative DNS responses, when a query tries to find something not defined. You can leave these as the default, but if you adjust them, then the RNAME field must point to a monitored mailbox, and the TTL adjustment may result in higher query traffic.

Outside of the SOA, there’s a number of other DNS records you should put in place:

  • SPF on the apex, now implemented as a Text (TXT) record, to indicate where your Email is permitted to originate from (low volume lookup traffic). Something like “v=spf1 mx -all” may do. You should set a different record even if your domain is not hosting email, to indicate that all fraudulently generated email from it is SPAM.
  • CAA on the apex, to indicate to Certificate Authorities, who your permitted CAs for your domain are (extremely low volume lookup traffic). Something like:
    0 issue “letsencrypt.org”
    0 issue “amazon.com”
    may do.
  • DMARC: a TXT record on hostname _dmarc.yourdomain, with value “v=DMARC1; p=quarantine; rua=mailto:youremail@yourdomain
  • SMT TLS Reporting: a TXT record on the hostname _smtp._tls.yourdomain with value v=TLSRPTv1;rua=mailto:youremail@yourdomain
  • An MTA (Mail Transfer Agent) STS (Strict Transport Security) record: a TEXT record for hostname _mta-sts.yourdomain with value “v=STSv1; id=2021021000;” – the number can be a representation of the current datetime in yyyymmddhhmm that can be incremented. You should also set up a static web site to host your MTA STS policy document itself on https://mta-sts.yourdomain/.well-known/mta-sts.txt.

For checking this domain’s security configuration, have a look at Hardenize.com, by Ivan Risti?.

Post-Migration

Ensure that all administrative staff have access to set and update the records they need.

Lastly, don’t forget to decommission the existing DNS service once you are convinced you do not need to go back to it.

(Previously-) Symantec Run Certificate Authority distrust is about to hit

Sometime in the next week, a large swathe of web sites around the world, from Fortune 500 companies, to governments and beyond will stop being available securely. All with the next release version of Google Chrome (version 70) and Firefox. The words “NET::ERR_CERT_SYMANTEC_LEGACY” are about to become well known.

For those wishing to look into the future, Google Chrome makes its future releases available to those interested under the labels of Beta, and prior to that, as a “Canary” (ie, in the coal mine). And if you’d cast your eyes over a few sites with these pre-release versions, you’ll see examples like that shown here (name removed).

A web site not available as the issuing certificate authority has been distrusted.

Sadly, the operators of these sites may well be looking at the embedded certificate expiry (Valid Until) date, and think this is not an issue for them. Some of these certificates may have many more years of appearing to be valid.

The case is much worse: the organisation that these web site operators obtained the certificate from — the Certificate Authority — is about to have its status revoked, having been caught acting in ways that undermine the trust instilled in it. These are all powered by Symantec’s legacy root certificates, which includes the Thawte, GeoTrust, and RapidSSL brands.
You can read plenty online about this, for example: form DigiCert, Mozilla, and Google. Here’s Scott Helm’s February 2018 post, and his follow up from a recent Alexa Top sites crawl. Several of these have since updated their certificates (Well done, Tigereair.com.au, stratco.com.au, naati.com.au; fixed it!).

So what’s actually going to happen?

Some disruption.

I’m sure a large number of these will be smeared across mainstream media for being “hacked”, or “offline”, when in reality, “oblivious” is closer to the point.

Poor service providers are going to tell vulnerable people to “ignore the security warnings”, and to “proceed” to the site regardless. This is BAD advise. If you are told this, you are better off ceasing to do business with the organisation as they do not under stand the security they are dealing with. If this is the advice of your employer, then you should consider what this means to the security of your personal HR (and other) data.

There’s far too many people operating, controlling, or otherwise “responsible” for large numbers of web sites who have no idea about what they are actually operating. It’s evident from scanning site and seeing those that still have legacy, vulnerable encryption on their HTTPS configuration, or worse, serve content over unencrypted HTTP. Just because you don’t value you’re content from modification, doesn’t mean your web visitors don’t value NOT being compromised when visiting you.

Web traffic interception happens every second of every day. In Wifi Cafes, Airports, air planes, corporate LANs. TLS (formerly SSL) is the best way we have to protect the integrity of the content across untrusted networks, but we’re in a constant capability race to ensure that services only offer ways to connect that minimise the risk of using untrusted networks.

Driven by a desire to not change things that appear to be working (or indeed, being either lazy, overworked, under resourced/funded, or unaware), organisations are not bringing up their drawbridge of security on their most vulnerable interfaces: those services that are facing the Internet, such as their web site or web services. This issue, when it breaks, will help highlight that some organisations and individuals should probably not be in charge of the services they currently operate.

Case in point: check out BankGradeSecurity.com, a ranking of financial institutions around the world and how well they have adopted modern encryption and security capabilities on their web site and Internet banking services.

It’s clear we’re constantly in the middle of technology transitions – IT Services are not simply done; they are either in-use and actively well-maintained, or they should be archived or removed. Anything else demonstrates cost cutting and under-valuation of the digital capability that allows an organisation to operate.

Organisations face a choice of two types of Managed Services providers today: those that understand service maintenance on behalf of their customers and those that do not (and are still running with the same HTTPS configuration they went live with years ago.

It’s easy to spot these services — they haven’t enabled GCM based AES block ciphers or Eliptical Curve Diffie-Hellman Ephemeral (ECDHE) key exchange mechanisms. Worse, those permitting the use of SSLv2, SSLv3, or TLS 1.0, or not yet permitting the use of TLS 1.2 (or the shiny new TLS 1.3). And unbelievably, those that don’t enable HTTPS at all.

There’s more signs of stagnation if you know what to look for; lack of HTTP/2, lack of IPv6, long TTLs on DNS records, etc, that all indicate organisations that are stifled, or don’t have capability to understand what they are doing. Sometimes its corporate direction to use 3rd party IT operations who again, use the cheapest unskilled and unqualified labour to delivery IT services, dressed up in marketing to make it look like they save the earth.


If you’re affected by this, consider attending Nephology’s Web Security training.

Changing the definition of High Availability

I was recently speaking with someone about high availability. Their approach to high availability was two hand crafted instances (servers), in each in a separate Availability Zones on AWS, behind an ELB.

My approach to high availability is two (or more) instances in two (or more) Availability Zones behind an ELB, auto-created (and replaced upon failure) by AutoScale, with health-checks to ensure the instances are processing their workloads correctly, and replacing those that are not.

I extend this to include:

  • rolling updates so when the need arises, such as replacing the underlying OS every 6 – 12 months I can do so as seamlessly as possible
  • applying critical security updates daily without human intervention
  • ensuring that logs are sent off the instance to persistent storage and analytics where they can’t be overwritten or tampered with
  • ensuring data workloads are on object storage, or managed database platforms

Of course the ASG provisioning approach to EC2 deployment requires that everything during install is scripted and repeatable.And it gives me the freedom to size my ASG to zero instances outside of service hours, and ensure that the system deploys every single time.

I’ve done this resize ASGs to zero in non production times for many years now, across hundreds of instances per day. You can bet its pretty well tested, has saved hundreds of thousands of dollars, and ensured that the process works.

I bake my own AMIs, and re-baseline these semi-regularly using automation to do so – a process that takes 10 minutes of wall time, and about 60 seconds of my time. I treat AMIs as artefacts, as precious and as important as the code they run.

I have to stop myself sometimes when others rely upon pre-cloud approaches to high availability.

The snowflakes must stop. Cattle, not pets.

The Move to VPC NAT Gateway

Once more on the move to… this time, the move to RE-move Public and Elastic IP addresses from EC2 Instances…

As previously stated, one of my architectural decisions with AWS VPC using S3 and other critical operational services is to not have Single Points of Failure or anything overly complicated between my service Instances, and the remote services they depend upon. Having addressed private access to S3 via VPC Endpoints for S3, the volume of traffic I have that must traverse out of my VPC has reduced. Additional Endpoints have been indicated by the AWS team, I know this is going to further reduce the requirement my EC2 Instances will have on outbound Internet access in future.

But for now, we still need to get reliable outbound traffic, with minimal SPOFs.

Until recently, the only options for outbound traffic from the VPC were:

  1. Randomly assigned Public IPs & a route to the Internet
  2. Persistently allocated Elastic IPs & a route to the Internet
  3. A route to on-premise (or other – but outside of the VPC), with NAT to the Internet performed there
  4. A SOCKS Proxy that itself had Public or Elastic IP & a route to the Internet
  5. An HTTP Proxy, perhaps behind an internal ELB, that itself had a Public IP or Elastic IP & a route to the Internet

All four of these options would remain with some part of my architecture having an interface directly externally exposed. While Security Groups give very good protection, VPC Flow Logs would continue to remind us that there are persistent “knocks on the door” from the Internet as ‘bots and scripts would test every port combination they could. Of course, these attempts bounce off the SGs and/or NACLs.

We can engineer solutions within the instance (host OS) with host-based firewalls and port knocking, but we can also engineer more gracefully outside in the VPC as well.

Until this year, VPC supported routing table entries that would use another Instance (an a separate subnet) as a gateway. Using IPTables or its nfTables replacement, you could control this traffic quite well, however that’s an additional instance to pay for, and far worse, to maintain. If this NAT Instance was terminated, then it would have to be replaced: something that AutoScale could handle for us. However, a new NAT instance would have to adjust routing table(s) in order to add it’s Elastic Network Interface (ENI) to be inserted as a gateway in its dependent subnet’s routing tales. Sure we can script that, but its more moving parts. Lastly, the network throughput was constrained to that of the NAT Instance.

AWS VPC NAT Gateway
AWS VPC NAT Gateway

Then came NAT Gateway as a managed service. Managed NAT, no bandwidth limit, nothing to manage or maintain. The only downside is that managed NAT is not multi-AZ: a NAT Gateway exists in only a single Availability Zone (AZ).

Similar to NAT Instances, a NAT Gateway should be defined within a Public subnet of our VPC (i.e., with a direct route to the Internet via IGW). The NAT Gateway gets assigned an Elastic IP, and is still used as a target for a routing rule.

To get around the single-AZ nature of the current NAT Gateway implementation, we define a new routing table per AZ. Each AZ gets its own NAT Gateway, with its own EIP. Other subnets in the same AZ then use a routing table rule to route outbound via the NAT Gateway.

The biggest downside of Managed NAT is that you can’t do interesting hacks on the traffic as it traverses the NAT Gateway at this time. I’ve previously used Instance-NAT to transparently redirect outbound HTTP (TCP 80) traffic via a Squid Proxy, which would then do URL inspection and white/black listing to permit or block the content. In the NAT Gateway world, you’d have to do that in another layer: but then perhaps that proxy server sits in a NAT-routed subnet.

Having said that, soon the ability to do interception of HTTP traffic will go away, soon to be replaced by an all-SSL enabled HTTP/2 world — well, over the next 5 years perhaps. This would require SSL-interception using Server Name Indication (SNI) “sneek & peak” to determine the desired target hostname, then on-the-fly generate a matching SSL certificate issued by our own private CA — a CA that would have to be already trusted by the client devices going through the network, as otherwise this would be a clear violation of the chain of trust.

What else is on my wishlist?

  • All the VPC Endpoints I can dream of: SQS, AutoScaling (for signaling ASG events SUCCESS/FAILURE), CloudWatch (for submitting metrics), DynamoDB, EC2 and CloudFormation API
  • Packet mangling/redirection on Managed NAT on a port-by-port basis.
  • Not having to create one NAT Gateway per-AZ, but one NAT Gateway with multiple subnets such that it selects the egress subnet/EIP in the same AZ as the instances behind it if it is healthy, so I don’t need to make one Routing table per NAT Gateway, and auto fail-over of the gateway to other AZs should here be an issue in any AZ
  • VPC peering between Regions (Encrypted, no SPOFs)

Conclusion

So where is this architecture headed?

I suspect this will end up with a subnet (per AZ) of instances that do require NAT egress outbound, and a subnet of instances that only require access to the array of VPC Endpoints that are yet to come. Compliance may mean filtering that through explicit proxies for filtering and scanning. Those proxies themselves would be in the “subnet that requires NAT egress” — however their usage would be greatly reduced by the availability of the VPC Endpoints.

The good news is, that based upon the diligent work of the VPC team at AWS, we’re sure to get some great capabilities and controls. VPC has now started a next wave of evolution: its launch (back in 2009) had kind of stagnated for a while, but in the last 2 years its back in gear (peering, NAT Gateway, S3 Endpoints).

Looking back over my last three posts – The Move to Three AZs in Sydney, The Move to S3 Endpoints, and now this Move to NAT Gateway, you can see there is still significant improvements to a VPC architecture that needs to be undertaken by the administrator to continue to improve the security and operational resilience in-cloud. Introducing these changes incrementally over time while your workload is live is possible, but takes planning.

More importantly, it sets a direction, a pattern: this is a journey, not just a destination. Additional improvements and approaches will become recommendations in future, and we need to be ready to evaluate and implement them.

The move to S3 Endpoints

And now, continuing my current theme of “the move to…” with the further adventures of running important workloads and continuing the evolution of reliability and security at scale; the next improvement is the enabling of S3 endpoints for our VPC.

VPC Perspective: going to S3

Access to S3 for many workloads is critical. Minimising the SPOFs (Single Points of Failure), artificial maximum bandwidth or latency constraints, and maximising the end-to-end security is often required. For fleets of instances, the options used to be:

  1. Public IPs to communicate directly over the (local) Internet network within the Region to talk to S3 – but these are randomly assigned, so you’d rely on the API credentials alone
  2. Elastic IPs in place of Public IPs, similar to above, and then have to manage the request, release, limits and additional charges associated with EIPs.
  3. NAT Instances in AutoScale groups, with boot time scripts and role permission to update dependent routing tables to recover from failed NAT instances
  4. Proxy servers, or an ASG of proxy servers behind an internal ELB

In all of these scenarios, you’d look to use EC2 Instances with temporary, auto-rotating IAM Role credentials to access S3 over an encrypted channel (HTTPS). TLS 1.2, modern ciphers, and a solid (SHA256) chain of trust to the issuing CA was about as good as it got to ensure end-to-end encryption and validation that your process had connected to S3 reliably.

But S3 Endpoints enhances this, and in more than just a simple way.

Let’s take a basic example: an Endpoint is attached to a VPC with a policy (default, open) for a outbound access to a particular AWS Service (S3 for now), and the use of this Endpoint is made available to the EC2 Instances in the VPC by way of the VPC Routing table(s) and their association to a set of subnets. You may have multiple routing tables; perhaps you’d permit some of your subnets to use the endpoint, and perhaps not others.

With the Endpoint configured as above it permits direct access to S3 in the same Region without traversing the Internet network. The configured S3 Logging will start to reflect the individual Instance Private IPs (within the VPC) and no longer have the Public or Elastic IPs they may have previously used. They don’t need to use a NAT (Instance or NAT Gateway) or other Proxy: the Endpoint provides reliable, high through-put access to S3.

However, the innovation doesn’t stop there. That policy mentioned above on the Endpoint can place restrictions on the APIs and Buckets that are accessible via this Endpoint. For example, a subnet of Instances that I want to ensure they can ONLY access only my named bucket(s) Endpoint policy. As they have no other route to S3, then they can’t access 3rd party anonymously accessible buckets.

I can also limit the API calls via the Endpoint: perhaps permitting on Get, Put, List operations. These instance couldn’t assume another role (sts:assumeRole) that may have s3:DeleteBucket privileges, and use it via this restricted Endpoint.

Let’s make it a little more complex, with a second Endpoint on the VPC. Perhaps I’ll associate this second Endpoint with my administrative  subnet, and permit an open policy on it.

S3 Perspective: Restricting sources

An S3 bucket, once created in a Region, accepts valid signed requests from the Principals you permit in IAM policy. You can add Bucket Policies to them to restrict this to a set of trusted IP CIDR blocks (both IPv4 and IPv6 now – IPv6 only for the S3 public API Service Endpoint, not the optionally enabled S3 website or VPC Endpoint). For example, a DENY policy with a condition of:

"Condition": {
   "NotIpAddress": {
     "aws:SourceIp": [
       "54.240.143.0/24",
       "2001:DB8:1234:5678::/64"
     ]
   },
}

But with VPC Endpoints, you would instead add a DENY role with a condition of:

"Condition": {
  "StringNotEquals": {
    "aws:sourceVpc": "vpc-1234beef"
  }
}

Items in the condition block are AND-ed together at this time, so if you’re writing a policy with both VPC endpoint requirement OR an on-premise IP block, things get interesting: you’re going to want to Boolean OR these two separate Conditions in a Deny block:

"Condition": { 
  "NotIpAddress": { 
    "aws:SourceIp": [ "54.240.143.0/24", "2001:DB8:1234:5678::/64" ],
  },
  "StringNotEquals": {
    "aws:sourceVpc": "vpc-1234beef"
  }
} # FAILS EVERY TIME AS BOTH ARE EVALUATED!!

Luckily there’s a work around. IfExists can conditionally check a Condition key, and skip it if its not defined:

"Condition": {
  "NotIpAddressIfExists": {
    "aws:SourceIp" : [ "54.240.143.0/24", "2001:DB8:1234:5678::/64" ]
  },
  "StringNotEqualsIfExists" : {
    "aws:SourceVpc", [ "vpc-1234beef" ]
  }
}

Thus these two can be ANDED together and still pass if either one is TRUE. Kind of like an OR! Add the Action: DENY to this and we should be looking pretty good.

In summary

So what’s this got us now?

  1. Our S3 logs should only contain IP addresses from within the VPC now, so it’s fairly obvious to pick out any other access attempts.
  2. Our reliance on external Internet access has slightly reduced – but there are other sites and services in use (eg, SQS, CloudWatch for metric submission, or even AutoScale for signaling ASG scaling action results) then these are still required to go our the Internet Gateway (IGW) one way or another
  3. Our S3 buckets can have additional constrains to further limit the scope of credentials.
  4. We’ve avoided complex scenarios of lashing together scripts that dynamically adjust routing tables, intercept SSL traffic on proxy servers, or other nasty hacks

The AWS team has publicly indicated more Endpoints are to come, so this shows a clear trajectory: less reliance on “Internet” access for instances. All of this is a long, long way from what VPC looked like back in 2008, when it was S3-backed instances with no IGW – just private subnets with an IPSEC VGW to on-premise.

The underlying theme, however, is that the security model is not set and forget, but to continue this journey as the platform further improves.

So, key recommendations:

  1. Use IAM Roles for EC2 instances (unless you have multiple un-trusted clients using SSH/RDP to the instance). These credentials auto-rotate multiple times per day, and are transparently used by the AWS SDKs.
  2. Turn on S3 Bucket Logging (to a separate bucket). When setting he bucket logging destination, make sure you end the prefix with a trailing slash (/). Eg, “MyBucket” logs to bucket “MyLoggingBucket”, with prefix “S3logs/MyBucket/”. S3 Logging is a Trusted Advsior recommendation: setting a Lifecycle policy on these logs is my recommendation (dev/test at X days, Production at Y years?).
  3. Create (at least one) VPC S3 Endpoint for the buckets in region, and adjust routing tables accordingly. Perhaps start with an open policy if you’re comfortable (it’s no worse than the previous access to S3 over Internet), and iterate from there.
  4. Consider locking your S3 buckets down to just your VPC, or your VPC and some well known ranges.