And now, continuing my current theme of “the move to…” with the further adventures of running important workloads and continuing the evolution of reliability and security at scale; the next improvement is the enabling of S3 endpoints for our VPC.
VPC Perspective: going to S3
Access to S3 for many workloads is critical. Minimising the SPOFs (Single Points of Failure), artificial maximum bandwidth or latency constraints, and maximising the end-to-end security is often required. For fleets of instances, the options used to be:
- Public IPs to communicate directly over the (local) Internet network within the Region to talk to S3 – but these are randomly assigned, so you’d rely on the API credentials alone
- Elastic IPs in place of Public IPs, similar to above, and then have to manage the request, release, limits and additional charges associated with EIPs.
- NAT Instances in AutoScale groups, with boot time scripts and role permission to update dependent routing tables to recover from failed NAT instances
- Proxy servers, or an ASG of proxy servers behind an internal ELB
In all of these scenarios, you’d look to use EC2 Instances with temporary, auto-rotating IAM Role credentials to access S3 over an encrypted channel (HTTPS). TLS 1.2, modern ciphers, and a solid (SHA256) chain of trust to the issuing CA was about as good as it got to ensure end-to-end encryption and validation that your process had connected to S3 reliably.
But S3 Endpoints enhances this, and in more than just a simple way.
Let’s take a basic example: an Endpoint is attached to a VPC with a policy (default, open) for a outbound access to a particular AWS Service (S3 for now), and the use of this Endpoint is made available to the EC2 Instances in the VPC by way of the VPC Routing table(s) and their association to a set of subnets. You may have multiple routing tables; perhaps you’d permit some of your subnets to use the endpoint, and perhaps not others.
With the Endpoint configured as above it permits direct access to S3 in the same Region without traversing the Internet network. The configured S3 Logging will start to reflect the individual Instance Private IPs (within the VPC) and no longer have the Public or Elastic IPs they may have previously used. They don’t need to use a NAT (Instance or NAT Gateway) or other Proxy: the Endpoint provides reliable, high through-put access to S3.
However, the innovation doesn’t stop there. That policy mentioned above on the Endpoint can place restrictions on the APIs and Buckets that are accessible via this Endpoint. For example, a subnet of Instances that I want to ensure they can ONLY access only my named bucket(s) Endpoint policy. As they have no other route to S3, then they can’t access 3rd party anonymously accessible buckets.
I can also limit the API calls via the Endpoint: perhaps permitting on Get, Put, List operations. These instance couldn’t assume another role (sts:assumeRole) that may have s3:DeleteBucket privileges, and use it via this restricted Endpoint.
Let’s make it a little more complex, with a second Endpoint on the VPC. Perhaps I’ll associate this second Endpoint with my administrative subnet, and permit an open policy on it.
S3 Perspective: Restricting sources
An S3 bucket, once created in a Region, accepts valid signed requests from the Principals you permit in IAM policy. You can add Bucket Policies to them to restrict this to a set of trusted IP CIDR blocks (both IPv4 and IPv6 now – IPv6 only for the S3 public API Service Endpoint, not the optionally enabled S3 website or VPC Endpoint). For example, a DENY policy with a condition of:
"Condition": {
"NotIpAddress": {
"aws:SourceIp": [
"54.240.143.0/24",
"2001:DB8:1234:5678::/64"
]
},
}
But with VPC Endpoints, you would instead add a DENY role with a condition of:
"Condition": {
"StringNotEquals": {
"aws:sourceVpc": "vpc-1234beef
"
}
}
Items in the condition block are AND-ed together at this time, so if you’re writing a policy with both VPC endpoint requirement OR an on-premise IP block, things get interesting: you’re going to want to Boolean OR these two separate Conditions in a Deny block:
"Condition": {
"NotIpAddress": {
"aws:SourceIp": [ "54.240.143.0/24", "2001:DB8:1234:5678::/64" ],
},
"StringNotEquals": {
"aws:sourceVpc": "vpc-1234beef"
}
} # FAILS EVERY TIME AS BOTH ARE EVALUATED!!
Luckily there’s a work around. IfExists can conditionally check a Condition key, and skip it if its not defined:
"Condition": {
"NotIpAddressIfExists": {
"aws:SourceIp" : [ "54.240.143.0/24", "2001:DB8:1234:5678::/64" ]
},
"StringNotEqualsIfExists" : {
"aws:SourceVpc", [ "vpc-1234beef" ]
}
}
Thus these two can be ANDED together and still pass if either one is TRUE. Kind of like an OR! Add the Action: DENY to this and we should be looking pretty good.
In summary
So what’s this got us now?
- Our S3 logs should only contain IP addresses from within the VPC now, so it’s fairly obvious to pick out any other access attempts.
- Our reliance on external Internet access has slightly reduced – but there are other sites and services in use (eg, SQS, CloudWatch for metric submission, or even AutoScale for signaling ASG scaling action results) then these are still required to go our the Internet Gateway (IGW) one way or another
- Our S3 buckets can have additional constrains to further limit the scope of credentials.
- We’ve avoided complex scenarios of lashing together scripts that dynamically adjust routing tables, intercept SSL traffic on proxy servers, or other nasty hacks
The AWS team has publicly indicated more Endpoints are to come, so this shows a clear trajectory: less reliance on “Internet” access for instances. All of this is a long, long way from what VPC looked like back in 2008, when it was S3-backed instances with no IGW – just private subnets with an IPSEC VGW to on-premise.
The underlying theme, however, is that the security model is not set and forget, but to continue this journey as the platform further improves.
So, key recommendations:
- Use IAM Roles for EC2 instances (unless you have multiple un-trusted clients using SSH/RDP to the instance). These credentials auto-rotate multiple times per day, and are transparently used by the AWS SDKs.
- Turn on S3 Bucket Logging (to a separate bucket). When setting he bucket logging destination, make sure you end the prefix with a trailing slash (/). Eg, “MyBucket” logs to bucket “MyLoggingBucket”, with prefix “S3logs/MyBucket/”. S3 Logging is a Trusted Advsior recommendation: setting a Lifecycle policy on these logs is my recommendation (dev/test at X days, Production at Y years?).
- Create (at least one) VPC S3 Endpoint for the buckets in region, and adjust routing tables accordingly. Perhaps start with an open policy if you’re comfortable (it’s no worse than the previous access to S3 over Internet), and iterate from there.
- Consider locking your S3 buckets down to just your VPC, or your VPC and some well known ranges.