AWS Re:Invent Day 1 thoughts

This is going to be a long week of learning how the world has changed. I’m already tired, and I’m not even there. My brain hurts (you’d not believe how many typos I am correcting here).

While (once again) I am not at Re:Invent in Las Vegas, Nevada, I’m tuned in to as many news sources as possible to try and catch what parts of the undifferentiated heavy lifting has changed. I’ve been one of the AWS Cloud Warriors for the last two years (2017-2018), which has been lucky enough for me to be given a conference ticket, but unfortunately I’ve not been able to get there.

While I may not be physically there, I am in spirit, having been nominated as one of the AWS Ambassadors.

However the live stream video (which has improved dramatically since 2014), the Tweets from various people, the updates on LinkedIn, RSS feeds, Release Notes, What’s New page, AWS Blog (hi Jeff), and indeed, the Recent Changes/Release History sections of lots of the documentation pages (such as this Release History page for CloudFormation) have given me more information to trawl through.

It’s now Tuesday night in Perth, Western Australia, and day two of Re:Invent but its only 7am Tuesday morning in Las Vegas (yes, I’m 16 hours in the future). Here’s my thoughts on the releases thus far:

100 GB/s networking in VPC

The ENA network interface was previously limited to 25 Gb/sec per instance on the largest instance types. Indeed, its worth noting that most network resources are limited to some degree by the instance size within an instance family. But now a new family – the C5n instances – have interfaces capable of up to 100 Gb/sec (that’s 12.6 GB/s – little b is bits, big B is bytes).

Much has been said about network throughput, and the comparison between ENA and SR-IOV in the AWS Cloud, and comparisons to other Cloud environments. 100 Gb/s now sets a new high bar that other vendors are yet to reach.

While its wonderful to have that level of throughput, its also worth noting that scale-out is still sometimes a good idea. 100 instances at 1 GB/s each may provide a better solution sometimes, but then again sometimes a problem doesn’t split nicely between multiple server instances. YMMV.

Transit gateway

Managing an enterprise within AWS usually a case of managing multiple AWS accounts. The ultimate in separation from a console/account level sometimes reverts to integration questions around network, governance and other considerations.

In March 2014 (yes, 4 and a half years ago), VPC service team introduced VPC Peering, a non-transitive peering arrangement between VPCs – non-throttling, no single point of failure way of meshing two separate VPCs together (including in separate accounts).

This announcement now gives a transitive way (hence the name) of meshing a spread enterprise deployment. There’s multiple reasons for doing so:

  • Compliance: all outbound (to Internet) traffic is deemed by your corporate policy to funnel via a centralised specific gateways.
  • Management overhead: organising N-VPCs to mesh together means creating (N-1)*N/2 peering arrangements, and double that number for routing table entries. If we have 4 environments (dev, test, UAT and Production), and 10 applications in their own environments, then that’s 40 VPCs, and 780 peering relationships and 1560 routing table updates.

Its worth noting that in some organisations, an accounts administrative users may themselves not have access to create an IGW for access to the internet; a Transit gateway may be the only way permitted for connectivity so it can be centrally managed.

But in taking central management, you now have a few considerations:

Blast radius. If you stuff up the Transit gateway configuration, you take down the organisation. With separation and peering, each VPC is its own blast radius.

  1. Cost: Transit gateway isn’t free. You probably want to permit S3 Endpoints for large volume object storage
  2. Throughput: 50 Gb/s may seem a lot, but now there are 100 Gb/s instances

ARM based A1 Instances

In 2013, when I worked at AWS, I spoke with friends at ARM and AWS Service teams about the possibility of this happening. The attractiveness of the reduced power envelope, and cost comparison of the chip itself made it already look compelling then. This was before Windows was compiled to ARM – and that support is only strengthening. Its heartening now to see this coming out the door, giving customers choice.

Earlier this month we saw an announcement about AMD CPUs. Now we have three CPU manufacturers to choose from in the cloud when looking to run Virtual Machines. Customers can now vote with their workloads as to what they want to use. The CPU manufacturers now have more reason to innovate and make better, faster or cheaper CPUs available. When you can switch platforms easily (you do DevOps, right? All scripted installs?) then its perhaps down to the cost question now.

Now, recently it was announced there will be a t3a. Wonder if there will be a t3a1?

Compute in the cloud just got even more commodity. Simon Wardley, fire up your maps.

S3 improvements (lots here)

Gosh, so much here already.

Firstly, an admission that AWS Glacier is no longer its own service, but folded under S3 and renamed as S3 Glacier. There’s a new API for glacier to make it easier to work with, and the ability to put objects to S3 and have them stored immediately as Glacier objects without having to have zero day archive Lifecycle policies.

SFTP transfers – finally, a commodity protocol for file uploads that simple integrators can use, without having to deploy your own maintained, patched, fault-tolerant, scalable ingestion fleet of servers. This right here is the definition of undifferentiated heavy lifting being simplified, but with a price of 30c/hour, you’re looking at US$216 before you include any data transfer charges.

Object Lock: the ability to put files and not be able to delete them for a period. For when you have strict compliance requirements. Currently can only be defined on a Bucket during Bucket creation.

S3 events seem to have got a lot more detailed as well, with more trigger types than can be sent to SQS, SNS, or straight to a Lambda function.

KMS with dedicated HSM storage

KMS has simplified the way that key management is done, but some organisations require a dedicated HSM for compliance reasons. Now you can tell KSM to use your custom key store (a single-tenneted CloudHSM devices in our VPC) as the storage for these keys, but still use KMS APIs for your own key interaction, and use those keys for your services.

A dedicated Security Conference

Boston, End of June. Two days.

Not so new (but really recent)

CLI Version 2

Something so critical – the CLI – used by so many poor-man (poor-person) integrations and CI/CD pipelines, now with a version 2 in the works. Its breaking changes time – but in the mean time, the v1 CLI continues to get updates.

Predictive AutoScaling

Having EC2 AutoScaling reactively scale when thresholds are breached has been great, but combining that with machine learning based upon previous scaling events to make predictive scaling is next-level .

Lambda Support for Python 3.7

You may initially think this is trivial, stepping up from Python 3.6 to Lambda with Python 3.7, but it means that Python Lambda code can now make TLS 1.3 requests. Updating from Python 3.6 to 3.7 is mostly trivial; from 2.7 to 3.x normally means re-factoring liburi/requests client libraries and liberal use of parentheses where previously they weren’t required (eg, for print()).

S3: Public Access Blocking

Block Public Access finally removes the need for custom Bucket policies to prevent accidental uploads with acl:public (which, when you’re using a 3rd party s3 client for which you can’t see or control the ACL used may be scary). The downfall of the previous policies that rejected uploads if ACL:public (or not acl:private) was used is that it interfered with the ability to do multi-part puts (different API).

There’s been way too many cases of customers leaving objects publicly accessible. This will become a critical control in future. Most organisations don’t want public access to S3: those that do want public, anonymous access probably should be using CloudFront to do so (and a CloudFront origin Access identity for this as well, with Lambda@Edge to handle auto indexing and trailing slash redirects).

DynamoDB: Encrypted by default

A big step up. In reality, the ‘encryption at rest’ scenario within AWS is a formality: as one of the few people in Australia who has actually been inside a US-East-1 facility (hey QuinnyPig, I recall that from your slide two weeks ago at Latency Conf) the physical security is superb; the separation of responsibility between the logical allocation of data, and the knowledge of the physical location are separate teams.

So given that someone in the facility doesn’t know where your data is, and someone who knows where it is doesn’t have physical access (and those with physical access cant smuggle storage devices in or out), we’re at a high bar (physical devices only leave facilities when crushed into a very fine powder, particularly for SSD based storage).

So the Encrypted At Rest capability is more a nice to have – an extra protection should the standard storage wiping techniques (already very robust) have an issue. But given the bulk of the AES algorithm has been in CPU extensions for years, the overhead of processing encryption is essentially no impact.

Summary

I’ve tried my best to stay aware of so much, but the last 24 months has stretched the definition of what Cloud is so very wide. IoT, Robotics, Machine Learning, Vision Processing, Connect, Alexa, Analytics, DeepLens, this list seems so wide before you dive deep to the details. And the existing stalwarts: Ec2, S3, SQS, and even VPC keep getting richer, and richer.

The above is the services I’ve been interested in – there is definitely a hell of a lot more in the last 24 hours as well.

What’s today (US time) going to bring? I need to get some sleep, because this is exhausting just trying to keep the brain up to date.

The move to S3 Endpoints

And now, continuing my current theme of “the move to…” with the further adventures of running important workloads and continuing the evolution of reliability and security at scale; the next improvement is the enabling of S3 endpoints for our VPC.

VPC Perspective: going to S3

Access to S3 for many workloads is critical. Minimising the SPOFs (Single Points of Failure), artificial maximum bandwidth or latency constraints, and maximising the end-to-end security is often required. For fleets of instances, the options used to be:

  1. Public IPs to communicate directly over the (local) Internet network within the Region to talk to S3 – but these are randomly assigned, so you’d rely on the API credentials alone
  2. Elastic IPs in place of Public IPs, similar to above, and then have to manage the request, release, limits and additional charges associated with EIPs.
  3. NAT Instances in AutoScale groups, with boot time scripts and role permission to update dependent routing tables to recover from failed NAT instances
  4. Proxy servers, or an ASG of proxy servers behind an internal ELB

In all of these scenarios, you’d look to use EC2 Instances with temporary, auto-rotating IAM Role credentials to access S3 over an encrypted channel (HTTPS). TLS 1.2, modern ciphers, and a solid (SHA256) chain of trust to the issuing CA was about as good as it got to ensure end-to-end encryption and validation that your process had connected to S3 reliably.

But S3 Endpoints enhances this, and in more than just a simple way.

Let’s take a basic example: an Endpoint is attached to a VPC with a policy (default, open) for a outbound access to a particular AWS Service (S3 for now), and the use of this Endpoint is made available to the EC2 Instances in the VPC by way of the VPC Routing table(s) and their association to a set of subnets. You may have multiple routing tables; perhaps you’d permit some of your subnets to use the endpoint, and perhaps not others.

With the Endpoint configured as above it permits direct access to S3 in the same Region without traversing the Internet network. The configured S3 Logging will start to reflect the individual Instance Private IPs (within the VPC) and no longer have the Public or Elastic IPs they may have previously used. They don’t need to use a NAT (Instance or NAT Gateway) or other Proxy: the Endpoint provides reliable, high through-put access to S3.

However, the innovation doesn’t stop there. That policy mentioned above on the Endpoint can place restrictions on the APIs and Buckets that are accessible via this Endpoint. For example, a subnet of Instances that I want to ensure they can ONLY access only my named bucket(s) Endpoint policy. As they have no other route to S3, then they can’t access 3rd party anonymously accessible buckets.

I can also limit the API calls via the Endpoint: perhaps permitting on Get, Put, List operations. These instance couldn’t assume another role (sts:assumeRole) that may have s3:DeleteBucket privileges, and use it via this restricted Endpoint.

Let’s make it a little more complex, with a second Endpoint on the VPC. Perhaps I’ll associate this second Endpoint with my administrative  subnet, and permit an open policy on it.

S3 Perspective: Restricting sources

An S3 bucket, once created in a Region, accepts valid signed requests from the Principals you permit in IAM policy. You can add Bucket Policies to them to restrict this to a set of trusted IP CIDR blocks (both IPv4 and IPv6 now – IPv6 only for the S3 public API Service Endpoint, not the optionally enabled S3 website or VPC Endpoint). For example, a DENY policy with a condition of:

"Condition": {
   "NotIpAddress": {
     "aws:SourceIp": [
       "54.240.143.0/24",
       "2001:DB8:1234:5678::/64"
     ]
   },
}

But with VPC Endpoints, you would instead add a DENY role with a condition of:

"Condition": {
  "StringNotEquals": {
    "aws:sourceVpc": "vpc-1234beef"
  }
}

Items in the condition block are AND-ed together at this time, so if you’re writing a policy with both VPC endpoint requirement OR an on-premise IP block, things get interesting: you’re going to want to Boolean OR these two separate Conditions in a Deny block:

"Condition": { 
  "NotIpAddress": { 
    "aws:SourceIp": [ "54.240.143.0/24", "2001:DB8:1234:5678::/64" ],
  },
  "StringNotEquals": {
    "aws:sourceVpc": "vpc-1234beef"
  }
} # FAILS EVERY TIME AS BOTH ARE EVALUATED!!

Luckily there’s a work around. IfExists can conditionally check a Condition key, and skip it if its not defined:

"Condition": {
  "NotIpAddressIfExists": {
    "aws:SourceIp" : [ "54.240.143.0/24", "2001:DB8:1234:5678::/64" ]
  },
  "StringNotEqualsIfExists" : {
    "aws:SourceVpc", [ "vpc-1234beef" ]
  }
}

Thus these two can be ANDED together and still pass if either one is TRUE. Kind of like an OR! Add the Action: DENY to this and we should be looking pretty good.

In summary

So what’s this got us now?

  1. Our S3 logs should only contain IP addresses from within the VPC now, so it’s fairly obvious to pick out any other access attempts.
  2. Our reliance on external Internet access has slightly reduced – but there are other sites and services in use (eg, SQS, CloudWatch for metric submission, or even AutoScale for signaling ASG scaling action results) then these are still required to go our the Internet Gateway (IGW) one way or another
  3. Our S3 buckets can have additional constrains to further limit the scope of credentials.
  4. We’ve avoided complex scenarios of lashing together scripts that dynamically adjust routing tables, intercept SSL traffic on proxy servers, or other nasty hacks

The AWS team has publicly indicated more Endpoints are to come, so this shows a clear trajectory: less reliance on “Internet” access for instances. All of this is a long, long way from what VPC looked like back in 2008, when it was S3-backed instances with no IGW – just private subnets with an IPSEC VGW to on-premise.

The underlying theme, however, is that the security model is not set and forget, but to continue this journey as the platform further improves.

So, key recommendations:

  1. Use IAM Roles for EC2 instances (unless you have multiple un-trusted clients using SSH/RDP to the instance). These credentials auto-rotate multiple times per day, and are transparently used by the AWS SDKs.
  2. Turn on S3 Bucket Logging (to a separate bucket). When setting he bucket logging destination, make sure you end the prefix with a trailing slash (/). Eg, “MyBucket” logs to bucket “MyLoggingBucket”, with prefix “S3logs/MyBucket/”. S3 Logging is a Trusted Advsior recommendation: setting a Lifecycle policy on these logs is my recommendation (dev/test at X days, Production at Y years?).
  3. Create (at least one) VPC S3 Endpoint for the buckets in region, and adjust routing tables accordingly. Perhaps start with an open policy if you’re comfortable (it’s no worse than the previous access to S3 over Internet), and iterate from there.
  4. Consider locking your S3 buckets down to just your VPC, or your VPC and some well known ranges.

Official Debian Images on Amazon Web Services EC2

Official Debian AMIs are now on Amazon web Services

Please Note: this article is written from my personal perspective as a Debian Developer, and is not the opinion or expression of my employer.

Amazon Web Service‘s EC2 offers customers a number of Operating Systems to run. There are many Linux Distributions available, however for all this time, there has never been an ‘Official’ Debian Image – or Amazon Machine Image (AMI), created by Debian.

For some Debian users this has not been an issue as there are several solutions of creating your own personal AMI. However for the AWS Users who wanted to run a recognised image, it has been a little confusing at times; several Debian AIMs have been made available by other customers, but the source of those images has not been ‘Debian’.

In October 2012 the AWS Marketplace engaged in discussions with the Debian Project Leader, Stefano Zacchiroli. A group of Debian Developers and the wider community formed to generated a set of AMIs using Anders Ingemann’s ec2debian-build-ami script. These AMIs are published in the AWS Marketplace, and you can find the listing here:

No fees are collected for Debian for the use of these images via the AWS Marketplace; they are listed here for your convenience. This is the same AMI that you may generate yourself, but this one has been put together by Debian Developers.

If you plan to use this AMI, I suggest you read http://wiki.debian.org/Cloud/AmazonEC2Image, and more explicity, SSH as the user ‘admin and then ‘sudo -i‘ to root.

Additional details

Anders Ingemann and others maintain a GitHub project called ec2debian-build-ami which generates a Debian AMI. This script supports several desired features, an was also updated to add in some new requirements. This means the generated image supports:

  • non-root SSH (use the user ‘admin)
  • secure deletion of files in the generation of the image
  • using the Eucalyptus toolchain for generation of th eimage
  • ensuring that this script and all its dependencies are DFSG compliant
  • using the http.debian.net redirector service in APT’s sources.list to select a reasonably ‘close’ mirror site
  • and the generated image contains only packages from ‘main’
  • plus minimal additional scripts (nuder the Apache 2.0 license as in ec2debian-build-ami) to support:
    • fetching the SSH Public Key for the ‘admin’ user (sudo -i to gain root)
    • executing UserData shell scripts (example here)

Debian Stable (Squeeze; 6.0.6 at this point in time) does not contain the cloud-init package, and neither does Debian Testing (Wheezy).

A fresh AWS account (ID 379101102735) was used for the initial generation of this image. Any Debian Developer who would like access is welcome to contact me. Minimal charges for the resource utilisation of this account (storage, some EC2 instances for testing) are being absorbed by Amazon for this. Co-ordination of this effort is held on the debian-cloud mailing list.

The current Debian stable is 6.0.6 ‘Squeeze‘, and we’re in deep freeze for the ‘Wheezy‘ release. Squeeze has a Xen kernel that works on the Parallel Virtual Machine (PVM) EC2 instance, and hence this is what we support on EC2. (HVM images are a next phase, being headed up by Yasuhiro Akarki <ar@d.o>).

Marketplace Listing

The process of listing in the AWS Marketplace was conducted as follows:

  • A 32 bit and 64 bit image was generated in US-East 1, which was AMI IDs:
    • ami-1977f070: 379101102735/debian-squeeze-i386-20121119
    • ami-8568efec: 379101102735/debian-squeeze-amd64-20121119
  • The image was shared ‘public’ with all other AWS users (as was the underlying EBS snapshot, for completeness)
  • The AWS Marketplace team duplicated these two AMIs into their AWS account
  • The AWS Marketplace team further duplicated these into other AWS Marketplace-supported Regions

This image went out on the 19th of November 2012. Additional documentation was put into the Wiki at: http://wiki.debian.org/Cloud/AmazonEC2Image/Squeeze

A CloudFormation template may help you launch a Debian instance by containing a mapping to the relevent AMI in the region you’re using: see the wiki link above.

What’s Next

The goal is to continue stable releases as they come out. Further work is happening to support generation of Wheezy images, and HVM (which may all collapse into one effort with a Linux 3.x kernel in Wheezy). If you’re a Debian Developer and would like a login to the AWS account we’ve been using, then please drop me a line.

Further work to improve this process has come from Marcin Kulisz, who is starting to package ec2debian-build-ami into a Debian: this will complete the circle of the entire stack being in main (one day)!

Thanks goes to Stefano, Anders, Charles, and everyone who  contributed to this effort.

Resources